belltown
Roku Guru
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2016
01:45 PM
Re: How can developer test with different regions?
"renojim" wrote:
I tried changing the font, but it didn't make a difference. I only have Consolas, Lucida Console, and Raster Fonts to choose from. I think there's more going on here than just the font. I don't think the Windows 7 console understands UTF-8.
-JT
10 more days to take advantage of the free upgrade to Windows 10, which has a much improved console experience. In Windows 7, you might be able to get UTF-8 console support by typing chcp 65001 at the command prompt (and using the Consolas or Lucida Console font), although I no longer have a Win7 system to test that out on. Of course, if you're doing this in a debug session you can use PurpleBug, which should print all currency symbols correctly; if not, let me know.
renojim
Community Streaming Expert
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2016
01:56 PM
Re: How can developer test with different regions?
"RokuKC" wrote:"renojim" wrote:
......
asc() values: 48 44 55 57 32 8364 <- I thought asc() gave a value <= 255?
String characters represent Unicode code points.
8364 = U+20AC 'EURO SIGN'
I think I'm starting to get this. Internally the string is encoded as UTF-8. When I use asc() to get the ASCII code for the 6th character from the string, which is represented by 3 bytes in UTF-8, I get a Unicode value. Now how all that chooses a character from the font I'm using, I have no idea, but then that's exactly why I wanted to test this in the first place.
-JT
Roku Community Streaming Expert
Help others find this answer and click "Accept as Solution."
If you appreciate my answer, maybe give me a Kudo.
I am not a Roku employee.
Help others find this answer and click "Accept as Solution."
If you appreciate my answer, maybe give me a Kudo.
I am not a Roku employee.
renojim
Community Streaming Expert
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2016
02:12 PM
Re: How can developer test with different regions?
EnTerr, belltown, keep in mind that you can't test this from the debugger. That's the whole problem. You can only test IAPs from a packaged channel and you can only test locales by creating a user account "in" another country. I'm using an updated and improved* version of my DbgPrint and netcat running in a Windows 7 console to capture the output.
I'll give the Windows 10 console a try, but for various reasons I won't be updating my laptop.
-JT
* - I found that using UDP (roDatagramSocket) works a lot better
I'll give the Windows 10 console a try, but for various reasons I won't be updating my laptop.
-JT
* - I found that using UDP (roDatagramSocket) works a lot better
Roku Community Streaming Expert
Help others find this answer and click "Accept as Solution."
If you appreciate my answer, maybe give me a Kudo.
I am not a Roku employee.
Help others find this answer and click "Accept as Solution."
If you appreciate my answer, maybe give me a Kudo.
I am not a Roku employee.
belltown
Roku Guru
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2016
02:34 PM
Re: How can developer test with different regions?
"renojim" wrote:
Now how all that chooses a character from the font I'm using, I have no idea, but then that's exactly why I wanted to test this in the first place.
Any Unicode "character" can be identified by a number, called a "code point".
The code point for the Euro sign has been assigned the decimal number of 8363 (hex 20AC).
The internal representation of a character is determined by its "encoding". One such encoding is UTF-8, which can represent any character as a sequence of one to four 8-bit bytes. The UTF-8 encoding for the Euro sign is the three-byte sequence: E2 82 AC (hex).
The external representation of a character is called a "glyph". The glyph used to represent a particular character depends on the "font" used to display the characters. A font maps code points to glyphs. The font must contain a glyph representing a character for that character to display correctly. What glyphs the font contains are up to the font designer; there's no uniform standard for that, and some fonts contain more glyphs than others.
To display a character correctly requires use of the correct encoding to read the character's internal byte representation, AND a font that contains a glyph that represents the character.
The reason your Windows 7 console displayed gibberish is because it was not aware that the character was encoded in UTF-8, i.e. it didn't know that the 3-byte sequence E282AC represented the Euro character. It displayed each byte as a separate character because that is what was called for by its default encoding.
BrightScript stores characters internally using the UTF-8 encoding. 'Asc' returns the code point number of the character's UTF-8 byte sequence (regardless of what the documentation says). 'Length' returns the number of characters, not bytes (which is correct in the documentation).
EnTerr
Roku Guru
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2016
02:41 PM
Re: How can developer test with different regions?
"renojim" wrote:
I think I'm starting to get this. Internally the string is encoded as UTF-8.
Not exactly. But close enough!
Internally (in memory) is likely represented as 16-bit wide-characters, `wchar_t` or `char16_t`. Long story but suffice to say in the past Unicode people thought 65536 possible characters "ought to be enough for anybody"... and were proven wrong. RAM representation doesn't matter, since it's "externalized" to UTF-8 when using WriteAsciiFile() or roURLTransfer.PostFromString() or roByteArray.FromAsciiString()
When I use asc() to get the ASCII code for the 6th character from the string, which is represented by 3 bytes in UTF-8, I get a Unicode value. Now how all that chooses a character from the font I'm using, I have no idea, but then that's exactly why I wanted to test this in the first place.Yes.
EnTerr
Roku Guru
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2016
02:50 PM
Re: How can developer test with different regions?
"belltown" wrote:
BrightScript stores characters internally using the UTF-8 encoding.
You are right on the big picture. But this minor point will be very, very unusual if true. Strings in practice are not stored as UTF-8, because that way figuring out which the N-th characters is would be tedious (either count every time from the very beginning or create an index - or be grossly wrong about what mid(beg, len) returns).
Instead, i believe the practice is to store them in UCS-2, with surrogate pairs for characters are outside the BMP (>65536). Yeah, that gives wrong length and indexing for these but libraries re-define the meaning of "length" for that.
belltown
Roku Guru
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2016
02:55 PM
Re: How can developer test with different regions?
"EnTerr" wrote:"belltown" wrote:
BrightScript stores characters internally using the UTF-8 encoding.
You are right on the big picture. But this minor point will be very, very unusual if true. Strings in practice are not stored as UTF-8, because that way figuring out which the N-th characters is would be tedious (either count every time from the very beginning or create an index - or be grossly wrong about what mid(beg, len) returns).
That's true. I was just explaining the concepts, not the implementation.
EnTerr
Roku Guru
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2016
03:17 PM
Re: How can developer test with different regions?
I may have to eat crow on this one, because this does not seem to fit my theory (it should have mis-behaved counting if broken into surrogate pairs):
RokuKC, will you tells us, pretty please? 🙂
(@renojim - sorry for the confusion i am causing with this discussion - internal repr. has no bearing on how a character gets displayed!)
BrightScript Debugger> s = "123" + chr(1e6) + "567": ? len(s)
7
BrightScript Debugger> ? url.escape(s)
123%F3%B4%89%80567
BrightScript Debugger> ? mid(s,7,1)
7
RokuKC, will you tells us, pretty please? 🙂
(@renojim - sorry for the confusion i am causing with this discussion - internal repr. has no bearing on how a character gets displayed!)
EnTerr
Roku Guru
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2016
05:28 PM
Re: How can developer test with different regions?
Oh, marmalade!
😉 @belltown
BrightScript Debugger> tm = createObject("roTimeSpan")
BrightScript Debugger> s = string(2^12, "*"): s = s.replace("*", s)
BrightScript Debugger> tm.mark(): n = len(s): ? tm.TotalMilliseconds()
147
BrightScript Debugger> s2 = s.replace("*", chr(1e6))
BrightScript Debugger> tm.mark(): n = len(s2): ? tm.TotalMilliseconds()
591
😉 @belltown
belltown
Roku Guru
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2016
06:39 PM
Re: How can developer test with different regions?
"EnTerr" wrote:
Oh, marmalade!BrightScript Debugger> tm = createObject("roTimeSpan")
BrightScript Debugger> s = string(2^12, "*"): s = s.replace("*", s)
BrightScript Debugger> tm.mark(): ln = len(s): ? tm.TotalMilliseconds()
147
BrightScript Debugger> s = s.replace("*", chr(1e6))
BrightScript Debugger> tm.mark(): ln = len(s): ? tm.TotalMilliseconds()
591
😉 @belltown
Yes, len() counts the chars, one by one. So it probably indexes that way too, and maybe does store UTF-8 chars as contiguous bytes internally.