greubel
Visitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-16-2014
12:00 PM
Len() and UTF8
Len() returns incorrect value for some UTF8 sequences for different builds.
This also effects the font string width, GetOneLineWidth().
Model=N1000 Level=3.1 1182 - chars should be 27 and 21
Model=3100X Level=5.4 230 - is correct
This also effects the font string width, GetOneLineWidth().
Model=N1000 Level=3.1 1182 - chars should be 27 and 21
13:30:05.885 Get_Font w=268 h=60 b=True i=False m=2 [Ð’ÐµÑ‡Ð½Ð°Ñ Ð¿Ñ€Ð¸Ð·Ñ€Ð°Ñ‡Ð½Ð°Ñ Ð²ÑтречнаÑ
О чем говорÑÑ‚ мужчины]
13:30:05.894 Line 0 chars 52
13:30:05.904 000000 D092D0B5D187D0BDD0B0D18F20D0BFD180D0B8D0B7D180D0B0D187D0BDD0B0D1
13:30:05.914 000032 8F20D0B2D181D182D180D0B5D187D0BDD0B0D18F000000000000000000000000
13:30:05.920 Line 1 chars 39
13:30:05.928 000000 D09E20D187D0B5D0BC20D0B3D0BED0B2D0BED180D18FD18220D0BCD183D0B6D1
13:30:05.937 000032 87D0B8D0BDD18B00000000000000000000000000000000000000000000000000
Model=3100X Level=5.4 230 - is correct
13:24:07.987 Get_Font w=476 h=90 b=True i=False m=2 [Ð’ÐµÑ‡Ð½Ð°Ñ Ð¿Ñ€Ð¸Ð·Ñ€Ð°Ñ‡Ð½Ð°Ñ Ð²ÑтречнаÑ
О чем говорÑÑ‚ мужчины]
13:24:08.003 Line 0 chars 27
13:24:08.005 000000 D092D0B5D187D0BDD0B0D18F20D0BFD180D0B8D0B7D180D0B0D187D0BDD0B0D1
13:24:08.008 000032 8F20D0B2D181D182D180D0B5D187D0BDD0B0D18F000000000000000000000000
13:24:08.010 Line 1 chars 21
13:24:08.012 000000 D09E20D187D0B5D0BC20D0B3D0BED0B2D0BED180D18FD18220D0BCD183D0B6D1
13:24:08.023 000032 87D0B8D0BDD18B00000000000000000000000000000000000000000000000000
4 REPLIES 4
EnTerr
Roku Guru
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-19-2014
02:53 PM
Re: Len() and UTF8
As clarification, i checked what those strings are:
Text is in Russian, a fairly straightforward language. I mean there is nothing exotic or confusing in the use of Cyrillic characters, there are no combining characters nor anything like that.
Based on the examples it seems that in 3.x firmware len() returns length in number of octets (bytes) used to UTF8-encode a string - where in 5.x firmware len() returns the number of characters in the string. If so, this is platform fragmentation issue that should be fixed (it would be ridiculous to tell developers to check platform version when using such fundamental function as string length).
- first one is "Вечная призрачная встречная" consisting of 27 characters, represented in UTF8 by 52 octets (D092 D0B5 D187 D0BD D0B0 D18F 20 D0BF D180 D0B8 D0B7 D180 D0B0 D187 D0BD D0B0 D18F 20 D0B2 D181 D182 D180 D0B5 D187 D0BD D0B0 D18F)
- second one is "О чем говорят мужчины", with length 21 chars, represented by 39 octets (D09E 20 D187 D0B5 D0BC 20 D0B3 D0BE D0B2 D0BE D180 D18F D182 20 D0BC D183 D0B6 D187 D0B8 D0BD D18B).
Text is in Russian, a fairly straightforward language. I mean there is nothing exotic or confusing in the use of Cyrillic characters, there are no combining characters nor anything like that.
Based on the examples it seems that in 3.x firmware len() returns length in number of octets (bytes) used to UTF8-encode a string - where in 5.x firmware len() returns the number of characters in the string. If so, this is platform fragmentation issue that should be fixed (it would be ridiculous to tell developers to check platform version when using such fundamental function as string length).
dmitskevich
Visitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-26-2014
11:51 PM
Re: Len() and UTF8
The same problem here 😞

RokuJoel
Binge Watcher
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2014
06:30 PM
Re: Len() and UTF8
Bug filed.
- Joel
- Joel
EnTerr
Roku Guru
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-30-2014
09:53 PM
Re: Len() and UTF8
@greubel,
i was reminded recently of this thread, when i faced the WTF-8 music on fw3. Today i wondered if i can write a reasonably fast function that counts the number of characters in the string (not the number of bytes, which is what len() on fw3 returns and that differs for strings with non-ASCII). And i came with something clever that makes me feel full of myself :roll:. Here goes:
Seems roFont.GetOneLineWidth() you mention will have to be fixed by RokuCo though, no can do about it.
i was reminded recently of this thread, when i faced the WTF-8 music on fw3. Today i wondered if i can write a reasonably fast function that counts the number of characters in the string (not the number of bytes, which is what len() on fw3 returns and that differs for strings with non-ASCII). And i came with something clever that makes me feel full of myself :roll:. Here goes:
function charCount(s as String):
if len(chr(256)) <> 1:
'we are on fw3, evil hack required
s = createObject("roRegex", ".", "").replaceAll(s, ".")
end if
return len(s)
end function
Seems roFont.GetOneLineWidth() you mention will have to be fixed by RokuCo though, no can do about it.