Roku Developer Program

Join our online forum to talk to Roku developers and fellow channel creators. Ask questions, share tips with the community, and find helpful resources.
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
greubel
Visitor

Len() and UTF8

Len() returns incorrect value for some UTF8 sequences for different builds.
This also effects the font string width, GetOneLineWidth().

Model=N1000 Level=3.1 1182 - chars should be 27 and 21

13:30:05.885 Get_Font w=268 h=60 b=True i=False m=2 [Вечная призрачная встречная
О чем говорят мужчины]
13:30:05.894 Line 0 chars 52
13:30:05.904 000000 D092D0B5D187D0BDD0B0D18F20D0BFD180D0B8D0B7D180D0B0D187D0BDD0B0D1
13:30:05.914 000032 8F20D0B2D181D182D180D0B5D187D0BDD0B0D18F000000000000000000000000
13:30:05.920 Line 1 chars 39
13:30:05.928 000000 D09E20D187D0B5D0BC20D0B3D0BED0B2D0BED180D18FD18220D0BCD183D0B6D1
13:30:05.937 000032 87D0B8D0BDD18B00000000000000000000000000000000000000000000000000

Model=3100X Level=5.4 230 - is correct

13:24:07.987 Get_Font w=476 h=90 b=True i=False m=2 [Вечная призрачная встречная
О чем говорят мужчины]
13:24:08.003 Line 0 chars 27
13:24:08.005 000000 D092D0B5D187D0BDD0B0D18F20D0BFD180D0B8D0B7D180D0B0D187D0BDD0B0D1
13:24:08.008 000032 8F20D0B2D181D182D180D0B5D187D0BDD0B0D18F000000000000000000000000
13:24:08.010 Line 1 chars 21
13:24:08.012 000000 D09E20D187D0B5D0BC20D0B3D0BED0B2D0BED180D18FD18220D0BCD183D0B6D1
13:24:08.023 000032 87D0B8D0BDD18B00000000000000000000000000000000000000000000000000
0 Kudos
4 REPLIES 4
EnTerr
Roku Guru

Re: Len() and UTF8

As clarification, i checked what those strings are:
  • first one is "Вечная призрачная встречная" consisting of 27 characters, represented in UTF8 by 52 octets (D092 D0B5 D187 D0BD D0B0 D18F 20 D0BF D180 D0B8 D0B7 D180 D0B0 D187 D0BD D0B0 D18F 20 D0B2 D181 D182 D180 D0B5 D187 D0BD D0B0 D18F)

  • second one is "О чем говорят мужчины", with length 21 chars, represented by 39 octets (D09E 20 D187 D0B5 D0BC 20 D0B3 D0BE D0B2 D0BE D180 D18F D182 20 D0BC D183 D0B6 D187 D0B8 D0BD D18B).

Text is in Russian, a fairly straightforward language. I mean there is nothing exotic or confusing in the use of Cyrillic characters, there are no combining characters nor anything like that.

Based on the examples it seems that in 3.x firmware len() returns length in number of octets (bytes) used to UTF8-encode a string - where in 5.x firmware len() returns the number of characters in the string. If so, this is platform fragmentation issue that should be fixed (it would be ridiculous to tell developers to check platform version when using such fundamental function as string length).
0 Kudos
dmitskevich
Visitor

Re: Len() and UTF8

The same problem here 😞
0 Kudos
RokuJoel
Binge Watcher

Re: Len() and UTF8

Bug filed.

- Joel
0 Kudos
EnTerr
Roku Guru

Re: Len() and UTF8

@greubel,
i was reminded recently of this thread, when i faced the WTF-8 music on fw3. Today i wondered if i can write a reasonably fast function that counts the number of characters in the string (not the number of bytes, which is what len() on fw3 returns and that differs for strings with non-ASCII). And i came with something clever that makes me feel full of myself :roll:. Here goes:

function charCount(s as String):
if len(chr(256)) <> 1:
'we are on fw3, evil hack required
s = createObject("roRegex", ".", "").replaceAll(s, ".")
end if
return len(s)
end function

Seems roFont.GetOneLineWidth() you mention will have to be fixed by RokuCo though, no can do about it.
0 Kudos