Roku Community

greubel · ‎01-16-2014

Len() returns incorrect value for some UTF8 sequences for different builds.
This also effects the font string width, GetOneLineWidth().

Model=N1000 Level=3.1 1182 - chars should be 27 and 21


13:30:05.885 Get_Font w=268 h=60 b=True i=False m=2 [Ð’ÐµÑ‡Ð½Ð°Ñ Ð¿Ñ€Ð¸Ð·Ñ€Ð°Ñ‡Ð½Ð°Ñ Ð²ÑÑ‚Ñ€ÐµÑ‡Ð½Ð°Ñ
Ðž Ñ‡ÐµÐ¼ Ð³Ð¾Ð²Ð¾Ñ€ÑÑ‚ Ð¼ÑƒÐ¶Ñ‡Ð¸Ð½Ñ‹]
13:30:05.894 Line 0 chars 52
13:30:05.904 000000  D092D0B5D187D0BDD0B0D18F20D0BFD180D0B8D0B7D180D0B0D187D0BDD0B0D1
13:30:05.914 000032  8F20D0B2D181D182D180D0B5D187D0BDD0B0D18F000000000000000000000000
13:30:05.920 Line 1 chars 39
13:30:05.928 000000  D09E20D187D0B5D0BC20D0B3D0BED0B2D0BED180D18FD18220D0BCD183D0B6D1
13:30:05.937 000032  87D0B8D0BDD18B00000000000000000000000000000000000000000000000000

Model=3100X Level=5.4 230 - is correct


13:24:07.987 Get_Font w=476 h=90 b=True i=False m=2 [Ð’ÐµÑ‡Ð½Ð°Ñ Ð¿Ñ€Ð¸Ð·Ñ€Ð°Ñ‡Ð½Ð°Ñ Ð²ÑÑ‚Ñ€ÐµÑ‡Ð½Ð°Ñ
Ðž Ñ‡ÐµÐ¼ Ð³Ð¾Ð²Ð¾Ñ€ÑÑ‚ Ð¼ÑƒÐ¶Ñ‡Ð¸Ð½Ñ‹]
13:24:08.003 Line 0 chars 27
13:24:08.005 000000  D092D0B5D187D0BDD0B0D18F20D0BFD180D0B8D0B7D180D0B0D187D0BDD0B0D1
13:24:08.008 000032  8F20D0B2D181D182D180D0B5D187D0BDD0B0D18F000000000000000000000000
13:24:08.010 Line 1 chars 21
13:24:08.012 000000  D09E20D187D0B5D0BC20D0B3D0BED0B2D0BED180D18FD18220D0BCD183D0B6D1
13:24:08.023 000032  87D0B8D0BDD18B00000000000000000000000000000000000000000000000000

EnTerr · ‎01-19-2014

As clarification, i checked what those strings are:

first one is "Вечная призрачная встречная" consisting of 27 characters, represented in UTF8 by 52 octets (D092 D0B5 D187 D0BD D0B0 D18F 20 D0BF D180 D0B8 D0B7 D180 D0B0 D187 D0BD D0B0 D18F 20 D0B2 D181 D182 D180 D0B5 D187 D0BD D0B0 D18F)

second one is "О чем говорят мужчины", with length 21 chars, represented by 39 octets (D09E 20 D187 D0B5 D0BC 20 D0B3 D0BE D0B2 D0BE D180 D18F D182 20 D0BC D183 D0B6 D187 D0B8 D0BD D18B).

Text is in Russian, a fairly straightforward language. I mean there is nothing exotic or confusing in the use of Cyrillic characters, there are no combining characters nor anything like that.

Based on the examples it seems that in 3.x firmware len() returns length in number of octets (bytes) used to UTF8-encode a string - where in 5.x firmware len() returns the number of characters in the string. If so, this is platform fragmentation issue that should be fixed (it would be ridiculous to tell developers to check platform version when using such fundamental function as string length).

dmitskevich · ‎02-26-2014

The same problem here 😞

RokuJoel · ‎02-27-2014

Bug filed.

- Joel

EnTerr · ‎07-30-2014

@greubel,
i was reminded recently of this thread, when i faced the WTF-8 music on fw3. Today i wondered if i can write a reasonably fast function that counts the number of characters in the string (not the number of bytes, which is what len() on fw3 returns and that differs for strings with non-ASCII). And i came with something clever that makes me feel full of myself :roll:. Here goes:


function charCount(s as String):
    if len(chr(256)) <> 1:
        'we are on fw3, evil hack required
        s = createObject("roRegex", ".", "").replaceAll(s, ".")
    end if
    return len(s)
end function

Seems roFont.GetOneLineWidth() you mention will have to be fixed by RokuCo though, no can do about it.

Roku Community

Roku Developer Program

Len() and UTF8

Re: Len() and UTF8

Re: Len() and UTF8

Re: Len() and UTF8

Re: Len() and UTF8