Roku Developer Program

Join our online forum to talk to Roku developers and fellow channel creators. Ask questions, share tips with the community, and find helpful resources.
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
greubel
Visitor

String Len() and UTF8

The string function len() as per the documentation works with ASCII.
BUT, if you have a string with UTF8 character sequences in it, len() counts the utf8 sequences as one character.

x = CreateObject("roByteArray")
x[0] = 236
x[1] = 151
x[2] = 176

? "array count = " x.Count()
str = x.ToAsciiString()
? "string length = " Len(str)
---------------
array count = 3
string length = 1

This bit me for a day trying to figure out why I kept loosing a connection that is using CHUNKED encoding

Would be nice if you could do a Count() on a string !

Is there a better way than this ?

Sub Length( in as string ) as integer
a = CreateObject("roByteArray")
a.FromAsciiString( in )
return a.Count()
End Sub
0 Kudos
4 REPLIES 4
EnTerr
Roku Guru

Re: String Len() and UTF8

I doubt it is any consolation but here `len()` is doing the right thing: the utf8 byte sequence you gave (ec 97 b0) represents a single unicode character, U+C5F0, hence the string is 1 character long.

(And yes, i understand your plight and have no better solution for it)
0 Kudos
greubel
Visitor

Re: String Len() and UTF8

Yes, this was more of a warning to other Devs that might hit it.
Maybe this behavour of the function should be better documented in the SDK.
0 Kudos
RokuMarkn
Visitor

Re: String Len() and UTF8

As you discovered, Len returns the number of characters in the string, not bytes. I've added a note to the documentation to clarify this. Just out of curiosity, why do you need to know the number of bytes?

--Mark
0 Kudos
greubel
Visitor

Re: String Len() and UTF8

I have random log content that I'm sending to a browser window using chunked encoding.
The first thing you have to send is the number of bytes, followed by the string.
I was using len() to get the byte count but it fails with UTF8 characters.
Works fine now using my Length() function.
0 Kudos