Roku Developer Program

Developers and content creators—a complete solution for growing an audience directly.
cancel
Showing results for 
Search instead for 
Did you mean: 
kyleabaker
Level 7

roRegex doesn't work with special characters

I'm trying to take a comma delimited string and split it into an array which is working with normal characters, however, I'm seeing issues with special characters like in the following comma delimited list:

"John, Tiña, Will"


The following code works without the special "ñ", but not with. Am I doing something wrong with the regex here?


list = "John, Tiña, Will"
regex = CreateObject("roRegex", ",", "") ' split on comma
print regex.Split(list)


Any tips? Solutions?
0 Kudos
7 Replies
kyleabaker
Level 7

Re: roRegex doesn't work with special characters

Actually, the roku doesn't even display those characters on screen? For example: "ü"

Is this a bug? Can I at least replace all of these characters so they display? Like "ü" --> "u"?
0 Kudos
belltown
Level 7

Re: roRegex doesn't work with special characters

"kyleabaker" wrote:
Actually, the roku doesn't even display those characters on screen? For example: "ü"

Is this a bug? Can I at least replace all of these characters so they display? Like "ü" --> "u"?

I ran a quick test. On my Roku 2 with the 5.5 firmware, all of the half-dozen or so random components I looked at displayed those characters.

On my Roku 1 with the 3.1 firmware, none of the components displayed those characters.

Were you running on the 3.1 fimware?
https://github.com/belltown/
0 Kudos
belltown
Level 7

Re: roRegex doesn't work with special characters

"kyleabaker" wrote:
Can I at least replace all of these characters so they display? Like "ü" --> "u"?

You could loop through the input data, converting all non-ASCII characters to ASCII "equivalents".

Something like this (untested):


function asciiConverter () as object
this = {}

this.lookup = CreateObject ("roArray", 256, false)

' Do not convert characters that are already ASCII
for i = 0 to 127
this.lookup [i] = i
end for

' Map the remaining ANSI chars to a default "invalid" ASCII character
for i = 128 to 255
this.lookup [i] = Asc ("?")
end for

' Add characters to be converted
this.lookup [241] = Asc ("n") ' n tilde
this.lookup [252] = Asc ("u") ' u tilde
' etc ....

' Conversion function
this.ToAscii = function (data as string) as string
baIn = CreateObject ("roByteArray")
baOut = CreateObject ("roByteArray")
baIn.FromAsciiString (data)

for each ch in baIn
baOut.Push (m.lookup [ch])
end for

return baOut.ToAsciiString ()
end function

return this
end function

' Example usage
ac = asciiConverter ()
strConverted = ac.ToAscii ("Hello" + Chr (241) + Chr (252))
print strConverted

https://github.com/belltown/
0 Kudos
EnTerr
Level 8

Re: roRegex doesn't work with special characters

"kyleabaker" wrote:
The following code works without the special "ñ", but not with. Am I doing something wrong with the regex here?

list = "John, Tiña, Will"
regex = CreateObject("roRegex", ",", "") ' split on comma
print regex.Split(list)

That was a false alarm re "roRegex doesn't work with special characters" - just tested variation of your example and roRegex works fine with non-ASCIIs, phew:
regex = CreateObject("roRegex", ",", "")
l = regex.Split("John, Tiña, Will")
? l
for i = 1 to len(l[1]): ch = mid(l[1], i, 1): ? i, asc(ch), ch: end for

Output for fw5 (fw3 not discussed for brevity):
John
Ti ?a
Will

1 32
2 84 T
3 105 i
4 241 ?
5 97 a

You see that split worked fine and "ñ" on fw5 = chr(241), where 241 is the proper Unicode of ñ. What might be surprising is that printing non-ASCII characters does not work in console - or that some components may not show them on screen - but that's not realated to roRegex.
0 Kudos
kyleabaker
Level 7

Re: roRegex doesn't work with special characters

EnTerr,

You're correct, if I hard code the string and test split it does indeed work and replace works as well. However, I'm reading this string from a text file and the results are that regex does not match this character at all?


fileText = ReadAsciiFile(path_to_file)
r = CreateObject("roRegex", Chr(241), "") 'ñ
If (r.IsMatch(fileText)) Then
print "matched"
fileText = r.ReplaceAll(fileText, "n")
print "replaced"
End If


If I hardcode the string, fileText = "John, Tiña, Will", then it does indeed seem to work for the most part as expected. However, when reading this string from a file it is not working at all. Is ReadAsciiFile doing some manipulation to the text?

I guess I'm looking for a good way to take this variable with the text stored in it and replace all of the special characters with an ASCII similar character for readability. Any tips?
0 Kudos
belltown
Level 7

Re: roRegex doesn't work with special characters

How is the file encoded?

Try reading the file into an roByteArray and examine the bytes to see what it is actually reading from the file.
https://github.com/belltown/
0 Kudos
EnTerr
Level 8

Re: roRegex doesn't work with special characters

Let's untangle this first:
"kyleabaker" wrote:
... However, when reading this string from a file it is not working at all. Is ReadAsciiFile doing some manipulation to the text?

Read the string with ReadAsciiFile(), then dump to console char-by-char the interesting part of the string (like i did above, `for i = 1 to len(s): ch = mid(s, i, 1): ? i, asc(ch), ch: end for`) and let us see how it looks like? Also what firmware version are you and what model# - that matters here (3 vs 5)?
0 Kudos