Roku Community

kyleabaker · ‎08-27-2014

I'm trying to take a comma delimited string and split it into an array which is working with normal characters, however, I'm seeing issues with special characters like in the following comma delimited list:

"John, Tiña, Will"

The following code works without the special "ñ", but not with. Am I doing something wrong with the regex here?


list = "John, Tiña, Will"
regex = CreateObject("roRegex", ",", "") ' split on comma
print regex.Split(list)

Any tips? Solutions?

kyleabaker · ‎08-27-2014

Actually, the roku doesn't even display those characters on screen? For example: "ü"

Is this a bug? Can I at least replace all of these characters so they display? Like "ü" --> "u"?

belltown · ‎08-27-2014

"kyleabaker" wrote:
Actually, the roku doesn't even display those characters on screen? For example: "ü"

Is this a bug? Can I at least replace all of these characters so they display? Like "ü" --> "u"?

I ran a quick test. On my Roku 2 with the 5.5 firmware, all of the half-dozen or so random components I looked at displayed those characters.

On my Roku 1 with the 3.1 firmware, none of the components displayed those characters.

Were you running on the 3.1 fimware?

belltown · ‎08-27-2014

"kyleabaker" wrote:
Can I at least replace all of these characters so they display? Like "ü" --> "u"?

You could loop through the input data, converting all non-ASCII characters to ASCII "equivalents".

Something like this (untested):


function asciiConverter () as object
    this = {}

    this.lookup = CreateObject ("roArray", 256, false)

    ' Do not convert characters that are already ASCII
    for i = 0 to 127
        this.lookup [i] = i
    end for
	
    ' Map the remaining ANSI chars to a default "invalid" ASCII character
    for i = 128 to 255
        this.lookup [i] = Asc ("?")
    end for

    ' Add characters to be converted
    this.lookup [241] = Asc ("n")    ' n tilde
    this.lookup [252] = Asc ("u")    ' u tilde
    ' etc ....
	
    ' Conversion function
    this.ToAscii = function (data as string) as string
        baIn = CreateObject ("roByteArray")
        baOut = CreateObject ("roByteArray")
        baIn.FromAsciiString (data)
 
       for each ch in baIn
            baOut.Push (m.lookup [ch])
        end for

        return baOut.ToAsciiString ()
    end function

    return this
end function

' Example usage
ac = asciiConverter ()
strConverted = ac.ToAscii ("Hello" + Chr (241) + Chr (252))
print strConverted

EnTerr · ‎08-27-2014

"kyleabaker" wrote:
The following code works without the special "ñ", but not with. Am I doing something wrong with the regex here?
list = "John, Tiña, Will"
regex = CreateObject("roRegex", ",", "") ' split on comma
print regex.Split(list)

That was a false alarm re "roRegex doesn't work with special characters" - just tested variation of your example and roRegex works fine with non-ASCIIs, phew:

regex = CreateObject("roRegex", ",", "")
l = regex.Split("John, Tiña, Will")
? l
for i = 1 to len(l[1]): ch = mid(l[1], i, 1): ? i, asc(ch), ch: end for

Output for fw5 (fw3 not discussed for brevity):

John
 Ti   ?a
 Will

 1               32              
 2               84             T
 3               105            i
 4               241               ?
 5               97             a

You see that split worked fine and "ñ" on fw5 = chr(241), where 241 is the proper Unicode of ñ. What might be surprising is that printing non-ASCII characters does not work in console - or that some components may not show them on screen - but that's not realated to roRegex.

kyleabaker · ‎08-28-2014

EnTerr,

You're correct, if I hard code the string and test split it does indeed work and replace works as well. However, I'm reading this string from a text file and the results are that regex does not match this character at all?


fileText = ReadAsciiFile(path_to_file)
r = CreateObject("roRegex", Chr(241), "") 'ñ
If (r.IsMatch(fileText)) Then
	print "matched"
	fileText = r.ReplaceAll(fileText, "n")
	print "replaced"
End If

If I hardcode the string, fileText = "John, Tiña, Will", then it does indeed seem to work for the most part as expected. However, when reading this string from a file it is not working at all. Is ReadAsciiFile doing some manipulation to the text?

I guess I'm looking for a good way to take this variable with the text stored in it and replace all of the special characters with an ASCII similar character for readability. Any tips?

belltown · ‎08-28-2014

How is the file encoded?

Try reading the file into an roByteArray and examine the bytes to see what it is actually reading from the file.

EnTerr · ‎08-28-2014

Let's untangle this first:

"kyleabaker" wrote:
... However, when reading this string from a file it is not working at all. Is ReadAsciiFile doing some manipulation to the text?

Read the string with ReadAsciiFile(), then dump to console char-by-char the interesting part of the string (like i did above, `for i = 1 to len(s): ch = mid(s, i, 1): ? i, asc(ch), ch: end for`) and let us see how it looks like? Also what firmware version are you and what model# - that matters here (3 vs 5)?

Roku Community

Roku Developer Program

roRegex doesn't work with special characters

Re: roRegex doesn't work with special characters

Re: roRegex doesn't work with special characters

Re: roRegex doesn't work with special characters

Re: roRegex doesn't work with special characters

Re: roRegex doesn't work with special characters

Re: roRegex doesn't work with special characters

Re: roRegex doesn't work with special characters