kyleabaker
Visitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-27-2014
02:44 PM
roRegex doesn't work with special characters
I'm trying to take a comma delimited string and split it into an array which is working with normal characters, however, I'm seeing issues with special characters like in the following comma delimited list:
The following code works without the special "ñ", but not with. Am I doing something wrong with the regex here?
Any tips? Solutions?
"John, Tiña, Will"
The following code works without the special "ñ", but not with. Am I doing something wrong with the regex here?
list = "John, Tiña, Will"
regex = CreateObject("roRegex", ",", "") ' split on comma
print regex.Split(list)
Any tips? Solutions?
7 REPLIES 7
kyleabaker
Visitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-27-2014
03:08 PM
Re: roRegex doesn't work with special characters
Actually, the roku doesn't even display those characters on screen? For example: "ü"
Is this a bug? Can I at least replace all of these characters so they display? Like "ü" --> "u"?
Is this a bug? Can I at least replace all of these characters so they display? Like "ü" --> "u"?
belltown
Roku Guru
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-27-2014
04:28 PM
Re: roRegex doesn't work with special characters
"kyleabaker" wrote:
Actually, the roku doesn't even display those characters on screen? For example: "ü"
Is this a bug? Can I at least replace all of these characters so they display? Like "ü" --> "u"?
I ran a quick test. On my Roku 2 with the 5.5 firmware, all of the half-dozen or so random components I looked at displayed those characters.
On my Roku 1 with the 3.1 firmware, none of the components displayed those characters.
Were you running on the 3.1 fimware?
belltown
Roku Guru
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-27-2014
05:08 PM
Re: roRegex doesn't work with special characters
"kyleabaker" wrote:
Can I at least replace all of these characters so they display? Like "ü" --> "u"?
You could loop through the input data, converting all non-ASCII characters to ASCII "equivalents".
Something like this (untested):
function asciiConverter () as object
this = {}
this.lookup = CreateObject ("roArray", 256, false)
' Do not convert characters that are already ASCII
for i = 0 to 127
this.lookup [i] = i
end for
' Map the remaining ANSI chars to a default "invalid" ASCII character
for i = 128 to 255
this.lookup [i] = Asc ("?")
end for
' Add characters to be converted
this.lookup [241] = Asc ("n") ' n tilde
this.lookup [252] = Asc ("u") ' u tilde
' etc ....
' Conversion function
this.ToAscii = function (data as string) as string
baIn = CreateObject ("roByteArray")
baOut = CreateObject ("roByteArray")
baIn.FromAsciiString (data)
for each ch in baIn
baOut.Push (m.lookup [ch])
end for
return baOut.ToAsciiString ()
end function
return this
end function
' Example usage
ac = asciiConverter ()
strConverted = ac.ToAscii ("Hello" + Chr (241) + Chr (252))
print strConverted
EnTerr
Roku Guru
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-27-2014
08:47 PM
Re: roRegex doesn't work with special characters
"kyleabaker" wrote:
The following code works without the special "ñ", but not with. Am I doing something wrong with the regex here?
list = "John, Tiña, Will"
regex = CreateObject("roRegex", ",", "") ' split on comma
print regex.Split(list)
That was a false alarm re "roRegex doesn't work with special characters" - just tested variation of your example and roRegex works fine with non-ASCIIs, phew:
regex = CreateObject("roRegex", ",", "")
l = regex.Split("John, Tiña, Will")
? l
for i = 1 to len(l[1]): ch = mid(l[1], i, 1): ? i, asc(ch), ch: end for
Output for fw5 (fw3 not discussed for brevity):
John
Ti ?a
Will
1 32
2 84 T
3 105 i
4 241 ?
5 97 a
You see that split worked fine and "ñ" on fw5 = chr(241), where 241 is the proper Unicode of ñ. What might be surprising is that printing non-ASCII characters does not work in console - or that some components may not show them on screen - but that's not realated to roRegex.
kyleabaker
Visitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-28-2014
10:37 AM
Re: roRegex doesn't work with special characters
EnTerr,
You're correct, if I hard code the string and test split it does indeed work and replace works as well. However, I'm reading this string from a text file and the results are that regex does not match this character at all?
If I hardcode the string, fileText = "John, Tiña, Will", then it does indeed seem to work for the most part as expected. However, when reading this string from a file it is not working at all. Is ReadAsciiFile doing some manipulation to the text?
I guess I'm looking for a good way to take this variable with the text stored in it and replace all of the special characters with an ASCII similar character for readability. Any tips?
You're correct, if I hard code the string and test split it does indeed work and replace works as well. However, I'm reading this string from a text file and the results are that regex does not match this character at all?
fileText = ReadAsciiFile(path_to_file)
r = CreateObject("roRegex", Chr(241), "") 'ñ
If (r.IsMatch(fileText)) Then
print "matched"
fileText = r.ReplaceAll(fileText, "n")
print "replaced"
End If
If I hardcode the string, fileText = "John, Tiña, Will", then it does indeed seem to work for the most part as expected. However, when reading this string from a file it is not working at all. Is ReadAsciiFile doing some manipulation to the text?
I guess I'm looking for a good way to take this variable with the text stored in it and replace all of the special characters with an ASCII similar character for readability. Any tips?
belltown
Roku Guru
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-28-2014
10:59 AM
Re: roRegex doesn't work with special characters
How is the file encoded?
Try reading the file into an roByteArray and examine the bytes to see what it is actually reading from the file.
Try reading the file into an roByteArray and examine the bytes to see what it is actually reading from the file.
EnTerr
Roku Guru
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-28-2014
11:24 AM
Re: roRegex doesn't work with special characters
Let's untangle this first:
Read the string with ReadAsciiFile(), then dump to console char-by-char the interesting part of the string (like i did above, `for i = 1 to len(s): ch = mid(s, i, 1): ? i, asc(ch), ch: end for`) and let us see how it looks like? Also what firmware version are you and what model# - that matters here (3 vs 5)?
"kyleabaker" wrote:
... However, when reading this string from a file it is not working at all. Is ReadAsciiFile doing some manipulation to the text?
Read the string with ReadAsciiFile(), then dump to console char-by-char the interesting part of the string (like i did above, `for i = 1 to len(s): ch = mid(s, i, 1): ? i, asc(ch), ch: end for`) and let us see how it looks like? Also what firmware version are you and what model# - that matters here (3 vs 5)?