EnTerr
Roku Guru
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-17-2014
04:56 PM
roString.tokenize() quirks? [explained]
I eyed today to use tokenize() string method but was surprised by its behavior:
But even with one separator char things don't go as planned - here is example of trying to parse CSV:
Since this function was practically undocumented till a month or so ago, can we fix these issues? Say to behave akin to str.split(). Any forum developers that would be hurt by change of behavior?
PS. Alternatively, can you allow passing roRegex as the parameter? That will make the method more powerful (i swear i will use it then!) and the regex library is already included. That will address both issues, since one can choose to pass CreateObject("roRegex", " with ", "i") for the first case; for the seconds, can use pattern "," vs ",+" ("[,;:\t]" etc), as needed
BrightScript Debugger> ? ("Nothing to Say with DJ Charlie").tokenize(" with ")This was expected to use " with " as separator and split only in two pieces, "Nothing to Say" and "DJ Charlie". Instead it seems to use the parameter passed as set of separator characters. (Undocumented)
No
ng
o
Say
DJ
C
arl
e
But even with one separator char things don't go as planned - here is example of trying to parse CSV:
Here the empty middle-initial part (since Mr.Schmoe unlike John Q. Public has no middle name) got lost. Seems to drop the empty tokens (Undocumented)
BrightScript Debugger> ? ("Joe,,Schmoe,2012-12-12").tokenize(",")
Joe
Schmoe
2012-12-12
Since this function was practically undocumented till a month or so ago, can we fix these issues? Say to behave akin to str.split(). Any forum developers that would be hurt by change of behavior?
PS. Alternatively, can you allow passing roRegex as the parameter? That will make the method more powerful (i swear i will use it then!) and the regex library is already included. That will address both issues, since one can choose to pass CreateObject("roRegex", " with ", "i") for the first case; for the seconds, can use pattern "," vs ",+" ("[,;:\t]" etc), as needed
2 REPLIES 2

RokuMarkn
Visitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-17-2014
05:30 PM
Re: roString.tokenize() quirks?
The behavior copies that of strtok_r (in fact if the delimiter is all ASCII chars then it is implemented by strtok_r). Some developers may be familiar with this behavior and prefer it. Doesn't ifRegex.Split do what you want?
--Mark
--Mark
EnTerr
Roku Guru
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-17-2014
06:11 PM
Re: roString.tokenize() quirks?
"RokuMarkn" wrote:
The behavior copies that of strtok_r (in fact if the delimiter is all ASCII chars then it is implemented by strtok_r). Some developers may be familiar with this behavior and prefer it. Doesn't ifRegex.Split do what you want?
It sure does, if i knew about it 🙂
BrightScript Debugger> ? CreateObject("roRegex", " with ", "i").split("Nothing to Say with DJ Charlie")Wonderful. Appreciate the pointer. Cheers!
Nothing to Say
DJ Charlie
BrightScript Debugger> ? CreateObject("roRegex", ",", "").split("Joe,,Schmoe,2012-12-12")
Joe
Schmoe
2012-12-12
Theoretically i knew strtok once upon a time but have been spoiled by high-level languages. When time permits, someone should add to documentation (snatched this from man strtok):
The delim argument specifies a set of bytes that delimit the tokens in the parsed string. [...] A sequence of two or more contiguous delimiter bytes in the parsed string is considered to be a single delimiter. Delimiter bytes at the start or end of the string are ignored. Put another way: the tokens returned [...] are always nonempty strings.