Roku Developer Program

Join our online forum to talk to Roku developers and fellow channel creators. Ask questions, share tips with the community, and find helpful resources.
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
EnTerr
Roku Guru

roString.tokenize() quirks? [explained]

I eyed today to use tokenize() string method but was surprised by its behavior:
BrightScript Debugger> ? ("Nothing to Say with DJ Charlie").tokenize(" with ")
No
ng
o
Say
DJ
C
arl
e
This was expected to use " with " as separator and split only in two pieces, "Nothing to Say" and "DJ Charlie". Instead it seems to use the parameter passed as set of separator characters. (Undocumented)

But even with one separator char things don't go as planned - here is example of trying to parse CSV:

BrightScript Debugger> ? ("Joe,,Schmoe,2012-12-12").tokenize(",")
Joe
Schmoe
2012-12-12
Here the empty middle-initial part (since Mr.Schmoe unlike John Q. Public has no middle name) got lost. Seems to drop the empty tokens (Undocumented)

Since this function was practically undocumented till a month or so ago, can we fix these issues? Say to behave akin to str.split(). Any forum developers that would be hurt by change of behavior?

PS. Alternatively, can you allow passing roRegex as the parameter? That will make the method more powerful (i swear i will use it then!) and the regex library is already included. That will address both issues, since one can choose to pass CreateObject("roRegex", " with ", "i") for the first case; for the seconds, can use pattern "," vs ",+" ("[,;:\t]" etc), as needed
0 Kudos
2 REPLIES 2
RokuMarkn
Visitor

Re: roString.tokenize() quirks?

The behavior copies that of strtok_r (in fact if the delimiter is all ASCII chars then it is implemented by strtok_r). Some developers may be familiar with this behavior and prefer it. Doesn't ifRegex.Split do what you want?

--Mark
0 Kudos
EnTerr
Roku Guru

Re: roString.tokenize() quirks?

"RokuMarkn" wrote:
The behavior copies that of strtok_r (in fact if the delimiter is all ASCII chars then it is implemented by strtok_r). Some developers may be familiar with this behavior and prefer it. Doesn't ifRegex.Split do what you want?

It sure does, if i knew about it 🙂
BrightScript Debugger> ? CreateObject("roRegex", " with ", "i").split("Nothing to Say with DJ Charlie")
Nothing to Say
DJ Charlie

BrightScript Debugger> ? CreateObject("roRegex", ",", "").split("Joe,,Schmoe,2012-12-12")
Joe

Schmoe
2012-12-12
Wonderful. Appreciate the pointer. Cheers!

Theoretically i knew strtok once upon a time but have been spoiled by high-level languages. When time permits, someone should add to documentation (snatched this from man strtok):
The delim argument specifies a set of bytes that delimit the tokens in the parsed string. [...] A sequence of two or more contiguous delimiter bytes in the parsed string is considered to be a single delimiter. Delimiter bytes at the start or end of the string are ignored. Put another way: the tokens returned [...] are always nonempty strings.
0 Kudos