Roku Developer Program

Developers and content creators—a complete solution for growing an audience directly.
cancel
Showing results for 
Search instead for 
Did you mean: 
sjb64
Level 7

Brightscript request - Tokenize

Is there a version of Tokenize that still retains empty values instead of skipping them? The manual states "A sequence of two or more contiguous delimiters in the string is treated as a single delimiter." but this causes problems in delimiting some serialized files I use.

If not, is a request for a future method or Tokenize variant possible? I have a routine that does it but am sure what I write in BrightScript will have much slower performance than a native function. With me deserializing about eight thousand lines it would make a difference on channel startup.
FlixRaider channel
0 Kudos
16 Replies
RokuMarkn
Level 7

Re: Brightscript request - Tokenize

You can use roRegex.Split to do that.

--Mark
0 Kudos
sjb64
Level 7

Re: Brightscript request - Tokenize

Didn't even think about the regex functions, my bad, thanks for the quick reply.
FlixRaider channel
0 Kudos
EnTerr
Level 8

Re: Brightscript request - Tokenize

Previously covered in viewtopic.php?f=34&t=66914 (sheds some light on fn origins)
0 Kudos
sjb64
Level 7

Re: Brightscript request - Tokenize

Was being too specific in my searching for answers I think, didn't see that post, but is interesting, makes the rational for the behavior make sense.

... and Regex obviously worked, but wasn't actually any faster than my routine. This has always been a bit of an issue with regular expressions, highly powerful, good but not great performance. But I'm sticking with the Regex option since it is more readable, instead of using a custom routine.
FlixRaider channel
0 Kudos
EnTerr
Level 8

Re: Brightscript request - Tokenize

"sjb64" wrote:
... and Regex obviously worked, but wasn't actually any faster than my routine. This has always been a bit of an issue with regular expressions,

Check also this thread - "roRegex.split() is quadratically slow?!" - something i discovered last year and might still be the case. At least i haven't heard it being fixed - but it's easy to check with the code snippet provided is still the case. In short, roRegex.splitting a really big string into many, many pieces is *massively* slow but not because of the regex per se.
0 Kudos
TheEndless
Level 7

Re: Brightscript request - Tokenize

"EnTerr" wrote:
"sjb64" wrote:
... and Regex obviously worked, but wasn't actually any faster than my routine. This has always been a bit of an issue with regular expressions,

Check also this thread - "roRegex.split() is quadratically slow?!" - something i discovered last year and might still be the case. At least i haven't heard it being fixed - but it's easy to check with the code snippet provided is still the case. In short, roRegex.splitting a really big string into many, many pieces is *massively* slow but not because of the regex per se.

FWIW, it's not just split. The same is true for replaces.
My Channels: http://roku.permanence.com - Twitter: @TheEndlessDev
Instant Watch Browser (NetflixIWB), Aquarium Screensaver (AQUARIUM), Clever Clocks Screensaver (CLEVERCLOCKS), iTunes Podcasts (ITPC), My Channels (MYCHANNELS)
0 Kudos
sjb64
Level 7

Re: Brightscript request - Tokenize

I've been probably excessive over the years avoiding Regex across the board. The vast majority of my programming is in C#, some PHP for our website, and brightscript here. In C# writing my own routines or using Linq (unless micro optimizations are called for things like (de)serialization subroutines that run many millions of times) is substantially faster then Regex, PHP really isn't a big difference since it's interpreted(ish), and brightsript seems to be a close call too. Even in the (too many) other languages I've used over the years, I've just got into the habit of ignoring it, and therefore am rather weak on it's intricacies, which is why I didn't even think about it before starting this thread.
FlixRaider channel
0 Kudos
dev42
Level 7

Re: Brightscript request - Tokenize

I keep meaning to give this a try. I'm not able to at the moment, but I assume this can handle splitting on strings rather than chars.

What sort of memory overhead is there for using this just for .split vs. rolling our own?
0 Kudos
sjb64
Level 7

Re: Brightscript request - Tokenize

"dev42" wrote:
What sort of memory overhead is there for using this just for .split vs. rolling our own?


Little at all, just the memory of the routine. Using .split would take none since the library is already on the device, and using a custom routine line the following just creates a new array so little at is used - the code size of the routine would be negligible. Since mine is used in a deserialization routine called thousands of times, I create the result array one time, and clear and reuse it every pass, avoids lots of allocations and GC.

The code below could easily be adjusted to split on strings by adjusting the offsets in the Value.Mid calls.

function roStringSplit(Result, Value, Delimiter) as object ' Is like tokenize but doesnt dump empty values, boxed version
Result.Clear()
OldSpot=-1: Spot = Value.Instr(Delimiter)
while Spot<>-1
Result.Push(Value.Mid(OldSpot+1,Spot-OldSpot-1))
OldSpot=Spot
Spot = Value.Instr(Spot+1, Delimiter)
end while
Result.Push(Value.Mid(OldSpot+1))
return Result
end function


But, even with me doing what I can to move as little data as possible in this routine (reusing array, no array of delim locations, no string copy) it's performance is not noticeably better than .split.
FlixRaider channel
0 Kudos