Roku Community

sjb64 · ‎04-20-2015

Is there a version of Tokenize that still retains empty values instead of skipping them? The manual states "A sequence of two or more contiguous delimiters in the string is treated as a single delimiter." but this causes problems in delimiting some serialized files I use.

If not, is a request for a future method or Tokenize variant possible? I have a routine that does it but am sure what I write in BrightScript will have much slower performance than a native function. With me deserializing about eight thousand lines it would make a difference on channel startup.

RokuMarkn · ‎04-20-2015

You can use roRegex.Split to do that.

--Mark

sjb64 · ‎04-20-2015

Didn't even think about the regex functions, my bad, thanks for the quick reply.

EnTerr · ‎04-20-2015

Previously covered in viewtopic.php?f=34&t=66914 (sheds some light on fn origins)

sjb64 · ‎04-21-2015

Was being too specific in my searching for answers I think, didn't see that post, but is interesting, makes the rational for the behavior make sense.

... and Regex obviously worked, but wasn't actually any faster than my routine. This has always been a bit of an issue with regular expressions, highly powerful, good but not great performance. But I'm sticking with the Regex option since it is more readable, instead of using a custom routine.

EnTerr · ‎04-21-2015

"sjb64" wrote:
... and Regex obviously worked, but wasn't actually any faster than my routine. This has always been a bit of an issue with regular expressions,

Check also this thread - "roRegex.split() is quadratically slow?!" - something i discovered last year and ~~might still be the case. At least i haven't heard it being fixed - but it's easy to check with the code snippet provided~~ is still the case. In short, roRegex.splitting a really big string into many, many pieces is *massively* slow but not because of the regex per se.

TheEndless · ‎04-21-2015

"EnTerr" wrote:
"sjb64" wrote:
... and Regex obviously worked, but wasn't actually any faster than my routine. This has always been a bit of an issue with regular expressions,

Check also this thread - "roRegex.split() is quadratically slow?!" - something i discovered last year and ~~might still be the case. At least i haven't heard it being fixed - but it's easy to check with the code snippet provided~~ is still the case. In short, roRegex.splitting a really big string into many, many pieces is *massively* slow but not because of the regex per se.

FWIW, it's not just split. The same is true for replaces.

My Channels: http://roku.permanence.com - Twitter: @TheEndlessDev
Instant Watch Browser (NetflixIWB), Aquarium Screensaver (AQUARIUM), Clever Clocks Screensaver (CLEVERCLOCKS), iTunes Podcasts (ITPC), My Channels (MYCHANNELS)

sjb64 · ‎04-22-2015

I've been probably excessive over the years avoiding Regex across the board. The vast majority of my programming is in C#, some PHP for our website, and brightscript here. In C# writing my own routines or using Linq (unless micro optimizations are called for things like (de)serialization subroutines that run many millions of times) is substantially faster then Regex, PHP really isn't a big difference since it's interpreted(ish), and brightsript seems to be a close call too. Even in the (too many) other languages I've used over the years, I've just got into the habit of ignoring it, and therefore am rather weak on it's intricacies, which is why I didn't even think about it before starting this thread.

dev42 · ‎04-22-2015

I keep meaning to give this a try. I'm not able to at the moment, but I assume this can handle splitting on strings rather than chars.

What sort of memory overhead is there for using this just for .split vs. rolling our own?

sjb64 · ‎04-23-2015

"dev42" wrote:
What sort of memory overhead is there for using this just for .split vs. rolling our own?

Little at all, just the memory of the routine. Using .split would take none since the library is already on the device, and using a custom routine line the following just creates a new array so little at is used - the code size of the routine would be negligible. Since mine is used in a deserialization routine called thousands of times, I create the result array one time, and clear and reuse it every pass, avoids lots of allocations and GC.

The code below could easily be adjusted to split on strings by adjusting the offsets in the Value.Mid calls.

function roStringSplit(Result, Value, Delimiter) as object ' Is like tokenize but doesnt dump empty values, boxed version
    Result.Clear()
    OldSpot=-1: Spot = Value.Instr(Delimiter)
    while Spot<>-1
        Result.Push(Value.Mid(OldSpot+1,Spot-OldSpot-1))			
        OldSpot=Spot
        Spot = Value.Instr(Spot+1, Delimiter)			
    end while
    Result.Push(Value.Mid(OldSpot+1))		
    return Result
end function

But, even with me doing what I can to move as little data as possible in this routine (reusing array, no array of delim locations, no string copy) it's performance is not noticeably better than .split.

Roku Community

Roku Developer Program

Brightscript request - Tokenize

Re: Brightscript request - Tokenize

Re: Brightscript request - Tokenize

Re: Brightscript request - Tokenize

Re: Brightscript request - Tokenize

Re: Brightscript request - Tokenize

Re: Brightscript request - Tokenize

Re: Brightscript request - Tokenize

Re: Brightscript request - Tokenize

Re: Brightscript request - Tokenize