Roku Developer Program

Join our online forum to talk to Roku developers and fellow channel creators. Ask questions, share tips with the community, and find helpful resources.
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
TheEndless
Channel Surfer

Re: Brightscript request - Tokenize

sjb64, one change you should consider in your roStringSpit routine... you're not taking into account the length of the delimiter, so a delimiter longer than one character, like "~~~", could produce incorrect results. For example...
BrightScript Debugger> ?roStringSplit([], "1~~~2~~~~~~4~~~5~~~6", "~~~")
1
~~2



~~4
~~5
~~6

I'd also recommend declaring your parameter types, otherwise your code will crash if something other than a string is passed in. If you declare your parameter types, you'll get a more meaningful error message if that happens.

And finally, I'm curious as to why you'd pass in an array, clear and modify it, then return that same array...? If you weren't clearing it in the first line of the function, it might make more sense, as you could add to an existing array, but the way it's written, I can't figure out the value of it.
My Channels: http://roku.permanence.com - Twitter: @TheEndlessDev
Instant Watch Browser (NetflixIWB), Aquarium Screensaver (AQUARIUM), Clever Clocks Screensaver (CLEVERCLOCKS), iTunes Podcasts (ITPC), My Channels (MYCHANNELS)
0 Kudos
sjb64
Roku Guru

Re: Brightscript request - Tokenize

This is part of a deserializer, so I get a file of data from our servers, tokenize it into 10 strings, each a sequence of records for a different table. So say line 7 is a list of 500 movies (id, name, year, release date, rating, so on), I tokenize it into 500 lines, one movie per line. Then my routine comes in. I call it with that pre-initialized array, and it splits line 1 into the fields, which are then named and saved into an associative array, then I call the same routine with line 2, as line 1 is now processed and gone, so I clear the array to return the fields from line 2, and so on through the 500. The reason for the reused array was just to avoid creating the array 500 times (actually 5,000+ over the 10 tables) as a result return variable, with the associated GC operations.

I don't really need to return it as it comes from the caller anyway, but doing so allows me to use the function as a return value -
for each Item in Values
Record = Table.Deserializer(roStringSplit(Result,Item,Delimiter))
if (Record.Valid) List.AddReplace(Record.ID.ToStr(), Record)
end for


I hope my explaination is making some sense.

On the delimeter length, that's what I meant when I said a small adjustment to the Value.Mid calls would fix that. For my use I just needed char delimiters to match my existing C# and PHP code that creates/uses the same serialized file.
0 Kudos
EnTerr
Roku Guru

Re: Brightscript request - Tokenize

"sjb64" wrote:
I don't really need to return it as it comes from the caller anyway, but doing so allows me to use the function as a return value -
for each Item in Values
Record = Table.Deserializer(roStringSplit(Result,Item,Delimiter))
if (Record.Valid) List.AddReplace(Record.ID.ToStr(), Record)
end for


There is a case to be made for destructive/mutator procedures (like what `roStringSplit()` is to `result`) NOT to return any value, i.e. not be a function proper. And for functions not to be mutators (i.e. take params, don't change them but `return` all of its contribution to the world). Now i am not stuck up on that rule hard and fast but over the years have found it beneficial to keep functions "functional" and procedures "imperative" so to speak, at a hand distance from each other. There is something to be said about coming back to your code six months from now and trying to remember how things work. Having no side effects hanky-panky helps.

An example of that practice are Python's list.sort() vs sorted(list). lst.sort() sorts array (or a mutable sequence in general) in place and returns nothing. sorted(lst) on the other hand returns a brand new list in sorted order.

How i'd write that snippet:

for each item in values:
strSplit(result, item, delimiter)
record = table.deserializer(result)
if record.isValid then list.addReplace(record.ID.toStr(), record)
end for

When reading, it draws your attention to the fact result is being mutated by strSplit(), can't miss it. And there is no performance penalty either.
0 Kudos
sjb64
Roku Guru

Re: Brightscript request - Tokenize

Fair enough, I actually completely agree in principal. But am always under the gun and deadlines, new language or not, and the roku dev tasks are just one part of a much larger picture, plus this was early in my brightscript learning that I wrote this routine. Data ingest was one of the startup processes I needed to write to even get to screens and other such fun.

I was also trying to squeeze out any additional performance in a langauge I'm not (and was even less so then) totally versed on.

So I'm going to claim hectic ignorance as an excuse.
0 Kudos
EnTerr
Roku Guru

Re: Brightscript request - Tokenize

No sweat, i was just in the mood to proselytize* :mrgreen:

Getting on a philosophical tangent, Kernigan once said:
Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?
But here is an interesting counter-point i ran into when looking for the exact quote, see "Kernigan's lever" here. He uses the psychological concept of "flow" plus charts (strategically drawn to make us think to progress one should shoot up into frustration to be returned back in the zone with improved skill) - if i boil it down to a one-liner, it's Nietzsche's "What does not kill me, makes me stronger."

So, it seems if certain practice never comes back to bite you (or others) - more power to you. If it bites you - and does not have rabies - that will make your skin stronger with scar tissue/build up your immune system. So either way wisdom can be found.

(*) no way will i be able to spell that w/o online dictionary
0 Kudos
sjb64
Roku Guru

Re: Brightscript request - Tokenize

Repost of the routine, more in line with observations by TheEndless and EnTerr.

function StringSplit(Value as string, Delimiter as string) as object
Size=Len(Delimiter)
Result = CreateObject("roArray",32,true)
OldSpot=-1: Spot = Value.Instr(Delimiter)
while Spot<>-1
Result.Push(Value.Mid(OldSpot+Size,Spot-OldSpot-Size))
OldSpot=Spot
Spot = Value.Instr(Spot+Size, Delimiter)
end while
Result.Push(Value.Mid(OldSpot+Size))
return Result
end function
0 Kudos
EnTerr
Roku Guru

Re: Brightscript request - Tokenize

"sjb64" wrote:
Repost of the routine, more in line with observations by TheEndless and EnTerr.

Hmmm, it is still slower that using roRegex.split() though, by about 20%. I tried to improve on it and discovered there is still a bug - when len(delim)>1, first chop is not right (probably had to be `OldSpot = -size`):
BrightScript Debugger> ?  stringSplit("123+-123+-123", "+-")
23
123
123

My rework (by using functions instead of methods)
function str_split(valu as string, delim as string):
res = [ ]
siz = len(delim)
old_ptr = 1 'one past match end, i.e. new_ptr + siz + 1
new_ptr = instr(1, valu, delim)
while new_ptr > 0:
res.push(mid(valu, old_ptr, new_ptr - old_ptr))
old_ptr = new_ptr + siz
new_ptr = instr(old_ptr, valu, delim)
end while
res.push(mid(valu, old_ptr))
return res
end function
did not lead to noticeable speed-up. Both our home-brew functions exhibits quadratic behavior just like roRegex.match() and i am puzzled why. The practical takeout seems to be to use roRegex.match().
0 Kudos