Roku Developer Program

Join our online forum to talk to Roku developers and fellow channel creators. Ask questions, share tips with the community, and find helpful resources.
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
greubel
Visitor

How to make it faster ?

I have DLNA working in Chaneru 2.0 BUT because of the amount of data, it is slow to index through the files.
Microsoft is the worst. It sends back 10 times the data that any other server would. For a request of 50 items, I get back about 500K.
So this takes a lot of time with directories with a 1000 entries.
I have rewritten the parsing routine about 100 times, trying to optimize it. Tried all of the available BS procedures and interfaces.
The big thing is to replace all of the escaped characters in the data ( > < " ' ) with their real character.
It seems that the BS Tokoenize is the most efficient method of parsing.

Is there any way to run a block of code natively ?
Or can you provide a call where I can pass an array of strings and their substitutions and then do it at one whack ?
Like roRegex BUT faster !
0 Kudos
25 REPLIES 25
TheEndless
Channel Surfer

Re: How to make it faster ?

"greubel" wrote:
The big thing is to replace all of the escaped characters in the data ( > < " ' ) with their real character.
It seems that the BS Tokoenize is the most efficient method of parsing.

Are you not using the built-in XML parser? It should automatically decode those characters.

Also, how are you using Tokenize to parse? In my experience, the longer the string you pass it as the delimiter, the longer it takes to tokenize, and the more unpredictable the results.

Would you be willing to share your parsing code, so we can provide optimization suggestions?

And finally, it sounds like you're trying to pull back all items in a directory at once. Why not use paging, and either download each page asynchronously and reset/extend the content list as you get it, or when the user explicitly selects a "Next Page" item?
My Channels: http://roku.permanence.com - Twitter: @TheEndlessDev
Instant Watch Browser (NetflixIWB), Aquarium Screensaver (AQUARIUM), Clever Clocks Screensaver (CLEVERCLOCKS), iTunes Podcasts (ITPC), My Channels (MYCHANNELS)
0 Kudos
greubel
Visitor

Re: How to make it faster ?

Are you not using the built-in XML parser? It should automatically decode those characters.

I tried it before and it is so slooooow, it's pathetic.

Also, how are you using Tokenize to parse?

The main parser chops the data on a ">" and then processes each field.

Would you be willing to share your parsing code, so we can provide optimization suggestions?

This is the fastest I've come up with to unescape the characters.

itm = txt.Tokenize( "&" )
txt = ""
for i=0 to itm.Count()-1
fld = itm[i].Tokenize( ";" )
if fld.Count() = 1
if Lcase(fld[0]) = "gt"
fld[0] = ">"
elseif Lcase(fld[0]) = "lt"
fld[0] = "<"
elseif Lcase(fld[0]) = "amp"
fld[0] = "&amp;"
elseif Lcase(fld[0]) = "quot"
fld[0] = Q
else
fld[0] = "&" + itm[i]
end if
else
if Lcase(fld[0]) = "gt"
fld[0] = ">" + fld[1]
elseif Lcase(fld[0]) = "lt"
fld[0] = "<" + fld[1]
elseif Lcase(fld[0]) = "amp"
fld[0] = "&amp;" + fld[1]
elseif Lcase(fld[0]) = "quot"
fld[0] = Q + fld[1]
else
fld[0] = fld[0] + ";" + fld[1]
end if
for j=2 to fld.Count()-1
fld[0] = fld[0] + ";" + fld[j]
end for
end if
txt = txt + fld[0]
end for


it sounds like you're trying to pull back all items in a directory at once

There may be 1 to XXX folders. I display 5 on the screen. When I first start, I pull 1 item from each of the subdirectories,
just to get a total count for that folder. I split up the processing across all the folders. I pull 50 for the selected folder, 20 for the others on the screen and 10 for the next 5 folders for a max of 120 outstanding items requested. As each request for a folder completes, another request gets fired off again for the next set of items for that folder. This allows me to display a progress bar for each folder while it is loading. This also allows the user to see that it is actually doing something and he can still scroll through the folders. That way if he wants a particular directory, it will load faster when it is selected.

If you want to see it work - Add Chaneru 2.0 to your channels - Feb 27, 2013 - https://owner.roku.com/add/EBUQA
0 Kudos
TheEndless
Channel Surfer

Re: How to make it faster ?

"greubel" wrote:
Are you not using the built-in XML parser? It should automatically decode those characters.

I tried it before and it is so slooooow, it's pathetic.

Really? I've only experienced slow XML parsing with files >2MB. It's certainly always been faster than parsing it programmatically. I wonder if there's something in your XML that's particularly problematic for the parser. Is it UTF-8?

I just did a test of parsing a 1MB XML file with the built-in XML parser, a simple regex parser I use for HTML decoding, and the code you posted, and got the following results:
XML Parser:      664ms
Regex Parser: 8779ms
Greubel Parser: 4542ms

The built-in XML parser is clearly superior, so I'm really curious as to why you're experiencing different results.
My Channels: http://roku.permanence.com - Twitter: @TheEndlessDev
Instant Watch Browser (NetflixIWB), Aquarium Screensaver (AQUARIUM), Clever Clocks Screensaver (CLEVERCLOCKS), iTunes Podcasts (ITPC), My Channels (MYCHANNELS)
0 Kudos
greubel
Visitor

Re: How to make it faster ?

Ok, I'll recode it again and see how it does. The last time I used the parser was with iTunes and they were really large files.
0 Kudos
greubel
Visitor

Re: How to make it faster ?

Ok, I recoded the my xml parsing routine to use parse().
Here are some times I get for large blocks with 50 items returned.

Roku Roku2
size = 18057
20:59:29.280 Tokenize = 1213 21:32:41.372 Tokenize = 301
20:59:29.291 regex = 1461 21:32:41.377 regex = 408
20:59:29.302 XML parse = 37 21:32:41.382 XML parse = 15
20:59:29.315 XML scan = 166 21:32:41.387 XML scan = 39
20:59:29.326 XML setup = 19 21:32:41.392 XML setup = 4

size = 20242
20:59:39.058 Tokenize = 1436 21:32:44.680 Tokenize = 359
20:59:39.069 regex = 1796 21:32:44.684 regex = 519
20:59:39.080 XML parse = 43 21:32:44.689 XML parse = 16
20:59:39.090 XML scan = 179 21:32:44.694 XML scan = 42
20:59:39.100 XML setup = 19 21:32:44.699 XML setup = 5

size = 20359
20:59:47.884 Tokenize = 1406 21:32:47.719 Tokenize = 359
20:59:47.900 regex = 1789 21:32:47.723 regex = 507
20:59:47.912 XML parse = 43 21:32:47.728 XML parse = 16
20:59:47.922 XML scan = 179 21:32:47.733 XML scan = 41
20:59:47.936 XML setup = 19 21:32:47.738 XML setup = 4

You can see my routine Tokenize is faster than regex. I have both in just to get times.
The XML scan is based on the XMLPrint example in the docs. It converts the XMLList to objects.
The XML setup does some finalization.

I'm calling regex 4 times to unescape the < > & and "

r = CreateObject("roRegex", "&gt;", "i")
str = r.ReplaceAll( str, ">")
r = CreateObject("roRegex", "&lt;", "i")
str = r.ReplaceAll( str, "<")
r = CreateObject("roRegex", "&amp;", "i")
str = r.ReplaceAll( str, "&")
r = CreateObject("roRegex", "&quot;", "i")
str = r.ReplaceAll( str, Q)

It would be nice if the parse routine recognized them and handled the codes BUT he doesn't.
Without doing the regex() calls, I get one long field for most of the data from parse().

Is there a way for regex() to handle all four values on one call ?
0 Kudos
TheEndless
Channel Surfer

Re: How to make it faster ?

"greubel" wrote:
It would be nice if the parse routine recognized them and handled the codes BUT he doesn't.
Without doing the regex() calls, I get one long field for most of the data from parse().

Either I'm not following what you're saying, or something funky is going on with your XML. The following code:
    xmlString = "<test>quote: &quot;, ampersand: &amp;, greater than: &gt;, less than: &lt;</test>"
xml = CreateObject("roXmlElement")
xml.Parse(xmlString)
?xml.GetText()

Correctly prints the following to the console for me:
quote: ", ampersand: &, greater than: >, less than: <

Am I misunderstanding the issue?

"greubel" wrote:
Is there a way for regex() to handle all four values on one call ?

Nothing that I could figure out. I spent a couple of hours trying to come up with something faster the other day when you first posted the issue, but the best I could come up with was the following, which finds all instances of encoded characters in a single Regex, but you'd still have to walk through the matches to do the replacements, which is much slower than what you're doing.
regex = CreateObject("roRegex", "&#?([0-9|quot|lt|gt|amp|nbsp|apos]+?);", "i")

Ideally you could use the roRegex.Split() method with the above, then do something similar to what you're already doing, but Split() doesn't give you the match that caused the split. I suppose you could do some Mid() parsing on the original string based on the Split() string lengths, but I can't imagine that'd be any faster than what you're currently doing.
My Channels: http://roku.permanence.com - Twitter: @TheEndlessDev
Instant Watch Browser (NetflixIWB), Aquarium Screensaver (AQUARIUM), Clever Clocks Screensaver (CLEVERCLOCKS), iTunes Podcasts (ITPC), My Channels (MYCHANNELS)
0 Kudos
greubel
Visitor

Re: How to make it faster ?

Thanks for working on this !
From what you are saying it looks like something is funny with either the parse() or the field retrievals.

I'll bet that I need two calls to parse. One just to convert the < > so he knows that they are fields on the second.
0 Kudos
RokuMarkn
Visitor

Re: How to make it faster ?

Two parse calls certainly should not be necessary. Can you post a sample of your XML and code to show the problem?

--Mark
0 Kudos
greubel
Visitor

Re: How to make it faster ?

I took out the regex calls and turned on some displays in my XML_Scan routine.
Notice that the "GetText value = <DIDL-Lite" all the characters are unescaped from the original.
Should they be processed on input to parse() ? or do I really need a second call ?

Here is a link to the xml trace.
http://www.chaneru.com/DLNA/XML.txt
0 Kudos