"EnTerr" wrote:
Oh absolutely - if one has control over the server-side, it's easier to pick a more digestible form for data than HTML
In one case I was thinking of, I have no control over the server from whence the source data is served (some of it HTML, and some of it Json), and certainly wouldn't consider parsing that from the Roku. However, what I was alluding to was using an intermediate server, one that you do have some degree of control over, which parses the source web pages then generates XML, which it in turn serves to the Roku. My WHATSONCHANNEL for example, when serving up listings from TV cable or satellite providers, can serve up 3 hours' worth of listings from a couple of hundred channels (something it really wasn't intended for), in less than 5 seconds -- and that uses a Php script running on a free web host 5,000 miles away in France (with my channel automatically falling back to another - free - server in Texas if the primary server goes down). If I wasn't so cheap I could probably pay for an account on a faster server, but I don't care because most of the requests from that channel (local OTA TV listings) are served in less than 3 seconds, and no-one's complained about that yet. If you're accessing other APIs that you have no control over, this method also gives you the advantage that you can strip out all the data that's not relevant to your Roku channel, and present it in any form you wish.
"EnTerr" wrote:
I would like to say a good word for roXML though: in certain cases it can parse HTML and its use may be very convenient.
That's something I wasn't aware of. That's good to know.
"EnTerr" wrote:
I am trying to speed-up my homebrew and coincidently, @belltown - i noticed that couple of years ago you wrote parser, JSONDecoder - with focus on speed. Glancing over it i see you use roByteArray and numerals instead of roRegex and strings. Why is that and do you have "words of wisdom" to share about parser performance in BrSc?
When I wrote my Json parser, back before Roku had native Json support, and when all I had was a basic Roku 1, I looked at a couple of other (Regex-based) Json parsers out there. They either had limitations in the Json that they would parse, or were just very slow, particularly when parsing larger amounts of data. I briefly looked at using roRegex and roString functions, but quickly came to the conclusion that string-handling was just very slow no matter what you used, even with "native" roRegex/roString handling. I'm not well versed in the inner workings of regex parsers, but I'd imagine that due to their complexity there's a lot of data access and CPU cycles being used as it scans the input data over and over again trying to match patterns. Also, with regex's, a lot depends on how you use them. Just about any given match can be performed using any number of different regex constructs. Even though I love regex's I didn't want to spend too much of my time trying to figure out which of the gazillion possible regex techniques was the fastest to use in each particular situation.
So I decided to keep things as simple as possible under the assumption that even interpreted code could be relatively fast if the underlying byte-code that was generated didn't have to do much. I generally tried to keep the number of data accesses to a minimum, even if that involved writing more lines of code. As you're probably aware, speed has nothing to do with how many lines of code you write; it's how many CPU instructions and memory accesses take place, etc. that counts.
My general design principles were to as far as possible:
- Convert the input string to a byte array, as BS seemed to process arrays of bytes fairly efficiently.
- Examine each character (byte) in turn, once and only once, with no lookahead, no copying, no stacks, etc., although I did make use of recursion in BS.
- When examining characters (bytes) to check whether they were number characters, or whitespace characters, or escape characters, etc., instead of having a comparison with multiple conditions (if char=space OR char=tab OR char=carriage-return OR char=newline), I just used a look-up table indexed by the character value (if whitespaceTable [byte])
- Stay away from BS roRegex and roString functions, which although they'd save lines of code, I just didn't think that all the stuff they had going on behind the scenes made up for that. [Subject to what I said above about how you use them being a significant factor].
I ran some tests on various-sized data sets as found that my parser was significantly faster on large data sets than the others, and generally comparable on smaller data sets. Then Roku made their own native Json parser, so I didn't need mine any more.
Paring Json, however, is way easier than parsing Xml. I can see why you'd want to save writing code by using regex's in your case. Just don't expect it to be very fast.