Roku Developer Program

Developers and content creators—a complete solution for growing an audience directly.
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Level 7

Not parsing the entire xml

I have a strange behavior when parsing a large xml (but I'm not sure if the fact that is large is causing the problem).

This is the xml format:

 <?xml version="1.0" encoding="UTF-8"?>
<webservice>
<control>
<operation>get_clips</operation>
<status>0</status>
</control>
<data>
<cat size="4" lang="EN"/>
<cat size="3" lang="EN"/>
' and 19 more other similar 'cat' elements
</data>
</webservice>



and this is the code, where
m.rawResponse
is the above xml

if m.rawResponse <> "" then
xml = CreateObject("roXMLElement")
print "################ "; m.rawResponse ' contains the entire xml
xml.Parse( m.rawResponse )
print "################ "; m.rawResponse ' contains the entire xml

' I've tried like this
categories = xml.data.GetChildElements()
print "number of categories: " ; categories.Count() ' prints 14

' and also like this
i = 0
for each categ in xml.data.cat
i = i +1
end for
print i ' prints 14
end if



The problem is that is only processing 14 cat elements, instead of 21 and I really don't know why.
Please help me with any ideas.
Thanks a lot!
0 Kudos
11 Replies
Highlighted
Level 7

Re: Not parsing the entire xml

Without seeing the full XML, it's hard to say, but I've parse XML files upwards of 5 megabytes in size without issue, so it's unlikely to be an issue with the size of the XML. If it were a size issue, the Parse() itself would almost certainly fail, though you're not checking for that in your code. The Parse() method returns a boolean to indicate whether the parse was successful or not, so you should start by adding a check for a successful result there first.
My Channels: http://roku.permanence.com - Twitter: @TheEndlessDev
Instant Watch Browser (NetflixIWB), Aquarium Screensaver (AQUARIUM), Clever Clocks Screensaver (CLEVERCLOCKS), iTunes Podcasts (ITPC), My Channels (MYCHANNELS)
0 Kudos
Highlighted
Level 7

Re: Not parsing the entire xml

You could also try running the XML through a validator, e.g. http://validator.w3.org/ to see if it can pick up anything strange in your XML.
https://github.com/belltown/
0 Kudos
Highlighted
Level 7

Re: Not parsing the entire xml

I've added a check to see if the parsing is successful and, yes, seems that something went wrong there, but I wonder why without this check it parsed the first 14 elements....
Here is the check I've made:

Function mGetCategories() as Dynamic
catList= CreateObject("roList")

if m.rawResponse <> "" then
xml = CreateObject("roXMLElement")

if not xml.Parse( m.rawResponse ) then
print "+++++++++++++++++++++++++++Can't parse"
STOP
return catList
endif
endif

return catList
End Function


As belltown suggested, I've used http://validator.w3.org and I have an error in the xml:
Line 421, Column 30: character "&" is the first character of a delimiter but occurred as data

<director>Donald Nij & Rick Senjin</director>


This message may appear in several cases:

You tried to include the "<" character in your page: you should escape it as "&lt;"
You used an unescaped ampersand "&": this may be valid in some contexts, but it is recommended to use "&amp;", which is always safe.
Another possibility is that you forgot to close quotes in a previous tag.


So, the problem is related to the "&" symbol. How can I resolve this issue, but not on the server side, but on the Roku code... I could not find any solution...

Thank you.
0 Kudos
Highlighted
Level 7

Re: Not parsing the entire xml

In the XML document, you need to code the ampersand characters using the XML entity reference: &amp;

For example:

<director>Donald Nij &amp; Rick Senjin</director>


Otherwise the XML is not valid and will not parse.
https://github.com/belltown/
0 Kudos
Highlighted
Level 13

Re: Not parsing the entire xml

I've run into this same problem. Here's how I fixed it:
    xmlraw = xfer.GetToString()
re = CreateObject("roRegEx"," & ","")
xml = re.ReplaceAll(xmlraw," &amp; ")

-JT
0 Kudos
Highlighted
Level 7

Re: Not parsing the entire xml

"renojim" wrote:
I've run into this same problem. Here's how I fixed it:
    xmlraw = xfer.GetToString()
re = CreateObject("roRegEx"," & ","")
xml = re.ReplaceAll(xmlraw," &amp; ")

-JT



Thanks a lot to all of you!!!! It works now perfect!
0 Kudos
Highlighted
Level 7

Re: Not parsing the entire xml

I wonder if I should encode all the special characters which may appear in the xml responses? What do you think? And which exactly these should be ? Thanks again/
0 Kudos
Highlighted
Level 7

Re: Not parsing the entire xml

It really depends on the server. The server should be encoding these characters itself. Otherwise it is sending invalid XML. You can't just encode all the special characters in the XML response, since that will destroy the XML structure. You only want to encode characters within each field. This gets complicated, which is why the server is supposed to do it. Note that your current solution will fail if the server uses any legitimate ampersand escapes like &gt; or &lt; in its content.

--Mark
0 Kudos
Highlighted
Level 13

Re: Not parsing the entire xml

"RokuMarkn" wrote:
Note that your current solution will fail if the server uses any legitimate ampersand escapes like &gt; or &lt; in its content.

It only replaces an ampersand preceded and followed by a space, so it should be ok. Of course it could still miss an unescaped ampersand if there isn't a preceding and following space.

-JT
0 Kudos