Forum Discussion

jandre's avatar
jandre
Visitor
15 years ago

& (ampersand) symbol issue with xml parsing

Looks like there is an issue parsing xml which has ampersand (&) symbol.
for eg, in the following xml when I use "Elizabeth & Gilbert" instead of "Elizabeth Gilbert " I get 'Cant parse feed " Error.

<item sdImg="http://rokudev.roku.com/rokudev/examples/videoplayer/images/ElizabethGilbert.jpg" hdImg="http://rokudev.roku.com/rokudev/examples/videoplayer/images/ElizabethGilbert.jpg">
<title>Elizabeth & Gilbert on nurturing creativity</title>
<contentId>10051</contentId>
<contentType>Talk</contentType>
<contentQuality>SD</contentQuality>
<streamFormat>mp4</streamFormat>
<media>
<streamQuality>SD</streamQuality>
<streamBitrate>1500</streamBitrate>
<streamUrl>http://video.ted.com/talks/podcast/ElizabethGilbert_2009_480.mp4</streamUrl>
</media>
<synopsis>Elizabeth Gilbert muses on the impossible things we expect from artists and geniuses -- and shares the radical idea that, instead of the rare person 'being' a genius, all of us 'have' a genius. It's a funny, personal and surprisingly moving talk.</synopsis>
<genres>Creativity</genres>
<runtime>1172</runtime>
</item>

Any way to get around this situation?

Thanks!

9 Replies

  • Try this:

    <title>Elizabeth &amp; Gilbert on nurturing creativity</title>
  • Thanks that works. however I am wondering if this can be fixed at the sdk level, the reason is I am pulling a lot of data from the backend and I dont know where all the places I have the & symbol and escaping/encoding the symbol everytime going to be an issue. Also I have secure video URLs with & symbols in it causing some problems because of this parsing issue.

    Thanks,
  • "TheEndless" wrote:
    Ampersands need to be encoded in XML regardless of the parser.


    yes...looks like that is the rule ๐Ÿ™‚

    Thanks!
  • jbrave's avatar
    jbrave
    Channel Surfer
    So if the XML has an ampersand, it is going to fail parsing. Could one perhaps use regex to search and replace ampersands with &amp; before parsing?
  • "jbrave" wrote:
    So if the XML has an ampersand, it is going to fail parsing. Could one perhaps use regex to search and replace ampersands with &amp; before parsing?

    One could if they were positive there were no other properly encoded characters in the XML (or one had the patience to figure out the regex to detect the difference).
  • I believe an unescaped & literal violates the spec for xml and even html, so the backend should be fixed to not do that. Php has a function to do so if you are using that: http://php.net/manual/en/function.htmlspecialchars.php Just run any text fields that might contain special characters through a function like that to escape them.
  • "TheEndless" wrote:
    "jbrave" wrote:
    So if the XML has an ampersand, it is going to fail parsing. Could one perhaps use regex to search and replace ampersands with &amp; before parsing?

    One could if they were positive there were no other properly encoded characters in the XML (or one had the patience to figure out the regex to detect the difference).


    Zero width negative lookahead matching makes the regex to find and replace rogue ampersands somewhat trivial. Incorrectly encoded > and < are a bit harder...

    Match: &(?!\S+;)
    Replace: &amp;
  • Ampersands aren't the only symbols you can/will have issues with. I'm not sure what platform/language you're using, but it would behoove you to make sure that you properly encode all attributes and elements. We use .NET -- and therefore we use Microsoft's AntiXSS library to properly encode all XML attributes and elements. You will run into issues with internationalization (UTF-8 symbols) and also with other symbols ('<', '>', etc.) if you don't do that.