cpradio
Visitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-27-2011
03:19 PM
Strip out HTML Tags
I am reading an XML file to populate a springboard, but the descriptions in the XML have HTML tags embeded in them. How can I quickly strip these out?
Thanks,
Matt
Thanks,
Matt
6 REPLIES 6
kbenson
Visitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-27-2011
03:21 PM
Re: Strip out HTML Tags
"cpradio" wrote:
I am reading an XML file to populate a springboard, but the descriptions in the XML have HTML tags embeded in them. How can I quickly strip these out?
Do a google search for a regular expression to strip HTML (there's tons online), and use the regular expression component in brightscript to remove them.
-- GandK Labs
Check out Reversi! in the channel store!
Check out Reversi! in the channel store!
cpradio
Visitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-27-2011
03:25 PM
Re: Strip out HTML Tags
Ah, missed the regular expression component. Thanks, that should do the trick. Was kinda hoping it would support the HTML tags, but I understand why it doesn't.
jbrave
Channel Surfer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-27-2011
05:09 PM
Re: Strip out HTML Tags
I've had way more success using string functions like instr, mid left and right than regex. I posted a regex I found for parsing HTML a while ago, but have never actually gotten it to work beyond giving me the string "<HTML>" so if anyone has some code that works to get a specific tag or an associativearray of tags it would be awesome.
Screenshades: The first Screensaver for Roku2!
Musiclouds: The best free internet music, on your Roku!
Ouroborialis: Psychedelic Screensaver for Roku!
Musiclouds: The best free internet music, on your Roku!
Ouroborialis: Psychedelic Screensaver for Roku!
kbenson
Visitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-27-2011
05:26 PM
Re: Strip out HTML Tags
"jbrave" wrote:
I've had way more success using string functions like instr, mid left and right than regex. I posted a regex I found for parsing HTML a while ago, but have never actually gotten it to work beyond giving me the string "<HTML>" so if anyone has some code that works to get a specific tag or an associativearray of tags it would be awesome.
The trick is to use lazy operators (instead of greedy ones, which are the default). You get lazy operation by appending "?" to a match multiplier.
e.g.
With the string "<b>this is a bold string.</b> <i>this is an italicized string.</i> <b>this is another bold string</b>"
With this content, the regex "<b>.*</b>" will match from the first <b> to the last </b>, which generally isn't what you want.
If you change the regex to "<b>.*?</b>" you get a "lazy" match between the opening and closing bold tags, which tries to match the minimum possible, instead of the maximum possible, with the maximum possible match being the default behavior of + and * (and ? when used by itself).
That said, there are other problems commonly encountered. using a regular expression to parse HTML is problematic in most cases. You really need to be able to make some assumptions about the specific content you are parsing.
-- GandK Labs
Check out Reversi! in the channel store!
Check out Reversi! in the channel store!
lewi-p
Visitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-12-2012
08:25 AM
Re: Strip out HTML Tags
For anyone that's interested I used this...
function StringRemoveHTMLTags(baseStr as String) as String
r = createObject("roRegex", "<[^<]+?>", "i")
return r.replaceAll(baseStr, "")
end function

neoRiley
Visitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-13-2017
01:47 PM
Re: Strip out HTML Tags
"lewi-p" wrote:
For anyone that's interested I used this...function StringRemoveHTMLTags(baseStr as String) as String
r = createObject("roRegex", "<[^<]+?>", "i")
return r.replaceAll(baseStr, "")
end function
Thank you 🙂 Does exactly what it says. I figured I'd say thanks 5yrs later since nobody else did