Forum Discussion

RENJITHVR4's avatar
RENJITHVR4
Visitor
9 years ago

How to parse HTML tags by using Brightscript?

From API , we have some text with HTML tags. Actually, it is privacy policy content. So is it possible to show privacy policy content without HTML tags? But we want the Right style. Like font size and weight. Is it possible to convert HTML tags to relevant format for this? Please suggest me the best way.

For example 

<ol>\r\n\t<li>We use Personal Data to allow you to participate in the features on the Site, to process your registration, and to provide you with other requested content related to our content and other offerings. Click here to learn more&nbsp;</li></ol>

7 Replies

  • Hi 
    I am also looking for same issue, if you got any solution for this. Please give me an update. Thanks in advance, hope to get response from you.
  • Brightscript Debugger> html = "<tag>hi there<another tag/><tag2> <TAG3>MORE</tag3>"

    Brightscript Debugger> ? html
    <tag>hi there<another tag/><tag2> <TAG3>MORE</tag3>

    Brightscript Debugger> r = CreateObject("roRegex", "<.*?>", "") : ? r.ReplaceAll(html, "")
    hi there MORE

    Brightscript Debugger> html = "\r\n\tHELLO \r\r\rHOW ARE YOU?"

    Brightscript Debugger> ? html
    \r\n\tHELLO \r\r\rHOW ARE YOU?

    Brightscript Debugger> r = CreateObject("roRegex", "(\\r|\\t|\\v|\\n)", "") : ? r.ReplaceAll(html, "")
    HELLO HOW ARE YOU?


    Brightscript Debugger> html = "<ol>\r\n\t<li>We use Personal Data to allow you to participate in the features on the Site, to process your registration, and to provide you with other requested content related to our content and other offerings. Click here to learn more&nbsp;</li></ol>"

    ' strip html tags
    Brightscript Debugger> r = CreateObject("roRegex", "<.*?>", "") : html = r.ReplaceAll(html, "")

    ' strip carriage return, tab, vertical tab, newline
    Brightscript Debugger> r = CreateObject("roRegex", "(\\r|\\t|\\v|\\n)", "") : html = r.ReplaceAll(html, "")

    Brightscript Debugger> ?html
    We use Personal Data to allow you to participate in the features on the Site, to process your registration, and to provide you with other requested content related to our content and other offerings. Click here to learn more&nbsp;

    ' strip non breaking space entity
    Brightscript Debugger> r = CreateObject("roRegex", "&nbsp;", "") : ? r.ReplaceAll(html, "")
    We use Personal Data to allow you to participate in the features on the Site, to process your registration, and to provide you with other requested content related to our content and other offerings. Click here to learn more
  • don't use roRegEx when simple .replace() would do; the latter is faster.
    roXmlElement may be of help, if the html in question is well-formed from the point of view of XML.
  • replace doesn't do glob or grouping does it?

    So would still need regex to strip the html tags and possibly the grouped \r \n \t \v. You are right though, the last part where it strips off the &nbsp; could've been replace.
  • i doubt actual string would have backspace literals, that's neither here (html) nor there (c source) encoding. In Roku-speak, \r\n\t would have been chr(13)+chr(10)+chr(8)
  • chr(8) is backspace. chr(9) = \t = horizontal tab and chr(11) = \v = vertical tab. You silly rabbit.

    Brightscript Debugger> ? "no"+chr(8)+chr(8)+"yes"
    yes
  • "speechles" wrote:
    chr(8) is backspace. chr(9) = \t = horizontal tab and chr(11) = \v = vertical tab. You silly rabbit.

    i stand corrected.