Roku Developer Program

Developers and content creators—a complete solution for growing an audience directly.
cancel
Showing results for 
Search instead for 
Did you mean: 
RENJITHVR4
Level 7

How to parse HTML tags by using Brightscript?

From API , we have some text with HTML tags. Actually, it is privacy policy content. So is it possible to show privacy policy content without HTML tags? But we want the Right style. Like font size and weight. Is it possible to convert HTML tags to relevant format for this? Please suggest me the best way.

For example 

<ol>\r\n\t<li>We use Personal Data to allow you to participate in the features on the Site, to process your registration, and to provide you with other requested content related to our content and other offerings. Click here to learn more&nbsp;</li></ol>
0 Kudos
7 Replies
venkatareddy
Level 7

Re: How to parse HTML tags by using Brightscript?

Hi 
I am also looking for same issue, if you got any solution for this. Please give me an update. Thanks in advance, hope to get response from you.
0 Kudos
speechles
Level 7

Re: How to parse HTML tags by using Brightscript?

Brightscript Debugger> html = "<tag>hi there<another tag/><tag2> <TAG3>MORE</tag3>"

Brightscript Debugger> ? html
<tag>hi there<another tag/><tag2> <TAG3>MORE</tag3>

Brightscript Debugger> r = CreateObject("roRegex", "<.*?>", "") : ? r.ReplaceAll(html, "")
hi there MORE

Brightscript Debugger> html = "\r\n\tHELLO \r\r\rHOW ARE YOU?"

Brightscript Debugger> ? html
\r\n\tHELLO \r\r\rHOW ARE YOU?

Brightscript Debugger> r = CreateObject("roRegex", "(\\r|\\t|\\v|\\n)", "") : ? r.ReplaceAll(html, "")
HELLO HOW ARE YOU?


Brightscript Debugger> html = "<ol>\r\n\t<li>We use Personal Data to allow you to participate in the features on the Site, to process your registration, and to provide you with other requested content related to our content and other offerings. Click here to learn more&nbsp;</li></ol>"

' strip html tags
Brightscript Debugger> r = CreateObject("roRegex", "<.*?>", "") : html = r.ReplaceAll(html, "")

' strip carriage return, tab, vertical tab, newline
Brightscript Debugger> r = CreateObject("roRegex", "(\\r|\\t|\\v|\\n)", "") : html = r.ReplaceAll(html, "")

Brightscript Debugger> ?html
We use Personal Data to allow you to participate in the features on the Site, to process your registration, and to provide you with other requested content related to our content and other offerings. Click here to learn more&nbsp;

' strip non breaking space entity
Brightscript Debugger> r = CreateObject("roRegex", "&nbsp;", "") : ? r.ReplaceAll(html, "")
We use Personal Data to allow you to participate in the features on the Site, to process your registration, and to provide you with other requested content related to our content and other offerings. Click here to learn more
0 Kudos
Roku Employee
Roku Employee

Re: How to parse HTML tags by using Brightscript?

don't use roRegEx when simple .replace() would do; the latter is faster.
roXmlElement may be of help, if the html in question is well-formed from the point of view of XML.
0 Kudos
speechles
Level 7

Re: How to parse HTML tags by using Brightscript?

replace doesn't do glob or grouping does it?

So would still need regex to strip the html tags and possibly the grouped \r \n \t \v. You are right though, the last part where it strips off the &nbsp; could've been replace.
0 Kudos
Roku Employee
Roku Employee

Re: How to parse HTML tags by using Brightscript?

i doubt actual string would have backspace literals, that's neither here (html) nor there (c source) encoding. In Roku-speak, \r\n\t would have been chr(13)+chr(10)+chr(8)
0 Kudos
speechles
Level 7

Re: How to parse HTML tags by using Brightscript?

chr(8) is backspace. chr(9) = \t = horizontal tab and chr(11) = \v = vertical tab. You silly rabbit.

Brightscript Debugger> ? "no"+chr(8)+chr(8)+"yes"
yes
0 Kudos
Roku Employee
Roku Employee

Re: How to parse HTML tags by using Brightscript?

"speechles" wrote:
chr(8) is backspace. chr(9) = \t = horizontal tab and chr(11) = \v = vertical tab. You silly rabbit.

i stand corrected.
0 Kudos