Roku Community

belltown · ‎12-02-2016

"EnTerr" wrote:
"RokuKC" wrote:
I think it is unlikely that Roku would add support for embedded chr(0) characters in BrightScript strings.

Hm, let me correct myself - i implied the only way to allow for U+0000 is by re-implementing String type with length_counter field. I was wrong - turns out there is a wickedly clever way to represent \0 in UTF-8 as 0xC0 0x80 as to avoid ever using the dreaded 0x00 octet. See Modified UTF-8.

However, if i try to do that through roByteArray, bizarre things happen:
Brightscript Debugger> ba = createObject("roByteArray"): ba.fromHexString("c080")
Brightscript Debugger> s = ba.toAsciiString(): ? len(s), s, asc(s)
 1              ??               63
What's going on here? Seems like a bug - len() is correct but the rest is whacked?

C080, or any 2-byte sequence starting with c0 or c1, is just not a valid UTF8 sequence. The official standard doesn't allow for "overlong" encodings (representing a character using a 2-byte encoding when that character can be encoded in a single byte). BrightScript is just implementing the official standard.

Regarding your earlier point: "As far as i can tell the \0 arcane quirk is not even documented." -- it's in the roByteArray description in the Component Reference.

EnTerr · ‎12-02-2016

"belltown" wrote:
C080, or any 2-byte sequence starting with c0 or c1, is just not a valid UTF8 sequence. The official standard doesn't allow for "overlong" encodings (representing a character using a 2-byte encoding when that character can be encoded in a single byte). BrightScript is just implementing the official standard.

Oh, please! Don't tell me BrightScript is "holier-than-thou Java, Android and TCL" in implementing UTF-8 standard - that would be ridiculous stance to take. :roll:
It's not out of purity that B/S does not use overlong NUL for internal repesentation.

Regarding your earlier point: "As far as i can tell the \0 arcane quirk is not even documented." -- it's in the roByteArray description in the Component Reference.

"Beware of the leopard!" - you are looking in the wrong place. NUL is legitimate character in ASCII, Unicode and BASIC. Imagine you never knew C in your life. What's the expected outcome from the following?

for i = 0 to 10: ? len(chr(i)), : next

PS. RokuKC is right that NUL support in B/S strings is largely an academic concern... but shouldn't me and you as academic luminaries be able to get our opinions in a row? 8-)

Specifically to how this thread started - with binary data - UTF-8 strings indeed are not a way to handle that. Cue \0xFE and \0xFF octets which could never ever be part of valid UTF-8

RokuKC · ‎12-02-2016

Belltown is correct, in that at least some of the BrightScript processing enforces valid UTF-8, and would reject the C080 sequence.
You can expect enforcement to get more strict over time. 🙂

Regardless of what someone might expect, BrightScript doesn't define a character type, only a string type.
Having Chr(0) returning empty string is 'as designed'.

EnTerr · ‎12-02-2016

Having Chr(0) returning empty string is 'as designed'.

BASIC defines chr() as always returning a 1-character string, unless an out-of-range exception.
The deviation from this in a single case to return an empty string adds a 3rd outcome, a singularity, a quirk. An undocumented one at that.

Can we please not label the quirks "as designed" retro-actively? Feels sarcastic 😉
"As implemented" or "won't fix" ("as you were") would be more genuine

RokuKC · ‎12-05-2016

"EnTerr" wrote:
...

BrightScript is not BASIC.

I'm not sure why you are attributing sarcasm or non-genuineness to my attempt to provide information. 😞

The BrightScript Chr() behavior of returning empty string for 0 and other non-valid codepoints is intentional and was done with forethought.

EnTerr · ‎12-05-2016

"RokuKC" wrote:
BrightScript is not BASIC.

BrightScript is a dialect of BASIC, extending it as a scripting language (dynamic types), i.e. in general direction of VBScript and VBA.

I would like to think "because RokuAnthony wrote it" is not the sole reason we are using B/S on Roku - but rather^ because it's easy to pick up by people coming from VB background. TRS-80 "Level II BASIC" - for which AJW wrote a simulator back in the days - is a BASIC, right? Besides the genealogical connection, it's easy to trace the BASIC roots in the B/S core functionality, incl. idiosyncrasies like the dummy argument in the pos() and upTime() functions.

To that extent, what i said was it would be desirable - >>> if practical <<< - to maintain the well-known functionality, which in the case of CHR(x) is to return a 1-char string for x in the ASCII range [0, 127].

The BrightScript Chr() behavior of returning empty string for 0 and other non-valid codepoints is intentional and was done with forethought.

Wait, we might have a misunderstanding here. In what sense is \0 a "non-valid codepoint"? It is valid ASCII, valid Unicode, valid UTF-8, valid in JSON (see the easy-to-read spec)... i think even in HTML (which points to Unicode definition of control characters, which points to "C0 control codes"). The only one that takes exception to that is XML, can't have \0 there - which is fine, as long as you do XML only.

I'm not sure why you are attributing sarcasm or non-genuineness to my attempt to provide information. 😞

My apologies if i offended you, i was just objecting to what i perceived to be an euphemism. To clarify the terms, if some behavior stems from underlying implementation (i.e. comes "bottom up"), that is "as implemented" and not "as designed". Was i wrong in assuming chr(0) = "" comes from shoring up the fact that B/S strings are currently implemented internally as ASCIIZ?

(^) reverse Hanlon's ("assume benevolence or forethought...") or me being a Pollyanna?

RokuKC · ‎12-05-2016

"EnTerr" wrote:
...

This discussion seems to be way off in the weeds at this point...

I've already commented on these points, sorry if it is not clear.

To reiterate, the Chr(0) behavior is explicitly intentional and "as designed".

BrightScript does not support embedded NUL characters in strings.

EnTerr · ‎12-05-2016

"RokuKC" wrote:
BrightScript does not support embedded NUL characters in strings.

Ok, so if the choice is clear, shouldn't that un-support be documented?
For String and roString, i imagine - otherwise there are way many places where marshalling to/from another representation happens to re-iterate it - besides roByteArray and chr() i can outright think of ReadAsciiFile(), parseJSON(), roUrlTransfer.getToString(), roUrlEvent ...

Roku Community

Roku Developer Program

Re: How to translate ID3 PRIV or GEOB to readable strings from HLS live stream?

Re: How to translate ID3 PRIV or GEOB to readable strings from HLS live stream?

Re: How to translate ID3 PRIV or GEOB to readable strings from HLS live stream?

Re: How to translate ID3 PRIV or GEOB to readable strings from HLS live stream?

Re: How to translate ID3 PRIV or GEOB to readable strings from HLS live stream?

Re: How to translate ID3 PRIV or GEOB to readable strings from HLS live stream?

Re: How to translate ID3 PRIV or GEOB to readable strings from HLS live stream?

Re: How to translate ID3 PRIV or GEOB to readable strings from HLS live stream?