UTF-16 not supported for XML parsing?

EnTerr

Roku Guru

10 years ago

Looking for something unrelated, i ran into this: http://utf8everywhere.org/

Executive summary:

"UTF-16 is the worst of both worlds, being both variable length^ and too wide. It exists only for historical reasons and creates a lot of confusion."

Always use UTF8 for external representation (wire protocol, files) of text.

For historical reasons^^, the internal representation is more complicated. Use UTF8 if you can but beware the API might be dictated by the host platform (e.g. Java, .Net, Qt).

(^) e.g. Koala emoji

U+1F428 will be represented as 2 UTF-16 "characters" (0xD83D 0xDC28 or 0x3DD8 0x28DC depending on byte-sexuality, yet another PITA) and string length() will return 2. Asking for the 1st or 2nd character will return a high or low surrogate half, .reverse() creates "invalid" string, sorting is not lexicographic... things are not rosy

(^^) Under the (wrong) early belief that all Unicode characters would fit in 16 bits, the early adopters of Unicode - Qt framework (1992), Windows NT 3.1 (1993) and Java (1995) started using a 2-byte encoding, UCS-2. Couple of years later, the dream of fixed-width encoding were shattered

Forum Discussion

UTF-16 not supported for XML parsing?

Recent Discussions

Can you use android TV apks?

Our Cathe OnDemand App crashes when a user tries to log in.

Problem with Deep Link and Channel Content Play

Video node - pausing Live content

Where can I buy a ROKU 7000x tv?