Forum Discussion
EnTerr
10 years agoRoku Guru
Don't use UTF-16 (UCS-2) for file encoding, that's outdated technology (1980-1990s). And so is UTF-32/UCS-4.
With UTF-8
Use UTF-8 and you are golden.
PS. apparently >86% of the web uses UTF-8 and <0.1% does UTF-16. HTML5 seems to have strong opinion on encodings, saying "Authors should use UTF-8. Conformance checkers may advise authors against using legacy encodings. Authoring tools should default to using UTF-8 for newly-created documents." and in particular mandates if <meta charset=...> or <meta http-equiv="text/html;charset=..." are to be used, the value there MUST be UTF-8.
With UTF-8
- You don't have to care about endianness, like with UTF-16BE vs UTF-16LE.
- There is no need of BOM either.
- And there is no need to <?xml encoding=...> declare it either - UTF-8 is the presumptive one.
- Your files will be smaller (~60% of UTF-16 for worst case HTML, e.g. Hindi/Japanese)
- It's fully backwards compatible with ASCII
Use UTF-8 and you are golden.
PS. apparently >86% of the web uses UTF-8 and <0.1% does UTF-16. HTML5 seems to have strong opinion on encodings, saying "Authors should use UTF-8. Conformance checkers may advise authors against using legacy encodings. Authoring tools should default to using UTF-8 for newly-created documents." and in particular mandates if <meta charset=...> or <meta http-equiv="text/html;charset=..." are to be used, the value there MUST be UTF-8.