Roku Developer Program

Developers and content creators—a complete solution for growing an audience directly.
cancel
Showing results for 
Search instead for 
Did you mean: 
quartern
Level 7

Does roByteArray:FromBase64String() not like UTF8?

Hi

I'm trying to use SOME data that can be retrieved online from a base64 encoded JSON
a little confused because yesterday a version of this seemed to have worked and now it does not
as such I took a snapshot of the data that doesn't work and stored as a gist (to have a repeatable test)

I KNOW I wont be able to render the international characters and will live with it (for now)
but the base64 decode seems to have SOME issue (is it the non-ascii, or overall size - I dont know)

Using online base64 test tool (https://www.base64decode.org/) & JSON validateion tools (https://jsonformatter.curiousconcept.com/)
I surmize that the data itself is actually fine. I even took a decoded version
and served it up to this roku test (bypassing the base64 decode)
and ParseJSON() didnt seem to have any issues with the international characters


Function test()
request = CreateObject("roUrlTransfer")
request.SetCertificatesFile("common:/certs/ca-bundle.crt")
request.AddHeader("X-Roku-Reserved-Dev-Id", "")
request.InitClientCertificates()
request.SetUrl("https://gist.githubusercontent.com/quartern/c4bb656c005ded4b3c3bead99dd5b71a/raw/e35ae64009457d050e24cbdf0ebf22fc8d4aecd8/gistfile1.txt")

tmpfile="tmp:/quartern_resp.txt"
request.GetToFile(tmpfile)
ba = CreateObject("roByteArray")
if not ba.ReadFile(tmpfile)
print "ReadFile failed"
else if ba.count() < 1
print "Empty response"
else
' we know we get these with quotes - strip them
txt=ba.ToAsciiString()
print "B4 STRIP >>>"+txt.left(10)+"..."+txt.right(10)+"<<<"
ba.pop()
ba.shift()
txt=ba.ToAsciiString()
print "AFTER STRIP >>>"+txt.left(10)+"..."+txt.right(10)+"<<<, count=",ba.count()
' decode and parse
ba.FromBase64String(ba.ToAsciiString())
txt=ba.ToAsciiString()
print "AFTER DECODE >>>"+txt.left(10)+"..."+txt.right(10)+"<<<, count=",ba.count()
theData=ParseJSON(ba.ToAsciiString())
endif

DeleteFile(tmpfile)
theData.x=1 ' just force debugger on fail
return theData
end Function




Here is the relevant output in the debugger:
B4 STRIP >>>"eyJ3Y2ZJc...lNiI6W119"<<<
AFTER STRIP >>>eyJ3Y2ZJcC...xlNiI6W119<<<, count= 271071
AFTER DECODE >>>{"wcfIp":[...??? ??? ??<<<, count= 203301
BRIGHTSCRIPT: ERROR: ParseJSON: Unterminated string: ...


The size ration seems reasonable but the end of that decoded string should be something like
,"Table6":[]}

so the question marks do not make sense

Are there any size or character limitations on the base64 encoded data?

Thanks
--QuarterN
Private apps: IsraTV (replaces IsraIBA, IsraNews2, IsraI24, Isra10, Isra20)
Users - to report issues with the app (not content of streams please) send me a tweet - @quartern_roku and follow (so we can DM)
0 Kudos
6 Replies
quartern
Level 7

Re: Does roByteArray:FromBase64String() not like UTF8?

I Think I found it,

The input included a few backslash characters to escape a forward slash (forward slash is valid base64 char,
why the server escapes it within a string? maybe a javascript thing - IDK)
15k\/INek16jXpyAxIiwibWVkaWFfY29kZSI6Ijg4OTIzNyIsIm1lZGlhX2Rlc2MiOiLXkNeZ16TXlCD


It would be great if the Base64 decoder would print the value and offset of the character on which it choked

The only way for me to locate this was to implement/port base64 decoder (included here for "fun")

' based on https://en.wikibooks.org/wiki/Algorithm_Implementation/Miscellaneous/Base64
Function Decode64FromFile(fname as String) as String
WHITESPACE_CHR=64
EQUALS_CHR=65
INVALID_CHR=66

d = [
66,66,66,66,66,66,66,66,66,66,64,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,62,66,66,66,63,52,53,
54,55,56,57,58,59,60,61,66,66,66,65,66,66,66, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,66,66,66,66,66,66,26,27,28,
29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,
66,66,66,66,66,66
]

iter = 0
buf = 0
currLen = 0

baIn = CreateObject("roByteArray")
baOut = CreateObject("roByteArray")
baIn.ReadFile(fname)

inCount=0
for each inByte in baIn
c = d[inByte]
inCount+=1

if c=INVALID_CHR
print "Invalid character X at postion Y", inByte, inCount-1
else if c=WHITESPACE_CHR
' ignore
print "ignore whitespace"
else if c=EQUALS_CHR
exit for
else
buf = (buf<<6) or c
iter+=1 ' increment the number of iteration
' If the buffer is full, split it into bytes
if iter = 4
currLen += 3
baOut.push((buf >> 16) and 255)
baOut.push((buf >> 8) and 255)
baOut.push(buf and 255)
buf = 0
iter = 0
end if
end if
end for

if iter = 3
currLen += 2
baOut.push((buf >> 10) and 255)
baOut.push((buf >> 2) and 255)
else if iter = 2
currLen += 1
baOut.push((buf >> 4) and 255)
end if

return baOut.ToAsciiString()
end function
Private apps: IsraTV (replaces IsraIBA, IsraNews2, IsraI24, Isra10, Isra20)
Users - to report issues with the app (not content of streams please) send me a tweet - @quartern_roku and follow (so we can DM)
0 Kudos
Arthy74
Level 7

Re: Does roByteArray:FromBase64String() not like UTF8?

amazing, works better than the builtin FromBase64String

Thank you so much !
0 Kudos
EnTerr
Level 8

Re: Does roByteArray:FromBase64String() not like UTF8?

@quartern -
this is a VERY interesting puzzle you ran into!

You were using PHP^, yes? Smiley Surprisedops: Its json_encode() escapes slashes.

It actually took "double whammy" for this to happen - your second mistake was not doing parse_json() on the downloaded JSON but instead using .shift() & .pop() shenanigans to get rid of the quotes. Which left the \/ inside, something parse would have taken care of.

That's why i am amused by the case, it's a "twofer" that popped in completely unrelated place (base64 decoding).

(^) Friends don't let friends use PHP!
On the other hand, were i in this trade, i would actively encourage my competition to use PHP and MySql...
0 Kudos
quartern
Level 7

Re: Does roByteArray:FromBase64String() not like UTF8?

"Arthy74" wrote:
amazing, works better than the builtin FromBase64String

Thank you so much !


U R Welcome - its good for debug - but for me it was not faster (though I just timed a modified version that does it in-memory no via file, I should re-test with the above)
Private apps: IsraTV (replaces IsraIBA, IsraNews2, IsraI24, Isra10, Isra20)
Users - to report issues with the app (not content of streams please) send me a tweet - @quartern_roku and follow (so we can DM)
0 Kudos
quartern
Level 7

Re: Does roByteArray:FromBase64String() not like UTF8?

"EnTerr" wrote:
You were using PHP^, yes?


I can just about spell PHP - server is not mine

"EnTerr" wrote:
It actually took "double whammy" for this to happen - your second mistake was not doing parse_json()


My errors were strictly in the base64 decode. Unless ParseJson() also does a base64 decode I'm not sure that your comments apply -
I only stripped the quotes with shift & pop (granted, maybe not the cleanest way) because the base64 decode was choking on them. I could have tried Eval() but seemed risky.

Regardless to what encoded the quotes and backslashes - the intent was for them to be evaluated by JS in a browser - so its understandable why they would be there
Private apps: IsraTV (replaces IsraIBA, IsraNews2, IsraI24, Isra10, Isra20)
Users - to report issues with the app (not content of streams please) send me a tweet - @quartern_roku and follow (so we can DM)
0 Kudos
EnTerr
Level 8

Re: Does roByteArray:FromBase64String() not like UTF8?

"quartern" wrote:
My errors were strictly in the base64 decode. Unless ParseJson() also does a base64 decode I'm not sure that your comments apply -
I only stripped the quotes with shift & pop (granted, maybe not the cleanest way) because the base64 decode was choking on them. I could have tried Eval() but seemed risky.

Your mistake was not using parse_json() first on something that is JSON. If you had JSON-decoded the string, not only the quotes but also the backslash escapes inside would be taken care of - and no problem later with base64 decode. FromBase64String() is justified puking on \. And of course you won't use eval(), eww gross - when there is parse_json

Regardless to what encoded the quotes and backslashes - the intent was for them to be evaluated by JS in a browser - so its understandable why they would be there

JavaScript has no issue with / in strings. / does not need to be (although it might be) escaped. It works either way, just like you can say "\u0037" instead of "7".
0 Kudos