GPF
Visitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-16-2012
11:02 PM
PNG Encoder for Roku Proof of Concept
https://github.com/GPF/pngEncodeRoku
I ported a j2me midp 1.0 png encoder(http://www.chrfr.de/software/midp_png.html - by Christian Fröschlin) to run on the roku. It takes an RGBA ByteArray and returns a PNG ByteArray. It uses the LibRokuDev library.
It still needs some profiling and performance improvements as it is not very speedy. but I tested it with loading a jpg, using GetByteArray from the loaded bitmap, passed it to toPNG(width,height,pixels) and was able to display the png as a bitmap. Current code loads a static ByteArray of the Roku forums logo and encodes it and displays it on the roku screen.
Thanks,
Troy
I ported a j2me midp 1.0 png encoder(http://www.chrfr.de/software/midp_png.html - by Christian Fröschlin) to run on the roku. It takes an RGBA ByteArray and returns a PNG ByteArray. It uses the LibRokuDev library.
It still needs some profiling and performance improvements as it is not very speedy. but I tested it with loading a jpg, using GetByteArray from the loaded bitmap, passed it to toPNG(width,height,pixels) and was able to display the png as a bitmap. Current code loads a static ByteArray of the Roku forums logo and encodes it and displays it on the roku screen.
Thanks,
Troy
20 REPLIES 20
MSGreg
Visitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-17-2012
06:22 AM
Re: PNG Encoder for Roku Proof of Concept
+1
Works great! Thank you for this.
Takes about 20-30 seconds on a 237x276 image.
I started looking into writing a roByteArray as a PNG and was at the stage of needing a deflate encoder (not needed for compression, just wrapping a raw encoding).
Hopefully the speed will increase soon. Let us know if that happens. 🙂
Thanks for the time and effort to put this together!
Works great! Thank you for this.
Takes about 20-30 seconds on a 237x276 image.
I started looking into writing a roByteArray as a PNG and was at the stage of needing a deflate encoder (not needed for compression, just wrapping a raw encoding).
Hopefully the speed will increase soon. Let us know if that happens. 🙂
Thanks for the time and effort to put this together!
MSGreg
Visitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-18-2012
02:49 AM
Re: PNG Encoder for Roku Proof of Concept
Quick profiling, I know the headers aren't going to mean much without my specific code, but below is the breakdown of milliseconds of each section. Some are subsets of others. Letters A,B, etc are in toPNG, tC is toChunk, CDC is CreateDataChunk.
Significant time is in updateCRC (tc5), and a significant portion of that is rdRightShift and rdXOR, mostly rdRightShift.
Is there any way to improve the following function for a logic right shift? Looks like the main use in updateCRC is a shift by 8, so an optimization using count = 8 might suffice.
Pulling the XOR inline saves about 3 seconds out of the 20 spent on updateCRC
Start Convert
Time tC1: 1
Time tC2: 0
Time tC3: 0
Time tC4: 0
Time xor1: 0
Time shift 1
Time xor2: 0
Time xor1: 0
Time shift 0
Time xor2: 0
Time tC5: 5
Time tC6: 1
Time A: 8
Time CDC1: 1594
raw.count = 261924
Time tC1: 1
Time tC2: 0
Time tC3: 0
Time tC4: 4
Time xor1: 0
Time shift 0
Time xor2: 0
Time xor1: 3299
Time shift 14592
Time xor2: 2869
Time tC5: 20766
Time tC6: 3
Time B: 27484
Time tC1: 0
Time tC2: 0
Time tC3: 0
Time tC4: 0
Time xor1: 1
Time shift 0
Time xor2: 1
Time xor1: 0
Time shift 0
Time xor2: 0
Time tC5: 4
Time tC6: 0
Time C: 6
Time 😧 7
PNG Count = 262032
That's all I have time for. I will review again when I actually need this function.
Significant time is in updateCRC (tc5), and a significant portion of that is rdRightShift and rdXOR, mostly rdRightShift.
Is there any way to improve the following function for a logic right shift? Looks like the main use in updateCRC is a shift by 8, so an optimization using count = 8 might suffice.
' ********************************************
' * Right logical (non-sign-extending) shift *
' ********************************************
function rdRightShift(num as integer, count = 1 as integer) as integer
mult = 2 ^ count
summand = 1
total = 0
for i = count to 31
if (num and summand * mult)
total = total + summand
end if
summand = summand * 2
end for
return total
end function
Pulling the XOR inline saves about 3 seconds out of the 20 spent on updateCRC
for n = 0 to lastn
index = ((c and not buf[n]) or (not c and buf[n])) and &hFF
'index = rdXOR(c, buf[n]) and &hFF
shiftedc = rdRightShift(c, 😎
c = ((shiftedc and not t[index]) or (not shiftedc and t[index]))
'c = rdXOR(t[index], shiftedc)
end for
Start Convert
Time tC1: 1
Time tC2: 0
Time tC3: 0
Time tC4: 0
Time xor1: 0
Time shift 1
Time xor2: 0
Time xor1: 0
Time shift 0
Time xor2: 0
Time tC5: 5
Time tC6: 1
Time A: 8
Time CDC1: 1594
raw.count = 261924
Time tC1: 1
Time tC2: 0
Time tC3: 0
Time tC4: 4
Time xor1: 0
Time shift 0
Time xor2: 0
Time xor1: 3299
Time shift 14592
Time xor2: 2869
Time tC5: 20766
Time tC6: 3
Time B: 27484
Time tC1: 0
Time tC2: 0
Time tC3: 0
Time tC4: 0
Time xor1: 1
Time shift 0
Time xor2: 1
Time xor1: 0
Time shift 0
Time xor2: 0
Time tC5: 4
Time tC6: 0
Time C: 6
Time 😧 7
PNG Count = 262032
That's all I have time for. I will review again when I actually need this function.
GPF
Visitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-18-2012
09:43 AM
Re: PNG Encoder for Roku Proof of Concept
Yeah I was just looking into this last night. I haven't done any timing on it though, or know the frequency of negative numbers being right shifted.
A couple other changes was looking into.
in the createDataChunk instead of looping through the whole bitmap to add the png scanline filter(0) to the start of each scanline, not sure if this is faster or not.
and in the rdByteArray.brs was experimented with different ways to copy a subset of a bytearray to a new bytearray.
Copy via using a substring
Copy via a temp file
Another possibility is to create a palette chunk from the bitmap data if the # of colors is less the 256 and then the actually IDAT chunk would only need to be palette indexes which would reduce the amount of data that is processed by the crc32/adler32/zlib deflate . Also looking into breaking the pixel data up into multiple IDAT chunks section to see if that is any faster but would make the png even larger with some additional book keeping data.
Thanks,
Troy
function rdRightShift(num as integer, count = 1 as integer) as integer
if Sgn(num) then
mult = 2 ^ count
summand = 1
total = 0
for i = count to 31
if (num and summand * mult)
total = total + summand
end if
summand = summand * 2
end for
else
total=num/(2^count)
end if
return total
end function
A couple other changes was looking into.
in the createDataChunk instead of looping through the whole bitmap to add the png scanline filter(0) to the start of each scanline, not sure if this is faster or not.
hxstr=pixels.toHexString()
regfind="(.{"+(width*8).toStr()+"})"
r = CreateObject("roRegex",regfind , "")
hxstr=r.ReplaceAll(hxstr, "\100")
hxstr="00"+hxstr.left(hxstr.len()-2)
raw.FromHexString(hxstr)
and in the rdByteArray.brs was experimented with different ways to copy a subset of a bytearray to a new bytearray.
Copy via using a substring
sraw=source.toHexString()
strraw=sraw.Mid( (index_start*2), (index_end-index_start)*2)
dest.FromHexString(strraw)
Copy via a temp file
tmp = rdTempFile(".byt")
print "Byte File "+tmp
source.writefile(tmp,index_start,(index_end-index_start))
dest.readfile(tmp)
DeleteFile(tmp)
Another possibility is to create a palette chunk from the bitmap data if the # of colors is less the 256 and then the actually IDAT chunk would only need to be palette indexes which would reduce the amount of data that is processed by the crc32/adler32/zlib deflate . Also looking into breaking the pixel data up into multiple IDAT chunks section to see if that is any faster but would make the png even larger with some additional book keeping data.
Thanks,
Troy
renojim
Community Streaming Expert
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-18-2012
05:08 PM
Re: PNG Encoder for Roku Proof of Concept
I can speed up the updateCRC function a little:
By the way, the PNG encoding is about 4x faster on a Roku 2 over a Roku 1.
It would be nice if there was a built-in function to save a region to a PNG or JPEG.
-JT
updateCRC: function(crc as integer, buf as object) as integer
c% = crc
for n = 0 to buf.count() - 1
index% = ((c% and not buf[n]) or (not c% and buf[n])) and &hFF
shiftedc% = (c% and &hFFFFFF00)/256
shiftedc% = shiftedc% and &hFFFFFF
c% = ((shiftedc% and not m.CRCTABLE[index%]) or (not shiftedc% and m.CRCTABLE[index%]))
end for
return c%
end function
By the way, the PNG encoding is about 4x faster on a Roku 2 over a Roku 1.
It would be nice if there was a built-in function to save a region to a PNG or JPEG.
-JT
Roku Community Streaming Expert
Help others find this answer and click "Accept as Solution."
If you appreciate my answer, maybe give me a Kudo.
I am not a Roku employee.
Help others find this answer and click "Accept as Solution."
If you appreciate my answer, maybe give me a Kudo.
I am not a Roku employee.
MSGreg
Visitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-18-2012
07:51 PM
Re: PNG Encoder for Roku Proof of Concept
Nice, renojim!
I was wondering how much time could be saved by removing the boxing/unboxing (using the %), so thanks for that! I modified slightly to remove the AA lookup and the repeated calls to Count (though, not sure if Brightscript evaluates the "TO" on a for loop repeatedly or not, some languages do some don't).
GPF, the frequency of negative numbers seems about 30-50%, and I would suspect that testing "< 0" rather than the function call of Sgn() would be much better performance.
The updateCRC function is now about 6x faster. This is code:
Setting Alpha to 252
Done Setting Alpha
Start Convert
Time tC1: 0
Time tC2: 0
Time tC3: 1
Time tC4: 0
Time tC5: 2
Time tC6: 1
Time A: 5
Time CDC1: 1604
raw.count = 261924
Time tC1: 0
Time tC2: 0
Time tC3: 0
Time tC4: 4
Time tC5: 3295
Time tC6: 3
Time B: 10186
Time tC1: 0
Time tC2: 0
Time tC3: 0
Time tC4: 0
Time tC5: 2
Time tC6: 0
Time C: 4
Time 😧 8
PNG Count = 262032
I was wondering how much time could be saved by removing the boxing/unboxing (using the %), so thanks for that! I modified slightly to remove the AA lookup and the repeated calls to Count (though, not sure if Brightscript evaluates the "TO" on a for loop repeatedly or not, some languages do some don't).
GPF, the frequency of negative numbers seems about 30-50%, and I would suspect that testing "< 0" rather than the function call of Sgn() would be much better performance.
The updateCRC function is now about 6x faster. This is code:
updateCRC: function(crc as integer, buf as object) as integer
c% = crc
lastn = buf.count() - 1
t = m.CRCTABLE
for n = 0 to lastn
index% = ((c% and not buf[n]) or (not c% and buf[n])) and &hFF
shiftedc% = (c% and &hFFFFFF00)/256
shiftedc% = shiftedc% and &hFFFFFF
c% = ((shiftedc% and not t[index%]) or (not shiftedc% and t[index%]))
end for
return c%
end function
Setting Alpha to 252
Done Setting Alpha
Start Convert
Time tC1: 0
Time tC2: 0
Time tC3: 1
Time tC4: 0
Time tC5: 2
Time tC6: 1
Time A: 5
Time CDC1: 1604
raw.count = 261924
Time tC1: 0
Time tC2: 0
Time tC3: 0
Time tC4: 4
Time tC5: 3295
Time tC6: 3
Time B: 10186
Time tC1: 0
Time tC2: 0
Time tC3: 0
Time tC4: 0
Time tC5: 2
Time tC6: 0
Time C: 4
Time 😧 8
PNG Count = 262032
MSGreg
Visitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-18-2012
08:54 PM
Re: PNG Encoder for Roku Proof of Concept
CDC1 sped up by a few tenths of a second (about 30%) in pngRoku.brs
raw = CreateObject("roByteArray") ' highest index of raw will be dest%+4*width
'raw.setresize(4*(width*(eheight-sheight)) + (eheight-sheight), false)
'print "4*sheight*width = "+(4*sheight*width).toStr()
end% = eheight -1
for y = sheight to end%
raw[dest%] = 0 ' No filter
dest%=dest%+1
mult% = 4*sheight*width
wend% = dest%+4*width-1
source% = source% + mult%
for dest% = dest% to wend%
raw[dest%] = pixels[source%] 'red
'dest%=dest%+1
source%=source%+1
end for
end for
print "Time CDC1: ";timer.TotalMilliseconds() : timer.Mark()
raw = CreateObject("roByteArray") ' highest index of raw will be dest%+4*width
'raw.setresize(4*(width*(eheight-sheight)) + (eheight-sheight), false)
'print "4*sheight*width = "+(4*sheight*width).toStr()
end% = eheight -1
for y = sheight to end%
raw[dest%] = 0 ' No filter
dest%=dest%+1
mult% = 4*sheight*width
wend% = dest%+4*width-1
source% = source% + mult%
for dest% = dest% to wend%
raw[dest%] = pixels[source%] 'red
'dest%=dest%+1
source%=source%+1
end for
end for
print "Time CDC1: ";timer.TotalMilliseconds() : timer.Mark()
MSGreg
Visitor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-18-2012
09:18 PM
Re: PNG Encoder for Roku Proof of Concept
One more data point:
Before improvements below,
CDC1: 1.5 seconds
B: 10.2 seconds
tc5: 3.2 seconds
Timer B is timing in pngRoku.brs:
data = createDataChunk(width, startheight,endheight, pixels,true)',false)
Timer tc5 is timing in pngRoku.brs in toChunk():
crc = rdCRC().updateCRC(crc,bnid)
crc = rdCRC().updateCRC(crc,traw)
My experience is that indexing into a roByteArray is fairly quick and probably does not need to be reviewed for further optimization unless you can get rid of the array entirely. What DOES make sense is arranging it so that it doesn't need to reallocated more than once. Reallocating an array could cause a copy if the new size doesn't fit in the heap space available after the old array.
In Zlibdeflate.brs inside writeUncompressDeflateBlock, there's more rdRightShifts by 8. Replacing those, we get good results.
About a 3x overall improvement. I think I would need another order of magnitude to be real useful. Looking for 0.2 to MAYBE 1 second total operation on a 256x256 block. I wish there were an >> and >>> operators for right shift logic and right shift arithmetic.
Done Setting Alpha
Start Convert
Time tC1: 0
Time tC2: 0
Time tC3: 1
Time tC4: 0
Time tC5: 2
Time tC6: 1
Time A: 5
Time CDC1: 1039
raw.count = 261924
Time tC1: 0
Time tC2: 1
Time tC3: 0
Time tC4: 4
Time tC5: 2886
Time tC6: 2
Time B: 8631
Time tC1: 1
Time tC2: 0
Time tC3: 0
Time tC4: 0
Time tC5: 2
Time tC6: 0
Time C: 5
Time 😧 8
PNG Count = 262032
I haven't looked in detail, but if a lot of data is being copied unnecessarily, then that might be a further improvement. Another option to avoid copying is to pass in source and destination indexes and do the copying while writing in the first place. I'm guessing maybe 50% to 70% potential improvement, maybe more. Good work so far. I'm done on this today.
Before improvements below,
CDC1: 1.5 seconds
B: 10.2 seconds
tc5: 3.2 seconds
Timer B is timing in pngRoku.brs:
data = createDataChunk(width, startheight,endheight, pixels,true)',false)
Timer tc5 is timing in pngRoku.brs in toChunk():
crc = rdCRC().updateCRC(crc,bnid)
crc = rdCRC().updateCRC(crc,traw)
My experience is that indexing into a roByteArray is fairly quick and probably does not need to be reviewed for further optimization unless you can get rid of the array entirely. What DOES make sense is arranging it so that it doesn't need to reallocated more than once. Reallocating an array could cause a copy if the new size doesn't fit in the heap space available after the old array.
In Zlibdeflate.brs inside writeUncompressDeflateBlock, there's more rdRightShifts by 8. Replacing those, we get good results.
About a 3x overall improvement. I think I would need another order of magnitude to be real useful. Looking for 0.2 to MAYBE 1 second total operation on a 256x256 block. I wish there were an >> and >>> operators for right shift logic and right shift arithmetic.
Done Setting Alpha
Start Convert
Time tC1: 0
Time tC2: 0
Time tC3: 1
Time tC4: 0
Time tC5: 2
Time tC6: 1
Time A: 5
Time CDC1: 1039
raw.count = 261924
Time tC1: 0
Time tC2: 1
Time tC3: 0
Time tC4: 4
Time tC5: 2886
Time tC6: 2
Time B: 8631
Time tC1: 1
Time tC2: 0
Time tC3: 0
Time tC4: 0
Time tC5: 2
Time tC6: 0
Time C: 5
Time 😧 8
PNG Count = 262032
I haven't looked in detail, but if a lot of data is being copied unnecessarily, then that might be a further improvement. Another option to avoid copying is to pass in source and destination indexes and do the copying while writing in the first place. I'm guessing maybe 50% to 70% potential improvement, maybe more. Good work so far. I'm done on this today.
destruk
Streaming Star
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-18-2012
09:58 PM
Re: PNG Encoder for Roku Proof of Concept
I still don't understand how this would be useful - unless the Roku is sending the result png file back to an external server? Wouldn't it be much quicker to supply an image in a format the roku can already use to display - or was the point of this simply an excercise in which case can you have it convert a PNG file to say, an TIF file for me - just to prove it can be done?

TheEndless
Channel Surfer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-18-2012
10:30 PM
Re: PNG Encoder for Roku Proof of Concept
"destruk" wrote:
I still don't understand how this would be useful - unless the Roku is sending the result png file back to an external server? Wouldn't it be much quicker to supply an image in a format the roku can already use to display - or was the point of this simply an excercise in which case can you have it convert a PNG file to say, an TIF file for me - just to prove it can be done?
Performance aside, one of the primary benefits of something like this would be dynamic image creation for use on the built-in screens. As an example, consider a sports channel that has 30 teams. It would be a horribly tedious and space consuming task to create separate images for every single possible matchup. If you could dynamically build a matchup poster based on the individual team logos, then it could provide much more flexibility.
Take it even further, and say you were developing a game and wanted to provide functionality to allow players to upload screenshots of their high scores to various social media sites... or maybe provide the ability to design an avatar in game, and upload it to an associated website... or even capture screens for upload to the server in particular error scenarios...
I can think of countless uses for the ability to save bitmaps to PNGs/JPGs...
My Channels: http://roku.permanence.com - Twitter: @TheEndlessDev
Instant Watch Browser (NetflixIWB), Aquarium Screensaver (AQUARIUM), Clever Clocks Screensaver (CLEVERCLOCKS), iTunes Podcasts (ITPC), My Channels (MYCHANNELS)
Instant Watch Browser (NetflixIWB), Aquarium Screensaver (AQUARIUM), Clever Clocks Screensaver (CLEVERCLOCKS), iTunes Podcasts (ITPC), My Channels (MYCHANNELS)