Roku Community

EnTerr · ‎09-27-2016

When i wrote this viewtopic.php?f=34&t=96515, something bothered me about the example - the hash dictionary access seemed unusually slow. And so i went back to re-test but the result was the same. Here is slightly simplified example:

BrightScript Debugger> N = 100000: ti = createObject("roTimeSpan")
BrightScript Debugger> d = {}: for i = 1 to N: d[i.toStr()] = i: next
BrightScript Debugger> ti.mark(): for i = 1 to N: value = d["42"]: next: ? ti.totalMilliSeconds()
 52855

So it takes 53 seconds to fetch a value 100,000 times. That makes access on a Roku3. Which is a HUGE number. Massively slow. By a seat-of-the-pants estimate that should been... eh, somewhere within 1-10us on said player - in other words 50x-500x faster.

What's going on, am i doing something wrong?
I can't imagine that's normal. No hash function is that slow.
What am i missing?

RokuKC · ‎09-27-2016

"EnTerr" wrote:
...
BrightScript Debugger> N = 100000: ti = createObject("roTimeSpan")
...
What's going on, am i doing something wrong?

BrightScript wasn't implemented to support AAs containing 100K items.
Nor even 1K, for that matter. 😉

EnTerr · ‎09-28-2016

"RokuKC" wrote:
BrightScript wasn't implemented to support AAs containing 100K items.
Nor even 1K, for that matter. :wink:

But... it's an associative array? :cry: ^
Ok, give it to me straight, doc - when do we get S.O.L. with roAssociativeArray? Till when is it sub-linear; what's the "Big O"?

(^) Cue Azrael's amazement from "Dogma": "But I'm a <censored> demon?"
Lots of implementations available (incl. perks like memory efficient, preserving order, blah-blah) - all of them with amortized access O(1), regardless of size?
Am i too spoiled for taking this for granted?
Am i the only developer who considers BrightScript a decent general-purpose (scripting) language?
And if maintaining B/S is too hard, shouldn't we just switch to a small, fast, portable, well-maintained language like Lua?

EnTerr · ‎10-04-2016

So i did some experimenting today (on a #4200) to see for what dictionary sizes does B/S behave reasonably - and compare to another as a baseline. Sizes tested were consecutive powers of 2, and B/S flew off the handle so fast that to fit all points i had to use a chart with logarithmic scales:

Horizontal axis is size of the AA. Vertical is the time in microseconds to index a value (i.e. AA), averaged over 2^16 sample block. So we see how past 500-1000 it turns linear (or super-linear?), doubling individual time with each doubling of .count(). That is, it starts behaving O(N) - where a hash dictionary should stay O(1). In a way i sugar-coated it here by using log-log chart: the "hockey stick" looks much worse/steeper if scale was linear.

Conclusion: use roAssociativeArray preferably with <300 elements. Past 1000, roAA turns into molasses/tarpit.

Let's zoom in at the lower values, where roAA is usable, here is a linear (lin-lin) graph:

For small dictionaries, B/S is usable - but it would be a stretch to say it was "optimized" for small sizes, since Lua does dictionary access 7x faster even when B/S is at its best. Not sure why such big difference. It can't be the case-insensitivity, since it could just always hash based on lower-cased key and then resolve the collisions based on case-sensitivity.

I am including the code i used to test, so someone can check for flaws in my approach - as well as the data, so that one can draw their own charts and conclusions:
[spoiler=gory details:39dt7gim]Brightscript code and output:


  for k = 1 to 16:
    N = 2^k
    n_loops = 2^16
  
    dict = {}
    for i = 0 to N-1:
      dict[i.toStr()] = 7
    next
  
    keys = []
    for i = 1 to n_loops: 
      keys.push((i mod N).toStr())
    next
  
    ti = createObject("roTimeSpan")
    sum = 0
    RunGarbageCollector()
    ti.mark()
    for each key in keys:
      sum += dict[key]
    next
    time = 1e3 * ti.totalMilliSeconds() / n_loops
  
    RunGarbageCollector()
    ti.mark()
    for each key in keys:
      sum += 7
    next
    ref_time = 1e3 * ti.totalMilliSeconds() / n_loops
  
    print N, time, time - ref_time
  next


2 1.40381 0.640869
4 1.34277 0.564575
8 1.34277 0.579834
16 1.38855 0.62561
32 1.46484 0.686646
64 1.4801 0.701904
128 1.5564 0.778198
256 1.75476 0.976562
512 2.13623 1.32751
1024 2.91443 2.15149
2048 4.98962 4.22668
4096 10.6049 9.82666
8192 22.7814 22.0184
16384 53.5126 52.7496
32768 127.472 126.694
65536 327.148 326.385

Lua code and output:


for k = 1, 16 do
  local N = 2^k
  local n_loops = 2^16
  
  local dict = {}
  for i = 0, N-1 do
    dict[tostring(i)] = 7
  end
  
  local keys = {} 
  for i = 1, n_loops do 
    keys[i] = tostring(i % N)
  end
  
  local sum = 0
  collectgarbage('stop') -- just in case, to avoid GC peaks
  local start = socket.gettime()
  for _, key in pairs(keys) do 
    sum = sum + dict[key]
  end
  local time = 1e6 * (socket.gettime() - start) / n_loops
  
  start = socket.gettime()
  for _, key in pairs(keys) do 
    sum = sum + 7
  end
  local ref_time = 1e6 * (socket.gettime() - start) / n_loops
  collectgarbage('restart')
  
  print(N, time, time - ref_time)
end


2 0.640869985 0.076295692
4 0.656127668 0.106811058
8 0.640869985 0.076295692
16 0.656127668 0.106811058
32 0.671388989 0.122076017
64 0.640869985 0.091553375
128 0.656127668 0.091549737
256 0.701904355 0.152587745
512 0.717165676 0.183110387
1024 0.717165676 0.167849066
2048 0.762942364 0.213625754
4096 0.808715413 0.198364432
8192 0.930784154 0.381467544
16384 0.991822162 0.442505552
32768 1.068114216 0.503539923
65536 1.15966759 0.579831976

[/spoiler:39dt7gim]

RokuKC · ‎10-04-2016

"EnTerr" wrote:
So i did some experimenting today (on a #4200) to see for what dictionary sizes does B/S behave reasonably - and compare to another as a baseline.

Impressive work! 8-) Where do you find the time...

I assume you are comparing with lua under Marmalade?

Stress testing and micro-performance analysis is interesting, but generally not the gating factor on usability for Roku apps, in my experience.

That said, performance enhancement request noted. Implementation details are always subject to change. 😉

EnTerr · ‎10-17-2016

"RokuKC" wrote:
Impressive work! 8-) Where do you find the time...

Since there was no further info, i figured i can test on my own.
Like they say, "If the mountain won't come to Muhammad then Muhammad must go to the mountain." 🙂
I had started doubting my sanity since i always assumed that hash dictionaries have O(1) access time. Everywhere, always. So maybe i was wrong all along? I wanted to see on a practical example that at least some language keeps constant access time as dictionary size increases - and so Lua did, B/S did not. Good, i was not the crazy one.

Perhaps there should be a note in TFM to avoid using roAA of .count()>1000, no?

I assume you are comparing with lua under Marmalade?

Indeed. Since i am ~~lazy~~ virtuous, Larry-Wall-style. :mrgreen:
But there is nothing special about the Lua they have used - in fact it's very garden-variety, off-the-rack Lua 5.1.4 - which is circa 2008, per the distro list. 8 year old interpreter - no JIT, no native-code juicing - yet works quite well, doesn't it? But the best part might be the small memory footprint and its design for embedding...

Roku Community

Roku Developer Program

roAssociativeArray - why so tragically slow?!

Re: roAssociativeArray - why so tragically slow?!

Re: roAssociativeArray - why so tragically slow?!

Re: roAssociativeArray - why so tragically slow?!

Re: roAssociativeArray - why so tragically slow?!

Re: roAssociativeArray - why so tragically slow?!