From native code to browser: Flash, Haxe, Dart or asm.js?
November 17, 2014
If you developed your own video codec and wanted to watch the video in a browser
what would you do? That is a question we faced a few years ago with
ScreenPressor and at that time
the answer was Flash. It was cross-platform, cross-browser, widely available
and pretty fast if you use the right programming language, i.e. Haxe instead of ActionScript.
So we implemented a decoder
(and a player) in Flash back then.
But now Flash is clearly on decline, supported only on desktop, meanwhile different browsers
So I decided it's time to check how JS can compare to Flash when it comes to
computation-intensive task such as a video codec. I really don't like the idea
use something that at least gets closures, objects and modules right and has
some static type checking. Since I already had working code in C++ for native code
and in Haxe for Flash, obvious choices were using Emscripten to generate asm.js from
C++ code and retargeting Haxe code to JS (just another target for Haxe compiler).
Also, Dart is pretty close as a language (porting to Dart is simpler than rewriting in
some Haskell or Lisp clones) and Dart VM is marketed as a faster and better replacement
for JS engines, so I was curious to try it.
To test and compare different languages and compilers I decided to implement
in them a small part of the codec, the most CPU consuming one:
decompression of a key frame to RGB24. I'm going to show the results first and then
follow with some notes on each language.
Here are the times for decompressing one particular 960x540 px frame on a laptop
with a 2.4 GHz Core i3 CPU and Windows 8.1:
Or in text:
Follow the links to run the benchmarks for yourself.
And here's some mobile story, time of the same operation in ms:
| A tablet with 1.3 GHz CPU on Android 4.2.2
| Firefox 33.1 || Chrome 39
|Haxe to JS || 250 || 289
|ASM.js || 197 || 330
| A phone with 1.4 GHz CPU on Android 2.3.5, Firefox 33.0
|Flash || 403
|Haxe to JS || 313
|ASM.js || 296
Another curious comparison:
|Compilation time, in seconds:
|Haxe to JS || 0.18
|Haxe to Flash || 0.13
|dart2js || 10.55
|Emscripten || 3.44
By the way, the picture used in the test shows a lossless compression from 960x540 = 518400
pixels = 1555200 bytes of RGB24 data to 149321 bytes, i.e. ~10x lossless compression.
I couldn't reach similar size with PNG even with special tools, and JPG at this size shows
Although people keep repeating "Flash is slow" it's actually pretty fast. 1.67 times slower than
native C++ here, and in some other tests of mine sometimes only 20% slower. That's comparable to
Java, C# or Go, and faster than most other languages/VMs. Of course, only if you use Haxe; with
ActionScript it can easily be twice slower.
faster than Flash and
be just 1.3 - 2 times slower than native C++ code. Which is really impressive, taking into account
its lack of static types and simple integer values (every number is a double there). Internet
Explorer is somewhat behind, Flash is still the fastest option there.
Among the tested languages none is a clear winner in all browsers, each browser has its own
favorite. For example, asm.js is really great in Firefox but only because there is a special
ahead of time compiler in Firefox that turns on for this code.
Other browsers treat asm.js code as ordinary
Haxe is really great both in speed of generated code, its size and speed of code generation
itself. A lesson to future compiler makers: if you want your compiler to be really fast
use OCaml, not Java! Haxe is more complex than Dart language-wise, having more sophisticated
type system, real type inference and some macros, and yet it translates
freaking 60 times faster.
Mobile web apps: slooow. Even when CPUs have only 2x lower frequency, due to ARM vs. Intel
differences they turn out to be 5-6 times slower. But who knows, they may catch up in a couple of
Some notes on particular languages / targets:
Haxe targeting Flash
If you need compact code that works consistently fast in all desktop browsers, Flash compiled
from Haxe is a really nice option. It's compact because of being distributed in bytecode.
In Flash 10 they added special instructions for fast direct memory access. These instructions
were not available in ActionScript but Alchemy (C++ to Flash compiler) and Haxe can use them.
They are available via flash.Memory API and work on a single array: you select some array first
to be this fast piece of memory and then use functions like Memory.getI32() and Memory.setI32()
to access it, this works faster than ordinary arrays, but if you need to access many different
arrays you have to manually allocate them inside this selected one and use indices with offsets.
Also, Haxe generally
optimizes code much better than ActionScript, this combined with special memory API gives very
fast code compared to AS3.
As you could see above, Haxe compiler generates Flash incredibly fast, while the default
ActionScript compiler (again, written in Java) is significantly slower.
Haxe is a multi-target language, however each target has some specific APIs and there are
also some semantic differences. I had a fully functioning ScreenPressor decoder in Haxe
for Flash, but making a good JS version of it turned out harder than I expected. After
changing API from Flash to JS (firstly, moving from flash.Memory to typed JS arrays)
I've got a working JS version but it took more than 170 ms in Chrome to decode that frame.
I knew it was too slow, at that time I already had a JS version generated from Dart
and it worked ~3x faster. The slowdown was caused by UInts. In ScreenPressor we use range coder,
a variant of arithmetic coder, and in our original C++ code it operates on 32-bit unsigned
integers, doing some arithmetic and bit shifts.
Haxe has a proper type for them - UInt, and in Flash it works perfectly fine.
operation, like a shift or bitwise-or, turns its operands and result into a signed 32-bit int value
(still stored as a double).
That means 0xFF00 << 16 becomes a negative number. In order to keep UInts working,
every time we use some UInt in an arithmetic expression Haxe inserts a comparison with 0
and addition of 4294967296.0 in case its JS value is negative. Comparisons with positive
constants for some reason turn into weird calls to in-place lambda functions containing
some weird constant comparisons. All these things make arithmetics with UInts very slow, hence
the 3x slowdown of our code. What's interesting, while Haxe keeps UInts 32-bit, it doesn't keep
32-bitness of Ints (signed integers) in JS, so they allow values like 0xFFFFFFFF (which would
become -1 in other Haxe targets). But again, any bitwise operation with them can turn them back
into 32-bit signed int. So, when targeting JS from Haxe, if you want UInts, i.e.
values 0 .. 0xFFFFFFFF, you can use Ints as long as you don't use shifts or bitwise logical
operations. Just replace "<< 8" with "* 256" and "a | b" with "a + b" (where appropriate)
and it works well and fast. Changing our UInts to Ints and doing these transformations allowed
having simple and fast generated JS code.
Another note: the JS code generated by Haxe actually failed to work in IE, saying it cannot
cast HTMLInputElement into HTMLInputElement. After editing the generated code manually to
skip this cast it worked fine.
Dart is a nice language (although its generics are too limited, someone ate too much Java
for breakfast), it comes as a package with Dartium (version of Chrome with Dart VM on board)
and DartEditor that does static analysis as you
type, infers types and shows lots of useful info in auto-completion pop-ups. Its API,
Porting our Haxe code to Dart went smoothly. The surprise came when I tried to use dart2js
and do some benchmarks. Code in Dart VM worked ~3x slower than the same code translated to JS
with dart2js! And the reason was again our integers usage. There is only one integer type
in Dart - int. Semantically it's an unbound integer. Internally in 32-bit Dart VM values
that fit into 31 bits are called "smi" ("small int") and stored in a 32-bit word. Larger values
are stored in boxed 64-bit integers and even larger values are using full-blown BigInts.
When we write something like "x = 0xFFFFFFFF" it's a positive number with 32 "1"s in binary, you
need at least 33 bits to store this number as a signed integer. So it doesn't fit in a "smi" and
gets boxed. And although our original code only needs 32-bit unsigned integers, in 32-bit Dart
VM many of them turn into boxed 64-bit values, and this causes the slowdown. When translated to
to Vyacheslav "mr_aleph" Egorov, a Dart VM group insider, for explaining this. He shows that
in 64-bit Dart VM my code runs 26% faster than generated JS. However there is no 64-bit
Dartium available for Windows yet.
There is, however, one obvious thing that makes dart2js generate less efficient code than Haxe.
Each array access gets preceded by an explicit bounds check and a call to a Dart-specific
function in case of bounds error. It doubles the number of bounds checks and bloats the code,
hence the slowdown compared to cleaner Haxe-generated code.
Emscripten and asm.js
Emscripten uses clang/LLVM to generate asm.js code from C++. Hearing about clang and LLVM
made me instinctively expect difficulties using on Windows and I was ready to reboot to Linux
for this experiment however it turned out on Windows installation of Emscripten is the easiest.
Just an one click installer, and everything works well out of the box.
with its fast memory access inside a single array, asm.js works with one fixed size array
which serves as the main memory (heap) for all your C++ code.
You can't just pass two JS arrays into
your C++ code, you need to manually allocate memory in this asm.js heap and copy the data there.
After C++ code finished work on them, you can read the data from the array being the asm.js heap.
Porting our C++ code to asm.js was easy, since it was just pure algorithms and computations, no
external libraries were used. I don't know how exactly Emscripten handles the uints question
but everything worked well and pretty fast without me having to worry about it and having to
turn uints into ints. The size of generated code was expectedly the largest: ~500 KB without
minification and ~200 KB with it. Since the JS code is generated from LLVM bitcode, it's rather
hard to trace it back to the source manually, so I didn't even try.
As for speed, as mentioned above, it only works really fast in Firefox where asm.js code
gets special treatment. In other browsers it's not faster than much shorter code generated by
other languages and sometimes (like in IE) significantly slower.
From the numbers above and mentioned preconditions I think it's pretty obvious what choice
we're going to make: Haxe seems the best option for us to make the ScreenPressor decoder in JS.