Tracing the Web

We have landed!

For the last two months I have been working with Mozilla on a just-in-time compiler for the JavaScript engine in Firefox, and a few hours ago this project (codenamed TraceMonkey) has landed in the main Firefox development tree.

TraceMonkey is a trace-based JIT compiler and it pushes the envelope on JavaScript performance. On average, we speed up Apple’s popular SunSpider benchmarks by factor 4.6 over the last release of Firefox. The overall runtime of SunSpider improved by about 1.83x (parts of SunSpider excercise things like the regular expression engine which is outside of the scope of JIT compilation, hence the lesser overall speedup). For the SunSpider ubench suite, which focuses on core JavaScript language features, we achieve a speedup of 22x. Whichever metric you chose to apply, Firefox now has the fastest JavaScript engine in the world.

TraceMonkey Performance relative to Firefox 3.0

TraceMonkey Performance relative to Firefox 3.0

Mike Schroepfer put together a demo showing the real-world performance impact of TraceMonkey. You should also check out Brendan Eich’s and Mike Shaver’s blog post about TraceMonkey, as well as David Anderson’s updates on our 64-bit x86 port (yes, we do 64-bit!).

Dynamic Compilation with Traces

Traditional just-in-time compilers (like Sun’s Hotspot VM) are in their design and structure very similar to static compilers (like GCC). They observe which methods get executed frequently, and translate the method into native machine code once a certaint threshold has been reached. While such methods often contain performance-critical parts (such as loops), they often also contain slow paths and non-loopy code, which barely if at all contributes to the runtime of the method. A whole-method compiler, however, has to always analyze and translate the entire method, even if parts of it are not particularly “compilation-worthy”.

Trace-based compilation takes a very different approach. We monitor the interpretation of bytecode instruction by the virtual machine and scan for frequently taken backwards branches, which are an indicator for loops in the underlying program. Once we identify such a loop start point, we follow the interpreter as it executes the program and record the sequence of bytecode instructions that get executed along the way. Since we start at a loop header, the interpreter will eventually return to this entry point once it completed an iteration through the loop. The resulting linear sequence of instructions is what we call a trace.

Traces represent a single iteration through a loop, and can span multiple methods and program modules. If a function is invoked from inside a loop, we follow the function call and inline the instructions executed inside the called method. Function calls themselves are never actually recorded. We merely verify at runtime that the same conditions that caused that function to be activated still hold.

Trace Trees and Nested Trace Trees

TraceMonkey uses a particular flavor of trace-based compilation which I described in my dissertation: Trace Trees. Loops often consist of more than a single performance-relevant path, and Trace Trees allow to capture all of these and organize them in a tree-shaped data structure which can be compiled trace-by-trace yet produces a globally optimized result for the entire loop. To deal with nested loop, these trees can also be nested inside of each other, with outer loop trees calling the inner loop tree.

Control-Flow Graph representation of a loop with a nested condition.

Trace Tree for the code shown int he Control-Flow Graph. Traces are recorded starting at the loop header (A) and connect back to A after completing an iteration.

A particular advantage of Trace Trees is the fact that they always represent a loop and thus enter function frame and leave function frame operations are always balanced as long we stay on trace. Thus, we can actually completely optimize away the overhead of function calls. As long we stay on trace (which in case of a loop we usually do for many iterations), we don’t construct and destroy function frames. Instead, we simply execute the inlined trace we recorded. Function frames for inlined calls are only constructed should we detect that we have to leave the trace (for example because we reached the end of the loop).

Type Specialization

Trace Trees by their very nature are the result of a control-flow speculation. We speculate that loops tend to execute the same sequence of instructions over and over, which is usually true for many applications. In TraceMonkey we go a step further and also speculate on types.

JavaScript in contrast to Java or C/C++ is a dynamically typed language. Variables are declared by name only, and their type will be determined automatically once a value is assigned to them. Assigning values with different types to a variable changes the type of the variable on the fly to match the new value’s type. Executing such dynamically typed code has been traditionally fairly expensive. Type specialization eliminates much of this overhead.

Mike Shaver ran some benchmarks, comparing the performance of simple loops written in JavaScript and C. Our JIT generates code that is roughly equivalent to the performance of unoptimized C code (gcc -O0). We achieve this through aggressive type speculation. Whenever we see a program assign only integers to a variable, for example, we specialize the generated machine code to hold that variable in an integer machine register. Guards in the traces ensure that the type doesn’t unexpectedly change, in which case we leave the trace and let the interpreter handle this (unexpected and often infrequent) case.

Type specialization removes much of the principal overhead associated with dynamically typed languages, and as we further improve our JIT we expect to get fairly close to the performance of statically typed languages such as Java or C.

Traces Everywhere

Our work on TraceMonkey was done in close collaboration with Adobe’s Tamarin Tracing project. In fact, TraceMonkey and Tamarin Tracing share the same core tracing backend (nanojit), which was contributed by Adobe. Adobe has been criticized in the last few month for the slow performance of Tamarin Tracing on untyped JavaScript code. However, Tamarin Tracing is first and foremost a JIT compiler for ActionScript, a typed JavaScript dialect. While Tamarin Tracing does run untyped code, its not particularly optimized (yet) for this task.

TraceMonkey shows the full potential of Adobe’s nanojit backend when combined with a VM that was specifically designed and optimized for untyped JavaScript code (SpiderMonkey), and we expect much of our work to make its way into nanojit and Tamarin Tracing.

People

TraceMonkey was a tremendous group effort of a large group of extremely talented people. Much of the recent advances in the area of nested trees, aggressive type speculation and runtime type inference are based on work done by graduate students at our research group at UC Irvine (Michael Bebenita, Mason Chang, Marcelo Contra, Gregor Wagner and others). Our research was generously funded by a grant from the National Science Foundation (Principal Investigator Professor Michael Franz, Program Director Dr. Helen Gill) as well as grants and donations from Microsoft, Sun Microsystems, Intel, and last but not least Mozilla.

For me, it has been an amazing opportunity to spend the last two month here at Mozilla, turning our research ideas into actual product code. Its hard to describe what it feels like to work alongside people like Brendan Eich, the inventor of JavaScript, or Mike Shaver, Mozilla’s new VP Engineering and life-long JavaScript VM veteran. And even interns around here are rocket scientists. David Anderson, one of Mozilla’s interns, wrote a complete 64-bit backend for us over the summer, making TraceMonkey the first JavaScript JIT capable of targeting x86-64.

TraceMonkey was developed in close collaboration with Edwin Smith and his Tamarin Tracing team at Adobe, and the web at large owes Adobe a great deal of gratitude for open-sourcing the Tamarin and Tamarin Tracing VMs, allowing Mozilla to build TraceMonkey on top of Tamarin Tracing’s nanojit backend. nanojit is a small and highly efficient trace-based just-in-time compiler backend that is language agnostic and highly portable, and I think it has a bright future. It has just landed in Firefox, and hopefully we will see it pop up in a future release of Adobe’s Flash Player soon.

The Road Ahead

Landing in the central Firefox repository was a big step for us, but there is also definitively a lot of work ahead of us. We are now at the point where we trace a lot of code in benchmarks and on the web, but there is a lot more coverage we will add over time.

Also, we are far away from having exhausted all the potential of trace compilation and we plan to add many features and optimizations over the next few month. Our current speedups are just the beginning of whats possible:

  • Improve register allocation and code generation in nanojit.
  • Runtime analysis of builtins (machine code) to reduce spill overhead of builtin calls (Gregor Wagner from UCI did some work on this recently.)
  • Bring performance of the ARM backend up to par with x86 and x86-64 backends and add a PowerPC backend (joint work with Adobe).
  • Add tree-recompilation and parallel compilation (based on our prior work on Parallel Dynamic Compilation, Mohammad Haghighat from Intel has been looking into this for nanojit).
  • Add more advanced trace optimization techniques like Tree Folding, Load Propagation and Escape Analysis.

Our goal is to eventually close the performance gap between JavaScript and traditional desktop languages, and we believe that for many applications this will be possible.

In parallel to our work with Mozilla on JavaScript performance, we also have a number of exciting tracing-related projects going on at UC Irvine. Mason Chang, one of our graduate students, is currently working with Adobe on the Tamarin Tracing VM, adding context threading and trace visualisation. Michael Bebenita from UCI is currently interning with Sun Microssystems and has been making great progress integrating our Java trace compiler into Maxine, and we plan on switching to Maxine as our main research platform for Java compilation. Alexander Yermolovich (also UC Irvine) is working with Adobe this summer on an exciting project involving fast execution of rich dynamic content that Adobe will hopefully announce to the public soon.

If you are interested in their work, check out their blogs (linked from my website). For further reading material on traces and trace compilation you can also take a log at my earlier blog posts on this topic.

Update: Mason Chang did some benchmarks comparing TraceMonkey to Apple’s WebKit/SquirrelFish VM. Looks like we are on average 2.5x faster than SquirrelFish (about 15% faster on total runtime).

I am looking for a tenure-track faculty position for Fall 2009 to continue my research on virtual machines, dynamic compilation and type-safe languages.

This entry was posted in Trace Compilation. Bookmark the permalink.

51 Responses to Tracing the Web

  1. Nuss says:

    Awesome. Plain awesome.

  2. Pingback: John Resig - TraceMonkey

  3. Pingback: Ajaxian » JavaScript JIT: The Dream Gets Closer (in Firefox)

  4. Great work! So where is the Windows x86-64 version to test??

  5. Andreas says:

    As far as I know Mozilla doesn’t ship a win64 version of Firefox (but its supposed to be possible to compile it by hand I was just told). We will try to get some more test coverage for linux64 over the next few days. The JIT is fairly agnostic to linux/win, so once the JIT is stable the rest is up to the browser team as far as a win64 version is concerned.

  6. Thanks Andreas — I have Vista x64 on all our Windows machines, which we had to do ourselves. But I just got a new TouchSmart for home, and it comes with Vista x64 preinstalled (no other option), so I guess the transition is really starting to happen for 64-bit in the consumer space.

    Sadly I can’t get Minefield/Fx 3.1 to work on my page I wanted to test, even with JIT turned off. I guess I will have to wait…

  7. Pingback: Home of KaiRo: How Fast Is TraceMonkey In Real World?

  8. Pingback: JavaScript JIT: The Dream Gets Closer (in Firefox) | about ICT

  9. Pingback: To shell and back » Blog Archive » Firefox 3.1 to deliver massive Javascript performance improvements

  10. Pingback: TraceMonkey: DOM, Canvas, Opensource and more on Dion Almaer's Blog

  11. voracity says:

    Fantastic stuff. Very much appreciate your efforts in radically expanding the boundaries of possibility for both Mozilla and Javascript.

  12. Martino Pham says:

    Congrats! I took ics142b with you in spring ’08 and thought it really cool to read about your accomplishments with this project

  13. Dan says:

    I’d like to know what, if any, are the disadvantages of this new approach? More memory? A slowdown in certain circumstances? Or is it just win-win :-)

  14. Without having gone over the details of either, this reminds me of the work presented by Craig Chambers in his 1992 thesis “The Design and Implementation of the SELF Compiler, an Optimizing Compiler for Object-Oriented Programming Languages” and in particular chapter 10 on an optimisation technique called “Splitting”.

    It is nice to see dynamic languages making it to the fore front and leveraging their runtime nature as part the means for optimisations.

  15. Pingback: FireFox 3.1 mit TraceMonkey - Mozilla, JavaScript, Firefox, Browser, Opera, Entwickler, Kompiler, Applikationen, Unzulänglichkeiten, Verbesserung, Aktuell, Aktivierung, Usergemeinde, Benutzung, Tests, Erste, NightlyBuilds, Andreas, TraceMonkey, Allein -

  16. Pingback: JavaScript ve Firefoxu 3.1 trhne všechny rekordy - Martin Hassman: blog nejen o prohlížečích

  17. Philipp K. says:

    Andreas:
    “but its supposed to be possible to compile it by hand I was just told”
    Yes, in fact it is possible and you can get precompiled x86-64 builds for Windows at http://www.mozilla-x86-64.com and http://www.vector64.com already a long time.

  18. Pingback: Javascript-Beschleuniger fr Firefox - Leecher.To Rapidshare Forum

  19. Pingback: Der kommende Firefox beschleunigt Web-Anwendungen enorm | silicon.de

  20. Birotom says:

    We recently had lots of argument within our team, on whether the Javascript & AJAX approach to a rich GUI is the right one, or should we turn to something new, such as XAML. Our users experienced that the javascript based GUI is too slow, especially on slightly older machines. We definitely have to try this!

  21. Pingback: Firefox 3.1 soll JavaScript deutlich beschleunigen - Software | ZDNet.de News

  22. Pingback: Firefox 3.1 soll JavaScript deutlich beschleunigen - WinBoard - Die Windows Community

  23. Pingback: Outside the Box() » JavaScript at the Speed of Sound

  24. Pingback: Mozilla TraceMonkey speeds javascript « Shebanation

  25. Pingback: TraceMonkey, Firefox 3.1′i hızlandırıyor | Teknoloji Ekseni - Teknoloji haberleri

  26. Pingback: TraceMonkey, Firefox 3.1'i hzlandryor - KeyfiAlem.Net-Alemin En Keyiflisi

  27. 2.5x to squirrelfish. well done.

  28. Pingback: TraceMonkey Firefox’u hızlandırıyor | Bilgisayar-Destek

  29. Pingback: Inline threading, TraceMonkey, etc. :: David Mandelin’s blog

  30. Pingback: Firefox 3.1 erhält Javascript-Beschleuniger | OSNote

  31. Pingback: The Burning Edge » Blog Archive » 2008-08-29 Trunk builds

  32. Pingback: Le performance di JavaScript viste da Mozilla e dal team di IE 8 « dezone

  33. Pingback: This month in bookmarks: August 2008

  34. Pingback: VelociPeek - Eric’s weblog on tech » Mozilla Responds to V8: TraceMonkey With Trace Trees

  35. Luis says:

    I wonder if these techniques have anything in common with those of psyco for python (or the current pypy project).
    Are you familiar with them?

  36. Pingback: Most Fastester Browser in the World (Evar) | Machaxor.net

  37. Pingback: Minefield, Firefox supera en velocidad a Google Chrome | Mareos de un Geek

  38. Pingback: Firefox Minefield: más rápido que Chrome « Conocimiento Libre (o lo que está detrás del Software Libre)

  39. Olaf Spinczyk says:

    Hey Andreas, das ist ja wirklich cool! Schöne Grüße, Olaf.

  40. Pingback: Firefox 3.1 Beta1 steht bereit - NETZ-ONLINE

  41. The work you guys have been doing is simply awesome. Thanks to each and everyone in the team for all their efforts.

  42. Pingback: WebWorkerDaily » Archive Foxmarks Adds Cross-Platform Password Sync «

  43. Pingback: TraceMonkey « Sallaalla’s Blog

  44. Pingback: Future of Desktops Apps is the Web, Future of the Web is Desktop : TroyWorks

  45. Pingback: abuSe luvaRe »  Firefox Minefield: más rápido que Chrome

  46. Pingback: Techknology’s Blog » Murphy’s Law: Is a Firefox 3.5 Really That Fast?

  47. Pingback: Is a Firefox 3.5 Really That Fast? | Xtreme Core

  48. Pingback: Yet Another Meaningless JavaScript Benchmark « I will not buy this blog, it is scratched!

  49. Pingback: Fusible Traces « Lost Intentions

  50. Pingback: Firefox 3.1 ile hızlanıyor | Teknoloji Haberleri

  51. Pingback: FireFox 3.1 mit TraceMonkey » schwarz-weiss.cc

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s