While very well suited for fast interpretation, Spidermonkey’s fat opcodes are not well suited for the intermediate representation we want to record. Instead, for the IR we want to have a much more low level representation that exposes all the type checking and specialization that happens in the fat opcodes. Take Spidermonkey’s JSOP_ADD opcode, for example:
Apart from the nasty xml hack at the top, what this opcode mostly does is go along a chain of a bunch of type checks and conversions, and then perform the actual addition of 2 numbers, or a string concatenation in case either element is a string.
SpiderMonkey actually always performs a double addition and doesn’t specialize to integer additions, but even without this optimization JSOP_ADD contains a large amount of specialization decisions that must be exposed at the IR level in order for a tracer to be able to eliminate this overhead.
Since the interpreter code makes heavy use of macros to abstract certain intrinsic operations, we could use those very abstractions as IR instructions. FETCH_OPND, for example, fetches an operand from the stack (essentially returns “sp[n]”). This would be an ideal IR instruction since its low level enough to be compiled (well, actually its a no-op for the compiler since this turns into a register operation).
Ignoring for a moment the ugly xml code in between, the next operation is VALUE_TO_PRIMITIVE, which is actually a composite macro that itself calls a bunch of low-level macros. Such composite macros we don’t want to touch, instead we are interested in the low-level macros these macros eventually invoke.
Here we can see that the macro invokes JSVAL_IS_PRIMITIVE to check whether the value is already a primitive in which case the value is return (in the last parameter vp), otherwise the default value is obtained from the object.
JSVAL_IS_PRIMITIVE is again a nice primitive to record since it basically just checks for some bit patterns to apply.
The next question is how do we refactor the interpreter to be able to insert all the recording hooks we need.
In our JamVM-based trace compiler we hooked into the interpreter by modifying the macros JamVM uses to implement each bytecode. Similarly, in Spidermonkey we could try to hook into the low-level macros (primitives). In FETCH_OPND, for example, we could insert a call to the recorder right after reading the value from the stack:
#define FETCH_OPND(n) (record_FETCH_OPND(n), sp[n])
This approach works well for JamVM where each instruction we record performs a set of defined actions on the stack and local variables, and the recording functions can track those actions on some abstract stack and local variable array. A JVML push instruction, for example, pushes a value on the operand stack. The recording function (lets call it record_PUSH) performs the same action on the abstract stack, but instead of the value it pushes the address of the IR node on the abstract stack that generated the value PUSH is putting on the stack.
In case of FETCH_OPND, however, we don’t have a complete overview of the state, because the value FETCH_OPND reads is returned and then assigned to a variable (i.e. rval). While we could track rval and lval, this would also require modifying FETCH_OPND to indicate which value they write to, otherwise the recording function can’t tell which abstract equivalent of the variable it has to update.
Instead, I think it makes more sense to slightly change the interface of FETCH_OPND. Instead of returning a value, the output variable gets passed in as last argument (SpiderMonkey already does this for most macros):
#define FETCH_OPND(n, x) x = sp[n];
The code generated by this macro is still identical to the original approach, but we can now hook in a recorder much more easily. The recording version of FETCH_OPND would look like this:
#define FETCH_OPND(n, x) x = sp[n]; \
Essentially, we just append the recorder invocation to the existing macro. In addition, we can now record the result of the operation (x), as well as the address of the location we store to (&x), which is essentially the name of the value (think of SSA names here). Using this name we can uniquely identify any future use of the value using a hash table of name to instruction that generated the value (in this case that instruction would be identified by consulting the abstract stack inside record_FETCH_OPND and figure out who put the value onto the stack).
JSVAL_IS_PRIMITIVE can be transformed similarly, however, in contrast to FETCH_OPND its actually not located in jsinterp.c. Instead, its buried somewhere deep inside of jsapi.h. To make this work well we would have to gather all primitives in one place where we can annotate them with recording code. Some common naming for primitives would be nice too. There is no need to actually remove the JSVAL_IS_PRIMITIVE code from jsapi.h, but jsinterp.h should use a second set of macros that map the primitives to their implementation:
#define PRIM_FETCH_OPND(n, x) \
#define PRIM_JSVAL_IS_PRIMITIVE(v, x) \
x = JSVAL_IS_PRIMITIVE(v)
To generate the tracing equivalent (jstrace.h) we could probably resort to some automated code manipulation, i.e. translate every primitive by appending a call to a recording macro with identical signature:
#define PRIM_FETCH_OPND(n, x) FETCH_OPND(n, x); \
#define PRIM_JSVAL_IS_PRIMITIVE(v, x) x = JSVAL_IS_PRIMITIVE(v);
Transforming jsinterp.c like this will require a major code shakeup, however I think there is an argument to be made that this would actually improve readability and increase modularity even without tracing in mind. Also, if done right this entire macro refactoring business should not affect the actual underlying code at all. One could even do an automated code comparison of the code every time an opcode is rewritten to use primitives, since no code has to change in the non-recording case despite all the macro magic at the source level.
I will try to hack this up for a few opcodes to see what it looks like.
I think we’ll want the macros to have RECORD() and maybe INTERP() wrappers around the relevant portions of their bodies, such that we can avoid are-we-tracing? check overhead in the interpreter case. With RECORD()/INTERP() wrapping, we can include the jstrace.h file twice to create two switches or jump tables.
We live for major jsinterp shakeups. :)
Yes, we definitively want to have two separate interpreter loops, one non-recording and one recording, both derived from the same source with some macro magic. There will be a small overhead in branch instructions to monitor for potential trace start points, but that kind of check is usually shadowed by the general dispatch overhead (in the JamVM tracer it was around 1%).