At our last meeting with the Spidermonkey developer team up at Mozilla Edwin Smith from Adobe gave a presentation about pathological cases for trace trees. Amongst others Ed talked about the massive tree that gets recorded for John Conway’s Game of Life algorithm. It essentially consists of a nested loop over a matrix that counts for each cell the number of neighboring cells that are alive. The trivial implementation of this is a series of consecutive if statement inside the loop body, each incrementing a counter if the cell coordinates are within range and the cell is alive. To address this issue, we have now added a new tree transformation to our trace tree compiler: tree folding. The idea of tree folding is to reconnect traces that split of another trace but only have a minimal difference with respect to the code executed. A simple example for this is calculating the minimum of two values:
for (n = 0; n < 5000; ++n) a = min(b,c)
The trace tree for this loop consists of 2 traces. One primary trace that covers one side of the min condition (i.e. b < c), and then whenever b is >= c, the other (secondary) trace is recorded. The goal of tree folding is to eliminate the need to have 2 traces in this scenario. While 2 traces seem reasonable here, just imagine using several min functions inside a loop body. The tree would fan out quickly and contain hundreds of traces.The tree folding algorithm is implemented as a short compiler pipeline that gets invoked every time we add a trace to the tree. The tree is serialized into a sequence of instructions (read about tree serialization here), which is then passed through the IdentifySpliceSites and TreeFolding filters. At the end of this pipeline is a TreeBuilder that re-constitutes a trace tree, which is substituted for the original tree if we actually folded traces in it. To identify sliceable trace parts (strands), we look through the instruction sequence and scan for guards where a secondary trace is attached to. To be sliceable, both traces (primary and attached secondary) must be followed by the same merge point, which is a guard or the end of the loop. Between the split point (where we will start merging) and the merge point (the common guard we expect both traces to hit) we can executed arbitrary instructions as long they don’t change the memory state (STORE, SynchronizationInstruction, AllocationInstruction). We we find such a guard, we mark it as a split point and the next pass will eliminate it and replace it with conditional move instructions.The reason that we can only merge traces at a second “common” guards is that we need to compare the state of all variables on both traces in the merge point. Guards contain a complete snapshot of the variable state (InterpreterState object), because we need to restore the VM state in case of a side exit on that guard. When merging the two strands, we emit the code from both strands into the primary trace, compare the states in the two merge guards and for every difference between them we issue a conditional move instruction at the end of the code of both strands to merge the states into one state. For the remainder of the primary trace we have to replace all references to the “left” (primary) strand with a reference to the appropriate conditional move since we execute both strands in parallel now and the right side could be the valid strand. The conditional moves use the same condition as the guard to pick which values to copy. The guard instruction in the primary trace is completely eliminated, and so is the secondary trace attached to it and any trace attached to that. We call this tree pruning.Its important to note that we might prune a code path from the tree that is then no longer represented in the tree and will have to be re-recorded. Consider the game of life code. Its a series of if statements and when we merge the first if statement, the secondary trace will have a long code path for the additional 7 ifs in it. We only copy the code for the first if into the primary trace (until the merge guard), the rest of the trace is thrown away and will be re-recorded when we side exit on the 2nd remaining if. Through 8 iterations eventually all the if statements are inlined into the primary trace.The advantage of this approach is that we can do hierarchical inlining. If an if statement contains another if statement, the inner if statement is folded first and becomes a series of conditional moves, which then in a second pass can be folded into the outer if statement. This is particularly important because complex if clauses are essentially a series of nested ifs and can be thus folded using the algorithm described above. For this iterative folding, the folding pipeline has to be re-executed as long the tree changes. In most cases, at most 2 iterations are necessary to obtain a tree that cannot be further simplified.For the details of the algorithm take a look at the source code of our compiler which is available from our Hotpath Project website.