Archive for June, 2007

Elimination of Initial Loads from Local Allocations

June 29, 2007

Allocating objects with NEW initializes the object state and sets all object variables to 0/0.0/null. When performing load elimination (where we match up stores into captured objects with subsequent loads), we previously didn’t eliminate these types of loads because they don’t have a matching store. I changed the load elimination pass to properly replace these loads with a CONSTANT instruction.

With this latest change all captured allocations should no longer be loaded from as long we can properly analyze them. We can’t always analyse array accesses, i.e. if the index used to access the array is variant. As long the latter is not the case, captured allocation sites will only be pointed to by stores, no reads. This means that if the allocation is singleton, we can actually hoist it out of the loop.

Read the rest of this entry »

PPPJ’07 Paper Accepted

June 29, 2007

A paper describing the Java Virtual Machine Language (JVML) interpreter we have implemented as part of our trace tree just-in-time (JIT) compiler has been accepted to the ACM Conference on Principles and Practice of Programming in Java (PPPJ’07). What makes our JVML interpreter unique is the fact that its implemented in Java itself, just as the rest of our JIT compiler. Despite the use of a high-language such as Java, our interpreter is reasonably fast (slowdown 4-9 over the hand-optimized assembly language implementation used by Sun’s Hotspot VM). In addition, our interpreter is entirely type-safe and thus is not part of the trusted code base. This is an ideal approach to add an interpreter to a Java VM system that already supports compiled code execution but doesn’t have a builtin Java interpreter (such as IBM’s Jikes RVM).

Guard Elimination

June 28, 2007

We finished today the guard elimination pass, which is responsible for redundant null check and array bounds check elimination. Since all conditional branches appear as guards in our representation, the guard elimination pass is able to merge and simplify loop conditions and array bounds checks into the minimum amount of necessary checks. Consider the following code example:


int i = 0;
while (i < 5000) {
  a[i++] = 0;
  a[i++] = 0;
}

The resulting intermediate representation (IR) code has 6 implicit guards (two null checks, two lower bounds checks, and two upper bounds checks), and a 7th explicit guard representing the loop condition. The guard elimination pass reduces these to a single check.

Read the rest of this entry »

Induction Variable Analysis and Array Bounds Check Elimination

June 25, 2007

I added an Induction Variable Analysis (IVA) pass to the compiler. It runs after the arithmetic optimizer which converts all instructions that substract a constant value from a value into additions with a negative constant. The IVA pass then searches for PHI instructions in tails of the tree that point update a context variable with an addition that draws its value directly from the same context variable. To deal with stacked additions, i.e. an a[i++] followed by a b[i++], the arithmetic optimizer accumulates chains of additions and substractions into a single addition. A context slot is only considered an induction variable if its increased (or decreased) by a constant amount along every trace in the tree. However, the value by which the variable changes does not have to be constant. Variables that are increasing along every trace (or remain unchanged which corresponds to an increase by zero) are flagged monotonic non-decreasing. Variables that always decrease (or stay the same) along every trace are flagged monotonic non-increasing.

The result of the IVA is used by the guard elimination pass to hoist guards out of the loop that check whether an induction variable is always smaller than a constant (i.e. a not less than zero index check for arrays). Since we know for induction variables that they are either monotonic non-increasing or monotonic non-decreasing, we can hoist such guards out of the loop and merly check them once in the prolog. This eliminates 50% of the guards associated with reading from or writing to arrays. The other bounds check (in case of an increasing index the upper bounds check) will be eliminated using guard condition coalescing.

Common Subexpression Elimination & Guards

June 22, 2007

I added guard instructions to the set of instructions that can be subject to common subexpression elimination. They are not strictly expressions since they don’t calculate a value, but they can be eliminated just like common subexpressions. To calculate the value number of a guard we compute the sum of the operand’s value numbers and add the hash code of the GUARD class. We intentially do not include the branch type in the value number, because we want complementary branches to match (LT and GE are equivalent if the operands are swapped around).

Load Elimination and Non-Constant Offsets

June 21, 2007

If during load elinination a store is observed into a captured object (we only care about captured objects at this point, since we know them to be alias free) that does not use a constant offset (i.e. array access with a variable index), we have to stop the analysis for all downstream accesses to this object. This is due to the fact that we can’t safely predict which part of the object the store modified, and thus we can’t reliably re-connent future loads to the value. I added this safety check to the compiler.

CSE and Value Numbering in Traces

June 21, 2007

In my previous trace compiler I did Common Subexpression Elimination (CSE) by going over the tree in forward order and keeping a per-opcode history of all instruction that were seen before during iteration and thus could replace this (redundant) instruction. Thus, for each instruction all instructions with the same opcode have to be scanned. In our new trace tree compiler we use a hash table lookup based on value numbers instead. The hash table is cloned every time we hit a guard node to save the set of available expression to match against at that point. This saved set is than used when a secondary trace branch off that guard is analyzed.

As value numbers we use the sum of the value number of each operand plus the hash code of the class of the instruction (each instruction in the IR is a class in our compiler). The value number of constants is the value of the constant. For every instruction we don’t want to be subject to CSE, we use the object hash code as value number and thus it won’t match anything (except itself). We also added a commutative() method to each binary instruction that indicates whether the operand order is irrelevant. Since we use the sum of the operand value numbers the overall value number will always be identical, no matter whether the operation is commutative or not, but the subsequent equality test will fail if the ordering doesn’t match and the operation is not commutative.

Escape Analysis on Trace Trees Revisited (Again)

June 21, 2007

We did more work on the escape analysis algorithm today. We realized that PHI updates in trace tails are essentially just another form of store, and thus we simply treat context slots as just another form of object. As such, it can be either captured or escaping. Context “objects” escape when we see a load of a reference from them. Since we can’t figure out where the reference the load instruction loads originated from, we flag the entire context as escaping, and during the next data flow analysis iteration any store to this context will let the stored reference escape.

At the same time we have eliminated the loop containment check from the algorithm. It turns out that for the purpose of our load elimination pass its sufficient to know that objects don’t escape into memory. For the allocation site hoisting, on the other hand, we will need confirmation that any object allocation in the loop doesn’t overlap with its own copies created during previous loop iterations. This can happen when references are passed around along the loop edges. We will integrate this check directly into the allocation site hoisting algorithm.

Tree Filtering & Ordering of Filters

June 20, 2007

Our compiler is organized as a series of filters that form a pipeline. We serialize the tree into a sequence of instructions and pipe these through the compiler pipeline. We already have some 12 or so filters, and its getting difficult to keep track of dependencies between them. Invariant code motion, for example, needs information from the invariant code analysis filter.

We introduced a CompilerStatus object that is passed along the pipeline. Each filter notifies the status object when it starts processing the tree. Since filters are truely pipelined and process instructions in an overlapping manner, there is no guarantee when a filter terminates. If such a termination property is needed, we insert Barrier filters into the pipeline which wait for all instructions to come in before allowing instructions to flow out and continue along the pipeline. Whenever a Barrier filter has received all instructions and thus all instructions have been processed by all previous filters, it clears the list of pending filters and moves them to the “completed” filter set.

Filters can query these two sets to ensure that any dependencies have been fulfilled. Currently the granularities are “must have been started”, and “has been completed”.

Load Elimination

June 20, 2007

We have added a load elimination pass to the compiler. It runs behind a barrier after the escape analysis pass, but the load elimination itself can be pipelined since it only considers one instruction at a time. We eliminate all loads that read from captured objects. For this we track the base address and offset and the corresponding reference that was written to that location. If we see a matching load, its substituted with a direct link to the instruction that produced the reference that was written to the location. Since we are operating on a mere sequence of instructions, this can happen in linear time.

We only eliminate loads with a constant offset. This does include arrays accessed with a constant index. Also, we currently do not eliminate loads that didn’t have a corresponding store. These read unitialized fields and thus always return 0/0.0/null/false, which of course can be optimized into a constant. This will be added soon since its one of the prerequisites of hoisting the allocation itself out of the loop (without having to manually clear it during every iteration to ensure that reading uninitialized fields doesn’t return some results from a previous loop iteration).

What we don’t eliminate are stores. These are trickier, since we might have to reconstitute them in side exits in case a reference escapes through a side exit (which is otherwise not an issue for our scheme). However, since we do eliminate all dependencies on the stores through load elimination, the stores likely do not create a significant performance loss.