I have updated the algorithm to fix a few bugs and make it more readable.
I have implemented a simple algorithm to convert PHI instructions into an equivalent sequence of register move and exchange instructions. This is a non-trivial task because PHI instructions are assumed to execute in parallel while actual moves execute in sequence, possibly overwriting values that will still be needed by subsequent moves. To ensure proper semantics we have to order the moves we emit such context slots are only written to after all read dependencies have been satisfied. In case of cycles in the data flow this is not always possible:
int a, b; for (...) { int t = a; a = b; b = t; }
In SSA form and after copy propagation has been performed, this loop consists of the PHI instructions (a←b, b←a) only. There is no ordering in which sequential moves can ensure an equivalent data flow. Preston Briggs described using temporary registers to break these cycles.
I discussed this problem today with Josiah Carlson. He actually sent me a python implementation of an algorithm that identifies cycles in the PHI set and then breaks them using a temporary register. I ended up using an algorithm that operates without temporaries by emitting a sequences of moves and register exchanges. Most CISC architectures (such as x86) support these, and at least between registers they are efficient. Never use them on memory locations though on x86, since they imply bus locking. Even on RISC architectures such exchanges can be implemented with a series of xor instructions:
(snippet taken from IBM’s PowerPC Compiler’s Writer Guide)
# R3 = a
# R4 = b
xor R3,R3,R4 # R3 = a ^ b
xor R4,R4,R3 # R4 = b ^ (a ^ b) = a
xor R3,R3,R4 # R3 = (a ^ b) ^ a = b
Here is the algorithm as pseudo code:
for (p : all phi edges) if (p.src in context) # context is the set of all slots that # can be the destination of a phi. # record in edges that there is one more dependency # on the source of this phi edges[p.src]++ # initially the value of all slots can be found in the # corresponding slot (this changes when we do exchanges) alias[src] = src alias[dst] = dst # add this phi to the worklist worklist.add(p) else mov(dst, src) while (worklist not empty) p = worklist.pop() # resolve the source of the phi through the alias table # in case we moved it somewhere else with an # exchange instruction dst = p.dst src = alias[p.src] # a previous exchange might have obsoleted this one if (dst != src) # are there any pending dependencies on the destination # of this phi? if (edges[dst] > 0) # If yes, that we can't just overwrite it with a move. # Instead we have to issue an exchange operation. xchg(dst, src) # Update alias to indicate where the old src value can be # found now (since we just moved it). alias[dst] = src else # no edges for src, value no longer needed, we can # overwrite it mov(dst, src) # remove the dependency from the phi we just # processed --edges[src]
We will have to do a little bit more testing and a formal correctness proof, but so far this seems to work quite well and produces shorter sequences than Brigg’s algorithm (at least on x86 where we do have an XCHG instruction).
Pingback: Register Swapping vs Temporary Registers « Dr. Andreas Gal