In the previous post I described an algorithm that transforms out of SSA form even in the presence of cyclic PHI dependencies. For this I use a series of exchange (XCHG) and move (MOV) instructions, instead of using a temporary register to break dependencies. This produces less instructions in some cases.
Consider this example:
int a, b; ... for (...) { int t = a; a = b; b = t; }
Using the highest optimization settings (-O9) gcc 4.0 produces the following code for this (I will use Intel notation here, GCC uses the GNU assembler syntax which is really unreadable):
label: movl edx, eax movl eax, ecx movl ecx, edx ... jne label
EDX is used as the temporary here, a is in EAX, and b in ECX. The algorithm I posted here resolve this scenario with a single XCHG instruction.
I did some benchmarks and I can’t see any performance gain, but then we are talking about a few register moves here. However, the code is certainly more compact, leading to better cache utilization, and we save one temporary, which reduces register pressure. We should probably publish this somewhere.