Wednesday, April 22, 2009

Loose ends and register pairing

I've been spending some time cleaning up loose ends and addressing the floating point register pairing on SPARC.

First loose end was refactoring the way information about what STG registers are pinned to what hardware regs is managed. When I fixed the NCG back in Jan / Feb I had defined a local freeRegs function which records which regs are available for allocation, mostly so I could work out what was going on. Last week I redid it so its gets the information from include/MachRegs.h again via some #ifdef nastiness. Said nastiness is contained to a single module, with the benefit being that the SPARC NCG behaves the same way as the ones for the other architectures.

Second loose end was working out why we couldn't pin a register to hold the pointer to a thread's gc_thread structure in the garbage collector code. The reason is that names like %l1 are just offsets into the SPARC register stack, and change during a function call. It's all well and good to pin a certain variable to %l1, but its value gets lost as soon as you make a call. Seems obvious in hindsight. We can pin STG registers to hardware registers while running the program proper, because Haskell code doesn't use the register window. However, the garbage collector is written in C, so the same plan doesn't fly.

I'm currently refactoring the graph coloring register allocator to do a better job of handling the SPARC's floating point registers. SPARC has 32x 32bit float registers. To store a 64 bit float you place the value in a pair of 32bit regs. This is much like the situation on x86 where the 16 bit regs AL and AH can be combined to form AX. Previously, we had just statically reserved 6 of the FP regs to be 32 bit, and the rest to be 64 bit. This works, but is kludgy and doesn't make good use of the hardware.

When I wrote GHC's graph coloring allocator back in 2007, I based it on the algorithm described by Smith, Ramsey and Holloway in A Generalized Algorithm for Graph-Coloring Register Allocation. This algorithm handles register quite elegantly, and I wrote the code with the SPARC architecture in mind. However, back then the SPARC NCG didn't actually work, so I couldn't test this functionality. It does now though, so i'm going back and finishing this unfinished business.