Last week I pushed some patches that started to break up the native code generator into architecture specific modules. Unfortunately, due to a miscommunication the patches weren't validated on powerpc properly, and now the head has been broken on that architecture for a few days.
I've been using Thorkil's machine to debug the problem, and have narrowed it down to the linear register allocator (surprise surprise). The linear allocator has some code that tries to make sure that basic blocks are allocated in call-order. Unfortunately, for tiresome reasons, if there are no branch instructions to a particular block then the allocator goes into an endless loop. This problem is exposed when trying to compile AutoApply.cmm from the runtime system. Only a small amount of code from the RTS contains Cmm level loops, so it's not well tested in that respect. Regular, compiled Haskell code contains no assembly level loops.
The graph coloring allocator compiles the code fine, so I'm pretty sure it's not a problem with code generation. I'm not sure why splitting up the code generator created this problem.
There isn't a comment in the allocator code that fully explains the block ordering problem, or gives a test case. I tried disabling this code, but it didn't make things better. Also, when I try to dump the cmm code for AutoApply.cmm with -ddump-opt-cmm the pretty printer (or something) also goes into an endless loop. This happens when compiling with -fregs-graph as well, so I think it's an unrelated problem.
More digging. Code generation for AutoApply.cmm produces basic blocks that have no explicit branches to them, because the branch is via a jump table...
Dah! Found it. My previous patch to the handling of fixup code wasn't complete for powerpc. The BCTR instruction wasn't being treated as though it was a branch.
Realise that there is a problem: 1 min
Setup to work on the problem: 1 hr
Find the problem: 5 hrs
Fix the problem: 1 min
Ah, code generation. Who doesn't love it?