Monday, January 12, 2009

Bootstrapping 7

I'm still chasing down the source of that bug from last time. Spent some time re-reading the STG paper, and trolling through the RTS code.

Running the program with +RTS -DS didn't help, but I noticed a flag -Da that checks the format of closures while the program runs. However, it seemed to be disabled, or unused. There is code for it is in the RTS, but it doesn't show up in +RTS --help. Running the program with +RTS -Da gives:

benl@mavericks:~/devel/ghc/ghc-HEAD-native/tmp> ./Main +RTS -DS -Da
stg_ap_v_ret... PAP/1(3f5212, 3ef238)
stg_ap_0_ret... Bus error (core dumped)

Went chasing through the RTS code looking for the source of the Bus Error. -Da causes Printer.c:printClosure to be invoked print out a description of each closure, but the pointers passed to it are misaligned. That is, misaligned / containing pointer tag bits. Reread the dynamic pointer tagging paper, then fixed the RTS code.

Ended up doing a binary-ish search to find the problem. Starting with a known good .s file generated with -fvia-c, slowly copied dubious sections of the -fasm version into it, testing along the way. This works because in the current STG -> Cmm translation, top level STG functions only share data via pinned registers. This is sort of like a standard calling convention for GHC functions, so it doesn't matter if GHC and GCC do register allocation a little differently. It might be worthwhile automating this process if I hit similar problems in the future.

Anyway, it turn's out there's a world of difference between
mov %l1, %l2 and mov %l2, %l1...

With that fixed, ran the should_run codeGen tests and got

OVERALL SUMMARY for test run started at Monday, 12 January 2009  5:23:10 PM EST
61 total tests, which gave rise to
427 test cases, of which
0 caused framework failures
367 were skipped

42 expected passes
1 expected failures
0 unexpected passes
17 unexpected failures

Unexpected failures:
1852(optasm) -- regalloc
1861(optasm) -- regalloc
2080(optasm) -- wrong output
cg015(optasm) -- unknown unary match op
cg018(optasm) -- regalloc
cg021(optasm) -- segv
cg022(optasm) -- segv
cg024(optasm) -- regalloc
cg026(optasm) -- regalloc
cg028(optasm) -- regalloc
cg034(optasm) -- regalloc
cg035(optasm) -- regalloc
cg044(optasm) -- regalloc
cg046(optasm) -- segv
cg054(optasm) -- genSwitch not implemented
cg058(optasm) -- segv
cg060(optasm) -- segv


The ones marked spill die with:
ghc: panic! (the 'impossible' happened)
(GHC version 6.11.20090110 for sparc-sun-solaris2):
RegAllocLinear.allocRegsAndSpill: no spill candidates


Repaired the rot in the linear register allocator. The free register map only worked for x86(_64) and PPC. Now we've got:

   2080(optasm)    -- wrong output
cg015(optasm) -- unknown unary match op
cg021(optasm) -- segv
cg022(optasm) -- segv
cg026(optasm) -- segv
cg034(optasm) -- regalloc FF64
cg035(optasm) -- regalloc FF64
cg044(optasm) -- regalloc FF64
cg046(optasm) -- segv
cg054(optasm) -- genSwitch
cg058(optasm) -- segv
cg060(optasm) -- segv


The others marked regalloc are dying because the allocator is messing up the float register twinning. That'll be tomorrow's problem.

No comments:

Post a Comment