Thursday, February 5, 2009

Just Testing

Ok, I've got the stage3 build with RTS and Libs working with -fasm -O2. Did a full testsuite run last night, and it looks promising. However, a stack of the driver tests failed because the compiler or build system spews something about "/bin/sh: cygpath: not found". This line isn't in the file its testing against, which results in a failure.

.. the cygpath thing was easy enough to hack around. A full stage3 test run gives the following. I've sorted the failing tests by ways failed.

OVERALL SUMMARY for test run started at Thursday,  5 February 2009 12:43:50 PM EST
2287 total tests, which gave rise to
8550 test cases, of which
0 caused framework failures
1580 were skipped

6483 expected passes
247 expected failures
1 unexpected passes
239 unexpected failures

Unexpected passes:
hpc_ghc_ghci(normal)

Unexpected failures:
TH_ppr1(normal)
apirecomp001(normal)
gadt23(normal)
ghciprog004(normal)
ghcpkg02(normal)
hpc_draft(normal)
hpc_hand_overlay(normal)
hpc_markup_001(normal)
hpc_overlay(normal)
hpc_overlay2(normal)
rebindable5(normal)
recomp004(normal)

ThreadDelay001(normal,threaded1)
conc014(normal,threaded1)
conc015(normal,threaded1)
conc017(normal,threaded1)
concprog001(normal,hpc,threaded1)
signals001(normal,hpc,threaded1)
list001(normal,hpc,threaded1)
hSeek003(normal,hpc,threaded1)
hSetBuffering003(normal,hpc,threaded1)
copyFile002(normal,hpc,threaded1)
dynamic001(normal,hpc,threaded1)
dynamic002(normal,hpc,threaded1)
enum03(normal,hpc,threaded1)
integerBits(normal,hpc,threaded1)
xmlish(normal,hpc,threaded1)
memo001(normal,threaded1)
memo002(normal,threaded1)
num007(normal,threaded1)
num013(normal,threaded1)
openFile008(normal,threaded1)
read001(normal,threaded1)
readwrite002(normal,threaded1)
testblockalloc(normal,threaded1)
tup001(normal,threaded1)
getDirContents001(normal,threaded1)
jtod_circint(normal,threaded1)
life_space_leak(normal,threaded1)

arr016(normal,threaded1,threaded2)
random1283(normal,threaded1,threaded2)

1914(ghci)
seward-space-leak(ghci)

1861(optc)
barton-mangler-bug(optc)
joao-circular(optc,hpc,threaded1,threaded2)
time003(optc,hpc,optasm,threaded2)
simplrun007(optc,optasm)


2910(hpc)
arith012(hpc)
arr019(hpc)
cholewo-eval(hpc)
concio002(hpc)
hDuplicateTo001(hpc)
copyFile001(hpc)
hGetPosn001(hpc)
hIsEOF002(hpc)
hSeek001(hpc)
hSeek002(hpc)
hTell002(hpc)
ioeGetHandle001(hpc)
launchbury(hpc)
num005(hpc)
num009(hpc)
num014(hpc)
process004(hpc)
process006(hpc)
renameFile001(hpc)
show001(hpc)
signals002(hpc)

arith002(threaded1)
arith005(threaded1)
dsrun022(threaded1)

conc042(threaded2)
conc043(threaded2)
conc044(threaded2)
conc045(threaded2)
concprog002(threaded2)

andy_cherry(normal,optc,hpc,optasm,threaded1,threaded2)
annrun01(normal,optc,hpc,optasm,threaded1,threaded2)
arith011(normal,optc,hpc,optasm,ghci,threaded1,threaded2)
bits(normal,optc,hpc,optasm,ghci,threaded1,threaded2)
enum01(normal,optc,hpc,optasm,threaded1,threaded2)
enum02(normal,optc,hpc,optasm,ghci,threaded1,threaded2)
ffi009(normal,optc,hpc,optasm,threaded1,threaded2)
ffi019(optc,hpc,optasm,ghci,threaded1,threaded2)
genUpTo(normal,optc,hpc,optasm,ghci,threaded1,threaded2)
hClose002(normal,optc,hpc,optasm,ghci,threaded1,threaded2)
hGetBuf001(normal,optc,hpc,optasm,threaded1,threaded2)
integerConversions(normal,optc,hpc,optasm,threaded1,threaded2)
num012(normal,optc,hpc,optasm,ghci,threaded1,threaded2)
process007(normal,optc,hpc,optasm,ghci,threaded1,threaded2)
strings(normal,optc,hpc,optasm,ghci,threaded1,threaded2)
tcrun007(normal,optc,hpc,optasm,ghci,threaded1,threaded2)
user001(normal,optc,hpc,optasm,ghci,threaded1,threaded2)



A good percentage of these are seg faulting, so there is still a bug or two lurking around.

Continued to split the native code generator into arch specific modules while waiting for test runs. I'm splitting into Alpha, X86, PPC, and SPARC subdirs. The Alpha code is long dead, but it seems a shame to just delete it, so I've made modules for it but commented it out in those modules.

Due to time constraints I'll have to leave some of the #ifdefs in, expecially for the difference between i386 and x86_64. There are also some #ifdefs to choose between Darwin and other OSs.

The way it setup now is that (almost) all code from all arches gets compiled, no matter what the host platform is. This means that potential validate problems are always exposed. It should also help reduce the rot-factor for seldom used platforms.

The top level modules like MachCodeGen still have an #ifdef to choose which one to import for the specific platform. However, it should be straight forward to go from this setup to having a data type representing the target platform, and being able to specify it on the command line.

On the SPARC front, I've just finished cleaning up the horror show that was nativeGen/MachRegs.hs. I'll spare you the details, but this source comment (not mine) sums it up nicely: "Gag me with a spoon, eh?"

It feels a lot simpler when all the SPARC code is isolated, and you don't have to expend mental effort to reject code for architectures you're not working on. It's also reassuring to know that if you change something you're not going to break something for another architecture that is in an #ifdef block that isn't being compiled on the current platform.

Also went through and fixed all the warnings in code from MachRegs, MachInstrs, RegsAllocInfo and PprMach. The world seems shiny and new.

Monday, February 2, 2009

Join Points

There were no updates last week on account of it being the Australia Day public holiday on monday, and me going to fp-syd later in the week.

Although the SPARC NCG now passes almost all of the testsuite, and can compile the Haskell source of GHC itself, it still has some problems when compiling the RTS. Unlike the Cmm code emitted when compiling Haskell source, the hand written Cmm code in the RTS contains loops as well as join points.

Mid last week I hit a bug in the linear register allocator, but didn't finish fixing it. Here is some emitted assembler code when compiling rts/PrimOps.cmm

 .Lch:
....
st %l0,[%l6]
add %l6,4,%l6
b .Lci <--- JUMP to .Lci
nop

.Lci:
st %l0,[%i6-84]
ld [%i6-76],%l0
st %l0,[%i6-76]
sll %l0,2,%l0
....

.Lcj:
....
st %l6,[%i6-76]
add %l0,8,%l6
b .Lci <--- JUMP to .Lci
nop

.Ln1l:
b .Lci <--- JUMP to .Lci
mov %l7,%l0
ld [%i6-100],%l7 <--- ??? slot 100 never assigned to
b .Lci <--- unreachable code??
nop


Label .Lci is a join point because it is the target of several jumps. The code in block .Ln11 is supposed to be fixup code. Fixup code is used to make sure that register assignments match up for all jumps to a particular label.

This code is badly broken though
1) .Ln11 is unreachable, nothing ever jumps to it.
2) There should probably be a .nop after the first branch instruction in .Ln11 to fill the SPARC branch delay slot. That's if that first jump is supposed to be there at all.
3) The instruction ld[%i6-100],%l7 loads from spill slot 100, but that slot is not stored to anywhere else in the code.
4) The three instructions after the first are also unreachable.

This is a bit funny considering I spent July-Sept 2007 at GHC HQ writing a new graph colouring register allocator for GHC, specifically because the linear allocator was known to be dodgy.

Alas, the graph allocator suffered the fate of many such "replacement projects". It is a little slower on some examples, and it comes with an ambient FUD cloud because it is fresh code. Sure, I could just use the graph allocator for SPARC, but that would still leave known bugs in the linear allocator for someone else to trip over. Until such time as we ditch the linear allocator entirely, for all architectures, we still need to maintain it.

The reason why .n1l wasn't being referenced was that the code to change the destination of the jump instruction at the end of block .Lch was missing. ... finally got sick enough of the organisation of RegAllocLinear.hs that I spent most of the day splitting it out into its own set of modules, which was well over due.

Some time later...

It seems as though the code in block .Ln1l is supposed to be swapping the values in regs %l7 and %l0. In block .Lch vreg %vI_co is in %l7 and %vI_cp is in %l0, but the block .Lci wants them the other way around, what a nightmare.

More time later...

From handleComponents in RegAllocLinear.hs

 -- OH NOES!
(_, slot) <- spillR (RealReg sreg) spill_id

The _ is supposed to match the instruction, generated by spillR, that stores the value in sreg into the spill slot at [%i6-100]. The comment is mine.