Thursday, January 15, 2009

Info tables

The info tables are getting broken for some reason. The Cmm code has:

sEl_ret()
{ [const Main.$wf_srt-sEl_info;, const 1;, const 2228231;]
}

-fvia-c gives:

.text
.align 4
.long Main_zdwf_srt - sEl_info
.long 1
.long 2228231
sEl_info:
....

but -fasm gives:

.text
.align 4
.long Main_zdwf_srt+0 <---------- lost sE1_info
.long 1
.long 2228231
sEl_info:
....

some time later..

#if sparc_TARGET_ARCH
-- ToDo: This should really be fixed in the PIC support, but only
-- print a for now.
pprImm (ImmConstantDiff a b) = pprImm a
#else
pprImm (ImmConstantDiff a b) = pprImm a <> char '-'
<> lparen <> pprImm b <> rparen
#endif

!!!! .... sigh.

Fixed that, but it still doesn't work. Here's the code for a whole closure:

    .data                        <-------------------------------- data
.align 8
.global Main_a_srt
Main_a_srt:
.long Main_lvl_closure
.long base_GHCziHandle_stdout_closure
.long base_GHCziIO_a28_closure
.data
.align 8
.global Main_a_closure
Main_a_closure:
.long Main_a_info
.long 0
.text <-------------------------------- text
.align 4
.long Main_a_srt-(Main_a_info)+0 <-------- offset text to data
.long 196609
.long 0
.long 983047
.global Main_a_info
Main_a_info:
.LcGp:
sethi %hi(base_GHCziHandle_stdout_closure),%l2
or %l2,%lo(base_GHCziHandle_stdout_closure),%l2
sethi %hi(Main_lvl_closure),%l3
or %l3,%lo(Main_lvl_closure),%l3
call base_GHCziIO_a28_info,0
nop

SRTs (Static Resource Table?) are supposibly used for garbage collecting CAFs, but the GHC commentary page on them seems out of date, or missing. In any event, the assembler can't make an offset between labels in .text and .data segments. In some architectures .text and .data use entirely separate address spaces. This probably got broken in a previous GHC release when info tables were moved to be next to the code. Checking against x86 reveals it does the same thing, but the x86 assembler is ok with cross segment offsets.

I ended up just changing the pretty printer so it prints out ReadOnlyData segments as .text, which is a bit nasty. I'll be able to handle this in a nicer way when the sparc NCG is factored out into its own set of modules.

Win!

56 expected passes
1 expected failures
0 unexpected passes
3 unexpected failures

Unexpected failures:
2080(optasm) -- wrong output
cg015(optasm) -- unknown unary match op
cg054(optasm) -- genSwitch

That seems to have fixed the seg faulting ones.

Working on 2080.hs

-- cmm code is
_sFv::I32 = %MO_UU_Conv_W8_W32(I8[R2]); <--- load unsigned
_sFx::I32 = _sFv::I32;
_cG7::I32 = %MO_S_Le_W32(_sFx::I32, 127);
if (_cG7::I32 >= 1) goto cGa;

-- -fvia-c gives:
ldub [%l2], %g1 <--- load unsigned
cmp %g1, 127
ble,pt %icc, .LL5
sethi %hi(ghczmprim_GHCziBool_True_closure), %g1
...

-- -fasm gives:
ldsb [%l2],%l0 <--- load signed
srl %l0,0,%l0 <--- nop
cmp %l0,127
ble .LcGa
nop
...

This was an easy operand format problem. However, I did notice that the Cmm code only ever loads unsigned data, like I8[R2]. If you were going to load signed data it would be better to use the sparc ldsb instruction which sign extends the byte in one go, versus doing an unsigned load then sign extending it separately. Another task for a simple peephole optimizer....

Also fixed the unknown unary match op problem - sign extension code was unfinished. genSwitch here we come.

Wednesday, January 14, 2009

Liveness lies

Yesterday I fixed the linear allocator to handle floating point register twinning, or at least I thought I did. The output code looked ok, but the programs I tried still crashed. I ended up spending the rest of the day writing a tool (mayet) to compare the -fasm and -fvia-c versions. Mayet takes the two .s files and splits them up into parts belonging to the individual closures. It then slowly substitutes the dubious -fasm sections for the known good -fvia-c sections.

Last night I got enough of it working to find a bad -fasm closure in cg034, which tests out floating point math. Curiously, the closure itself didn't do any float or double math. This morning I hand adjusted the -fvia-c version to look like the -fasm one until it exhibited the same problem.

After some wibbling around found this:

             ld [%l1+12],%vI_s1GO
# born: %vI_s1GO

cmp %vI_s1GO,0
# r_dying: %vI_s1GO <--------- LIES

bne .Lc2cm

......
......
c2cm:
ld [%l1+8],%vI_n2cH
# born: %vI_n2cH

st %vI_n2cH,[%i0-12]
# r_dying: %vI_n2cH

sethi %hi(base_GHCziFloat_a_closure),%l2

or %l2,%lo(base_GHCziFloat_a_closure),%l2

or %g0,%vI_s1GO,%l3
# r_dying: %vI_s1GO <---------


This is a dump of register liveness information. The line marked LIES shows that the allocator thinks that variable %vI_s1G0 isn't used after the cmp instruction. Unfortuntately, after the branch, it's used in an or. The vreg %vI_n2cH got allocated to the same register as %vI_s1G0, clobbering the contained value and causing the crash.

Turns out the register liveness determinator wasn't treating BI and BF as though they were branch instructions, so liveness information wasn't being propagated across the basic blocks properly.

Fixing that problem stopped cg034 from crashing, though it still gave the wrong answer. During debugging, noticed that if ghc is executed with -v or -ddump-reg-liveness then the top level labels emitted in the .s file change - which confuses mayet. Hmm.. let that be a lesson to all of us: changing compiler flags should not change top level names, if at all possible.

More digging

        (_s1Ri::F32,) = foreign "ccall" 
__encodeFloat((_c2sm::I32, `signed'), (_c2sn::I32, PtrHint),
(_c2so::I32, `signed'))[_unsafe_call_];
F32[Sp] = _s1Ri::F32;

Is translated to:
        call __int_encodeFloat,2
nop

st %f28,[%i0] <- BOGUS %f28

A floating point return value should be placed in %f0, but for some reason the GHC code that does just that was missing. Fixed that, and it almost works... just gives the wrong answer.

Loading of doubles looks broken.

via-c says:

ld [%l1+3], %f8
fitod %f8, %f2


but the NGC does:

ld [%l1+3],%l0
st %l0,[%o6-8]
ld [%o6-8],%f10
fitos %f10,%f10

Hmm.

Remember that comment from a few days ago:
-- ToDo: Verify correctness

Turns out it wasn't correct.. Who would have known :P

That fixed cg034 and cg035. Now we're down to:

Unexpected failures:
2080(optasm) -- segv
cg015(optasm) -- unknown unary match op
cg021(optasm) -- segv
cg022(optasm) -- segv
cg026(optasm) -- segv
cg044(optasm) -- segv
cg046(optasm) -- segv
cg054(optasm) -- genSwitch
cg058(optasm) -- segv
cg060(optasm) -- segv

Monday, January 12, 2009

Bootstrapping 7

I'm still chasing down the source of that bug from last time. Spent some time re-reading the STG paper, and trolling through the RTS code.

Running the program with +RTS -DS didn't help, but I noticed a flag -Da that checks the format of closures while the program runs. However, it seemed to be disabled, or unused. There is code for it is in the RTS, but it doesn't show up in +RTS --help. Running the program with +RTS -Da gives:

benl@mavericks:~/devel/ghc/ghc-HEAD-native/tmp> ./Main +RTS -DS -Da
stg_ap_v_ret... PAP/1(3f5212, 3ef238)
stg_ap_0_ret... Bus error (core dumped)

Went chasing through the RTS code looking for the source of the Bus Error. -Da causes Printer.c:printClosure to be invoked print out a description of each closure, but the pointers passed to it are misaligned. That is, misaligned / containing pointer tag bits. Reread the dynamic pointer tagging paper, then fixed the RTS code.

Ended up doing a binary-ish search to find the problem. Starting with a known good .s file generated with -fvia-c, slowly copied dubious sections of the -fasm version into it, testing along the way. This works because in the current STG -> Cmm translation, top level STG functions only share data via pinned registers. This is sort of like a standard calling convention for GHC functions, so it doesn't matter if GHC and GCC do register allocation a little differently. It might be worthwhile automating this process if I hit similar problems in the future.

Anyway, it turn's out there's a world of difference between
mov %l1, %l2 and mov %l2, %l1...

With that fixed, ran the should_run codeGen tests and got

OVERALL SUMMARY for test run started at Monday, 12 January 2009  5:23:10 PM EST
61 total tests, which gave rise to
427 test cases, of which
0 caused framework failures
367 were skipped

42 expected passes
1 expected failures
0 unexpected passes
17 unexpected failures

Unexpected failures:
1852(optasm) -- regalloc
1861(optasm) -- regalloc
2080(optasm) -- wrong output
cg015(optasm) -- unknown unary match op
cg018(optasm) -- regalloc
cg021(optasm) -- segv
cg022(optasm) -- segv
cg024(optasm) -- regalloc
cg026(optasm) -- regalloc
cg028(optasm) -- regalloc
cg034(optasm) -- regalloc
cg035(optasm) -- regalloc
cg044(optasm) -- regalloc
cg046(optasm) -- segv
cg054(optasm) -- genSwitch not implemented
cg058(optasm) -- segv
cg060(optasm) -- segv


The ones marked spill die with:
ghc: panic! (the 'impossible' happened)
(GHC version 6.11.20090110 for sparc-sun-solaris2):
RegAllocLinear.allocRegsAndSpill: no spill candidates


Repaired the rot in the linear register allocator. The free register map only worked for x86(_64) and PPC. Now we've got:

   2080(optasm)    -- wrong output
cg015(optasm) -- unknown unary match op
cg021(optasm) -- segv
cg022(optasm) -- segv
cg026(optasm) -- segv
cg034(optasm) -- regalloc FF64
cg035(optasm) -- regalloc FF64
cg044(optasm) -- regalloc FF64
cg046(optasm) -- segv
cg054(optasm) -- genSwitch
cg058(optasm) -- segv
cg060(optasm) -- segv


The others marked regalloc are dying because the allocator is messing up the float register twinning. That'll be tomorrow's problem.