GHC on SPARC: 2009-01-11

Thursday, January 15, 2009

Info tables

The info tables are getting broken for some reason. The Cmm code has:

sEl_ret()
        { [const Main.$wf_srt-sEl_info;, const 1;, const 2228231;]
        }

-fvia-c gives:


    .text
        .align 4
        .long Main_zdwf_srt - sEl_info
        .long 1
        .long 2228231
sEl_info:
        ....

but -fasm gives:


    .text
        .align 4
        .long Main_zdwf_srt+0         <---------- lost sE1_info
        .long 1
        .long 2228231
sEl_info:
        ....

some time later..


#if sparc_TARGET_ARCH
-- ToDo: This should really be fixed in the PIC support, but only
-- print a for now.
pprImm (ImmConstantDiff a b) = pprImm a 
#else
pprImm (ImmConstantDiff a b) = pprImm a <> char '-'
                            <> lparen <> pprImm b <> rparen
#endif

!!!! .... sigh.

Fixed that, but it still doesn't work. Here's the code for a whole closure:

    .data                        <-------------------------------- data
    .align 8
        .global Main_a_srt
    Main_a_srt:
        .long Main_lvl_closure
        .long base_GHCziHandle_stdout_closure
        .long base_GHCziIO_a28_closure
    .data
        .align 8
        .global Main_a_closure
    Main_a_closure:
        .long Main_a_info
        .long 0
    .text                              <-------------------------------- text
        .align 4
        .long Main_a_srt-(Main_a_info)+0        <-------- offset text to data
        .long 196609
        .long 0
        .long 983047
        .global Main_a_info
    Main_a_info:
    .LcGp:
        sethi %hi(base_GHCziHandle_stdout_closure),%l2
        or %l2,%lo(base_GHCziHandle_stdout_closure),%l2
        sethi %hi(Main_lvl_closure),%l3
        or %l3,%lo(Main_lvl_closure),%l3
        call base_GHCziIO_a28_info,0
        nop

SRTs (Static Resource Table?) are supposibly used for garbage collecting CAFs, but the GHC commentary page on them seems out of date, or missing. In any event, the assembler can't make an offset between labels in .text and .data segments. In some architectures .text and .data use entirely separate address spaces. This probably got broken in a previous GHC release when info tables were moved to be next to the code. Checking against x86 reveals it does the same thing, but the x86 assembler is ok with cross segment offsets.

I ended up just changing the pretty printer so it prints out ReadOnlyData segments as .text, which is a bit nasty. I'll be able to handle this in a nicer way when the sparc NCG is factored out into its own set of modules.

Win!


      56 expected passes
       1 expected failures
       0 unexpected passes
       3 unexpected failures

Unexpected failures:
   2080(optasm)      -- wrong output
   cg015(optasm)     -- unknown unary match op
   cg054(optasm)     -- genSwitch

That seems to have fixed the seg faulting ones.

Working on 2080.hs

-- cmm code is
        _sFv::I32 = %MO_UU_Conv_W8_W32(I8[R2]);   <--- load unsigned
        _sFx::I32 = _sFv::I32;
        _cG7::I32 = %MO_S_Le_W32(_sFx::I32, 127);
        if (_cG7::I32 >= 1) goto cGa;

-- -fvia-c gives:
        ldub [%l2], %g1                           <--- load unsigned
        cmp %g1, 127
        ble,pt %icc, .LL5
         sethi %hi(ghczmprim_GHCziBool_True_closure), %g1
        ...

-- -fasm gives:
        ldsb [%l2],%l0                            <--- load signed
        srl %l0,0,%l0                             <--- nop
        cmp %l0,127
        ble .LcGa
        nop
        ...

This was an easy operand format problem. However, I did notice that the Cmm code only ever loads unsigned data, like I8[R2]. If you were going to load signed data it would be better to use the sparc ldsb instruction which sign extends the byte in one go, versus doing an unsigned load then sign extending it separately. Another task for a simple peephole optimizer....

Also fixed the unknown unary match op problem - sign extension code was unfinished. genSwitch here we come.

Wednesday, January 14, 2009

Liveness lies

Yesterday I fixed the linear allocator to handle floating point register twinning, or at least I thought I did. The output code looked ok, but the programs I tried still crashed. I ended up spending the rest of the day writing a tool (mayet) to compare the -fasm and -fvia-c versions. Mayet takes the two .s files and splits them up into parts belonging to the individual closures. It then slowly substitutes the dubious -fasm sections for the known good -fvia-c sections.

Last night I got enough of it working to find a bad -fasm closure in cg034, which tests out floating point math. Curiously, the closure itself didn't do any float or double math. This morning I hand adjusted the -fvia-c version to look like the -fasm one until it exhibited the same problem.

After some wibbling around found this:

             ld [%l1+12],%vI_s1GO
                    # born:    %vI_s1GO
                     
             cmp %vI_s1GO,0
                    # r_dying: %vI_s1GO       <--------- LIES
                     
             bne .Lc2cm

                ......
                ......
        c2cm:
             ld [%l1+8],%vI_n2cH
                    # born:    %vI_n2cH
                     
             st %vI_n2cH,[%i0-12]
                    # r_dying: %vI_n2cH
                     
             sethi %hi(base_GHCziFloat_a_closure),%l2
                     
             or %l2,%lo(base_GHCziFloat_a_closure),%l2
                     
             or %g0,%vI_s1GO,%l3
                    # r_dying: %vI_s1GO       <---------

This is a dump of register liveness information. The line marked LIES shows that the allocator thinks that variable %vI_s1G0 isn't used after the cmp instruction. Unfortuntately, after the branch, it's used in an or. The vreg %vI_n2cH got allocated to the same register as %vI_s1G0, clobbering the contained value and causing the crash.

Turns out the register liveness determinator wasn't treating BI and BF as though they were branch instructions, so liveness information wasn't being propagated across the basic blocks properly.

Fixing that problem stopped cg034 from crashing, though it still gave the wrong answer. During debugging, noticed that if ghc is executed with -v or -ddump-reg-liveness then the top level labels emitted in the .s file change - which confuses mayet. Hmm.. let that be a lesson to all of us: changing compiler flags should not change top level names, if at all possible.

More digging

        (_s1Ri::F32,) = foreign "ccall" 
               __encodeFloat((_c2sm::I32, `signed'), (_c2sn::I32, PtrHint),
                             (_c2so::I32, `signed'))[_unsafe_call_];
        F32[Sp] = _s1Ri::F32;

Is translated to:

        call __int_encodeFloat,2
        nop

        st %f28,[%i0]         <- BOGUS %f28

A floating point return value should be placed in %f0, but for some reason the GHC code that does just that was missing. Fixed that, and it almost works... just gives the wrong answer.

Loading of doubles looks broken.

via-c says:


        ld [%l1+3], %f8
        fitod %f8, %f2

but the NGC does:


        ld [%l1+3],%l0
        st %l0,[%o6-8]
        ld [%o6-8],%f10
        fitos %f10,%f10

Hmm.

Remember that comment from a few days ago:

-- ToDo: Verify correctness

Turns out it wasn't correct.. Who would have known :P

That fixed cg034 and cg035. Now we're down to:


Unexpected failures:
   2080(optasm)   -- segv
   cg015(optasm)  -- unknown unary match op
   cg021(optasm)  -- segv
   cg022(optasm)  -- segv
   cg026(optasm)  -- segv 
   cg044(optasm)  -- segv
   cg046(optasm)  -- segv
   cg054(optasm)  -- genSwitch 
   cg058(optasm)  -- segv
   cg060(optasm)  -- segv

Monday, January 12, 2009

Bootstrapping 7

I'm still chasing down the source of that bug from last time. Spent some time re-reading the STG paper, and trolling through the RTS code.

Running the program with +RTS -DS didn't help, but I noticed a flag -Da that checks the format of closures while the program runs. However, it seemed to be disabled, or unused. There is code for it is in the RTS, but it doesn't show up in +RTS --help. Running the program with +RTS -Da gives:

benl@mavericks:~/devel/ghc/ghc-HEAD-native/tmp> ./Main +RTS -DS -Da
stg_ap_v_ret... PAP/1(3f5212, 3ef238)
stg_ap_0_ret... Bus error (core dumped)

Went chasing through the RTS code looking for the source of the Bus Error. -Da causes Printer.c:printClosure to be invoked print out a description of each closure, but the pointers passed to it are misaligned. That is, misaligned / containing pointer tag bits. Reread the dynamic pointer tagging paper, then fixed the RTS code.

Ended up doing a binary-ish search to find the problem. Starting with a known good .s file generated with -fvia-c, slowly copied dubious sections of the -fasm version into it, testing along the way. This works because in the current STG -> Cmm translation, top level STG functions only share data via pinned registers. This is sort of like a standard calling convention for GHC functions, so it doesn't matter if GHC and GCC do register allocation a little differently. It might be worthwhile automating this process if I hit similar problems in the future.

Anyway, it turn's out there's a world of difference between
mov %l1, %l2 and mov %l2, %l1...

With that fixed, ran the should_run codeGen tests and got

OVERALL SUMMARY for test run started at Monday, 12 January 2009  5:23:10 PM EST
      61 total tests, which gave rise to
     427 test cases, of which
       0 caused framework failures
     367 were skipped

      42 expected passes
       1 expected failures
       0 unexpected passes
      17 unexpected failures

Unexpected failures:
   1852(optasm)        -- regalloc
   1861(optasm)        -- regalloc
   2080(optasm)        -- wrong output
   cg015(optasm)       -- unknown unary match op
   cg018(optasm)       -- regalloc
   cg021(optasm)       -- segv
   cg022(optasm)       -- segv
   cg024(optasm)       -- regalloc
   cg026(optasm)       -- regalloc
   cg028(optasm)       -- regalloc
   cg034(optasm)       -- regalloc
   cg035(optasm)       -- regalloc
   cg044(optasm)       -- regalloc
   cg046(optasm)       -- segv
   cg054(optasm)       -- genSwitch not implemented
   cg058(optasm)       -- segv
   cg060(optasm)       -- segv

The ones marked spill die with:

ghc: panic! (the 'impossible' happened)
  (GHC version 6.11.20090110 for sparc-sun-solaris2):
        RegAllocLinear.allocRegsAndSpill: no spill candidates

Repaired the rot in the linear register allocator. The free register map only worked for x86(_64) and PPC. Now we've got:

   2080(optasm)    -- wrong output
   cg015(optasm)   -- unknown unary match op
   cg021(optasm)   -- segv
   cg022(optasm)   -- segv
   cg026(optasm)   -- segv
   cg034(optasm)   -- regalloc FF64
   cg035(optasm)   -- regalloc FF64
   cg044(optasm)   -- regalloc FF64
   cg046(optasm)   -- segv
   cg054(optasm)   -- genSwitch
   cg058(optasm)   -- segv
   cg060(optasm)   -- segv

The others marked regalloc are dying because the allocator is messing up the float register twinning. That'll be tomorrow's problem.

GHC on SPARC