Saturday, January 10, 2009

Bootstrapping 6

The build with the SPARC native code generator went through. Tried to run the test suite. For some reason when the compiler panics, the test framework runs off and tries to consume all available memory - like a fork bomb. Not sure why, maybe it was built wrong. Ended up just running individual tests by hand.

Ah, the joy of debugging assembly code. Fixed one problem where bad instructions were being generated. If you have LD [%f26 + ... ], %l1 then that's easy to find, because you can't use float regs for addressing.

Found a program from the test suite that seg faulted and spent some time reducing it to a smaller test case.
main = print (dude (1, 2))
dude xx
= case xx of
(x, y) -> y
Seems to work, but
main = sum2 [1,2]
sum2 xx
= case xx of
[] -> 0
(x:xs) -> x + sum2 xs
always returns 0, no matter what size the list being summed is.

Spent about an hour staring at the STG, Cmm and NCG code, but couldn't see anything obviously wrong. Noticed that the current NCG doesn't use branch delay slots, there's a comment in the code saying that doing so would confuse the register allocator. This'll be a first port of call for optimization once the NCG is working again.

On the other hand, to compile the sum2 program above with -O0, GCC used 420 instructions but the NCG used only 361 (doesn't actually work though).

I'm finding it hugely useful to have the via-c path still in place. Besides the obvious fact that I can still compile the libraries with it, it's comforting to know that the Cmm code is good, and I can just concentrate on the Cmm -> Asm pass. All the tricky dynamic pointer tagging and whatnot is expressed in Cmm, which makes life a lot easier for me. It's also good to have multiple -O flags, because it's an easy way to create wibbles in the Cmm code when diagnosing bugs.

Got bored staring at assembly code, and went back to find a better test case. Ended up with:
main = print (up 0)
up x
= case x of
0 -> x

Note that up also takes a Num dictionary. Got:
Main: internal error: stg_ap_pp_ret
(GHC version 6.11.20090105 for sparc_sun_solaris2)
Please report this as a GHC bug:
Abort (core dumped)

Bingo! Assertions in the RTS code are my new best friends.

Wednesday, January 7, 2009

Bootstrapping 5

Re-enabled compilation of the native code generator for SPARC, then went through and plugged enough holes to get it to compile again. It looks like the representation of operand formats was changed in the recent Cmm work, but the SPARC code didn't keep up because it was hidden inside #ifdef blocks. Remember kids, #ifdef = evil. Besides operand formats and the missing genSwitch function, it doesn't look like the SPARC backend has rotted too badly.

All the NCG modules have recompiled, but I must have touched something at the root of the module dependency tree. It's recompiling TcRnDriver.lhs again, which has a 2MB intermediate .hc file and has been going for 40 mins.. sigh.

Spent some time comparing the .s files produced by GCC and the GCC for Sun Systems (gccfss) compiler. It seems that any .hs file compiled via gccfss generates bad .s code because it doesn't handle pinning of STG registers the way real GCC does.

Tried to speed up recompilation by slurping across the object and interface files for TcRnDriver from a previous build. For some reason it accepted TcRnDriver.hi but not Parser.hi

Bad interface file: dist-stage1/build/Parser.hi
Something is amiss; requested module ghc-6.11.20090105:Parser differs from
name found in the interface file ghc-6.11.20081231:Parser

Curses @ sensible checking of interface file versions! :) It appears as though the configure date (or latest patch date?) is part of the GHC version, so the backup build tree I squirreled away isn't going to help me. I'll have to leave it building overnight and remember not to run ./configure again.

In other news, a stage3 build of the HEAD + yesterdays patch worked, so at least the via-c path is good again (modulo epic slowness).

Tuesday, January 6, 2009

Bootstrapping 4

Test of patched GHC 6.10.1 produced the following:

Unexpected failures:


When building the ghc-HEAD-native with the native code generator turned on, looks like the genSwitch function for SPARC is missing from MachCodeGen.hs. Also missing are the ALLOCATABLE_REGS defs from MachRegs.lhs.

Went through the description of the register set in includes/MachRegs.h. Looks like there are 3 allocatable integer regs, 6 allocatable double regs, and 4 allocatable float regs. The graph coloring allocator currently only allocates doubles, not single precision floats as well. Will have to come back and check this.

isRegRegMove and mkRegRegMovInstr from RegAllocInfo.hs were missing. So was regDotColor from RegAllocStats.hs, isFloatSize from MachRegs.lhs

MachRegs.lhs has a function mkVReg which determines the vreg to use for a particular size word. This is the source of single precision floating point vregs. All the word size code has also rotted. The sparc Size type is different from the ones use for the i386(64) and ppc. Should come back and fix this, but for now I've just added the missing functions and kept the old size type.

In MachInstrs.hs i386 has OpReg but sparc has RI, similar. Should refactor sparc code to use OpReg.

Monday, January 5, 2009

Bootstrapping 3

Running tests on patched stage2 build of GHC 6.10.1 on sparky. Most seem to be going through, but have:

calling convention not supported on this architecture: stdcall
When checking declaration:
foreign import stdcall safe "static &p" m_stdcall
:: StablePtr a -> IO (StablePtr b)

Started teasing out the SPARC stuff from nativeGen/MachCodeGen.hs. At the moment it's a mess of #ifdefery. I'm sure #ifdefs were the best way to do it back then there were only one or two targets, but now there is code for i386, i386_64, powerpc, alpha and sparc all mixed in together.

My plan is to copy out the non-architecture-specific functions from nativeGen/MachCodeGen.hs into their own module nativeGen/MachCodeGenShared.hs. I'll split the sparc specific stuff into a set of modules under nativeGen/sparc. Once the sparc native gen works again I'll then go back and delete the sparc specific stuff from nativeGen/MachCodeGen.hs, and make it use the MachCodeGenShared.hs. This should leave the original support for all architectures untouched during development.

While going through the code, found an amusing comment in the sparc section:
-- Floating point assignment to a register/temporary
-- ToDo: Verify correctness
assignReg_FltCode :: Size -> CmmReg -> CmmExpr -> NatM InstrBlock
assignReg_FltCode pk reg src = do ...

hmmm. I wonder how long that ToDo has been there..

The build of GCC 4.3.2 died with
Configuring stage 1 in sparc-sun-solaris2.10/libgcc
checking for suffix of object files... configure: error: cannot compute
suffix of object files: cannot compile.

Investagation of the config.log reveals:
/home/benl/files/gcc/build/gcc-4.3.2-obj/./gcc/xgcc ...
conftest.c:1: internal compiler error: Segmentation Fault
Please submit a full bug report,

Tried to back off to GCC 4.2.1 then GCC 4.2.4. Both die during the build with configure problems:
config.status: executing gstdint.h commands
make[3]: Entering directory `/home/benl/files/gcc/build/gcc-4.2.4-obj/libdecnumber'
make[3]: Nothing to be done for `all'.
make[3]: Leaving directory `/home/benl/files/gcc/build/gcc-4.2.4-obj/libdecnumber'
make[3]: Entering directory `/home/benl/files/gcc/build/gcc-4.2.4-obj/gcc'
make[3]: *** No rule to make target `all'. Stop.
make[3]: Leaving directory `/home/benl/files/gcc/build/gcc-4.2.4-obj/gcc'

For some reason the configure script isn't dropping the Makefile in ./gcc.

Giving up trying to compile a more recent GCC under Solaris. Will just copy the 4.2.1 binaries across from mavericks. Let's hope we don't stumble across any more bugs in it.

Stop. Rewind. I've been tripping over myself because the builds are taking so long. I've got copies of the head on mavericks, and sparky and they're all different versions. Some of the recent patches pushed to the head also seem to have broken the build, so I'm backing up to the head as of 2009/01/01.

I'm going to stick with this version, and the same compiler flags, for all builds until the NGC is fixed. I need to be able to reuse the .o files between builds because they take too long to remake.

Finally pushed the gc_thread patch. This should fix the via-c build. I've run through the testsuite, though I'm still waiting the actual stage2 build to finish - it's stuck on Parser.hs again.

Turned on the NCG for sparc. That'll be another nights worth of rebuilding.

Sunday, January 4, 2009

Bootstrapping 2

A test of the patched GHC 6.10.1 looks promising:
make stage=1 WAY=optc EXTRA_HC_OPTS=-L/data0/home/benl/lib

Unexpected failures:
2594(optc) -- run segv. tests 64 bit FFI.
T2486(optc) -- diff top level specialisations, ok by hand.
andy_cherry(optc) -- timed out when compiling, ok by hand.
conc020(optc) -- does nothing. not sure.
enum01(optc) -- run segv. 64 bit code? not sure.
enum02(optc) -- run segv. 64 bit code
enum03(optc) -- run segv. 64 bit code

I'm getting linker problems when trying to build gdb on sparky.
gcc -g -O2      \
-o gdb gdb.o libgdb.a \
../readline/libreadline.a ../opcodes/libopcodes.a ../bfd/libbfd.a -lintl
../libiberty/libiberty.a ../libdecnumber/libdecnumber.a -ldl -lncurses
-lsocket -lnsl -lm -lexpat ../libiberty/libiberty.a
Undefined first referenced
symbol in file
initscr32 libgdb.a(tui.o)
w32addch libgdb.a(tui-io.o)
w32attron libgdb.a(tui-wingeneral.o)
w32attroff libgdb.a(tui-wingeneral.o)
acs32map libgdb.a(tui-win.o)
getcurx libgdb.a(tui-io.o)
getcury libgdb.a(tui-io.o)

After some Googling around, it turns out these syms are defined in the Solaris /usr/ccs/ library and not GNU libncurses.

Spent some time going through the current trac tickets. #186 says that Int64 code has been broken for some time, which will be the reason for many of the test failures.

On mavericks I have two builds running with a single thread, and another running with 4 threads. This is creating frequent 3-4 second pauses on the console, which I guess is due to memory starvation. The builds are also dying every 20 min due to race conditions, I'm giving up on parallel make.

The stage1 build of ghc-HEAD-work on mavericks died with:
(hi-boot interface)
does not export

Module `IdInfo' (hi-boot interface) does not export `notGlobalId'

I think this was because the repo got into a weird state because I control-c'd darcs when it was running. The formatting of the first error message looks weird though.

Darcs failures.. zzzz
== running darcs pull --repodir testsuite
Pulling from "/data0/home/benl/devel/ghc/ghc-HEAD"...
darcs: bug in get_extra commuting patches:
First patch is:
Fri Sep 7 18:23:27 EST 2001 simonmar
* [project @ 2001-09-07 08:23:27 by simonmar]
Fix some signatures after Ord was removed as a superclass of Ix.
Second patch is:
Fri Sep 14 01:54:43 EST 2001 simonmar
* [project @ 2001-09-13 15:54:43 by simonmar]
Back out the change to remove Ord as a superclass of Ix; the revised
Haskell 98 report will no longer have this change.
darcs failed: 256 at ./darcs-all line 69.
benl@mavericks:~/devel/ghc/ghc-HEAD-work> darcs --version
2.1.0 (release)

Hrm. stage1 build of ghc-HEAD-work on sparky completed, but trying to compile something results in:
~/devel/ghc/ghc-HEAD-work/ghc/stage1-inplace/ghc --make Main.hs
[1 of 1] Compiling Main ( Main.hs, Main.o )
Linking Main ...
ld: fatal: relocation error: R_SPARC_32: file /home/benl/devel/ghc/ghc-HEAD-work/libffi/libHSffi.a(v8.o):
symbol : offset 0xfd0559b6 is non-aligned

Trying GNU ld instead gives:
ld: gct: TLS definition in /home/benl/devel/ghc/ghc-HEAD-work/rts/libHSrts.a(GC.o)
section .tbss mismatches non-TLS reference in

I expect this happened because i changed my ~/bin/gcc during the build... yah. I'm trying to do too many things at once because the builds take so long. Rebuilt the runtime system and now I'm getting this:
/opt/gcc/bin/../../SUNW0scgfss/4.0.4/prod/bin/fbe: "StgCRun.s", line 17: error: statement syntax
/opt/gcc/bin/../../SUNW0scgfss/4.0.4/prod/bin/fbe: "StgCRun.s", line 31: error: statement syntax

This is with GHC 4.0.4 installed on sparky. Checking out the .s files at those lines shows:
    sethi   %hi(%l1),%i5
ld [%i5+%lo(%l1)],%i0

From the OpenSPARC 2007 architecture manual, both of those instructions are bad. This looks like a legitimate GCC bug. Time to upgrade.