Saturday, January 3, 2009

Bootstrapping

Sent a message to cvs-ghc asking about the huge compile times. Simon PJ responded saying that Roman had mentioned that GCC is uniquely slow on the SPARC T2. No suggestions for reducing the huge intermediate .hc files. I rebuilt Language/Haskell/TH/Syntax.hs again while dumping core. The source is 24k, but the desugared code is 2.7MB. The bulk of it is derived instance functions for the Data typeclass - stacks of gmaps of various sorts. Not much I can do about that, so I'll try and remove the lib from subsequent compiles.

Anyway. A stage2 build of GHC 6.8.3 with -O on sparky has worked, so has a stage2 build with -O0 on mavericks. I had two builds of GHC 6.10.1 running on mavericks last night and both have died with:

Configuring installPackage-1.0...
cabal-bin: ghc version >=6.4 is required but the version of
/data0/home/benl/devel/ghc/ghc-6.10.1-quickest/ghc/stage2-inplace/ghc could
not be determined.
make[3]: *** [with-stage-2] Error 1

The version can't be determined because the stage2 compiler segfaults, so we've made it to ticket #2692

All programs produced by the stage1 compiler segfault, including

main = return ()

The gdb stack trace says:

#0 0x0025e660 in todo_block_full ()
#1 0x0027d03c in evacuate ()
#2 0x0025e898 in markWeakPtrList ()
#3 0x0025d200 in GarbageCollect ()
#4 0x00256fa4 in scheduleDoGC ()
#5 0x002570ac in exitScheduler ()
#6 0x0025631c in hs_exit_ ()
#7 0x0025646c in shutdownHaskellAndExit ()
#8 0x00254770 in real_main ()
#9 0x002547c8 in main ()

So it's dying when performing the final GC which corresponds to the error message on the ticket. This probably got broken when the new parallel GC was added.

ghc: internal error: ASSERTION FAILED: file sm/GCUtils.c, line 140

Attempting to compile the same source on mavericks gives linker errors..
/opt/gnat/gcc/lib/gcc/sparc-sun-solaris2.10/4.2.1/../../../libbfd.a(libbfd.o): In function `warn_deprecated':
/var/tmp/Binutils/binutils-2.17.50/bfd/libbfd.c:978: undefined reference to `libintl_dgettext'
/var/tmp/Binutils/binutils-2.17.50/bfd/libbfd.c:981: undefined reference to `libintl_dgettext'

.. guess the build environment wan't that sane after all. Turns out on mavericks libintl is in
/usr/local/stow/gettext-0.13/lib/libintl.so

Hacking on rts/sm/Evac.c, trying to find why the assertion above failed. Undid the STATIC_INLINEs to get a better stack trace:

(gdb) bt
#0 0xff1c5bf0 in _lwp_kill () from /lib/libc.so.1
#1 0xff164bfc in raise () from /lib/libc.so.1
#2 0xff141100 in abort () from /lib/libc.so.1
#3 0x00258d00 in rtsFatalInternalErrorFn (s=0x322428 "ASSERTION FAILED: file %s, line %u\n", ap=0xffbff330) at RtsMessages.c:164
#4 0x002589e8 in barf (s=0x322428 "ASSERTION FAILED: file %s, line %u\n") at RtsMessages.c:40
#5 0x00258a54 in _assertFail (filename=0x3241c8 "sm/GCUtils.c", linenum=147) at RtsMessages.c:55
#6 0x0026b7ac in todo_block_full (size=5, ws=0x36a1f4) at sm/GCUtils.c:147
#7 0x0029d730 in alloc_for_copy (size=5, stp=0x36a174) at sm/Evac.c:77
#8 0x0029d7ec in copy_tag (p=0xffbff5f4, info=0x27ac34, src=0xfee81244, size=5, stp=0x36a174, tag=0) at sm/Evac.c:96
#9 0x0029e408 in evacuate (p=0xffbff5f4) at sm/Evac.c:621
#10 0x0026ca5c in markWeakPtrList () at sm/MarkWeak.c:395
#11 0x00267d40 in GarbageCollect (force_major_gc=rtsFalse) at sm/GC.c:346
#12 0x0025bd60 in scheduleDoGC (cap=0x0, task=0x0, force_major=rtsFalse) at Schedule.c:1478
#13 0x0025c8ac in exitScheduler (wait_foreign=rtsFalse) at Schedule.c:2018
#14 0x00259414 in hs_exit_ (wait_foreign=rtsFalse) at RtsStartup.c:416
#15 0x002595bc in shutdownHaskellAndExit (n=0) at RtsStartup.c:554
#16 0x002550bc in real_main () at Main.c:141
#17 0x00255114 in main (argc=1, argv=0xffbffa4c) at Main.c:153


That pointer for ws looks dodgy. It's loaded via the global gct, which is stored in a global register, which is architecture specific. Adding the following to GCThread.h makes the trivial program above compile, but stage2 is still segfaulting. Will do a clean rebuild.

#if defined(sparc_HOST_ARCH)
// Don't use REG_base or R1 for gct on SPARC because they're getting clobbered
// by something else. Not sure what yet. -- BL 2009/01/03

extern __thread gc_thread* gct;
#define DECLARE_GCT __thread gc_thread* gct;


Building ghc-HEAD on sparky. The alex I installed before was the wrong version for some reason, reinstall alex-2.2.

My attempted build of GHC 6.8.3 stage3 on sparky seems to have gone to sleep. Top shows it's done 15 sec of work all day. On further investigation, running stage2/ghc-inplace shows that it does nothing except sleep forever. A hello program built by the stage1 compiler segfaults. Lesson learned: test stage1 before building stage2.

Can't find gdb on sparky. Tried compiling gdb 6.8 from source. Died. FFS.
remote.c: In function `extended_remote_attach_1':
remote.c:2859: warning: unsigned int format, pid_t arg (arg 3)


When running tests for a patched GHC 6.10.1 on mavericks, have to manually supply a -L libdir to avoid it seeing the system 64 bit gmp libs. If it sees them then ld emits a warning to the console, which makes the test framework think the compile failed. Use:
make stage=1 WAY=optc TEST=cg036 EXTRA_HC_OPTS=-L/data0/home/benl/lib


Test framework was crashing because it was built with the old compiler. Must remember that all the framework stuff is built with stage1, not the host compiler.

Running the patched GHC 6.10.1 against the codeGen tests with just WAY=optc. They seem to be going though, so I'm hoping 6.10.1 is fixed for SPARC. Will run full testsuite tomorrow when the stage2 build is done.

No comments:

Post a Comment