The horizontal lines are hardware threads, with the darkness of the line corresponding to the number of instructions executed per unit of time. The ticks along the X axis give the number of seconds wall-clock runtime.
(sorry there are no numbers on the axes, custom hackery and all)
These are with 8, 32 and 64 haskell threads (+RTS -N8 ..) respectively. sumeuler scales well, so in the -N64 case we get even machine utilization and a short runtime.
In the plot for -N8, the system seems to favor the the first thread of each core, so every 8th trace is darker. The higher threads aren't used as much. There seems to more activity in the last two cores.
The plot for matmult reveals distinct phases of high and low parallelism. At about 7 sec into the -N32 plot we see a single running Haskell thread shuffling between the cores.
In this plot we see a faint background computation, along with what I expect are regular periods of garbage collection. However, it appears as though most of the work is being done by a single Haskell thread. The plots for 8 threads and 1 thread have a similar runtime.
I am eagerly awaiting the release of ThreadScope so I can match these hardware profiles up with GHC's view of the world.