Goal of this talk

Simplify the understanding of the runtime behaviour of Haskell programs compiled with GHC. For example,

Essentially, this talk provides an introduction to GHC’s RTS commentary, which guides through GHC’s RTS sources.

Memory layout of (heap) objects

GHC uses the same basic layout for every object in the RTS.

Basic closure layout Basic info table layout
Basic closure layout Basic info table layout

Structure of a header.

typedef struct {
    const struct _StgInfoTable* info;
#ifdef PROFILING
    StgProfHeader         prof;
#endif
#ifdef GRAN
    StgGranHeader         gran;
#endif
} StgHeader;

For more information, see the GHC commentary.

Example: a list of Int’s

data List a = Nil | One a (List a)

ex1 :: List Int
ex1 = One 1 (One 2 (One 3 Nil))

Example: a list of unboxed Int’s

data ListInt = NilI | OneI {-# UNPACK #-} !Int ListInt

ex1 :: ListInt
ex1 = OneI 1 (OneI 2 (OneI 3 NilI))

Normal forms

normal form (NF)

An expression in normal form is fully evaluated, and no sub-expression could be evaluated any further (i.e. it contains no un-evaluated thunks).

weak head normal form (WHNF)

An expression in weak head normal form has been evaluated to the outermost data constructor or lambda abstraction (the head).

-- in NF
42
(2, "hello")
\x -> (x + 1)

-- not in NF
1 + 2                 -- we could evaluate this to 3
(\x -> x + 1) 2       -- we could apply the function
"he" ++ "llo"         -- we could apply the (++)
(1 + 1, 2 + 2)        -- we could evaluate 1 + 1 and 2 + 2

-- in WHNF 
(1 + 1, 2 + 2)       -- the outermost part is the data constructor (,)
\x -> 2 + 2          -- the outermost part is a lambda abstraction
'h' : ("e" ++ "llo") -- the outermost part is the data constructor (:)

-- not in WHNF
1 + 2                -- the outermost part here is an application of (+)
(\x -> x + 1) 2      -- the outermost part is an application of (\x -> x + 1)
"he" ++ "llo"        -- the outermost part is an application of (++)

For more information, see this StackOverflow question.

Haskell evaluation

Case distinctions are the fuel of the execution of Haskell code.

A case distinction

case x of _ -> e

forces the evaluation of the value denoted by the variable x to WHNF when the expression e is evaluated to WHNF.

Ultimately, Haskell programs are run by forcing the value of (runIO main) theWorld# to WHNF, where


newtype IO a = IO { runIO :: World# -> (# a, World# #) }

theWorld# :: World#    -- magical reference to the world

For more information on the implementation of the IO monad, see GHC.Types.

Example: summing a list of Int’s (I)

sumList :: List Int -> Int
sumList Nil        = 0
sumList (One x xs) = x + sumList xs

test :: Int
test = sumList (One 1 (One 2 (One 3 Nil)))

High-level outline of evaluating test to WHNF:

   test
~> sumList (One 1 (One 2 (One 3 Nil)))
~> 1 + sumList (One 2 (One 3 Nil))
~> 1 + (2 + sumList (One 3 Nil))
~> 1 + (2 + (3 + (sumList Nil)))
~> 1 + (2 + (3 + 0))
~> 1 + (2 + 3)
~> 1 + 5
~> 6

Example: summing a list of Int’s (II)

sumList :: List Int -> Int
sumList xs = case xs of
               Nil      -> 0
               One x xs -> x + sumList xs

test :: Int
test = sumList ex1

ex1 = One 1 ex1_1
ex1_1 = One 2 ex1_2
ex1_2 = One 3 ex1_3
ex1_3 = Nil

Evaluating test to WHNF with explicit stack management.

   test                                    <| []                  -- tail call
~> sumList ex_1                            <| []                  -- tail call
~> case ex1 of Nil      -> 0
               One x xs -> x + sumList xs  <| []                  -- push return-closure onto stack
~> ex1                                     <| [case ?? of ...]    -- tail call
~> One 1 ex1_1                             <| [case ?? of ...]    -- WHNF reached: return control to caller, i.e.,
                                                                  --   jump to the top-most stack frame
~> case One 1 ex1_1 of ...                 <| []                  -- select appropriate case
~> (+) 1 (sumList ex1_1)                   <| []                  -- force 'sumList ex1_1' to WHNF 
                                                                  --   => push return-closure onto stack
~> sumList ex1_1                           <| [(+) 1 ??]          -- tail call
~> case ex1_1 of ...                       <| [(+) 1 ??]          -- push return-closure onto stack
~> ex1_1                                   <| [case ?? of ..., (+) 1 ??]  -- ...
~> One 2 ex1_2                             <| [case ?? of ..., (+) 1 ??]  -- ...
~> case One 2 ex1_2 of ...                 <| [(+) 1 ??]
~> (+) 2 (sumList ex1_2)                   <| [(+) 1 ??]
~> sumList ex1_2                           <| [(+) 2 ??, (+) 1 ??]
~> .... ~>
~> 0                                       <| [(+) 3 ??, (+) 2 ??, (+) 1 ??]
~> (+) 0 3                                 <| [(+) 2 ??, (+) 1 ??]
~> 3                                       <| [(+) 2 ??, (+) 1 ??]
~> (+) 2 3                                 <| [(+) 1 ??]
~> 5                                       <| [(+) 1 ??]
~> (+) 1 5                                 <| []
~> 6                                       <| []

Evaluation costs

When executing a Haskell program, time is spent on

The Spineless Tagless G-machine (STG machine) makes these costs explicit.

Overview: GHC’s compilation pipeline

For more information, see this and that.

Invariants established by CorePrep

-- not in ANF --
----------------
\x -> f (g x) [x]    -- use of compound arguments 'g x' and '[x]'

\x -> (x:)           -- unsaturated use of ':' constructor

case x of            -- nested pattern in case distinction
  [y] -> e1
   _  -> e_def

-- in ANF --
------------
\x -> let y = g x 
          z = [x]
      in f y z

\x y -> x:y

let zz = e_def
in case x of
      []     -> zz
      x1:xs1 -> case xs1 of
                  [] -> e1
                  _  -> zz

Example: core-prep of sumList

data List a = Nil | One a (List a)

sumList :: List Int -> Int
sumList Nil        = 0
sumList (One x xs) = x + sumList xs

ex1 :: List Int
ex1 = One 1 (One 2 (One 3 Nil))

test = sumList ex1

Compiled with ghc -O2 -dsuppress-all -ddump-prep List.hs yields

==================== CorePrep ====================
ex6 = I# 1
ex5 = I# 2
ex4 = I# 3
ex3 = One ex4 (Nil)
ex2 = One ex5 ex3
ex1 = One ex6 ex2

Rec {
$wsumList =
  \ w_sca ->
    case w_sca of _ {
      Nil -> 0;
      One x_sce xs_sch ->
        case x_sce of _ { I# x1_scj ->
        case $wsumList xs_sch of ww_sck { __DEFAULT -> +# x1_scj ww_sck }
        }
    }
end Rec }

sumList =
  \ w_scm ->
    case $wsumList w_scm of ww_sco { __DEFAULT -> I# ww_sco }

test = case $wsumList ex1 of ww_scq { __DEFAULT -> I# ww_scq }

Nil = Nil
One = \ @ a_aaf eta_B2 eta_B1 -> One eta_B2 eta_B1

Example: map over a list of Int’s

mapList :: (a -> b) -> List a -> List b
mapList f Nil        = Nil
mapList f (One x xs) = One (f x) (mapList f xs)

foo :: Int -> Int
foo x = x + 0xDeadBeef

test :: List Int
test = mapList foo (One 1 (One 2 (One 3 Nil)))

Compiled with ghc -O2 -dsuppress-all -ddump-prep List.hs yields

==================== CorePrep ====================
Rec {
mapList =
  \ @ a_abc @ b_abd f_sct ds_sco ->
    case ds_sco of _ {
      Nil -> Nil;
      One x_scs xs_scv ->
        let {
          sat_scE = mapList f_sct xs_scv } in
        let {
          sat_scF = f_sct x_scs } in
        One sat_scF sat_scE
    }
end Rec }

foo =
  \ x_scy ->
    case x_scy of _ { I# x1_scB ->
    case +# x1_scB 3735928559 of sat_scG { __DEFAULT -> I# sat_scG }
    }

test6 = I# 1
test5 = I# 2
test4 = I# 3
test3 = One test4 (Nil)
test2 = One test5 test3
test1 = One test6 test2
test = mapList foo test1

Nil = Nil
One = \ @ a_aab eta_B2 eta_B1 -> One eta_B2 eta_B1

Exploiting our newly gained knowledge (I)

What implementation is faster?

sumN :: Int -> Int
sumN 0 = 0
sumN n = n + sumN (n - 1)

sumNAcc :: Int -> Int
sumNAcc =
    go 0
  where
    go acc 0 = acc
    go acc n = go (n + acc) (n -1)

Exploiting our newly gained knowledge (II)

It’s time for some interactive experimentation; i.e., let’s have some fun with

ghc -O2 -dsuppress-all -ddump-prep

and all the other debugging flags provided by GHC.

Note that some of the examples are included in the sources of this talk at Github.

The end…

..and further references for the curious:

Unfinished appendix: memory layout during execution (I)

mapList f Nil        = Nil
mapList f (One x xs) = One (f x) (mapList f xs)

foo :: Int -> Int
foo x = x + 0xDeadBeef

Memory layout before reducing mapList foo (One 1 (One 2 Nil)) to WHNF.

TODO: Image

Unfinished appendix: memory layout during execution (II)

mapList f Nil        = Nil
mapList f (One x xs) = One (f x) (mapList f xs)

foo :: Int -> Int
foo x = x + 0xDeadBeef

Memory layout after reducing mapList foo (One 1 (One 2 Nil)) to WHNF.

TODO: Image

Unfinished appendix: common closure types

For more information, see the GHC commentary.