Property-based testing

Defining properties of a function which are expected to hold (unit testing)
Checking these properties using automatic data generation (random, exhaustive)
When the property is falsified, getting a counter-example to fix the properties or the code
Benefits of this approach
- Specifications are part of the code (through properties)
- Force the developer to think about what the code should do (test-driven development)
- Automatic data generation helps finding corner-cases (in both the properties and the code)

Haskell Libraries

QuickCheck [Claessen and Hughes, 2000]: random data generation
SmallCheck / Lazy SmallCheck [Runciman et al., 2008]: exhaustive data generation (small scope hypothesis)
SmartCheck : generalizing counter-examples (no need of user-defined shrink)
Feat [Duregard, 2012] (functional enumeration of algebraic types): efficient exhaustive data generation

Outline

Small tutorial of the QuickCheck library (based on the UCSD and Chalmers Haskell courses)
Quick overview of the alternatives: SmallCheck, Lazy SmallCheck, SmartCheck, Feat
Few methodology comments

QuickCheck Library

An embedded DSL for defining properties
Random data generation
An embedded DSL for defining random data generators

Properties

Simple Bool functions

prop_RevUnit:: [Int] -> Bool
prop_RevUnit x =
    reverse [x] == [x]

prop_RevApp:: [Int] -> [Int] -> Bool
prop_RevApp xs ys =
    reverse (xs ++ ys) == reverse ys ++ reverse xs

prop_RevRev:: [Int] -> Bool
prop_RevRev xs =
    reverse (reverse xs) == xs

Testing prop_RevApp with QuickCheck

main> quickCheck prop_RevApp
+++ OK, passed 100 tests

QuickChecking QuickSort

qsort []     = []
qsort (x:xs) = qsort lhs ++ [x] ++ qsort rhs
    where lhs  = [y | y <- xs, y < x]
          rhs  = [z | z <- xs, z > x]

Verify that the resulting list is ordered.

isOrdered (x1:x2:xs) = x1 <= x2 && isOrdered (x2:xs)
isOrdered _          = True

prop_qsort_isOrdered :: [Int] -> Bool
prop_qsort_isOrdered = isOrdered . qsort

Model-based Testing

Instead of having multiple properties, we can give a reference implementation (observational equivalence).

prop_qsort_sort    :: [Int] -> Bool
prop_qsort_sort xs =  qsort xs == sort xs

The model may be inefficient but its correctness can be easily checked by hand.

Does this property hold?

Duplicates are erased...

Conditional Properties

alldiff (x:xs) = not (x `elem` xs) && alldiff xs
alldiff []     = True

prop_qsort_alldiff_sort :: [Int] -> Property
prop_qsort_alldiff_sort xs = 
   (alldiff xs) ==> qsort xs == sort xs

Instead of Bool, the output of the function is a Property, a special type built into the QuickCheck library. . . .

The implies combinator ==> is one of many QuickCheck combinators (e.g., .&&., .||., forall).

Quantified Properties

insert x []                 = [x]
insert x (y:ys) | x < y     = x : y : ys
                | otherwise = y : insert x ys

We can now define the following property:

prop_insert_ordered :: Int -> [Int] -> Property 
prop_insert_ordered x xs = 
   isOrdered xs ==> isOrdered (insert x xs)

What happens?

main> quickCheck prop_insert_ordered 
*** Gave up! Passed only 78 tests.

Most generated values are discarded by the ==> combinator.

Solution: write customized generators and quantified properties.

Quantified Properties

Properties may take the form:

forAll :: (Testable prop, Show a) => Gen a -> (a -> prop) -> Property

Instead of discarding test cases that are not ordered, we provide a user-defined generator of ordered lists:

prop_insert_ordered2 x = forAll orderedList $ \xs -> ordered (insert x xs)
  where types = x::Int

Observing Test Case Distribution

Two special QuickCheck property combinators:

collect  :: Show a => a -> Property -> Property
classify :: Bool -> String -> Property -> Property

We can now redefine the prop_insert_ordered property:

prop_insert_ordered3 :: Int -> [Int] -> Property 
prop_insert_ordered3 x xs = 
   collect ((length xs `div` 10) * 10) $
   classify (isOrdered xs) "ord" $
   classify (not (isOrdered xs)) "not-ord" $
   not (isOrdered xs) || isOrdered (insert x xs)

We get details on the generated data:

main> quickCheck prop_insert_ordered
+++ OK, passed 100 tests:
 18% 0, not-ord
 17% 10, not-ord
 14% 40, not-ord
 ...

Generator Monad

A Haskell function that generates random values:

newtype Gen a = MkGen { unGen :: StdGen -> Int -> a }

unGen g takes a random number generator, a size and returns a a value.

Gen looks like a Reader monad that passes a random number generator and the current size around:

instance Monad Gen where
  return x =
    MkGen (\_ _ -> x)

  MkGen m >>= k =
    MkGen (\r n ->
      let (r1, r2)  = split r
          MkGen m' = k (m r1 n)
       in m' r2 n
    )

Data Generation

The class of types for which random values can be generated:

class Arbitrary a where
  arbitrary :: Gen a
  shrink :: a -> [a]

Main> sample (arbitrary:: Gen Bool)

The QuickCheck library provides:

Instances for base types
Various generator combinators (choose, elements, oneof, frequency, etc.)

`choose`

choose :: (System.Random.Random a) => (a, a) -> Gen a

choose takes an interval and returns a random value from that interval.

main> sample $ choose (0, 3)
2
1
0
2
1
0
2
3
0
0

`elements`

elements :: [a] -> Gen a

elements generates one of the given values.

Example:

data Suit = Spades | Hearts | Diamonds | Clubs
  deriving (Show,Eq)

instance Arbitrary Suit where
  arbitrary = elements [Spades , Hearts , Diamonds , Clubs]

`oneof`

oneof :: [Gen a] -> Gen a

oneof randomly uses one of the given generators.

Example:

data Rank = Numeric Integer | Jack | Queen | King | Ace
  deriving (Show,Eq)

rank :: Gen Rank
rank = oneof
    [ return Jack
    , return Queen
    , return King
    , return Ace
    , do r <- choose (2,10)
         return (Numeric r)
    ]

main> sample rank

What is wrong with this definition?

`frequency`

We can fix it with the frequency combinator:

rank2 :: Gen Rank
rank2 = frequency
    [ (1, return Jack)
    , (1, return Queen)
    , (1, return King)
    , (1, return Ace)
    , (9, do r <- choose (2,10)
             return (Numeric r))
    ]

We now get a uniform distribution for the ranks.

main> sample rank2

Recursive datatypes

Generating lists:

genList1 ::  (Arbitrary a) => Gen [a]
genList1 = liftM2 (:) arbitrary genList1

genList2 ::  (Arbitrary a) => Gen [a]
genList2 = oneof [ return []
                 , liftM2 (:) arbitrary genList2]

genList3 ::  (Arbitrary a) => Gen [a]
genList3 = frequency [ (1, return [])
                     , (4, liftM2 (:) arbitrary genList3) ]

Generating ordered lists:

genOrdList = fmap sort genList3

prop_insert :: Int -> Property 
prop_insert x = forAll genOrdList $ \xs -> isOrdered (insert x xs)

How to control the size of the generated data ?

`sized`

The sized combinator makes the size bound argument of the MkGen function visible to the user.

list :: Arbitrary a => Int -> Gen [a]
list 0 = return []
list n = liftM2 (:) arbitrary (list (n-1))

genList4 :: Arbitrary a => Gen [a]
genList4 = sized list

It still is not a uniform distribution of lists (see the more complex implementation provided by QuickCheck)

As shown in the Feat paper, it is really hard to get a uniform distribution for tree-like structures...

Shrinking

The shrink function reduces the size of the generated counter-example (introduced in QuickCheck2).

class Arbitrary a where
  arbitrary :: Gen a
  shrink :: a -> [a]

shrink produces a (possibly) empty list of all the possible immediate shrinks of the given value.

The default implementation returns the empty list.

Shrinking Binary Trees

data Tree a = Nil | Branch a (Tree a) (Tree a)

Most implementations of shrink should try at least three things:

Type-specific shrinkings such as replacing a constructor by a simpler constructor.
Shrink a term to any of its immediate subterms of the same type.
Recursively apply shrink to all immediate subterms.

shrink Nil = []
shrink (Branch x l r) =
  -- shrink Branch to Nil
  [Nil] ++
  -- shrink to subterms
  [l, r] ++
  -- recursively shrink subterms
  [Branch x' l' r' | (x', l', r') <- shrink (x, l, r)]

QuickCheck provides a genericShrink function (based on the Generic and Typeable type classes) that help users defining basic shrink functions.

Advanced Usage

Testing polymorphic properties
Shrinking heuristic for user-defined data types
Testing higher-order functions
Testing monadic code (e.g. in the IO or ST monad)
Measuring test coverage with HPC

Limits of QuickCheck

Random generation is good for middle-sized input
QuickCheck does not provide any guarantee about the correctness of the property
Difficulty to define uniform generators and shrinking heuristics
Even with customized shrink, no guarantee that we have found the smallest counter-example

SmallCheck

Let us revisit the insert function example:

insert x []                 = [x]
insert x (y:ys) | x < y     = x : y : ys
                | otherwise = y : insert x ys

prop_insert_ordered :: Int -> [Int] -> Property 
prop_insert_ordered x xs = 
   isOrdered xs ==> isOrdered (insert x xs)

Remember that we had to give up with QuickCheck:

main> quickCheck prop_insert_ordered 
*** Gave up! Passed only 78 tests.

With SmallCheck, we can verify this property on all inputs of depth <= d:

main> smallCheck 7 prop_insert_ordered

This will try to falsify the property with all the lists of size<=7

Main Contributions of SmallCheck

Establish complete coverage of a defined test-space (depth <= d)
Make sure that any counter-example found is minimal
Writing test generators is straightforward and can be automated
Allow for the definition of properties using existentials as well as universals
Display counter-examples of functional type (no more true, some new contribution to QuickCheck)

Data Generation

Instead of Arbitrary, SmallCheck defines the Serial type class that provides a series function.

data Tree a = Nil | Branch a (Tree a) (Tree a)

instance Serial m a => Serial m (Tree a) where
  series = cons0 Null \/ cons3 Branch

series n is the finite list of all values of depth n.

type Series a = Int -> [a]

Existential Quantifier

Suppose we define a isPrefix function. How to test its correctness ?

prop_isPrefix xs xs’ = isPrefix xs (xs ++ xs’)

This property is a necessary condition but it is not sufficient. For example, it holds for the following bogus definition:

isPrefix xs ys = True

What we would like to check is:

prop_isPrefixSound xs ys =
    isPrefix xs ys ==>
    exists $ \xs’ -> xs ++ xs’ == ys

SmallCheck Function Example

Let us define a bogus property that every boolean binary operator is associative.

prop_assoc op = \x y z ->
    (x ‘op‘ y) ‘op‘ z == x ‘op‘ (y ‘op‘ z)
    where typeInfo = op :: Bool -> Bool -> Bool

smallCheck prop_assoc
Depth 0:
Failed test no. 22. Test values follow.
{True->{True->True;False->True};
False->{True->False;False->True}}
False
True
False

The depth of a function is defined by the number of its nested cases.

Lazy SmallCheck

main> smallCheck 7 prop_insert_ordered

Finished in less than 1 second (very very long in SmallCheck)

How does it work ?

Tries to detect as soon as possible when a partial defined data falsifies the ordered condition.
It begins by testing undefined as the value and refines it by need. The demands of the test property guide the exploration of the test-data space.
For example isOrdered [1:0:undefined] == False. So we can cut this test-data generation branch.

New version of LazySmallCheck with functions and existentials (2012)!

SmartCheck

Generalize counter-examples with universal quantifiers:

prop_RevApp_false:: ([Int],[Int]) -> Bool
prop_RevApp_false (xs,ys) =
    reverse (xs ++ ys) == reverse xs ++ reverse ys

main = smartCheck scStdArgs prop_RevApp_false

How does it work?

Shrinks the counter-example (no need to define shrink)
Generalizes values in the counter-example (extrapolating values)
Generalizes constructors when the datatype has a finite number of constructors (extrapolating constructors)

Feat

Feat (Functional Enumeration of Algebraic Types) provides enumerations as functions from natural numbers to values.

These enumerations can be automatically generated for any algebraic datatype using template Haskell.

-- use template haskell to generate an instance of Enumerable  
deriveEnumerable ''Tree

SmallCheck and QuickCheck generators can be derived from these enumerations.

-- create a uniform distribution from an Enumerable
instance (Enumerable a, Arbitrary a) => Arbitrary (Tree a) where
  arbitrary = sized uniform

What are the benefits?

Free uniform generators for QuickCheck
Particular useful for large abstract syntax trees (less terms of size n than of depth n)

Feat helped find bugs in template-haskell and haskell-src-exts.

Some Methodology Comments

Can these approaches be combined?

Start by checking with SmallCheck if there are some small inputs that already falsify the property.
Then try to run it with QuickCheck to find middle-sized counter-examples.
Be careful about the distribution of the values. Use Feat if possible (e.g., data does not need to satisfy a condition).
If some specific corner-cases are suspected (e.g., large integers), define customized generators.
When testing complex data-structures (e.g., ASTs), use SmartCheck to generalize the counter-examples given by QuickCheck

QuickCheck and SmallCheck provide integration with Tasty and HSpec, two popular testing frameworks.

Conclusion

Real-size examples (QuviQ [http://www.quviq.com])

"QuickCheck forced me to think about the properties of the library more closely - I changed some design decisions after it turned out that the properties disagreed with me. The one thing QuickCheck helped with more than anything though was refactoring despite a massive number of the functions all depending on each other, QuickCheck allows me to change the behaviour of one function in some obscure case and check that no other function was relying on that."

Re-implementations of QuickCheck for other languages (Python, Java)
A lot of new work in this domain (e.g., Feat with constraints, QuickSpec)
My own experience with QuickCheck and SmallCheck:
- Java code generator rewritten in Haskell : found a lot of corner cases we did not find in our previous Java implementation
- Implementation of Feat in the Tom language for testing Java code (both QuickCheck and SmallCheck generation strategies)

Testing Functions in QuickCheck

We can define Tree as an instance of the Functor class:

instance Functor Tree where
  fmap f (Fork t u) = Fork (fmap f t) (fmap f u)
  fmap f (Leaf x) = Leaf (f x)

And now we check the functor laws with QuickCheck:

law1 :: Tree Int -> Bool
law1 t = fmap id t == t

law2 :: (Int -> Int) -> (Int -> Int) -> Tree Int -> Bool
law2 f g t = fmap (f . g) t == fmap f (fmap g t)

main = do
  quickCheck law1
  quickCheck law2

Introduction to property-based testing in Haskell

Property-based testing

Haskell Libraries

Outline

QuickCheck Library

Properties

QuickChecking QuickSort

Model-based Testing

Conditional Properties

Quantified Properties

Quantified Properties

Observing Test Case Distribution

Generator Monad

Data Generation

choose

elements

oneof

frequency

Recursive datatypes

sized

Shrinking

Shrinking Binary Trees

Advanced Usage

Limits of QuickCheck

SmallCheck

Main Contributions of SmallCheck

Data Generation

Existential Quantifier

SmallCheck Function Example

Lazy SmallCheck

SmartCheck

Feat

Some Methodology Comments

Conclusion

Testing Functions in QuickCheck

`choose`

`elements`

`oneof`

`frequency`

`sized`