Introduction to property-based testing in Haskell
Emilie Balland
29/01/15
Property-based testing
- Defining properties of a function which are expected to hold (unit testing)
- Checking these properties using automatic data generation (random, exhaustive)
- When the property is falsified, getting a counter-example to fix the properties or the code
- Benefits of this approach
- Specifications are part of the code (through properties)
- Force the developer to think about what the code should do (test-driven development)
- Automatic data generation helps finding corner-cases (in both the properties and the code)
Haskell Libraries
- QuickCheck [Claessen and Hughes, 2000]: random data generation
- SmallCheck / Lazy SmallCheck [Runciman et al., 2008]: exhaustive data generation (small scope hypothesis)
- SmartCheck : generalizing counter-examples (no need of user-defined shrink)
- Feat [Duregard, 2012] (functional enumeration of algebraic types): efficient exhaustive data generation
Outline
- Small tutorial of the QuickCheck library (based on the UCSD and Chalmers Haskell courses)
- Quick overview of the alternatives: SmallCheck, Lazy SmallCheck, SmartCheck, Feat
- Few methodology comments
QuickCheck Library
- An embedded DSL for defining properties
- Random data generation
- An embedded DSL for defining random data generators
Properties
Simple Bool
functions
prop_RevUnit:: [Int] -> Bool
prop_RevUnit x =
reverse [x] == [x]
prop_RevApp:: [Int] -> [Int] -> Bool
prop_RevApp xs ys =
reverse (xs ++ ys) == reverse ys ++ reverse xs
prop_RevRev:: [Int] -> Bool
prop_RevRev xs =
reverse (reverse xs) == xs
Testing prop_RevApp
with QuickCheck
main> quickCheck prop_RevApp
+++ OK, passed 100 tests
QuickChecking QuickSort
qsort [] = []
qsort (x:xs) = qsort lhs ++ [x] ++ qsort rhs
where lhs = [y | y <- xs, y < x]
rhs = [z | z <- xs, z > x]
Verify that the resulting list is ordered.
isOrdered (x1:x2:xs) = x1 <= x2 && isOrdered (x2:xs)
isOrdered _ = True
prop_qsort_isOrdered :: [Int] -> Bool
prop_qsort_isOrdered = isOrdered . qsort
Model-based Testing
Instead of having multiple properties, we can give a reference implementation (observational equivalence).
prop_qsort_sort :: [Int] -> Bool
prop_qsort_sort xs = qsort xs == sort xs
The model may be inefficient but its correctness can be easily checked by hand.
Conditional Properties
alldiff (x:xs) = not (x `elem` xs) && alldiff xs
alldiff [] = True
prop_qsort_alldiff_sort :: [Int] -> Property
prop_qsort_alldiff_sort xs =
(alldiff xs) ==> qsort xs == sort xs
Instead of Bool
, the output of the function is a Property
, a special type built into the QuickCheck library. . . .
The implies combinator ==>
is one of many QuickCheck combinators (e.g., .&&.
, .||.
, forall
).
Quantified Properties
insert x [] = [x]
insert x (y:ys) | x < y = x : y : ys
| otherwise = y : insert x ys
We can now define the following property:
prop_insert_ordered :: Int -> [Int] -> Property
prop_insert_ordered x xs =
isOrdered xs ==> isOrdered (insert x xs)
What happens?
main> quickCheck prop_insert_ordered
*** Gave up! Passed only 78 tests.
Most generated values are discarded by the ==>
combinator.
Solution: write customized generators and quantified properties.
Quantified Properties
Properties may take the form:
forAll :: (Testable prop, Show a) => Gen a -> (a -> prop) -> Property
Instead of discarding test cases that are not ordered, we provide a user-defined generator of ordered lists:
prop_insert_ordered2 x = forAll orderedList $ \xs -> ordered (insert x xs)
where types = x::Int
Observing Test Case Distribution
Two special QuickCheck property combinators:
collect :: Show a => a -> Property -> Property
classify :: Bool -> String -> Property -> Property
We can now redefine the prop_insert_ordered
property:
prop_insert_ordered3 :: Int -> [Int] -> Property
prop_insert_ordered3 x xs =
collect ((length xs `div` 10) * 10) $
classify (isOrdered xs) "ord" $
classify (not (isOrdered xs)) "not-ord" $
not (isOrdered xs) || isOrdered (insert x xs)
We get details on the generated data:
main> quickCheck prop_insert_ordered
+++ OK, passed 100 tests:
18% 0, not-ord
17% 10, not-ord
14% 40, not-ord
...
Generator Monad
A Haskell function that generates random values:
newtype Gen a = MkGen { unGen :: StdGen -> Int -> a }
unGen g
takes a random number generator, a size and returns a a
value.
Gen
looks like a Reader
monad that passes a random number generator and the current size around:
instance Monad Gen where
return x =
MkGen (\_ _ -> x)
MkGen m >>= k =
MkGen (\r n ->
let (r1, r2) = split r
MkGen m' = k (m r1 n)
in m' r2 n
)
Data Generation
The class of types for which random values can be generated:
class Arbitrary a where
arbitrary :: Gen a
shrink :: a -> [a]
Main> sample (arbitrary:: Gen Bool)
The QuickCheck library provides:
- Instances for base types
- Various generator combinators (
choose
, elements
, oneof
, frequency
, etc.)
choose
choose :: (System.Random.Random a) => (a, a) -> Gen a
choose
takes an interval and returns a random value from that interval.
main> sample $ choose (0, 3)
2
1
0
2
1
0
2
3
0
0
elements
elements :: [a] -> Gen a
elements
generates one of the given values.
Example:
data Suit = Spades | Hearts | Diamonds | Clubs
deriving (Show,Eq)
instance Arbitrary Suit where
arbitrary = elements [Spades , Hearts , Diamonds , Clubs]
oneof
oneof :: [Gen a] -> Gen a
oneof
randomly uses one of the given generators.
Example:
data Rank = Numeric Integer | Jack | Queen | King | Ace
deriving (Show,Eq)
rank :: Gen Rank
rank = oneof
[ return Jack
, return Queen
, return King
, return Ace
, do r <- choose (2,10)
return (Numeric r)
]
main> sample rank
What is wrong with this definition?
frequency
We can fix it with the frequency
combinator:
rank2 :: Gen Rank
rank2 = frequency
[ (1, return Jack)
, (1, return Queen)
, (1, return King)
, (1, return Ace)
, (9, do r <- choose (2,10)
return (Numeric r))
]
We now get a uniform distribution for the ranks.
main> sample rank2
Recursive datatypes
Generating lists:
genList1 :: (Arbitrary a) => Gen [a]
genList1 = liftM2 (:) arbitrary genList1
genList2 :: (Arbitrary a) => Gen [a]
genList2 = oneof [ return []
, liftM2 (:) arbitrary genList2]
genList3 :: (Arbitrary a) => Gen [a]
genList3 = frequency [ (1, return [])
, (4, liftM2 (:) arbitrary genList3) ]
Generating ordered lists:
genOrdList = fmap sort genList3
prop_insert :: Int -> Property
prop_insert x = forAll genOrdList $ \xs -> isOrdered (insert x xs)
How to control the size of the generated data ?
sized
The sized
combinator makes the size bound argument of the MkGen
function visible to the user.
list :: Arbitrary a => Int -> Gen [a]
list 0 = return []
list n = liftM2 (:) arbitrary (list (n-1))
genList4 :: Arbitrary a => Gen [a]
genList4 = sized list
It still is not a uniform distribution of lists (see the more complex implementation provided by QuickCheck)
As shown in the Feat paper, it is really hard to get a uniform distribution for tree-like structures...
Shrinking
The shrink
function reduces the size of the generated counter-example (introduced in QuickCheck2).
class Arbitrary a where
arbitrary :: Gen a
shrink :: a -> [a]
shrink
produces a (possibly) empty list of all the possible immediate shrinks of the given value.
The default implementation returns the empty list.
Shrinking Binary Trees
data Tree a = Nil | Branch a (Tree a) (Tree a)
Most implementations of shrink should try at least three things:
- Type-specific shrinkings such as replacing a constructor by a simpler constructor.
- Shrink a term to any of its immediate subterms of the same type.
- Recursively apply shrink to all immediate subterms.
shrink Nil = []
shrink (Branch x l r) =
-- shrink Branch to Nil
[Nil] ++
-- shrink to subterms
[l, r] ++
-- recursively shrink subterms
[Branch x' l' r' | (x', l', r') <- shrink (x, l, r)]
QuickCheck provides a genericShrink
function (based on the Generic
and Typeable
type classes) that help users defining basic shrink functions.
Advanced Usage
- Testing polymorphic properties
- Shrinking heuristic for user-defined data types
- Testing higher-order functions
- Testing monadic code (e.g. in the IO or ST monad)
- Measuring test coverage with HPC
Limits of QuickCheck
- Random generation is good for middle-sized input
- QuickCheck does not provide any guarantee about the correctness of the property
- Difficulty to define uniform generators and shrinking heuristics
- Even with customized
shrink
, no guarantee that we have found the smallest counter-example
SmallCheck
Let us revisit the insert function example:
insert x [] = [x]
insert x (y:ys) | x < y = x : y : ys
| otherwise = y : insert x ys
prop_insert_ordered :: Int -> [Int] -> Property
prop_insert_ordered x xs =
isOrdered xs ==> isOrdered (insert x xs)
Remember that we had to give up with QuickCheck:
main> quickCheck prop_insert_ordered
*** Gave up! Passed only 78 tests.
With SmallCheck, we can verify this property on all inputs of depth <= d:
main> smallCheck 7 prop_insert_ordered
This will try to falsify the property with all the lists of size<=7
Main Contributions of SmallCheck
- Establish complete coverage of a defined test-space (depth <= d)
- Make sure that any counter-example found is minimal
- Writing test generators is straightforward and can be automated
- Allow for the definition of properties using existentials as well as universals
- Display counter-examples of functional type (no more true, some new contribution to QuickCheck)
Data Generation
Instead of Arbitrary
, SmallCheck defines the Serial
type class that provides a series
function.
data Tree a = Nil | Branch a (Tree a) (Tree a)
instance Serial m a => Serial m (Tree a) where
series = cons0 Null \/ cons3 Branch
series n
is the finite list of all values of depth n
.
type Series a = Int -> [a]
Existential Quantifier
Suppose we define a isPrefix
function. How to test its correctness ?
prop_isPrefix xs xs’ = isPrefix xs (xs ++ xs’)
This property is a necessary condition but it is not sufficient. For example, it holds for the following bogus definition:
isPrefix xs ys = True
What we would like to check is:
prop_isPrefixSound xs ys =
isPrefix xs ys ==>
exists $ \xs’ -> xs ++ xs’ == ys
SmallCheck Function Example
Let us define a bogus property that every boolean binary operator is associative.
prop_assoc op = \x y z ->
(x ‘op‘ y) ‘op‘ z == x ‘op‘ (y ‘op‘ z)
where typeInfo = op :: Bool -> Bool -> Bool
smallCheck prop_assoc
Depth 0:
Failed test no. 22. Test values follow.
{True->{True->True;False->True};
False->{True->False;False->True}}
False
True
False
The depth of a function is defined by the number of its nested cases.
Lazy SmallCheck
main> smallCheck 7 prop_insert_ordered
Finished in less than 1 second (very very long in SmallCheck)
How does it work ?
Tries to detect as soon as possible when a partial defined data falsifies the ordered
condition.
It begins by testing undefined
as the value and refines it by need. The demands of the test property guide the exploration of the test-data space.
For example isOrdered [1:0:undefined] == False
. So we can cut this test-data generation branch.
New version of LazySmallCheck with functions and existentials (2012)!
SmartCheck
Generalize counter-examples with universal quantifiers:
prop_RevApp_false:: ([Int],[Int]) -> Bool
prop_RevApp_false (xs,ys) =
reverse (xs ++ ys) == reverse xs ++ reverse ys
main = smartCheck scStdArgs prop_RevApp_false
How does it work?
Shrinks the counter-example (no need to define shrink
)
Generalizes values in the counter-example (extrapolating values)
Generalizes constructors when the datatype has a finite number of constructors (extrapolating constructors)
Feat
Feat (Functional Enumeration of Algebraic Types) provides enumerations as functions from natural numbers to values.
These enumerations can be automatically generated for any algebraic datatype using template Haskell.
-- use template haskell to generate an instance of Enumerable
deriveEnumerable ''Tree
SmallCheck and QuickCheck generators can be derived from these enumerations.
-- create a uniform distribution from an Enumerable
instance (Enumerable a, Arbitrary a) => Arbitrary (Tree a) where
arbitrary = sized uniform
What are the benefits?
- Free uniform generators for QuickCheck
- Particular useful for large abstract syntax trees (less terms of size n than of depth n)
Feat helped find bugs in template-haskell
and haskell-src-exts
.
Conclusion
- Real-size examples (QuviQ [http://www.quviq.com])
"QuickCheck forced me to think about the properties of the library more closely - I changed some design decisions after it turned out that the properties disagreed with me. The one thing QuickCheck helped with more than anything though was refactoring despite a massive number of the functions all depending on each other, QuickCheck allows me to change the behaviour of one function in some obscure case and check that no other function was relying on that."
Re-implementations of QuickCheck for other languages (Python, Java)
A lot of new work in this domain (e.g., Feat with constraints, QuickSpec)
My own experience with QuickCheck and SmallCheck:
Java code generator rewritten in Haskell : found a lot of corner cases we did not find in our previous Java implementation
Implementation of Feat in the Tom language for testing Java code (both QuickCheck and SmallCheck generation strategies)
Testing Functions in QuickCheck
We can define Tree as an instance of the Functor class:
instance Functor Tree where
fmap f (Fork t u) = Fork (fmap f t) (fmap f u)
fmap f (Leaf x) = Leaf (f x)
And now we check the functor laws with QuickCheck:
law1 :: Tree Int -> Bool
law1 t = fmap id t == t
law2 :: (Int -> Int) -> (Int -> Int) -> Tree Int -> Bool
law2 f g t = fmap (f . g) t == fmap f (fmap g t)
main = do
quickCheck law1
quickCheck law2
Space, Right Arrow or swipe left to move to next slide, click help below for more details