Classifier Calibration
A classifier is "calibrated" when the predicted probability of a class matches the expected frequency of that class. mlr can visualize this by plotting estimated class probabilities (which are discretized) against the observed frequency of said class in the data using generateCalibrationData and plotCalibration.
generateCalibrationData takes as input Prediction, ResampleResult, BenchmarkResult,
or a named list of Prediction or ResampleResult objects on a classification (multiclass
or binary) task with learner(s) that are capable of outputting probabiliites (i.e., learners
must be constructed with predict.type = "prob"
). The result is an object of class
CalibrationData which has elements proportion
, data
, and
task
. proportion
gives the proportion of observations labelled with a given class for
each predicted probability bin (e.g., for observations which are predicted to have class "A"
with probability , what is the proportion of said observations which have class "A"?).
lrn = makeLearner("classif.rpart", predict.type = "prob")
mod = train(lrn, task = sonar.task)
pred = predict(mod, task = sonar.task)
cal = generateCalibrationData(pred)
cal$proportion
#> Learner bin Class Proportion
#> 1 prediction (0.1,0.2] M 0.1060606
#> 2 prediction (0.7,0.8] M 0.7333333
#> 3 prediction [0,0.1] M 0.0000000
#> 4 prediction (0.9,1] M 0.9333333
#> 5 prediction (0.2,0.3] M 0.2727273
#> 6 prediction (0.4,0.5] M 0.4615385
#> 7 prediction (0.8,0.9] M 0.0000000
#> 8 prediction (0.5,0.6] M 0.0000000
The manner in which the predicted probabilities are discretized is controlled by two arguments:
breaks
and groups
. By default breaks = "Sturges"
which uses the Sturges algorithm in
hist. This argument can specify other algorithms available in
hist, it can be a numeric vector specifying breakpoints for cut,
or a single integer specifying the number of bins to create (which are evenly spaced).
Alternatively, groups
can be set to a positive integer value (by default groups = NULL
)
in which case cut2 is used to create bins with an approximately equal number
of observations in each bin.
cal = generateCalibrationData(pred, groups = 3)
cal$proportion
#> Learner bin Class Proportion
#> 1 prediction [0.000,0.267) M 0.08860759
#> 2 prediction [0.267,0.925) M 0.51282051
#> 3 prediction [0.925,1.000] M 0.93333333
CalibrationData objects can be plotted using plotCalibration. plotCalibration by default plots a reference line which shows perfect calibration and a "rag" plot, which is a rug plot on the top and bottom of the graph, where the top pertains to "positive" cases, where the predicted class matches the observed class, and the bottom pertains to "negative" cases, where the predicted class does not match the observed class. Perfect classifier performance would result in all the positive cases clustering in the top right (i.e., the correct classes are predicted with high probability) and the negative cases clustering in the bottom left.
plotCalibration(cal)
Because of the discretization of the probabilities, sometimes it is advantageous to smooth
the calibration plot. Though smooth = FALSE
by default, setting this option to TRUE
replaces the estimated proportions with a loess smoother.
cal = generateCalibrationData(pred)
plotCalibration(cal, smooth = TRUE)
All of the above functionality works with multi-class classification as well.
lrns = list(
makeLearner("classif.randomForest", predict.type = "prob"),
makeLearner("classif.nnet", predict.type = "prob", trace = FALSE)
)
mod = lapply(lrns, train, task = iris.task)
pred = lapply(mod, predict, task = iris.task)
names(pred) = c("randomForest", "nnet")
cal = generateCalibrationData(pred, breaks = c(0, .3, .6, 1))
plotCalibration(cal)