Please keep in mind, that DALEXtra now supports usage of keras models imported from Python with dedicated explain_keras
function. It is recommended to use it.
DALEX is designed to work with various black-box models like tree ensembles, linear models, neural networks etc. Unfortunately R packages that create such models are very inconsistent. Different tools use different interfaces to train, validate and use models. Fortunately DALEX can handle it all easily.
In this vignette we will show explanations for Multi Layer Perceptron (MLP) trained with h2o and keras packages.
library(dplyr)
library(DALEX)
library(DALEXtra)
library(keras)
library(titanic)
library(fastDummies)
library(h2o)
set.seed(123)
To illustrate applications of DALEX to classification problems we will use the titanic_train
dataset available in the titanic
package. Our goal is to predict the probability that the person will survive the catastrophe based on selected features such as cabin class, sex, age, number of family members on the ship, fare and embarkation. In both packages we will try to build a model with same architecture and inputs.
In the begining we will prepare and clean the data. The first difference between keras
and h2o
is that in keras
predictors and exaplined variable have to be specified as separate numeric tensors. This means that if we want to insert a factor into keras
model we have to encode it first. We will do one-hot encoding using dummy_cols
function from fastDummies
package (for both, h2o
and keras
, so the number and type of inputs will be identical).
# Data preparation and cleaning
<- titanic_train %>%
titanic_small select(Survived, Pclass, Sex, Age, SibSp, Parch, Fare, Embarked) %>%
mutate_at(c("Survived", "Sex", "Embarked"), as.factor) %>%
mutate(Family_members = SibSp + Parch) %>% # Calculate family members
na.omit() %>%
dummy_cols() %>%
select(-Sex, -Embarked, -Survived_0, -Survived_1, -Parch, -SibSp)
print(head(titanic_small))
## Survived Pclass Age Fare Family_members Sex_female Sex_male Embarked_
## 1 0 3 22 7.2500 1 0 1 0
## 2 1 1 38 71.2833 1 1 0 0
## 3 1 3 26 7.9250 0 1 0 0
## 4 1 1 35 53.1000 1 1 0 0
## 5 0 3 35 8.0500 0 0 1 0
## 6 0 1 54 51.8625 0 0 1 0
## Embarked_C Embarked_Q Embarked_S
## 1 0 0 1
## 2 1 0 0
## 3 0 0 1
## 4 0 0 1
## 5 0 0 1
## 6 0 0 1
# Data preprocessing for Keras
<- titanic_small %>%
titanic_small_y select(Survived) %>%
mutate(Survived = as.numeric(as.character(Survived))) %>%
as.matrix()
<- titanic_small %>%
titanic_small_x select(-Survived) %>%
as.matrix()
We can build MLP model in h2o
using h2o.deeplearning
function. To do this w need to first initialize h2o
and we need to convert titanic_small
to H2OFrame
.
h2o.init()
h2o.no_progress()
<- as.h2o(titanic_small, destination_frame = "titanic_small")
titanic_h2o
<- h2o.deeplearning(x = 2:11,
model_h2o y = 1,
training_frame = "titanic_small",
activation = "Rectifier", # ReLU as activation functions
hidden = c(16, 8), # Two hidden layers with 16 and 8 neurons
epochs = 100,
rate = 0.01, # Learning rate
adaptive_rate = FALSE, # Simple SGD
loss = "CrossEntropy")
To build neural network in keras
we have to stack layers, in case of MLP we will use layer_dense
.
<- keras_model_sequential() %>% # Initialization
model_keras layer_dense(units = 16, # 16 neurons in first hidden layer
activation = "relu", # ReLU as activation function
input_shape = c(10)) %>% # Ten inputs
layer_dense(units = 8, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
%>% compile(
model_keras optimizer = optimizer_sgd(lr = 0.01), # Simple SGD with learning rate 0.01
loss = "binary_crossentropy",
metrics = c("accuracy")
)
<- model_keras %>% fit(
history
titanic_small_x,
titanic_small_y,epochs = 40,
validation_split = 0.2
)
We can now check predictions from both models. Remember that h2o
and keras
are using different implementations of same algorithms and there is a lot of other, often randomized parameters like initial weights values, so to get exactly the same results you would have to consider all of them.
<- data.frame(
henry Pclass = 1,
Age = 8,
Fare = 72,
Family_members = 0,
Sex_male = 1,
Sex_female = 0,
Embarked_S = 0,
Embarked_C = 1,
Embarked_Q = 0,
Embarked_ = 0
)
<- as.h2o(henry, destination_frame = "henry")
henry_h2o <- as.matrix(henry)
henry_keras predict(model_h2o, henry_h2o) %>% print()
predict(model_keras, henry_keras) %>% print()
The first step of using the DALEX package is to wrap-up the black-box model with meta-data that unifies model interfacing.
To create an explainer we use explain()
function. For the models created by h2o package we will use the explain_h2o
funtion from DALEXtra
package.
<- DALEXtra::explain_h2o(model = model_h2o,
explainer_titanic_h2o data = titanic_small[ , -1],
y = as.numeric(as.character(titanic_small$Survived)),
label = "MLP_h2o",
colorize = FALSE)
<- DALEX::explain(model = model_keras,
explainer_titanic_keras data = titanic_small_x,
y = as.numeric(titanic_small_y),
type = "classification",
label = "MLP_keras",
colorize = FALSE)
Function model_performance()
calculates predictions and residuals for validation dataset.
<- model_performance(explainer_titanic_h2o)
mp_titinic_h2o <- model_performance(explainer_titanic_keras) mp_titanic_keras
Generic function print()
returns quantiles for residuals.
mp_titinic_h2o
## Measures for: classification
## recall : 0.7068966
## precision : 0.8874459
## f1 : 0.7869482
## accuracy : 0.8445378
## auc : 0.8983897
##
## Residuals:
## 0% 10% 20% 30% 40%
## -9.954327e-01 -3.220674e-01 -1.592363e-01 -1.109982e-01 -7.308569e-02
## 50% 60% 70% 80% 90%
## -2.791604e-02 2.880694e-06 2.809896e-03 3.623441e-02 5.488800e-01
## 100%
## 9.904999e-01
mp_titanic_keras
## Measures for: classification
## recall : 0.4689655
## precision : 0.7272727
## f1 : 0.5702306
## accuracy : 0.7128852
## auc : 0.7532775
##
## Residuals:
## 0% 10% 20% 30% 40% 50% 60%
## -0.8265051 -0.4426317 -0.3477609 -0.3010883 -0.2615811 -0.2057302 0.1789826
## 70% 80% 90% 100%
## 0.4714622 0.5191331 0.6694979 0.9073019
Generic function plot()
shows reversed empirical cumulative distribution function for absolute values from residuals. Plots can be generated for one or more models.
plot(mp_titinic_h2o, mp_titanic_keras)
We are also able to use the plot()
function to get an alternative comparison of residuals. Setting the geom = "boxplot"
parameter we can compare the distribution of residuals for selected models.
plot(mp_titinic_h2o, mp_titanic_keras, geom = "boxplot")
Using he DALEX package we are able to better understand which variables are important.
Model agnostic variable importance is calculated by means of permutations. We simply substract the loss function calculated for validation dataset with permuted values for a single variable from the loss function calculated for validation dataset.
This method is implemented in the model_parts()
function.
<- model_parts(explainer_titanic_h2o)
vi_titinic_h2o <- model_parts(explainer_titanic_keras)
vi_titinic_keras
# We can compare all models using the generic plot() function.
plot(vi_titinic_h2o, vi_titinic_keras)
Length of the interval coresponds to a variable importance. Longer interval means larger loss, so the variable is more important.
For better comparison of the models we can hook the variabe importance at 0 using the type="difference"
.
<- model_parts(explainer_titanic_h2o, type="difference")
vi_titinic_h2o <- model_parts(explainer_titanic_keras, type="difference")
vi_titinic_keras plot(vi_titinic_h2o, vi_titinic_keras)
As previously we create explainers which are designed to better understand the relation between a variable and model output: PDP plots and ALE plots.
Partial Dependence Plots (PDP) are one of the most popular methods for exploration of the relation between a continuous variable and the model outcome.
<- model_profile(explainer_titanic_h2o, variable = "Age")
mp_age_h2o <- model_profile(explainer_titanic_keras, variable = "Age")
mp_age_keras plot(mp_age_h2o, mp_age_keras)
Acumulated Local Effects (ALE) plot is the extension of PDP, that is more suited for highly correlated variables.
<- model_profile(explainer_titanic_h2o, variable = "Age", type = "accumulated")
mp_age_h2o <- model_profile(explainer_titanic_keras, variable = "Age", type = "accumulated")
mp_age_keras plot(mp_age_h2o, mp_age_keras)
The function predict_parts()
is a wrapper around a breakDown package. Model prediction is visualized with Break Down Plots, which show the contribution of every variable present in the model. Function predict_parts()
generates variable attributions for selected prediction. The generic plot()
function shows these attributions.
<- predict_parts(explainer_titanic_h2o, henry, type = "break_down")
pp_h2o <- predict_parts(explainer_titanic_keras, henry_keras, type = "break_down")
pp_keras
plot(pp_h2o)
plot(pp_keras)
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] h2o_3.32.0.1 fastDummies_1.6.3 titanic_0.1.0 keras_2.3.0.0
## [5] DALEXtra_2.0 DALEX_2.0.1 dplyr_1.0.2
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.5 pillar_1.4.6 compiler_4.0.2 ingredients_2.0
## [5] bitops_1.0-6 base64enc_0.1-3 tools_4.0.2 bit_4.0.4
## [9] zeallot_0.1.0 digest_0.6.25 jsonlite_1.7.1 evaluate_0.14
## [13] lifecycle_0.2.0 tibble_3.0.3 gtable_0.3.0 lattice_0.20-41
## [17] pkgconfig_2.0.3 rlang_0.4.7 Matrix_1.2-18 yaml_2.2.1
## [21] xfun_0.19 stringr_1.4.0 knitr_1.30 generics_0.1.0
## [25] vctrs_0.3.4 rappdirs_0.3.1 bit64_4.0.5 grid_4.0.2
## [29] tidyselect_1.1.0 data.table_1.13.4 reticulate_1.18 glue_1.4.2
## [33] R6_2.4.1 iBreakDown_1.3.1 rmarkdown_2.6 farver_2.0.3
## [37] whisker_0.4 ggplot2_3.3.2 purrr_0.3.4 magrittr_2.0.1
## [41] codetools_0.2-16 tfruns_1.4 scales_1.1.1 ellipsis_0.3.1
## [45] htmltools_0.5.0 colorspace_1.4-1 labeling_0.3 tensorflow_2.2.0
## [49] stringi_1.5.3 RCurl_1.98-1.2 munsell_0.5.0 crayon_1.3.4