This vignette shows the basic workflow of using
SHAPforxgboost
for interpretation of models trained with
XGBoost
, a hightly efficient gradient boosting
implementation (Chen and Guestrin 2016).
library("ggplot2")
library("SHAPforxgboost")
library("xgboost")
set.seed(9375)
Let’s train a small model to predict the first column in the iris
data set, namely Sepal.Length
.
head(iris)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
<- data.matrix(iris[, -1])
X <- xgb.DMatrix(X, label = iris[[1]])
dtrain
<- xgb.train(
fit params = list(
objective = "reg:squarederror",
learning_rate = 0.1
), data = dtrain,
nrounds = 50
)
Now, we can prepare the SHAP values and analyze the results. All this in just very few lines of code!
# Crunch SHAP values
<- shap.prep(fit, X_train = X)
shap
# SHAP importance plot
shap.plot.summary(shap)
# Alternatively, mean absolute SHAP values
shap.plot.summary(shap, kind = "bar")
# Dependence plots in decreasing order of importance
# (colored by strongest interacting variable)
for (x in shap.importance(shap, names_only = TRUE)) {
<- shap.plot.dependence(
p
shap, x = x,
color_feature = "auto",
smooth = FALSE,
jitter_width = 0.01,
alpha = 0.4
+
) ggtitle(x)
print(p)
}
Note: print
is required only in the context of using
ggplot
in rmarkdown
and for loop.
This is just a teaser: SHAPforxgboost
can do much more!
Check out the README for much more information.