.Rd
file to satisfy CRAN policies.The explain()
function now returns a matrix, as opposed to a tibble, which makes more sense since Shapley values values are ALWAYS numeric; data frames (and tibbles’s) are really only necessary when the data are heterogeneous. In essence, the output from explain()
will act like an R matrix but with class structure c("explain", "matrix", "array")
; you could always convert the results to a tibble using tibble::as_tibble(result)
.
Two new data sets, titanic
and titanic_mice
, were added to the package; see the corresponding help pages for details.
The plotting functions have all been deprecated in favor of the (far superior) shapviz package by @Mayer79 (grid.arrange()
is also no longer imported from gridExtra). Consequently, the output from explain()
no longer needs to have its own "explain"
class (only an ordinary c("matrix", "array")
object is returned).
The explain()
function gained three new arguments:
baseline
, which defaults to NULL
, containing the baseline to use when adjusting Shapley values to meet the efficiency property. If NULL
and adjust = TRUE
, it will default to the average training prediction (i.e., the average prediction over X
.)
shap_only
, which defaults to TRUE
, determines whether to return a matrix of Shapley values (TRUE
) containing the baseline as aanattribute or a list containing the Shapley values, corresponding feature values, and baseline (FALSE
); setting to FALSE
is a convenience when using the shapviz package.
parallel
, which defaults to FALSE
for determining whether or not to compute Shapley values in parallel (across features) using any suitable parallel backend supported by foreach.
The X
and newdata
arguments of explain()
should now work with tibble (#20).
Minor change to explain.lgb.Booster()
to support breaking changes in lightgbm v4.0.0. (Thanks to @jameslamb and @Mayer79.)
The dependency on matrixStats has been removed in favor of using R’s internal apply()
and var()
functions.
The dependency on plyr, which has been retired, has been removed in favor of using foreach directly.
Removed CXX_STD=CXX11 flag, so increased R dependency to R >= 3.6.
slowtests/
directory (for now).The force_plot()
function should now be compatible with shap (>=0.36.0); thanks to @hfshr and @jbwoillard for reporting (#12).
Fixed minor name repair issue caused by tibble.
explain()
should now be MUCH faster at explaining a single observation, especially when nsim
is relatively large (e.g., nsim >= 1000
).The default method of explain()
gained a new logical argument called adjust
. When adjust = TRUE
(and nsim > 1
), the algorithm will adjust the sum of the estimated Shapley values to satisfy the efficiency property; that is, to equal the difference between the model’s prediction for that sample and the average prediction over all the training data. This option is experimental and we follow the same approach as in shap (#6).
New (experimental) function for constructing force plots (#7) to help visualize prediction explanations. The function is also a generic which means additional methods can be added.
Function explain()
became a generic and gained a new logical argument, exact
, for computing exact Shapley contributions for linear models (Linear SHAP, which assumes independent features) and boosted decision trees (Tree SHAP). Currently, only "lm"
, "glm"
, and "xgb.Booster"
objects are supported (#2)(#3).
Minor improvements to package documentation.
Removed unnecessary legend from contribution plots.
Tweak imports (in particular, use @importFrom Rcpp sourceCpp
tag).
Fixed a typo in the package description; Shapley was misspelled as Shapely (fixed by Dirk Eddelbuettel in (#1)).
You can now specify type = "contribution"
in the call to autoplot.fastshap()
to plot the explanation for a single instance (controlled by the row_num
argument).
autoplot.fastshap()
gained some useful new arguments:
color_by
for specifying an additional feature to color by for dependence plots (i.e., whenever type = "dependence"
);
smooth
, smooth_color
, smooth_linetype
, smooth_size
, and smooth_alpha
for adding/controlling a smoother in dependence plots (i.e., whenever type = "dependence"
).
...
which can be used to pass on additional parameters to geom_col()
(when type = "importance"
) or geom_point()
(when type = "dependence"
).
Function fastshap()
was renamed to explain()
.
Functions explain()
and explain_column()
(not currently exported) now throw an error whenever the inputs X
and newdata
do not inherit from the same class.
Fixed a bug in the C++ source that gave more weight to extreme permutations.
Fixed a bug in the C++ source that caused doubles to be incorrectly converted to integers.
Fixed a bug in autoplot.fastshap()
when type = "importance"
; in particular, the function incorrectly used sum(|Shapley value|)
instead of mean(|Shapley value|)
.