sgboost
is an implementation of the sparse-group
boosting 1 to be used in conjunction with the
R-package mboost
. In the fitting process, a formula object
defining group base-learners and individual base-learners is used.
Regularization is based on the degrees of freedom of an individual
base-learners \(df(\lambda)\) and group
base-learners \(df(\lambda^{(g)})\),
such that \(df(\lambda) = \alpha\) and
\(df(\lambda^{(g)}) = 1- \alpha\).
Sparse-group boosting serves as an alternative method to sparse-group
lasso, employing boosted Ridge regression.
You can install the released version of sgboost from CRAN with:
You can install the development version of sgboost from GitHub with:
Vignettes are not included in the package by default. If you want to include vignettes, then use this modified command:
We use the same simulated data and model structure as in the vignette
of sparsegl
(McDonald, Liang , Heinsfeld, 2023)2. Randomly
generate a \(n\times p\) design matrix
X
.Randomly generate an \(n \times
p\) design matrix X. For the real-valued vector y, the following
two settings are being used:
where the coefficient vector \(\beta^*\) is specified as below, and the white noise \(\epsilon\) follows a standard normal distribution.
set.seed(10)
n <- 100
p <- 200
X <- matrix(data = rnorm(n * p, mean = 0, sd = 1), nrow = n, ncol = p)
beta_star <- c(
rep(5, 5), c(5, -5, 2, 0, 0), rep(-5, 5),
c(2, -3, 8, 0, 0), rep(0, (p - 20))
)
groups <- rep(1:(p / 5), each = 5)
# Linear regression model
eps <- rnorm(n, mean = 0, sd = 1)
y <- X %*% beta_star + eps
# Logistic regression model
pr <- 1 / (1 + exp(-X %*% beta_star))
y_binary <- as.factor(rbinom(n, 1, pr))
# Input data.frames
df <- X %>%
as.data.frame() %>%
mutate_all(function(x) {
as.numeric(scale(x))
}) %>%
mutate(y = as.numeric(y), y_binary = y_binary)
group_df <- data.frame(
group_name = groups,
variable_name = head(colnames(df), -2)
)
The sparse-group-boosting problem is formulated as the sum of mean squared error (linear regression) or logistic loss (logistic regression) within each boosting iteration:
Linear regression:
Logistic regression:
\(df(\lambda^{(g)}_j) = \alpha\)
\(df(\lambda^{g}) = 1-\alpha\)
\(df(\lambda) = tr(2H_\lambda-H_\lambda^TH_\lambda)\)
\(H\) is the Ridge Hat matrix of a base-learner
\(X^{(g)}\) is the submatrix of \(X\) with columns corresponding to the features in group \(g\).
\(\beta^{(g)}\) is the corresponding coefficients of the features in group \(g\).
\(p_g\) is the number of predictors in group \(g\).
\(\alpha\) adjusts the weight between group and individual base-learners.
\(\lambda^{(g)}, \lambda^{(g)}_j\) Ridge penalty parameters.
\(u\) is the negative gradient vector from the previous boosting iteration.
To estimate, tune and interpret a sparse-group boosting model, the following workflow is advised:
We start by creating the formula that describes the sparse-group
boosting optimization problem, as stated above. We pass three central
parameters to create_formula
.
alpha
: Mixing parameter between zero and one used for
the convex combination of the degrees of freedom.group_df
: data.frame containing the group structure to
be used.group_name
: The name of the column in
group_df
indicates the group structurevar_name
: The name of the column in
group_df
indicating the name of the variables in the
modelling data.frame to be used. Note that not all variables present on
the modelling data.frame have to be in group_df. These could be ID,
timestamp variables or additional information one does not want to use
in the model.outcome_name
: Name of the outcome variable in the
modelling data.frameintercept
: Should an intercept be used for all
base-learners? If the data is not centered, one should include use
intercept = TRUE
. Note that in this case, individual
base-learners will be groups of size two, and the group size of group
base-learners increases by one.sgb_formula_linear <- create_formula(
alpha = 0.4, group_df = group_df, outcome_name = "y", intercept = FALSE,
group_name = "group_name", var_name = "variable_name",
)
#> Warning in create_formula(alpha = 0.4, group_df = group_df, outcome_name = "y", : there is a group containing only one variable.
#> It will be treated as individual variable and as group
sgb_formula_binary <- create_formula(
alpha = 0.4, group_df = group_df, outcome_name = "y_binary", intercept = FALSE,
group_name = "group_name", var_name = "variable_name",
)
#> Warning in create_formula(alpha = 0.4, group_df = group_df, outcome_name = "y_binary", : there is a group containing only one variable.
#> It will be treated as individual variable and as group
Pass the formula to mboost
and use the arguments as
seems appropriate. The main hyperparameters are nu
and
mstop
. For model tuning, the mboost
function
cvrisk
can be used and plotted. cvrisk
may be
slow to run. One can run it in parallel to speed up.
sgb_model_linear <- mboost(
formula = sgb_formula_linear, data = df,
control = boost_control(nu = 1, mstop = 600)
)
# cv_sgb_model_linear <- cvrisk(sgb_model_linear,
# folds = cv(model.weights(sgb_model_linear),
# type = 'kfold', B = 10))
sgb_model_binary <- mboost(
formula = sgb_formula_binary, data = df, family = Binomial(),
control = boost_control(nu = 1, mstop = 600)
)
# cv_sgb_model_binary <- cvrisk(sgb_model_binary,
# folds = cv(model.weights(sgb_model_linear),
# type = 'kfold', B = 10))
# mstop(cv_sgb_model_linear)
# mstop(cv_sgb_model_binary)
# plot(cv_sgb_model_linear)
# plot(cv_sgb_model_binary)
sgb_model_linear <- sgb_model_linear[320]
sgb_model_binary <- sgb_model_binary[540]
In this example, the lowest out-of-sample risk is obtained at 320 (linear) and 540 (logistic) boosting iterations.
sgboost
has useful functions to understand sparse-group
boosting models and reflects that final model estimates of a specific
variable in the dataset can be attributed to group base-learners as well
as individual baseleraners depending on the boosting iteration
A good starting point for understanding a sparse-group boosting model
is the variable importance. In the context of boosting the variable
importance can be defined as the relative contribution of each predictor
to the overall reduction of the loss function (negative log likelihood).
get_varimp
returns the variable importance of each
base-learner/predictor selected in throughout the boosting process. Note
that in the case of the selection of a group and an individual variable
- call it \(x_1\) - from the same group
-\(x_1, x_2, ... x_p\) -, both
base-learners (predictors) will have an associated variable importance
as defined before. This allows us to differentiate between the
individual contribution of \(x_1\) its
own variable and the contribution of the group \(x_1\) is part of as a group construct. It
is not possible to compute the aggregated variable importance of \(x_1\) as it is not clear how much \(x_1\) contributes to the group. However,
the aggregated coefficients can be computed using get_coef
.
Also, the aggregated importance of all groups vs. all individual
variables are returned in a separate data.frame. With
plot_varimp
one can visualize the variable importance as a
barplot. One can indicate the maximum number of predictors to be printed
through n_predictors
or the minimal variable importance
that a predictor has to have though prop
. Through both
parameters the number of printed entries can be reduced. Note that in
this case, the relative importance of groups in the legend is based only
on the plotted variables and not the ones removed. To add information
about the direction of effect sizes, one could add arrows behind the
bars3. To
do this for the groups, one can use the aggregated coefficients from
get_coef
.
get_varimp(sgb_model = sgb_model_linear)$varimp %>% slice(1:10)
#> # A tibble: 10 × 6
#> reduction blearner predictor selfreq type relative_importance
#> <dbl> <chr> <chr> <dbl> <chr> <dbl>
#> 1 137. bols(V1, V2, V3, V4, V… V1, V2, … 0.162 group 0.300
#> 2 132. bols(V18, intercept = … V18 0.0125 indi… 0.291
#> 3 106. bols(V11, V12, V13, V1… V11, V12… 0.178 group 0.232
#> 4 19.1 bols(V7, intercept = F… V7 0.0625 indi… 0.0419
#> 5 18.1 bols(V6, intercept = F… V6 0.0656 indi… 0.0397
#> 6 16.1 bols(V14, intercept = … V14 0.00312 indi… 0.0354
#> 7 6.87 bols(V17, intercept = … V17 0.05 indi… 0.0151
#> 8 6.27 bols(V16, intercept = … V16 0.025 indi… 0.0138
#> 9 4.27 bols(V149, intercept =… V149 0.025 indi… 0.00939
#> 10 3.54 bols(V8, intercept = F… V8 0.0312 indi… 0.00779
The resulting coefficients can be retrieved through
get_coef
. In a sparse-group boosting models, a variable in
a dataset can be selected as an individual variable or as a group.
Therefore, there can be two associated effect sizes for the same
variable. This function aggregates both and returns them in a data.frame
sorted by the effect size 'effect'
.
get_coef(sgb_model = sgb_model_linear)$aggregate %>% slice(1:10)
#> # A tibble: 10 × 4
#> variable effect blearner predictor
#> <chr> <dbl> <chr> <chr>
#> 1 V18 7.80 bols(V18, intercept = FALSE, df = 0.4) V18
#> 2 V15 -5.87 bols(V11, V12, V13, V14, V15, intercept = FALSE, d… V11, V12…
#> 3 V5 5.42 bols(V1, V2, V3, V4, V5, intercept = FALSE, df = 0… V1, V2, …
#> 4 V4 5.15 bols(V1, V2, V3, V4, V5, intercept = FALSE, df = 0… V1, V2, …
#> 5 V11 -4.99 bols(V11, V12, V13, V14, V15, intercept = FALSE, d… V11, V12…
#> 6 V13 -4.97 bols(V11, V12, V13, V14, V15, intercept = FALSE, d… V11, V12…
#> 7 V14 -4.93 bols(V11, V12, V13, V14, V15, intercept = FALSE, d… V11, V12…
#> 8 V6 4.92 bols(V6, intercept = FALSE, df = 0.4); bols(V6, V7… V6; V6, …
#> 9 V7 -4.89 bols(V7, intercept = FALSE, df = 0.4); bols(V6, V7… V7; V6, …
#> 10 V2 4.87 bols(V1, V2, V3, V4, V5, intercept = FALSE, df = 0… V1, V2, …
get_coef(sgb_model = sgb_model_linear)$raw %>% slice(1:10)
#> # A tibble: 10 × 5
#> variable effect blearner predictor type
#> <chr> <dbl> <chr> <chr> <chr>
#> 1 V18 7.80 bols(V18, intercept = FALSE, df = 0.4) V18 indi…
#> 2 V5 5.42 bols(V1, V2, V3, V4, V5, intercept = FALSE, … V1, V2, … group
#> 3 V15 -5.39 bols(V11, V12, V13, V14, V15, intercept = FA… V11, V12… group
#> 4 V4 5.08 bols(V1, V2, V3, V4, V5, intercept = FALSE, … V1, V2, … group
#> 5 V11 -4.94 bols(V11, V12, V13, V14, V15, intercept = FA… V11, V12… group
#> 6 V6 4.92 bols(V6, intercept = FALSE, df = 0.4) V6 indi…
#> 7 V7 -4.88 bols(V7, intercept = FALSE, df = 0.4) V7 indi…
#> 8 V2 4.84 bols(V1, V2, V3, V4, V5, intercept = FALSE, … V1, V2, … group
#> 9 V3 4.49 bols(V1, V2, V3, V4, V5, intercept = FALSE, … V1, V2, … group
#> 10 V13 -4.48 bols(V11, V12, V13, V14, V15, intercept = FA… V11, V12… group
With plot_effects
we can plot the effect sizes of the
sparse-group boosting model in relation to the relative importance to
get an overall picture of the model. Through the parameter
'plot_type'
one can choose the type of visualization.
'radar'
refers to a radar plot using polar coordinates.
Here the angle is relative to the cumulative relative importance of
predictors and the radius is proportional to the effect size.
'clock'
does the same as 'radar'
but uses
clock coordinates instead of polar coordinates. 'scatter'
uses the effect size as y-coordinate and the cumulative relative
importance as x-axis in a classical Scatter plot.
plot_effects(sgb_model = sgb_model_binary, n_predictors = 10, base_size = 10)
#> 31 predictors were removed. Use prop or n_predictors to change
plot_path
calls get_coef_path
to retrieve
the aggregated coefficients from a mboost
object for each
boosting iteration and plots it, while indicating if a coefficient was
updated by an individual variable or group.
Obster F & Heumann C, (2024). Sparse-group boosting – Unbiased group and variable selection. https://doi.org/10.48550/arXiv.2206.06344↩︎
McDonald D, Liang X, Heinsfeld A, Cohen A, Yang Y, Zou H, Friedman J, Hastie T, Tibshirani R, Narasimhan B, Tay K, Simon N, Qian J & Yang J. (2022). Getting started with sparsegl. https://CRAN.R-project.org/package=sparsegl.↩︎
Obster F., Bohle H. & Pechan P.M. (2024). The financial well-being of fruit farmers in Chile and Tunisia depends more on social and geographical factors than on climate change. Commun Earth Environ 5, 16. https://doi.org/10.1038/s43247-023-01128-2.↩︎