bootstrap_performance()
allows you to calculate
confidence intervals for the model performance from a single train/test
split by bootstrapping the test set (#329, @kelly-sovacool).calc_balanced_precision()
allows you to calculate
balanced precision and balanced area under the precision-recall curve
(#333, @kelly-sovacool).find_feature_importance()
(#326,
@kelly-sovacool).
names
to feat
to
represent each feature or group of correlated features.lower
and upper
to report the
bounds of the empirical 95% confidence interval from the permutation
test. See vignette('parallel')
for an example of plotting
feature importance with confidence intervals.parallel
vignette (#310, @kelly-sovacool).parRF
, a parallel implementation of the
rf
method, with the same default hyperparameters as
rf
set automatically (#306, @kelly-sovacool).calc_model_sensspec()
- calculate sensitivity,
specificity, and precision for a model.calc_mean_roc()
& plot_mean_roc()
-
calculate & plot specificity and mean sensitivity for multiple
models.calc_mean_prc()
& plot_mean_prc()
-
calculate & plot recall and mean precision for multiple models.run_ml()
are now forwarded to
caret::train()
(#304, @kelly-sovacool).
weights
) to caret::train()
, allowing
greater flexibility.compare_models()
compares the performance
of two models with a permutation test (#295, @courtneyarmour).cv_times
did not affect the reported
repeats for cross-validation (#291, @kelly-sovacool).This minor patch fixes a test failure on platforms with no long doubles. The actual package code remains unchanged.
kfold >= length(groups)
(#285, @kelly-sovacool).
kfold
<= the number of
groups in the training set. Previously, an error was thrown if this
condition was not met. Now, if there are not enough groups in the
training set for groups to be kept together during CV, groups are
allowed to be split up across CV partitions.cross_val
added to run_ml()
allows users to define their own custom cross-validation scheme (#278,
@kelly-sovacool).
calculate_performance
, which
controls whether performance metrics are calculated (default:
TRUE
). Users may wish to skip performance calculations when
training models with no cross-validation.group_partitions
added to
run_ml()
allows users to control which groups should go to
which partition of the train/test split (#281, @kelly-sovacool).training_frac
parameter in
run_ml()
(#281, @kelly-sovacool).
training_frac
is a fraction between 0 and 1
that specifies how much of the dataset should be used in the training
fraction of the train/test split.training_frac
a vector of
indices that correspond to which rows of the dataset should go in the
training fraction of the train/test split. This gives users direct
control over exactly which observations are in the training fraction if
desired.group_correlated_features()
is now a user-facing
function.stats::cor
with the corr_method
parameter:
get_feature_importance(corr_method = "pearson")
preprocess_data()
converted the
outcome column to a character vector (#273, @kelly-sovacool, @ecmaggioncalda).preprocess_data()
:
prefilter_threshold
(#240, @kelly-sovacool, @courtneyarmour).
prefilter_threshold
or fewer rows in the data.remove_singleton_columns()
called by
preprocess_data()
to carry this out.get_feature_importance()
:
groups
(#246, @kelly-sovacool).
groups
is NULL
by default; in this case,
correlated features above corr_thresh
are grouped
together.preprocess_data()
now replaces spaces in the outcome
column with underscores (#247, @kelly-sovacool, @JonnyTran).preprocess_data()
and
get_feature_importance()
using the progressr
package (#257, @kelly-sovacool, @JonnyTran, @FedericoComoglio).stringsAsFactors
behavior.rpart
from Suggests to Imports for consistency
with other packages used during model training.This is the first release version of mikropml! 🎉
NEWS.md
file to track changes to the
package.run_ml()
preprocess_data()
plot_model_performance()
plot_hp_performance()
run_ml()
:
glmnet
: logistic and linear regressionrf
: random forestrpart2
: decision treessvmRadial
: support vector machinesxgbTree
: gradient-boosted trees