nlpred
Small-sample optimized estimators of cross-validated prediction metrics
nlpred
is an R package for computing estimates of
cross-validated prediction metrics. These estimates are tailored for
superior performance in small samples. Several estimators are available
including ones based cross-validated targeted minimum loss-based
estimation, estimating equations, and one-step estimation.
For standard use, we recommend installing the package from CRAN via
install.packages("nlpred")
You can install the current release of nlpred
from
GitHub via devtools
with:
::install_github("benkeser/nlpred") devtools
The main functions in the package are cv_auc
and
cv_scrnp
, which are used to compute, respectively, the
K
-fold cross-validated
area under the receiver operating characteristics curve (CVAUC) and
the K
-fold cross-validated sensitivity
constrained rate of negative prediction. However, rather than using
standard cross-validation estimators (where prediction algorithms are
developed in a training sample and AUC/SCRNP estimated using the
validation sample), we instead use techniques from efficiency theory to
estimate these quantities. This allows us to use the training data both
to develop the prediction algorithm, as well as key nuisance
parameters needed to evaluate AUC/SCRNP. By reserving more data for
estimation of these key parameters, we obtain improved performance in
small samples.
# load package
library(nlpred)
#> Loading required package: data.table
# turn off messages from np package
options(np.messages=FALSE)
# simulate data
<- 200
n <- 10
p <- data.frame(matrix(rnorm(n*p), nrow = n, ncol = p))
X <- rbinom(n, 1, plogis(X[,1] + X[,10]))
Y
# get cv auc estimates for logistic regression
<- cv_auc(Y = Y, X = X, K = 5, learner = "glm_wrapper")
logistic_cv_auc_ests
logistic_cv_auc_ests#> est se cil ciu
#> cvtmle 0.7598522 0.03223410 0.6966745 0.8230299
#> onestep 0.7601000 0.03252870 0.6963449 0.8238551
#> esteq 0.7557129 0.03252870 0.6919578 0.8194680
#> standard 0.7660940 0.03348094 0.7004726 0.8317154
# get cv auc estimates for random forest using nested
# cross-validation for nuisance parameter estimation. nested
# cross-validation is unfortunately necessary when aggressive learners
# are used.
<- cv_auc(Y = Y, X = X, K = 5,
rf_cv_auc_ests learner = "randomforest_wrapper",
nested_cv = TRUE)
rf_cv_auc_ests#> est se cil ciu
#> cvtmle 0.7305404 0.03606462 0.6598550 0.8012257
#> onestep 0.7308869 0.03625171 0.6598349 0.8019390
#> esteq 0.7281639 0.03625171 0.6571118 0.7992159
#> standard 0.7435551 0.03553040 0.6739168 0.8131934
# same examples for scrnp
<- cv_scrnp(Y = Y, X = X, K = 5, learner = "glm_wrapper")
logistic_cv_scrnp_ests
logistic_cv_scrnp_ests#> est se cil ciu
#> cvtmle 0.1099379 0.03873987 0.03400918 0.1858667
#> onestep 0.1237150 0.03857579 0.04810785 0.1993222
#> esteq 0.1237150 0.03857579 0.04810785 0.1993222
#> standard 0.1612586 0.03851825 0.08576425 0.2367530
<- cv_scrnp(Y = Y, X = X, K = 5,
rf_cv_scrnp_ests learner = "randomforest_wrapper",
nested_cv = TRUE)
rf_cv_scrnp_ests#> est se cil ciu
#> cvtmle 0.09331934 0.02851627 0.037428470 0.1492102
#> onestep 0.09642105 0.02851279 0.040536999 0.1523051
#> esteq 0.09642105 0.02851279 0.040536999 0.1523051
#> standard 0.08475865 0.04111922 0.004166465 0.1653508
If you encounter any bugs or have any specific feature requests, please file an issue.
Interested contributors can consult our contribution guidelines
prior to submitting a pull request.
After using the nlpred
package, please cite the
following:
@Manual{nlpredpackage,
title = {nlpred: Estimators of Non-Linear Cross-Validated Risks Optimized for Small Samples},
author = {David Benkeser},
note = {R package version 1.0.1}
}
@article{benkeser2019improved,
year = {2019},
author = {Benkeser, David C and Petersen, Maya and van der Laan, Mark J},
title = {Improved Small-Sample Estimation of Nonlinear Cross-Validated Prediction Metrics},
journal = {Journal of the American Statistical Association},
doi = {10.1080/01621459.2019.1668794}
}
© 2019- David Benkeser
The contents of this repository are distributed under the MIT license. See below for details:
The MIT License (MIT)
Copyright (c) 2019- David C. Benkeser
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.