rfPermute
estimates the significance of importance
metrics for a Random Forest model by permuting the response variable. It
will produce null distributions of importance metrics for each predictor
variable and p-values of observed importances. The package also
includes several summary and visualization functions for
randomForest
and rfPermute
results. See
rfPermuteTutorial()
in the package for a guide on running,
summarizing, and diagnosing rfPermute
and
randomForest
models.
To install the stable version from CRAN:
install.packages('rfPermute')
To install the latest version from GitHub:
# make sure you have devtools installed
if (!require('devtools')) install.packages('devtools')
# install from GitHub
::install_github('EricArcher/rfPermute') devtools
rfPermute
Estimate Permutation p-values for Random
Forest Importance Metricsimportance
Extract rfPermute Importance Scores and
p-valuesplotNull
Plot Random Forest Importance Null
DistributionsplotImpPreds
Distribution of Important Variablessummary
Summarize rfPermute and randomForest
modelsconfusionMatrix
Confusion MatrixcasePredictions
Return predictions and votes for
training casespctCorrect
Percent Correctly ClassifiedplotInbag
Distribution of sample inbag ratesplotPredictedProbs
Distribution of prediction
assignment probabilitiesplotProximity
Plot Random Forest Proximity ScoresplotTrace
Trace of cumulative error rates in
forestplotVotes
Vote DistributioncombineRP
Combine rfPermute modelsbalancedSampsize
Balanced Sample SizecleanRFdata
Clean Random Forest Input Datan
predictors.pct.correct
argument to plotTrace()
.
Default is now to have y-axis as 1 - OOB error rate.NOTE: v2.5 is a large redevelopment of the package.
The structure of rfPermute model objects has changed make them
incompatible with previous versions. Also, the name and functionality of
several functions has changed to make them more consistent with one
another. A tutorial (under construction) is available within the package
as rfPermuteTutorial()
.
exptdErrRate
threshold
argument in
classConfInt
and confusionMatrix
to
NULL
exptdErrRate
and
confusionMatrix
pctCorrect
casePredictions
plotConfMat
, plotOOBtimes
,
plotRFtrace
, and plotInbag
, and
plotImpVarDist
visualizations.confusionMatrix
so it will work when
randomForest
model doesn’t have a $confusion
element, like when model is result of combine
-ing multiple
models.num.cores
to
NULL
.type
argument to plotVotes
to choose
between area and bar charts.plot.rfPermute
to plotNull
to
avoid clashes and maintain functionality of
randomForest::plot.randomForest
.proximity.plot
to
proximityPlot
, exptd.err.rate
to
exptdErrRate
, and clean.rf.data
to
cleanRFdata
to make camelCase naming scheme more consistent
in package.plotNull
from base graphics to
ggplot2.symb.metab
data set.n
argument to impHeatmap
.classConfInt
,
confusionMatrix
, plotVotes
,
pctCorrect
.plot.rfPermute
that was reporting the
p-value incorrectly at the top of the figure.rfPermute
so it works on
Windows too.impHeatmap
function.proximity.plot
to use ggplot2
graphics.rfPemute
has separate $null.dist
and
$pval
elements, each with results for unscaled and scaled
importance mesures. See ?rfPermute
for more
information.rp.importance
and plot.rfPermute
now take
a scale
argument to specify whether or not importance
values should be scaled by standard deviations.nrep = 0
for rfPermute
, a
randomForest
object is returned.grid
name
clashes.clean.rf.data
where fixed
predictors were not removed.main
argument in
plot.rp.importance
.num.cores
argument to rfPermute
to
take advantage of multi-threadingcalc.imp.pval
to keep it from
indexing