The fabPrediction
package provides functions to
implement FAB conformal prediction algorithms for continuous data as in
Bersson and Hoff,
2022 and categorical counts data as in Bersson and Hoff, 2023.
The package also includes capabilities to obtain standard direct and
Bayesian prediction regions for both types of data.
or
We will demonstrate FAB prediction with log radon values from counties in Minnesota. See Price et al., 1996.
If we have prior information regarding the population mean of the response, we can obtain FAB prediction regions for one county.
A FAB prediction region, although nonparametric, constructs prediciton intervals based on a standard normal-normal model, \[\begin{aligned} y \sim{}& N(\theta,\sigma^2)\\ \theta \sim{}& N(\mu,\tau^2\sigma^2), \end{aligned}\]and the algorithm requires input values of \(\{\mu,\tau^2\}\).
Say we are fairly confident log radon values in County 9 should be
centered at 0.5. Then, we can obtain a prediction region with the
fab_prediction
function with a nonparametric guarantee of
\(1-\alpha\) conservative coverage.
fab.region = predictionInterval(y_county9,method = "FAB",
alpha = .15,
mu = 0.5,tau2 = 1)
fab.region$bounds
So, with at least \(85\%\)
probability, the next observed radon value in County 9 will be in the
range contained in the object fab.region
. We can easily
plot the output:
We can easily compare this the distance-to-average conformal prediction region.
plot(predictionInterval(y_county9,method = "DTA",alpha = .15),
main="DTA Prediction Interval For County 9",xlab="log(radon)")
The functions include capabilities to obtain and plot standard frequentist and Bayesian prediction intervals based on a normal model. Unlike the FAB and DTA methods, the coverage of these methods rely on, among other things, accuracy of distributional assumptions.
The package also features capability to handle multiple groups. Spatial relationships among the groups and covariates can be used to estimate conformal parameters.
We will demonstrate constructing prediction sets for categorical data on a simulated dataset consisting of 50 categories and 5 groups with varying within-group sample sizes.
N.groups = c(10,50,75,100,150)
set.seed(1)
prob = rdirichlet(50:1)
y = t(sapply(N.groups,function(j)rmultinom(1,j,prob)))
If we have prior information regarding the prior concentration parameter for a given group, we can construct a FAB prediction set for that group.
And, we can plot the categories included in the prediction set:
Similarly, we can construct and plot a direct prediction set for the same data:
plot(predictionSet(y_group3,method = "direct"),
main = "Direct Prediction Set for Group 3",
cex.axis=.5)
We can also construct a Bayesian prediction set based on this model, using the same prior as before in the FAB approach. Note, though, that this prediction set does not necessarily maintain the nominal coverage rate.
The package also features capability to handle multiple groups. To construct a prediction set for group 3 in this way, the prior concentration \(\gamma\) can be estimated with data from all groups except 3.
This prior can be used as the prior information in constructing a FAB prediction set.