Given we detected some form of bias during bias auditing, we are
often interested in obtaining fair(er) models. There are several ways to
achieve this, such as collecting additional data or finding and fixing
errors in the data. Assuming there are no biases in the data and labels,
one other option is to debias models using either
preprocessing, postprocessing and
inprocessing methods. mlr3fairness
provides some operators as PipeOp
s for
mlr3pipelines
. If you are not familiar with
mlr3pipelines
, the mlr3 book
contains an introduction.
We again showcase debiasing using the adult_train
task:
mlr3fairness
implements 2 reweighing-based algorithms:
reweighing_wts
and reweighing_os
.
reweighing_wts
adds observation weights to a
Task
that can counteract imbalances between the conditional
probabilities \(P(Y | pta)\).
key | output.num | input.type.train | input.type.predict | output.type.train |
---|---|---|---|---|
EOd | 1 | TaskClassif | TaskClassif | NULL |
reweighing_os | 1 | TaskClassif | TaskClassif | TaskClassif |
reweighing_wts | 1 | TaskClassif | TaskClassif | TaskClassif |
We fist instantiate the PipeOp
:
and directly add the weights:
Often we directly combine the PipeOp
with a
Learner
to automate the preprocessing (see
learner_rw
). Below we instantiate a small benchmark
set.seed(4321)
learner = lrn("classif.rpart", cp = 0.005)
learner_rw = as_learner(po("reweighing_wts") %>>% learner)
grd = benchmark_grid(list(task), list(learner, learner_rw), rsmp("cv", folds=3))
bmr = benchmark(grd)
#> INFO [22:45:19.313] [mlr3] Running benchmark with 6 resampling iterations
#> INFO [22:45:19.347] [mlr3] Applying learner 'classif.rpart' on task 'adult_train' (iter 1/3)
#> INFO [22:45:19.410] [mlr3] Applying learner 'classif.rpart' on task 'adult_train' (iter 2/3)
#> INFO [22:45:19.466] [mlr3] Applying learner 'classif.rpart' on task 'adult_train' (iter 3/3)
#> INFO [22:45:19.518] [mlr3] Applying learner 'reweighing_wts.classif.rpart' on task 'adult_train' (iter 1/3)
#> INFO [22:45:19.611] [mlr3] Applying learner 'reweighing_wts.classif.rpart' on task 'adult_train' (iter 2/3)
#> INFO [22:45:19.710] [mlr3] Applying learner 'reweighing_wts.classif.rpart' on task 'adult_train' (iter 3/3)
#> INFO [22:45:19.798] [mlr3] Finished benchmark
We can now compute the metrics for our benchmark and see if reweighing actually improved fairness, measured via True Positive Rate (TPR) and classification accuracy (ACC):
bmr$aggregate(msrs(c("fairness.tpr", "fairness.acc")))
#> nr task_id learner_id resampling_id iters fairness.tpr
#> 1: 1 adult_train classif.rpart cv 3 0.07494903
#> 2: 2 adult_train reweighing_wts.classif.rpart cv 3 0.01151982
#> fairness.acc
#> 1: 0.1162688
#> 2: 0.1054431
#> Hidden columns: resample_result
Our model became way fairer wrt. TPR but minimally worse wrt. accuracy!