maskr introduces the masked
class that suppresses or
“masks” printing of certain elements of an atomic vector while keeping
the underlying data available for computation. Masked vectors can
include numeric, logical character or factor data.
maskr can be installed from CRAN using:
install.packages('maskr')
You can also install the development version of maskr from GitHub:
::install_github('inpowell/maskr') devtools
library(maskr)
Masked vectors can be built from atomic vectors using
masked()
:
<- 0:8
x masked(x, 0 < x & x < 5)
#> <integer+masked[9]>
#> 0 n.p. n.p. n.p. n.p. 5 6 7 8
Here, the masked cells are indicated by “n.p.” (not published) by
default. We can change how they are presented using the
maskr.replacement
option:
options(maskr.replacement = '*')
masked(x, 0 < x & x < 5)
#> <integer+masked[9]>
#> 0 * * * * 5 6 7 8
options(maskr.replacement = NULL)
Other types of atomic vectors can be masked as well:
masked(letters, letters %in% c('a', 'e', 'i', 'o', 'u'))
#> <character+masked[26]>
#> n.p. b c d n.p. f g h n.p. j k l m n n.p. p
#> q r s t n.p. v w x y z
We can also use this to control which data gets displayed in data frames and cross-tables.
<- tibble::tibble(
tabular Activity = gl(4, 4, 16, labels = c("I", "II", "III", "Total")),
Region = gl(4, 1, 16, labels = c("A", "B", "C", "Total")),
Count = as.integer(c(
10, 25, 5, 40,
16, 13, 11, 40,
17, 20, 24, 61,
43, 58, 40, 141
))
)<- rep(FALSE, 16L)
suppress c(5, 8, 9, 12)] <- TRUE
suppress[$Count <- masked(tabular$Count, suppress)
tabular
tabular#> # A tibble: 16 × 3
#> Activity Region Count
#> <fct> <fct> <int+msk>
#> 1 I A 10
#> 2 I B 25
#> 3 I C 5
#> 4 I Total 40
#> 5 II A n.p.
#> 6 II B 13
#> 7 II C 11
#> 8 II Total n.p.
#> 9 III A n.p.
#> 10 III B 20
#> 11 III C 24
#> 12 III Total n.p.
#> 13 Total A 43
#> 14 Total B 58
#> 15 Total C 40
#> 16 Total Total 141
This works with tidyverse reshaping functions like
pivot_wider()
:
::pivot_wider(tabular, names_from = 'Region', values_from = 'Count')
tidyr#> # A tibble: 4 × 5
#> Activity A B C Total
#> <fct> <int+msk> <int+msk> <int+msk> <int+msk>
#> 1 I 10 25 5 40
#> 2 II n.p. 13 11 n.p.
#> 3 III n.p. 20 24 n.p.
#> 4 Total 43 58 40 141
Masked vectors support basic arithmetic, so for example we can find percentages while maintaining the correct masking pattern.
|>
tabular ::group_by(Activity) |>
dplyr::mutate(Percent = 100 * Count / Count[Region == 'Total'])
dplyr#> # A tibble: 16 × 4
#> # Groups: Activity [4]
#> Activity Region Count Percent
#> <fct> <fct> <int+msk> <dbl+msk>
#> 1 I A 10 25
#> 2 I B 25 62.5
#> 3 I C 5 12.5
#> 4 I Total 40 100
#> 5 II A n.p. n.p.
#> 6 II B 13 n.p.
#> 7 II C 11 n.p.
#> 8 II Total n.p. n.p.
#> 9 III A n.p. n.p.
#> 10 III B 20 n.p.
#> 11 III C 24 n.p.
#> 12 III Total n.p. n.p.
#> 13 Total A 43 30.5
#> 14 Total B 58 41.1
#> 15 Total C 40 28.4
#> 16 Total Total 141 100
Notice that where we have divided by a masked cell, the percentage is also masked.
Using masked vectors, as opposed to just replacing values we want to suppress with missing values, means we can always recover our data before we publish it:
$Count <- unmask(tabular$Count)
tabular
tabular#> # A tibble: 16 × 3
#> Activity Region Count
#> <fct> <fct> <int>
#> 1 I A 10
#> 2 I B 25
#> 3 I C 5
#> 4 I Total 40
#> 5 II A 16
#> 6 II B 13
#> 7 II C 11
#> 8 II Total 40
#> 9 III A 17
#> 10 III B 20
#> 11 III C 24
#> 12 III Total 61
#> 13 Total A 43
#> 14 Total B 58
#> 15 Total C 40
#> 16 Total Total 141