gglikert()
The purpose of gglikert()
is to generate a centered bar
plot comparing the answers of several questions sharing a common
Likert-type scale.
likert_levels <- c(
"Strongly disagree",
"Disagree",
"Neither agree nor disagree",
"Agree",
"Strongly agree"
)
set.seed(42)
df <-
tibble(
q1 = sample(likert_levels, 150, replace = TRUE),
q2 = sample(likert_levels, 150, replace = TRUE, prob = 5:1),
q3 = sample(likert_levels, 150, replace = TRUE, prob = 1:5),
q4 = sample(likert_levels, 150, replace = TRUE, prob = 1:5),
q5 = sample(c(likert_levels, NA), 150, replace = TRUE),
q6 = sample(likert_levels, 150, replace = TRUE, prob = c(1, 0, 1, 1, 0))
) |>
mutate(across(everything(), ~ factor(.x, levels = likert_levels)))
likert_levels_dk <- c(
"Strongly disagree",
"Disagree",
"Neither agree nor disagree",
"Agree",
"Strongly agree",
"Don't know"
)
df_dk <-
tibble(
q1 = sample(likert_levels_dk, 150, replace = TRUE),
q2 = sample(likert_levels_dk, 150, replace = TRUE, prob = 6:1),
q3 = sample(likert_levels_dk, 150, replace = TRUE, prob = 1:6),
q4 = sample(likert_levels_dk, 150, replace = TRUE, prob = 1:6),
q5 = sample(c(likert_levels_dk, NA), 150, replace = TRUE),
q6 = sample(
likert_levels_dk, 150,
replace = TRUE, prob = c(1, 0, 1, 1, 0, 1)
)
) |>
mutate(across(everything(), ~ factor(.x, levels = likert_levels_dk)))
Simply call gglikert()
.
The list of variables to plot (all by default) could by specify with
include
. This argument accepts tidy-select syntax.
The generated plot is a standard ggplot2
object. You can
therefore use ggplot2
functions to custom many aspects.
gglikert(df) +
ggtitle("A Likert-type items plot", subtitle = "generated with gglikert()") +
scale_fill_brewer(palette = "RdYlBu")
#> Scale for fill is already present.
#> Adding another scale for fill, which will replace the existing scale.
You can sort the plot with sort
.
By default, the plot is sorted based on the proportion being higher than the center level, i.e. in this case the proportion of answers equal to “Agree” or “Strongly Agree”. Alternatively, the questions could be transformed into a score and sorted accorded to their mean.
You can reverse the order of the answers with
reverse_likert
.
Proportion labels could be removed with
add_labels = FALSE
.
or customized.
By default, totals are added on each side of the plot. In case of an
uneven number of answer levels, the central level is not taken into
account for computing totals. With
totals_include_center = TRUE
, half of the proportion of the
central level will be added on each side.
Totals could be customized.
gglikert(
df,
totals_size = 4,
totals_color = "blue",
totals_fontface = "italic",
totals_hjust = .20
)
Or removed.
If you are using variable labels (see
labelled::set_variable_labels()
), they will be taken
automatically into account by gglikert()
.
if (require(labelled)) {
df <- df |>
set_variable_labels(
q1 = "first question",
q2 = "second question",
q3 = "this is the third question with a quite long variable label"
)
}
gglikert(df)
You can also provide custom variable labels with
variable_labels
.
gglikert(
df,
variable_labels = c(
q1 = "alternative label for the first question",
q6 = "another custom label"
)
)
You can control how variable labels are wrapped with
y_label_wrap
.
By default, Likert plots will be centered, i.e. displaying the same number of categories on each side on the graph. When the number of categories is odd, half of the “central” category is displayed negatively and half positively.
It is possible to control where to center the graph, using the
cutoff
argument, representing the number of categories to
be displayed negatively: 2
to display the two first
categories negatively and the others positively; 2.25
to
display the two first categories and a quarter of the third
negatively.
Simply specify symmetric = TRUE
.
Sometimes, the dataset could contain certain values that you should not be displayed.
A first option could be to convert the don’t knows into
NA
. In such case, the proportions will be computed on non
missing.
Or, you could use exclude_fill_values
to not display
specific values, but still counting them in the denominator for
computing proportions.
To define facets, use facet_rows
and/or
facet_cols
.
df_group <- df
df_group$group1 <- sample(c("A", "B"), 150, replace = TRUE)
df_group$group2 <- sample(c("a", "b", "c"), 150, replace = TRUE)
gglikert(df_group,
q1:q6,
facet_cols = vars(group1),
labels_size = 3
)
gglikert(df_group,
q3:q6,
facet_cols = vars(group1),
facet_rows = vars(group2),
labels_size = 3
) +
scale_x_continuous(
labels = label_percent_abs(),
expand = expansion(0, .2)
)
To compare answers by subgroup, you can alternatively map
.question
to facets, and define a grouping variable for
y
.
For a more classical stacked bar plot, you can use
gglikert_stacked()
.
Internally, gglikert()
is calling
gglikert_data()
to generate a long format dataset combining
all questions into two columns, .question
and
.answer
.
gglikert_data(df) |>
head()
#> # A tibble: 6 × 3
#> .weights .question .answer
#> <dbl> <fct> <fct>
#> 1 1 first question Strongly…
#> 2 1 second question Disagree
#> 3 1 this is the third question with a quite long variable label Agree
#> 4 1 q4 Disagree
#> 5 1 q5 Strongly…
#> 6 1 q6 Strongly…
Such dataset could be useful for other types of plot, for example for a classic stacked bar plot.
gglikert()
, gglikert_stacked()
and
gglikert_data()
accepts a weights
argument,
allowing to specify statistical weights.
The function position_likert()
used to center bars.