2. Main functions of “eatRep”
The four main functions can be seen as “replication variants” of the base R
functions mean()
, table()
, quantile()
, and glm()
:
repMean()
: computes means, standard deviations, mean differences, and standard deviation differencesrepTable()
: computes frequency tables and differences thereofrepQuantile()
for quantiles, percentiles, and so onrepGlm()
: linear and generalized linear regression models
2.1 Mean analysis
For the first example, we want to compute means and standard deviations (along with their standard errors) in reading competencies for several federal states at one distinct time of measurement (2010). As bt
contains data from 2010 and 2015 as well as both competencies reading and listening, we subset the data set:
bt2010 <- bt[which(bt[,"year"] == 2010),]
bt2010read <- bt2010[which(bt2010[,"domain"] == "reading"),]
We now call repMean()
with the reduced data set bt1010read
:
results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone",
repInd = "jkrep", imp="imp", groups = "country", dependent = "score",
progress = FALSE)
## 1 analyse(s) overall according to: 'group.splits = 1'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
We use country
as a grouping variable—all analyses are computed for each country separately. Important: persons in the data must be nested within the grouping variable. This is true for country
; each person belongs to only one federal state. For another possible grouping variable, domain
, this is not the case, as one single person may have worked on items from more than one domain. To check whether persons are nested within a grouping variable, the function isNested()
from the lme4
package package can be called:
lme4::isNested(bt2010[,"idstud"], bt2010[,"country"])
## [1] TRUE
lme4::isNested(bt2010[,"idstud"], bt2010[,"domain"])
## [1] FALSE
To conduct the analyses for both domains in a single call, we recommend using a loop, according to “listening” and “reading”. We demonstrate this usage in section 2.5. To collect the results in a single data.frame
which can be exported to excel, for example, the reporting function report()
should be called.
resReading <- report(results, add = list(kb="reading"))
The argument add
augments the output with additional columns. The function does not know that the analysis is about “reading” competence. If this information should be incorporated in the output, the add
argument allows to define one or multiple additional columns with scalar information of character type, for example:
resReading <- report(results, add = list(kb="reading", year = "2010"))
The analysis above includes one grouping variable (“country”) and one competence domain (“reading”) without any group comparisons. The output therefore is rather sparse.
However, the results can be computed according to more than one grouping variable. If the results should be computed for each country and each migration group, both are specified as grouping variables:
results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone",
repInd = "jkrep", imp="imp", groups = c("country", "mig"), dependent = "score",
progress = FALSE)
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
res <- report(results, add = list( kb="reading"))
Frequently, one might not only be interested in group means but also the total mean. Hence, we want to know the mean of each single country and the mean of the whole population. You can repeat the analysis two times, one including grouping variables and one ignoring all grouping variables, but it is easier to use only one single call:
results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone",
repInd = "jkrep", imp="imp", groups = "country", group.splits = 0:1,
dependent = "score", progress = FALSE)
## 2 analyse(s) overall according to: 'group.splits = 0 1'.
##
## analysis.number hierarchy.level groups.divided.by group.differences.by
## 1 0 NA
## 2 1 country NA
##
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
res <- report(results, add = list( kb="reading"))
The argument Argument group.splits
defines “hierarchy levels” for the analyses, indicating whether the analysis should be conducted within or across groups. The number of hierarchy levels always equals the number of grouping variables plus one. One grouping variable means two hierarchy levels, two grouping variables mean three levels, and so on. Without any grouping variables, only one level, the “zeroth” level, exists. At the zeroth level, no differentiation takes place; all statistics are computed for the whole population. With one grouping variable (country
, for example) two levels can be defined: at the zeroth level, statistics are computed for the whole population, and at the first level, statistics are computed for countryA
, countryB
, and countryC
separately. With two grouping variables (country
and migration background: mig
), three hierarchy levels are defined. The entire group (zeroth level), statistics computed for countryA
, countryB
, and countryC
(first level, according to country
), statistics computed for no migration background
and migration background
(first level, according to mig
), and at the second level, statistics separately computed for migrants in countryA
, migrants in countryB
, migrants in countryB
, natives in countryA
, natives in countryB
, natives in countryC
. group.splits
is a numeric vector which contains all desired hierarchy levels. Without specifying group.splits
, only the highest hierarchy level is considered for analysis.
Assume only one grouping variable. group.splits = c(0,1)
or group.splits = 0:1
additionally computes statistics for the zeroth level. For two grouping variables, group.splits = 1:2
computes statistics for the first and second level. The zeroth level is omitted. To yield statistics for all possible level, type group.splits = 0:x
, where “x” equals the number of grouping variables.
2.2 Group differences in means
Do boys and girls significantly differ in their mean competencies? Do Bavarian students outperform Saxonian students in “reading”? Is the mean score of Canadian students significantly above the OECD average? These examples can be distinguished regarding whether both units, which should be compared, share the same hierarchy level. Differences within a hierarchy level (e.g., boys vs. girls) are referred to as “group differences”. Differences between (adjacent) hierarchy levels (e.g., Canadian vs. OECD average, as Canada itself is part of the OECD average) are referred to as “cross-level differences”. The following example deals with group differences according to sex
—we compare, whether boys and girls significantly differ in their means:
results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone",
repInd = "jkrep", imp="imp", groups = "sex", group.differences.by = "sex",
dependent = "score", progress = FALSE)
## 1 analyse(s) overall according to: 'group.splits = 1'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
res <- report(results, add = list( kb="reading"))
The argument group.differences.by
defines the grouping variable for which differences should be computed. Note that only one variable can be specified in group.differences.by
, and this variable must also occur in groups
(which may, however, contain further variables). All pairwise contrasts are computed for the levels in the group.differences.by
-variable. If the grouping variable is dichotomous with two levels (boys, girls), only one contrast (boys vs. girls) can be defined. If the grouping variable is polytomous with three levels (for example, country
with countryA, countryB, countryC), three contrasts will be computed (countryA vs. countryB, countryA vs. countryC, countryB vs. countryC). A polytomous variable with four levels defines six contrasts, and so on. If groups
includes more than one variable, group.differences.by
defines for which of these variables group differences should be computed. If sex differences should be computed for each country separately, consider the following call:
results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone",
repInd = "jkrep", imp="imp", groups = c("country", "sex"),
group.differences.by = "sex", dependent = "score", progress = FALSE)
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
res <- report(results, add = list( kb="reading"))
Compute sex differences in each country and additionally for the whole group:
results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone",
repInd = "jkrep", imp="imp", groups = c("country", "sex"), group.splits = 0:2,
group.differences.by = "sex", dependent = "score", progress = FALSE)
## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##
## analysis.number hierarchy.level groups.divided.by group.differences.by
## 1 0 <NA>
## 2 1 country <NA>
## 3 1 sex sex
## 4 2 country + sex sex
##
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
res <- report(results, add = list( kb="reading"))
Compute country differences within each sex group and additionally for the whole group:
results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone",
repInd = "jkrep", imp="imp", groups = c("country", "sex"), group.splits = 0:2,
group.differences.by = "country", dependent = "score", progress = FALSE)
## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##
## analysis.number hierarchy.level groups.divided.by group.differences.by
## 1 0 <NA>
## 2 1 country country
## 3 1 sex <NA>
## 4 2 country + sex country
##
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
res <- report(results, add = list( kb="reading"))
2.3 cross-level differences in means
For the easiest case, assume only one grouping variable (sex
with levels boys
and girls
) and two hierarchy levels—the zeroth and the first level. Cross-level differences then refer to the difference of one group mean (e.g., boys
mean) and the total mean. With two or more grouping variables, cross-level differences can be thought of differences of one distinct group with all higher-ranking hierarchy levels. Assuming two grouping variables (country
with three levels, and migration background mig
with two levels), 23 cross-level differences are theoretically possible:
level 2 vs. level 1:
- migrants in countryA vs. migrants
- migrants in countryB vs. migrants
- migrants in countryC vs. migrants
- natives in countryA vs. natives
- natives in countryB vs. natives
- natives in countryC vs. natives
- migrants in countryA vs. countryA
- migrants in countryB vs. countryB
- migrants in countryC vs. countryC
- natives in countryA vs. countryA
- natives in countryB vs. countryB
- natives in countryC vs. countryC
level 1 vs. level 0:
- migrants vs. whole population
- natives vs. whole population
- countryA vs. whole population
- countryB vs. whole population
- countryC vs. whole population
level 2 vs. level 0:
- migrants in countryA vs. whole population
- migrants in countryB vs. whole population
- migrants in countryC vs. whole population
- natives in countryA vs. whole population
- natives in countryB vs. whole population
- natives in countryC vs. whole population
Each cross-level difference “connects” two hierarchy levels. Hierarchy levels are neighboring, if their difference equals 1. Levels 2 and 1 are neighboring, but levels 2 and 0 are not. To compute cross-level differences, group.splits
must be specified as a numeric vector of at least two distinct elements. To reduce number of cross-level differences, the argument cross.differences
allows to define for which pairs of hierarchy levels cross-level differences should be computed.
To give an example: Consider both grouping variables country
(3 levels) and mig
(2 levels). Means should be computed for each of the three hierarchy levels. Group differences should be computed for the country
variable (e.g., countryA vs. countryB, countryA vs. countryC, and countryB vs. countryC). Cross-level differences should be computed only in relation to the zeroth level, e.g. level 1 vs. level 0, and level 2 vs. level 0. The following command should be called:
results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone",
repInd = "jkrep", imp="imp", groups = c("country", "sex"), group.splits = 0:2,
group.differences.by = "country", cross.differences = list(c(0,1), c(0,2)),
dependent = "score", progress = FALSE)
## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##
## analysis.number hierarchy.level groups.divided.by group.differences.by
## 1 0 <NA>
## 2 1 country country
## 3 1 sex <NA>
## 4 2 country + sex country
##
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
##
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
##
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
res <- report(results, add = list( kb="reading"))
cross.differences
requests a list of numeric vectors with distinct elements each. Each vector must consist of two integers, specifying the hierarchy levels for which cross-differences should be computed. For simplicity, cross.differences = TRUE
requests all possible cross-level differences. Conversely, cross.differences = FALSE
omits all cross-level differences.
Combining group.differences.by
and cross.differences
allows to compute cross-level differences of group differences, for example, if researchers want to know whether the difference “boys vs. girls” in “countryA” differs from the difference “boys vs. girls” in the total population. Note that we explicitly assume heteroscedastic variance in cross-level difference estimation by setting hetero = TRUE
and clusters = "idclass"
:
results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone",
repInd = "jkrep", imp="imp", groups = c("country", "sex"), group.splits = 0:2,
group.differences.by = "sex", cross.differences = TRUE, dependent = "score",
progress = FALSE, clusters = "idclass", hetero = TRUE)
## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##
## analysis.number hierarchy.level groups.divided.by group.differences.by
## 1 0 <NA>
## 2 1 country <NA>
## 3 1 sex sex
## 4 2 country + sex sex
##
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
##
## Compute cross level differences using 'wec' method. Assume heteroscedastic variances.
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
##
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
##
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 32 replicate weights according to JK2 procedure.
##
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 60 replicate weights according to JK2 procedure.
##
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 40 replicate weights according to JK2 procedure.
##
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 91 replicate weights according to JK2 procedure.
##
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
res <- report(results, add = list( kb="reading"))
In the output data.frame created by report()
, cross-level differences of group differences are marked with the entry crossDiff_of_groupDiff
in the comparison
column.
2.4 Statistical remarks
For cross-level differences, the groups are not independent—when comparing countryA
with the whole population, one must consider that countryA
is part of the whole population. Hence, a t test is not appropriate. eatRep
supports “weighted effect coding” or replication methods (e.g, bootstrap), with “weighted effect coding” (wec) being the default. Alternative methods can be chosen with the crossDiffSE
argument. The method old
uses an inappropriate t test and should not be used. The method is maintained in the package to provide compatibility with older versions.
2.5 Trend analyses
In general, trends are just group differences. If the groups are distinct, persons are nested within the trend variable (each person belongs to solely one point in time). The major factor distinguishing trends from “conventional” group differences is the item sample: For group differences, the item sample is usually identical, for trends, this is not necessarily the case. Moreover, comparisons across different points in time run the risk of being affected by differential item functioning (DIF) or item parameter drift (IPD). If DIF can be considered as random, it should be incorporated into the computation of standard errors of trend estimates. If standard error of trend estimates should be computed, eatRep
allows to take the “linking error” according to differently functioning items into account.
When computing trends, the analysis is conducted in both cohorts (for example, 2010 and 2015) separately. Afterwards, for each combination of grouping variables, the difference \(\bar{m}_{2015}-\bar{m}_{2010}\) is estimated. The standard error of this difference is: \[\begin{equation} SE_{trend}=\sqrt{SE_{2010}^2+SE_{2015}^2+SE_{link}^2}. \end{equation}\]
Trends can be computed for simple means, group differences, and cross-level differences. For illustration the last analysis now will be repeated with additional trend estimation. The former used data object bt2010read
cannot be used any longer as only 2010 data are included. We use “reading competence” for 2010 and 2015 by subsetting the bt
data. In the function call, the trend variable trend = "year"
as well as the linking error linkErr = "leScore"
have to be defined. Without specifying the linkErr
argument, the linking error is defaulted to 0.
btread <- bt[which(bt[,"domain"] == "reading"),]
results <- repMean(datL = btread, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone",
repInd = "jkrep", imp="imp", groups = c("country", "sex"), group.splits = 0:2,
group.differences.by = "country", cross.differences = list(c(0,1), c(0,2)),
dependent = "score", trend = "year", linkErr = "leScore", progress = FALSE)
##
## Trend group: '2010'
## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##
## analysis.number hierarchy.level groups.divided.by group.differences.by
## 1 0 <NA>
## 2 1 country country
## 3 1 sex <NA>
## 4 2 country + sex country
##
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
##
##
## Trend group: '2015'
## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##
## analysis.number hierarchy.level groups.divided.by group.differences.by
## 1 0 <NA>
## 2 1 country country
## 3 1 sex <NA>
## 4 2 country + sex country
##
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
##
##
## Trend group: '2010'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
##
##
## Trend group: '2015'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
## Note: No linking error was defined. Linking error will be defaulted to '0'.
##
## Trend group: '2010'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
##
##
## Trend group: '2015'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
## Note: No linking error was defined. Linking error will be defaulted to '0'.
res <- report(results, add = list(kb="reading"))
2.6 Loops across non-nested (grouping) variables
Arguments groups
and group.splits
allow to analyze different groups and different hierarchy levels with just one single call. Alternatively, repMean()
may be called two times, once without grouping variabe(s), and one with additional grouping variable(s). The argument groups
requires that individuals are nested within grouping variables. Individuals, however, are not nested within competence domains (“reading” and “listening”)—domain
cannot be used as grouping variable. Alternatively, if both domains should be analyzed with one single call, an additional outer loop is necessary. We demonstrate this procedure with exemplary data bt
, containing both domains “reading” and “listening”. As in the example before, we analyze group, cross-level, and trend differences.
results <- by(data = bt, INDICES = bt[,"domain"], FUN = function ( subsample) {
repMean(datL = subsample, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone",
repInd = "jkrep", imp="imp", groups = c("country", "sex"),
group.splits = 0:2, group.differences.by = "country",
cross.differences = list(c(0,1), c(0,2)), dependent = "score",
trend = "year", linkErr = "leScore", progress = FALSE) } )
##
## Trend group: '2010'
## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##
## analysis.number hierarchy.level groups.divided.by group.differences.by
## 1 0 <NA>
## 2 1 country country
## 3 1 sex <NA>
## 4 2 country + sex country
##
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
##
##
## Trend group: '2015'
## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##
## analysis.number hierarchy.level groups.divided.by group.differences.by
## 1 0 <NA>
## 2 1 country country
## 3 1 sex <NA>
## 4 2 country + sex country
##
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
##
##
## Trend group: '2010'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
##
##
## Trend group: '2015'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
## Note: No linking error was defined. Linking error will be defaulted to '0'.
##
## Trend group: '2010'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
##
##
## Trend group: '2015'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
## Note: No linking error was defined. Linking error will be defaulted to '0'.
##
## Trend group: '2010'
## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##
## analysis.number hierarchy.level groups.divided.by group.differences.by
## 1 0 <NA>
## 2 1 country country
## 3 1 sex <NA>
## 4 2 country + sex country
##
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
##
##
## Trend group: '2015'
## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##
## analysis.number hierarchy.level groups.divided.by group.differences.by
## 1 0 <NA>
## 2 1 country country
## 3 1 sex <NA>
## 4 2 country + sex country
##
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
##
##
## Trend group: '2010'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
##
##
## Trend group: '2015'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
## Note: No linking error was defined. Linking error will be defaulted to '0'.
##
## Trend group: '2010'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
##
##
## Trend group: '2015'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
## Note: No linking error was defined. Linking error will be defaulted to '0'.
The by
-loop around repMean
splits the data in two subsets which are analyzed consecutively. The results
object is a list with two elements, “listening” and “reading”. The reporting function must be called for these two list elements separately. We now see that the argument add
can help to distinguish both resulting data.frames. First, the processing is demonstrated without using a loop:
names(results)
## [1] "listening" "reading"
resultsListening <- report(results[["listening"]], add = list(kb = "listening"))
resultsReading <- report(results[["reading"]], add = list(kb = "reading"))
alleResults1 <- rbind (resultsListening, resultsReading)
Using a loop shortens the call, especially if more than two competence domains are used:
alleResults2 <- lapply(names(results), FUN = function ( x ) {
report(results[[x]], add = list(kb=x))})
alleResults2 <- do.call("rbind", alleResults2)
2.7 Adjusted means
eatRep
also allows to compute “adjusted” means. We will not elaborate on theoretical issues about adjusted means—broadly speaking, unadjusted comparisons between two countries, say, Japan and Vietnam, may be difficult to interpret, because both countries differ substantially in terms of mean socio-economical status, migration background, and other background variables. Adjusted means can be thought of as weighted means to answer the question: would both countries still differ in their mean competencies, if the distribution of background variables would be equal? The researcher is free to select which background or demographic variables should be chosen for adjustment.
We demonstrate the computation of adjusted means for the domain “reading” in 2010, where we adjust for sex
, migration background (mig
) and socio-economical status (ses
). All variables selected for adjustment must be numeric. For polytomous variables (e.g., language at home: “german”, “german and another language”, “only another language”) dichotomous indicator variables must be generated beforehand. In the following example, we transform the non-numeric adjustment variables sex
and mig
to be numeric.
sapply(bt2010read[,c("sex", "mig", "ses")], class)
## sex mig ses
## "factor" "logical" "numeric"
bt2010read[,"sexnum"] <- car::recode(bt2010read[,"sex"], "'male'=0; 'female'=1",
as.factor = FALSE)
bt2010read[,"mignum"] <- as.numeric(bt2010read[,"mig"])
results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone",
repInd = "jkrep", imp="imp", groups = "country", group.splits = 0:1,
cross.differences = TRUE, adjust = c("sexnum", "mignum", "ses"),
dependent = "score", progress = FALSE)
## 2 analyse(s) overall according to: 'group.splits = 0 1'.
##
## analysis.number hierarchy.level groups.divided.by group.differences.by adjust
## 1 0 NA FALSE
## 2 1 country NA TRUE
##
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
res <- report(results, add = list( kb="reading"))
Please note that, to date, cross-level differences for adjusted means can only be computed using the methods old
. In the zeroth hierarchy level, no adjustment takes place. As the distribution of background variables in the total population is used as the reference for the adjusted group means, the adjusted population mean is equal to the unadjusted population mean.
If trends should be computed for adjusted means, the procedure sketched above cannot be adopted without further ado. If the adjusted mean of countryA
in 2015 should be compared with the adjusted mean of countryA
in 2010, the reference group is no longer the total population (e.g., all countries). Unadjusted means do no depend on the specific research questions, but for adjusted means, the research questions matters: Does countryA
in 2015 differ from countryB
in 2015? Or does countryA
in 2010 differ from countryA
in 2015? Both questions require different adjustment approaches.
In the previous section, the adjustment for only one time of measurement was sketched: Would Japan still differ from Vietnam, if the distribution of background variables were equivalent? For trend analyses, the research question could be: Would there be differences between 2010 and 2015 for Japan, if the composition of students according to some demographic variables would not have changed? The package eatRep
does not differentiate between both types of research questions. To compute adjusted trends, the formerly known trend variable year
must be used as grouping variable. If adjusted trends for different groups should be estimated, the subsetting according to groups must be done by hand or via an outer loop. The incorporation of linking errors, if desired, must be done by hand likewise. The following example illustrates the procedure. The standard error of the trend estimate is computed as the square root of the sum of the squared standard errors for 2010, 2015 and the link:
btread <- bt[which(bt[,"domain"] == "reading"),]
btread[,"sexnum"] <- car::recode(btread[,"sex"], "'male'=0; 'female'=1", as.factor = FALSE)
btread[,"mignum"] <- as.numeric(btread[,"mig"])
btread[,"year"] <- as.integer(btread[,"year"])
results <- by(data = btread, INDICES = btread[,"country"], FUN = function(sub.dat) {
res <- repMean(datL = sub.dat, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone",
repInd = "jkrep", imp="imp", groups = c("year"),
adjust = c("sexnum", "mignum", "ses"), dependent = "score",
progress = FALSE)
res <- report(res, add = list( kb="reading",
country= as.character(sub.dat[1,"country"])))
res[,"trend"] <- diff(res[,"est"])
res[,"trendSE"] <- sqrt(sum(res[,"se"]^2) + unique(sub.dat[,"leScore"])^2)
return(res)})
## 1 analyse(s) overall according to: 'group.splits = 1'.
## Assume unnested structure with 3 imputations.
## Create 62 replicate weights according to JK2 procedure.
##
## 1 analyse(s) overall according to: 'group.splits = 1'.
## Assume unnested structure with 3 imputations.
## Create 65 replicate weights according to JK2 procedure.
##
## 1 analyse(s) overall according to: 'group.splits = 1'.
## Assume unnested structure with 3 imputations.
## Create 48 replicate weights according to JK2 procedure.
results <- do.call("rbind", results)