Replication analyses with eatRep

Introduction

The following vignette demonstrates some analyses based on replication methods (Krewski & Rao, 1981; Rust, 2014; Rust & Rao, 1996; Wolter, 2007). Replication methods are quite common in the context of survey or large-scale assessment data (Foy et al., 2008) like the “National Assessment Studies and IQB Trends in Student Achievement” . The following examples are closely related to the “IQB” context (e.g., jackknife-2 methods are used instead of balanced repeated replicates), but may be adapted for PISA or TIMSS analyses as well. For an illustration how eatRep can be applied to PISA data and its specific replication design, see the last example in the documentation of the repMean() function.

Please note that the theoretical foundations of the presented methods are beyond the scope of this vignette—literature recommendations for in depth theoretical discussions can be found in the package documentation (type package?eatRep into the console). Instead, this vignette focuses on some prototypical analyses.

Furthermore, note that IRT item calibration (Hambleton et al., 1991; Hambleton & Jones, 1993; Mislevy, Johnson, et al., 1992) or “plausible values” imputation (Davier et al., 2009; Mislevy, Beaton, et al., 1992; Monseur & Adams, 2009) are not covered in this vignette. All the outlined analyses base on survey data in which “plausible values” are already included. Such kind of data is provided by the OECD or can be requested from the “Research Data Centre (FDZ) at the IQB”. Most of the analyses comprise of descriptive statistics (means, standard deviations), frequency distributions, and linear or logistic regression models. Usually, sampling designs for large-scale assessments have the following, specific characteristics:

Often, individuals in survey data are not randomly drawn from the population. In educational assessments which aim to compare countries, for example, the proportions in the sample do not necessarily correspond to the proportions in the population. Often, institutions like the OECD provide sampling weights according with their data which allow to estimate population parameters.
The (primary) sampling units in educational data are classes instead of individuals. Hence, the sample is clustered. Students within a class are more alike than students from different classes. Therefore, clustered sampled students are more homogeneous than randomly sampled students which may lead to biased standard error estimates in inference-based analyses.
Variables of interest (e.g. educational achievement) are latent and not directly observable (they are inherently missing). Additionally, questionnaire data frequently include missing responses. Therefore, institutions like the OECD or the IQB provide imputed data.

eatRep allows to compute (adjusted) means and mean differences, frequency tables, percentiles, and parameter of (log) linear regression models, taking the clustered and/or imputed sample into account via replication methods. Trend analyses are possible as well. eatRep meets the special features mentioned above (if they apply) in the following way:

1.: include sampling weights for the analyses.

2.: Use replication methods (Bootstrap, jackknife or “Balanced repeated replicates”) for inference statistics.

3.: Pool the results by applying specific rules (Little & Rubin, 1987; Rubin, 2003).

However, eatRep is also suitable if the data is clustered without imputations or imputed without clustered sampling. The three mentioned methods (using weights, replication methods, pooling methods) can be called independently from each other.

0. Installation

We recommend to use R version 4.0.0 or higher. eatRep is available from CRAN:

install.packages("eatRep")

1. Example data

eatRep contains exemplary data named lsa (“large scale assessment”), which resembles the “IQB Gesamtanalysedatensatz (GADS)”. lsa, however, is reduced in numbers of examinees, imputations, and variables. Once the package is loaded, the structure of lsa can be inspected via:

library(eatRep)
data(lsa, package="eatRep")
str(lsa, give.attr = FALSE)

## 'data.frame':    77322 obs. of  25 variables:
##  $ year     : num  2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##  $ idstud   : Factor w/ 11655 levels "P00001","P00002",..: 1 1 1 1 1 1 2 2 2 2 ...
##  $ idclass  : Factor w/ 432 levels "C001","C002",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ wgt      : num  2.6 2.6 2.6 2.6 2.6 ...
##  $ L2wgt    : num  2.43 2.43 2.43 2.43 2.43 ...
##  $ L1wgt    : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ jkzone   : num  22 22 22 22 22 22 22 22 22 22 ...
##  $ jkrep    : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ imp      : num  3 2 1 1 2 3 2 3 2 1 ...
##  $ nest     : num  1 2 2 1 1 2 2 2 1 2 ...
##  $ country  : Factor w/ 3 levels "countryA","countryB",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ sex      : Factor w/ 2 levels "female","male": 2 2 2 2 2 2 2 2 2 2 ...
##  $ ses      : num  56 56 56 56 56 56 32.5 32.5 32.5 32.5 ...
##  $ mig      : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ domain   : Factor w/ 2 levels "listening","reading": 1 1 1 1 1 1 2 2 2 2 ...
##  $ score    : num  366 366 425 551 485 ...
##  $ comp     : num  1 1 2 3 3 3 3 3 3 3 ...
##  $ failMin  : num  1 1 0 0 0 0 0 0 0 0 ...
##  $ passReg  : num  0 0 0 1 1 1 1 1 1 1 ...
##  $ passOpt  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ leScore  : num  1.22 1.22 1.22 1.22 1.22 ...
##  $ leComp   : num  0.00262 0.00262 0.0022 0.00155 0.00155 ...
##  $ leFailMin: num  0.00262 0.00262 0.00262 0.00262 0.00262 ...
##  $ lePassReg: num  0.00482 0.00482 0.00482 0.00482 0.00482 ...
##  $ lePassOpt: num  0.00059 0.00059 0.00059 0.00059 0.00059 ...

lsa is in the long format; this means the data set contains multiple rows per individual person. Imputed variables (e.g., migration background, mig) do not occur in several columns (mig_Imp_1, mig_Imp_2, mig_Imp_3, and so on), but only once: mig. Multiple imputations are stored in multiple rows, and the variable imp yields the number of the imputed data set. Furthermore, variables which refer to different competence domains (reading, listening) and different imputations and different times of measurement (i.e., the competence variable score) do not occur in multiple columns (reading_2010_score_Imp_1, reading_2010_score_Imp_2, …, listening_2010_score_Imp_1, listening_2015_score_Imp_1, …). score only occurs once, and imp defines the imputation, whereas domain gives the competence domain. To reshape data between the long and wide format, see for example the package tidyr (pivot_wider(), pivot_longer()) or the function wideToLong() from the package eatTools. See section 1.1 for more details about reshaping and examples.

lsa contains the following variables:

year: year of assessment (2010 or 2015)
idstud: individual student identifier; the data set contains 11,637 persons overall
idclass: class identifier; the data set contains 432 classes overall
wgt: sampling weight
jkzone: jackknife zone
jkrep: jackknife replicate
imp: number of imputation (1, 2, or 3)
nest: number of nest (for nested imputation which is not yet considered here, but see the examples of repMean() for further details)
country: federal state the student stems from
sex: students sex (male, female)
ses: socio economical status
mig: migration background
domain: competence domain (listening or reading)
score: point estimate (e.g., plausible value) for the corresponding imputation and domain. According to PISA, the scale is normed with a mean of 500 and a standard deviation of 100
comp: competence level with 5 distinct levels according to pre-defined cut scores. “1” corresponds to the lowest competence level, and “5” corresponds to the highest competence level
failMin: does the student fail to achieve the minimum standard?
passReg: does the student achieve regular standard?
passOpt: does the student achieve optimal standard?
leScore: linking error according to score variable
leComp: linking error according to comp variable
leFailMin: linking error according to failMin variable
lePassReg: linking error according to passReg variable
lePassOpt: linking error according to passOpt variable

lsa includes more than 77,000 observations. Actual large scale assessment data, however, have much more observations. lsa only represents a small section with only 3 imputations (instead of 10 used in PISA), 3 federal states (PISA includes 35 OECD countries), and two domains. Most variables have labels, stored as attributes:

attributes(lsa[,"year"])

## $varLabel
## [1] "year of assessment (2010 or 2015)"

As nested imputations are not considered here, we reduce the data to the first nest:

bt <- lsa[which(lsa[,"nest"] == 1),]

1.1 Excursus: reshape imputed data from wide into long format

Institutions like PISA provide imputed data in the wide format. Each row in the data matrix represents one person. eatRep, however, needs the long format. (Without imputations, this procedure is not necessary.) Wide format data stores different imputations of the same variable in different columns. The number of imputations is not stored in an explicit variable but results from the number of additional columns per variable. Long format data stores different imputations of the same variable in additional rows. A variable like imp defines the number of the imputation.

reshape2, tidyr or data.table provide functions for reshaping. Moreover, eatTools provides the wideToLong() function for easy reshaping into the required long format. We illustrate the functionality with the help of the wide format exemplary data data.timss3 from the BIFIEsurvey package:

data(data.timss3, package="BIFIEsurvey")
str(data.timss3, give.attr = FALSE)

## 'data.frame':    4668 obs. of  20 variables:
##  $ IDSTUD : num  4e+08 4e+08 4e+08 4e+08 4e+08 ...
##  $ TOTWGT : num  17.5 17.5 17.5 17.5 17.5 ...
##  $ JKZONE : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ JKREP  : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ female : num  1 0 1 1 1 1 1 1 0 0 ...
##  $ books  : num  3 3 5 3 3 2 4 3 3 4 ...
##  $ lang   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ migrant: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ scsci  : num  NA 2 2 1 2 3 2 2 1 1 ...
##  $ likesc : num  2 4 2 1 1 1 2 2 NA 1 ...
##  $ ASMMAT1: num  543 522 456 512 506 ...
##  $ ASSSCI1: num  600 512 497 584 533 ...
##  $ ASMMAT2: num  557 533 462 510 563 ...
##  $ ASSSCI2: num  578 519 545 614 568 ...
##  $ ASMMAT3: num  506 557 445 531 530 ...
##  $ ASSSCI3: num  570 554 528 569 564 ...
##  $ ASMMAT4: num  524 511 473 497 488 ...
##  $ ASSSCI4: num  560 506 550 597 483 ...
##  $ ASMMAT5: num  578 546 457 528 583 ...
##  $ ASSSCI5: num  607 565 546 623 578 ...

Data contains 4668 rows according to 4668 persons. The following variables should be considered for reshaping:

IDSTUD: individual student identifier
TOTWGT: weighting variable
JKZONE: jackknife zone
JKREP: jackknife replicate
female: students sex (1 = female; 0 = male)
books: number of books at home
lang: language at home (How often is the language used in the test spoken at home?)
migrant: migration background
ASMMAT1: first imputation (first plausible value) for math competence
ASMMAT2: second imputation (second plausible value) for math competence
ASMMAT3: third imputation (third plausible value) for math competence
ASMMAT4: fourth imputation (fourth plausible value) for math competence
ASMMAT5: fifth imputation (fifth plausible value) for math competence
ASSSCI1: first imputation (first plausible value) for science competence
ASSSCI2: second imputation (second plausible value) for science competence
ASSSCI3: third imputation (third plausible value) for science competence
ASSSCI4: fourth imputation (fourth plausible value) for science competence
ASSSCI5: fifth imputation (fifth plausible value) for science competence

As TIMSS data is in the wide format, no imputation variable exists. In contrast to the long format, we can easily see which variables are imputed (ASMMAT occurs five times), and which are not (female only occurs once). For reshaping, number of imputations must be constant across imputed variables—hence, wideToLong() cannot be used for nested imputed data. wideToLong() needs to know all variables which should be used for further analyses—the remaining variables can be ignored. The functionality differentiates between imputed and non-imputed variables:

library(eatTools)
timssLong <- eatTools::wideToLong(datWide = data.timss3, 
             noImp = c("IDSTUD", "TOTWGT", "JKZONE", "JKREP", "female"), 
             imp = list ( math = c("ASMMAT1", "ASMMAT2", "ASMMAT3", "ASMMAT4", "ASMMAT5"), 
                       science = c("ASSSCI1", "ASSSCI2", "ASSSCI3", "ASSSCI4", "ASSSCI5")))

The non-imputed variables can be defined in a single character string, whereas imputed variables should be defined in a named list with one or more character strings. In our example, variable math consists of five imputations ASMMAT1, ASMMAT2, ASMMAT3, ASMMAT4, and ASMMAT5. Variable science also consists of five imputations, ASSSCI1, ASSSCI2, ASSSCI3, ASSSCI4, and ASSSCI5. Resulting data timssLong now is suitable for eatRep analyses. For many imputations (e.g., 15), specifying character strings is more straightforward by using the paste() or paste0() function:

timssLong <- eatTools::wideToLong(datWide = data.timss3, 
             noImp = c("IDSTUD", "TOTWGT", "JKZONE", "JKREP", "female"), 
             imp = list ( math = paste0("ASMMAT",1:5),  
                       science = paste0("ASSSCI",1:5)))

Alternatively, reshaping can be performed with melt() from the data.table package:

library(data.table)
timssLong2<- data.table::melt(setDT(data.timss3), 
                id = c("IDSTUD", "TOTWGT", "JKZONE", "JKREP", "female"), 
                measure = patterns("^ASMMAT", "^ASSSCI"), 
                value.name = c("math", "science"), variable.name="imp")

2. Main functions of “eatRep”

The four main functions can be seen as “replication variants” of the base R functions mean(), table(), quantile(), and glm():

repMean(): computes means, standard deviations, mean differences, and standard deviation differences
repTable(): computes frequency tables and differences thereof
repQuantile() for quantiles, percentiles, and so on
repGlm(): linear and generalized linear regression models

2.1 Mean analysis

For the first example, we want to compute means and standard deviations (along with their standard errors) in reading competencies for several federal states at one distinct time of measurement (2010). As bt contains data from 2010 and 2015 as well as both competencies reading and listening, we subset the data set:

bt2010     <- bt[which(bt[,"year"] == 2010),]
bt2010read <- bt2010[which(bt2010[,"domain"] == "reading"),]

We now call repMean() with the reduced data set bt2010read:

results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
           repInd = "jkrep", imp="imp", groups = "country", dependent = "score", 
           progress = FALSE)

## 1 analyse(s) overall according to: 'group.splits = 1'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.

We use country as a grouping variable—all analyses are computed for each country separately. Important: persons in the data must be nested within the grouping variable. This is true for country; each person belongs to only one federal state. For another possible grouping variable, domain, this is not the case, as one single person may have worked on items from more than one domain. To check whether persons are nested within a grouping variable, the function isNested() from the lme4 package package can be called:

lme4::isNested(bt2010[,"idstud"], bt2010[,"country"])

## [1] TRUE

lme4::isNested(bt2010[,"idstud"], bt2010[,"domain"])

## [1] FALSE

To conduct the analyses for both domains in a single call, we recommend using a loop, according to “listening” and “reading”. We demonstrate this usage in section 2.5. To collect the results in a single data.frame which can be exported to excel, for example, the reporting function report2() should be called.

resReading <- report2(results, add = list(kb="reading"))[["plain"]]

To simplify the graphical visualization of the results using the eatPlot package, a new reporting function called report2() was introduced. If you are only interested in a tabular representation of the results, it is sufficient to simply extract the “plain” worksheet from the list object returned by the function. The old reporting function is still included in the package but deprecated. The argument add augments the output with additional columns. The function does not know that the analysis is about “reading” competence. If this information should be incorporated in the output, the add argument allows to define one or multiple additional columns with scalar information of character type, for example:

resReading <- report2(results, add = list(kb="reading", year = "2010"))[["plain"]]

The analysis above includes one grouping variable (“country”) and one competence domain (“reading”) without any group comparisons. The output therefore is rather sparse.

However, the results can be computed according to more than one grouping variable. If the results should be computed for each country and each migration group, both are specified as grouping variables:

results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
           repInd = "jkrep", imp="imp", groups = c("country", "mig"), dependent = "score", 
           progress = FALSE)

## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.

res <- report2(results, add = list( kb="reading"))[["plain"]]

Frequently, one might not only be interested in group means but also the total mean. Hence, we want to know the mean of each single country and the mean of the whole population. You can repeat the analysis two times, one including grouping variables and one ignoring all grouping variables, but it is easier to use only one single call:

results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
           repInd = "jkrep", imp="imp", groups = "country", group.splits = 0:1, 
           dependent = "score", progress = FALSE)

## 2 analyse(s) overall according to: 'group.splits = 0 1'.
##  
##  analysis.number hierarchy.level groups.divided.by group.differences.by
##                1               0                                     NA
##                2               1           country                   NA
## 
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.

res <- report2(results, add = list( kb="reading"))[["plain"]]

The argument Argument group.splits defines “hierarchy levels” for the analyses, indicating whether the analysis should be conducted within or across groups. The number of hierarchy levels always equals the number of grouping variables plus one. One grouping variable means two hierarchy levels, two grouping variables mean three levels, and so on. Without any grouping variables, only one level, the “zeroth” level, exists. At the zeroth level, no differentiation takes place; all statistics are computed for the whole population. With one grouping variable (country, for example) two levels can be defined: at the zeroth level, statistics are computed for the whole population, and at the first level, statistics are computed for countryA, countryB, and countryC separately. With two grouping variables (country and migration background: mig), three hierarchy levels are defined. The entire group (zeroth level), statistics computed for countryA, countryB, and countryC (first level, according to country), statistics computed for no migration background and migration background (first level, according to mig), and at the second level, statistics separately computed for migrants in countryA, migrants in countryB, migrants in countryB, natives in countryA, natives in countryB, natives in countryC. group.splits is a numeric vector which contains all desired hierarchy levels. Without specifying group.splits, only the highest hierarchy level is considered for analysis.

Assume only one grouping variable. group.splits = c(0,1) or group.splits = 0:1 additionally computes statistics for the zeroth level. For two grouping variables, group.splits = 1:2 computes statistics for the first and second level. The zeroth level is omitted. To yield statistics for all possible level, type group.splits = 0:x, where “x” equals the number of grouping variables.

2.2 Group differences in means

Do boys and girls significantly differ in their mean competencies? Do Bavarian students outperform Saxonian students in “reading”? Is the mean score of Canadian students significantly above the OECD average? These examples can be distinguished regarding whether both units, which should be compared, share the same hierarchy level. Differences within a hierarchy level (e.g., boys vs. girls) are referred to as “group differences”. Differences between (adjacent) hierarchy levels (e.g., Canadian vs. OECD average, as Canada itself is part of the OECD average) are referred to as “cross-level differences”. The following example deals with group differences according to sex—we compare, whether boys and girls significantly differ in their means:

results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
           repInd = "jkrep", imp="imp", groups = "sex", group.differences.by = "sex", 
           dependent = "score", progress = FALSE)

## 1 analyse(s) overall according to: 'group.splits = 1'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.

res <- report2(results, add = list( kb="reading"))[["plain"]]

The argument group.differences.by defines the grouping variable for which differences should be computed. Note that only one variable can be specified in group.differences.by, and this variable must also occur in groups (which may, however, contain further variables). All pairwise contrasts are computed for the levels in the group.differences.by-variable. If the grouping variable is dichotomous with two levels (boys, girls), only one contrast (boys vs. girls) can be defined. If the grouping variable is polytomous with three levels (for example, country with countryA, countryB, countryC), three contrasts will be computed (countryA vs. countryB, countryA vs. countryC, countryB vs. countryC). A polytomous variable with four levels defines six contrasts, and so on. If groups includes more than one variable, group.differences.by defines for which of these variables group differences should be computed. If sex differences should be computed for each country separately, consider the following call:

results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
           repInd = "jkrep", imp="imp", groups = c("country", "sex"), 
           group.differences.by = "sex", dependent = "score", progress = FALSE)

## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.

res <- report2(results, add = list( kb="reading"))[["plain"]]

Compute sex differences in each country and additionally for the whole group:

results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
           repInd = "jkrep", imp="imp", groups = c("country", "sex"), group.splits = 0:2, 
           group.differences.by = "sex", dependent = "score", progress = FALSE)

## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##  
##  analysis.number hierarchy.level groups.divided.by group.differences.by
##                1               0                                   <NA>
##                2               1           country                 <NA>
##                3               1               sex                  sex
##                4               2     country + sex                  sex
## 
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.

res <- report2(results, add = list( kb="reading"))[["plain"]]

Compute country differences within each sex group and additionally for the whole group:

results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
           repInd = "jkrep", imp="imp", groups = c("country", "sex"), group.splits = 0:2, 
           group.differences.by = "country", dependent = "score", progress = FALSE)

## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##  
##  analysis.number hierarchy.level groups.divided.by group.differences.by
##                1               0                                   <NA>
##                2               1           country              country
##                3               1               sex                 <NA>
##                4               2     country + sex              country
## 
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.

res <- report2(results, add = list( kb="reading"))[["plain"]]

2.3 cross-level differences in means

For the easiest case, assume only one grouping variable (sex with levels boys and girls) and two hierarchy levels—the zeroth and the first level. Cross-level differences then refer to the difference of one group mean (e.g., boys mean) and the total mean. With two or more grouping variables, cross-level differences can be thought of differences of one distinct group with all higher-ranking hierarchy levels. Assuming two grouping variables (country with three levels, and migration background mig with two levels), 23 cross-level differences are theoretically possible:

level 2 vs. level 1:

migrants in countryA vs. migrants
migrants in countryB vs. migrants
migrants in countryC vs. migrants
natives in countryA vs. natives
natives in countryB vs. natives
natives in countryC vs. natives
migrants in countryA vs. countryA
migrants in countryB vs. countryB
migrants in countryC vs. countryC
natives in countryA vs. countryA
natives in countryB vs. countryB
natives in countryC vs. countryC

level 1 vs. level 0:

migrants vs. whole population
natives vs. whole population
countryA vs. whole population
countryB vs. whole population
countryC vs. whole population

level 2 vs. level 0:

migrants in countryA vs. whole population
migrants in countryB vs. whole population
migrants in countryC vs. whole population
natives in countryA vs. whole population
natives in countryB vs. whole population
natives in countryC vs. whole population

Each cross-level difference “connects” two hierarchy levels. Hierarchy levels are neighboring, if their difference equals 1. Levels 2 and 1 are neighboring, but levels 2 and 0 are not. To compute cross-level differences, group.splits must be specified as a numeric vector of at least two distinct elements. To reduce number of cross-level differences, the argument cross.differences allows to define for which pairs of hierarchy levels cross-level differences should be computed.

To give an example: Consider both grouping variables country (3 levels) and mig (2 levels). Means should be computed for each of the three hierarchy levels. Group differences should be computed for the country variable (e.g., countryA vs. countryB, countryA vs. countryC, and countryB vs. countryC). Cross-level differences should be computed only in relation to the zeroth level, e.g. level 1 vs. level 0, and level 2 vs. level 0. The following command should be called:

results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
           repInd = "jkrep", imp="imp", groups = c("country", "sex"), group.splits = 0:2, 
           group.differences.by = "country", cross.differences = list(c(0,1), c(0,2)), 
           dependent = "score", progress = FALSE)

## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##  
##  analysis.number hierarchy.level groups.divided.by group.differences.by
##                1               0                                   <NA>
##                2               1           country              country
##                3               1               sex                 <NA>
##                4               2     country + sex              country
## 
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## Warning in repMeanList(datL = datL, a = a): Computation of cross level differences using 'wec' method is only possible for differences according to adjacent levels. Non-adjacent levels will be ignored.
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.

res <- report2(results, add = list( kb="reading"))[["plain"]]

## Warning in FUN(X[[i]], ...): Standard error correction for 'crossDiff_of_groupDiff' is currently not supported.
## Warning in FUN(X[[i]], ...): Standard error correction for crossDifferences across multiple hierarchy levels is currently not supported.

cross.differences requests a list of numeric vectors with distinct elements each. Each vector must consist of two integers, specifying the hierarchy levels for which cross-differences should be computed. For simplicity, cross.differences = TRUE requests all possible cross-level differences. Conversely, cross.differences = FALSE omits all cross-level differences.

Combining group.differences.by and cross.differences allows to compute cross-level differences of group differences, for example, if researchers want to know whether the difference “boys vs. girls” in “countryA” differs from the difference “boys vs. girls” in the total population. Note that we explicitly assume heteroscedastic variance in cross-level difference estimation by setting hetero = TRUE and clusters = "idclass":

results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
           repInd = "jkrep", imp="imp", groups = c("country", "sex"), group.splits = 0:2, 
           group.differences.by = "sex", cross.differences = TRUE, dependent = "score", 
           progress = FALSE, clusters = "idclass", hetero = TRUE)

## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##  
##  analysis.number hierarchy.level groups.divided.by group.differences.by
##                1               0                                   <NA>
##                2               1           country                 <NA>
##                3               1               sex                  sex
##                4               2     country + sex                  sex
## 
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## Compute cross level differences using 'wec' method. Assume heteroscedastic variances.
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 32 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 60 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 40 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 91 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.

res <- report2(results, add = list( kb="reading"))[["plain"]]

## Warning in FUN(X[[i]], ...): Standard error correction for 'crossDiff_of_groupDiff' is currently not supported.
## Warning in FUN(X[[i]], ...): Standard error correction for crossDifferences across multiple hierarchy levels is currently not supported.

In the output data.frame created by report2(), cross-level differences of group differences are marked with the entry crossDiff_of_groupDiff in the comparison column.

2.4 Statistical remarks

For cross-level differences, the groups are not independent—when comparing countryA with the whole population, one must consider that countryA is part of the whole population. Hence, a t test is not appropriate. eatRep supports “weighted effect coding” (Grotenhuis et al., 2017; Weirich et al., 2021) or replication methods (e.g, bootstrap), with “weighted effect coding” (wec) being the default. Alternative methods can be chosen with the crossDiffSE argument. The method old uses an inappropriate t test and should not be used. The method is maintained in the package to provide compatibility with older versions.

2.5 Trend analyses

In general, trends are just group differences. If the groups are distinct, persons are nested within the trend variable (each person belongs to solely one point in time). The major factor distinguishing trends from “conventional” group differences is the item sample: For group differences, the item sample is usually identical, for trends, this is not necessarily the case. Moreover, comparisons across different points in time run the risk of being affected by differential item functioning (DIF) or item parameter drift (IPD). If DIF can be considered as random, it should be incorporated into the computation of standard errors of trend estimates. If standard error of trend estimates should be computed, eatRep allows to take the “linking error” according to differently functioning items into account.

When computing trends, the analysis is conducted in both cohorts (for example, 2010 and 2015) separately. Afterwards, for each combination of grouping variables, the difference \(\bar{m}_{2015}-\bar{m}_{2010}\) is estimated. The standard error of this difference is: \[\begin{equation} SE_{trend}=\sqrt{SE_{2010}^2+SE_{2015}^2+SE_{link}^2}. \end{equation}\]

Trends can be computed for simple means, group differences, and cross-level differences. For illustration the last analysis now will be repeated with additional trend estimation. The former used data object bt2010read cannot be used any longer as only 2010 data are included. We use “reading competence” for 2010 and 2015 by subsetting the bt data. In the function call, the trend variable trend = "year" as well as the linking error linkErr = "leScore" have to be defined. Without specifying the linkErr argument, the linking error is defaulted to 0.

btread  <- bt[which(bt[,"domain"] == "reading"),]
results <- repMean(datL = btread, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
           repInd = "jkrep", imp="imp", groups = c("country", "sex"), group.splits = 0:2, 
           group.differences.by = "country", cross.differences = list(c(0,1), c(0,2)), 
           dependent = "score", trend = "year", linkErr = "leScore", progress = FALSE)

## 
## Trend group: '2010'
## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##  
##  analysis.number hierarchy.level groups.divided.by group.differences.by
##                1               0                                   <NA>
##                2               1           country              country
##                3               1               sex                 <NA>
##                4               2     country + sex              country
## 
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 
## 
## Trend group: '2015'
## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##  
##  analysis.number hierarchy.level groups.divided.by group.differences.by
##                1               0                                   <NA>
##                2               1           country              country
##                3               1               sex                 <NA>
##                4               2     country + sex              country
## 
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
##
## Warning in repMeanList(datL = datL, a = a): Computation of cross level differences using 'wec' method is only possible for differences according to adjacent levels. Non-adjacent levels will be ignored.
## 
##
## Trend group: '2010'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 
## 
## Trend group: '2015'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
## Note: No linking error was defined. Linking error will be defaulted to '0'.
##
## 
## Trend group: '2010'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 
## 
## Trend group: '2015'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
## Note: No linking error was defined. Linking error will be defaulted to '0'.

res     <- report2(results, add = list(kb="reading"))[["plain"]]

## Warning in FUN(X[[i]], ...): Standard error correction for 'crossDiff_of_groupDiff' is currently not supported.
## Warning in FUN(X[[i]], ...): Standard error correction for crossDifferences across multiple hierarchy levels is currently not supported.

2.6 Loops across non-nested (grouping) variables

Arguments groups and group.splits allow to analyze different groups and different hierarchy levels with just one single call. Alternatively, repMean() may be called two times, once without grouping variable(s), and one with additional grouping variable(s). The argument groups requires that individuals are nested within grouping variables. Individuals, however, are not nested within competence domains (“reading” and “listening”)—domain cannot be used as grouping variable. Alternatively, if both domains should be analyzed with one single call, an additional outer loop is necessary. We demonstrate this procedure with exemplary data bt, containing both domains “reading” and “listening”. As in the example before, we analyze group, cross-level, and trend differences.

results <- by(data = bt, INDICES = bt[,"domain"], FUN = function ( subsample) {
           repMean(datL = subsample, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
                    repInd = "jkrep", imp="imp", groups = c("country", "sex"), 
                    group.splits = 0:2, group.differences.by = "country", 
                    cross.differences = list(c(0,1), c(0,2)), dependent = "score", 
                    trend = "year", linkErr = "leScore", progress = FALSE) } )

## 
## Trend group: '2010'
## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##  
##  analysis.number hierarchy.level groups.divided.by group.differences.by
##                1               0                                   <NA>
##                2               1           country              country
##                3               1               sex                 <NA>
##                4               2     country + sex              country
## 
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 
## 
## Trend group: '2015'
## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##  
##  analysis.number hierarchy.level groups.divided.by group.differences.by
##                1               0                                   <NA>
##                2               1           country              country
##                3               1               sex                 <NA>
##                4               2     country + sex              country
## 
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
##
## Warning in repMeanList(datL = datL, a = a): Computation of cross level differences using 'wec' method is only possible for differences according to adjacent levels. Non-adjacent levels will be ignored.
##
## 
## Trend group: '2010'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 
## 
## Trend group: '2015'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
## Note: No linking error was defined. Linking error will be defaulted to '0'.
##
## 
## Trend group: '2010'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 
## 
## Trend group: '2015'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
## Note: No linking error was defined. Linking error will be defaulted to '0'.
##
## 
## Trend group: '2010'
## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##  
##  analysis.number hierarchy.level groups.divided.by group.differences.by
##                1               0                                   <NA>
##                2               1           country              country
##                3               1               sex                 <NA>
##                4               2     country + sex              country
## 
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 
## 
## Trend group: '2015'
## 4 analyse(s) overall according to: 'group.splits = 0 1 2'.
##  
##  analysis.number hierarchy.level groups.divided.by group.differences.by
##                1               0                                   <NA>
##                2               1           country              country
##                3               1               sex                 <NA>
##                4               2     country + sex              country
## 
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
##
## Warning in repMeanList(datL = datL, a = a): Computation of cross level differences using 'wec' method is only possible for differences according to adjacent levels. Non-adjacent levels will be ignored.
##
## 
## Trend group: '2010'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 
## 
## Trend group: '2015'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
## Note: No linking error was defined. Linking error will be defaulted to '0'.
##
## 
## Trend group: '2010'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 
## 
## Trend group: '2015'
## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
## Note: No linking error was defined. Linking error will be defaulted to '0'.

The by-loop around repMean splits the data in two subsets which are analyzed consecutively. The results object is a list with two elements, “listening” and “reading”. The reporting function must be called for these two list elements separately. We now see that the argument add can help to distinguish both resulting data.frames. First, the processing is demonstrated without using a loop:

names(results)

## [1] "listening" "reading"

resultsListening <- report2(results[["listening"]], add = list(kb = "listening"))[["plain"]]

## Warning in FUN(X[[i]], ...): Standard error correction for 'crossDiff_of_groupDiff' is currently not supported.
## Warning in FUN(X[[i]], ...): Standard error correction for crossDifferences across multiple hierarchy levels is currently not supported.

resultsReading   <- report2(results[["reading"]], add = list(kb = "reading"))[["plain"]]

## Warning in FUN(X[[i]], ...): Standard error correction for 'crossDiff_of_groupDiff' is currently not supported.
## Warning in FUN(X[[i]], ...): Standard error correction for crossDifferences across multiple hierarchy levels is currently not supported.

allResults1      <- rbind (resultsListening, resultsReading)

Using a loop shortens the call, especially if more than two competence domains are used:

allResults2      <- lapply(names(results), FUN = function ( x ) { 
                           report2(results[[x]], add = list(kb=x))[["plain"]]})

## Warning in FUN(X[[i]], ...): Standard error correction for 'crossDiff_of_groupDiff' is currently not supported.
## Warning in FUN(X[[i]], ...): Standard error correction for crossDifferences across multiple hierarchy levels is currently not supported.
## Warning in FUN(X[[i]], ...): Standard error correction for 'crossDiff_of_groupDiff' is currently not supported.
## Warning in FUN(X[[i]], ...): Standard error correction for crossDifferences across multiple hierarchy levels is currently not supported.

allResults2      <- do.call("rbind", allResults2)

2.7 Adjusted means

eatRep also allows to compute “adjusted” means (Mayer et al., 2016; Nachtigall et al., 2008). We will not elaborate on theoretical issues about adjusted means—broadly speaking, unadjusted comparisons between two countries, say, Japan and Vietnam, may be difficult to interpret, because both countries differ substantially in terms of mean socio-economical status, migration background, and other background variables. Adjusted means can be thought of as weighted means to answer the question: would both countries still differ in their mean competencies, if the distribution of background variables would be equal? The researcher is free to select which background or demographic variables should be chosen for adjustment.

We demonstrate the computation of adjusted means for the domain “reading” in 2010, where we adjust for sex, migration background (mig) and socio-economical status (ses). All variables selected for adjustment must be numeric. For polytomous variables (e.g., language at home: “german”, “german and another language”, “only another language”) dichotomous indicator variables must be generated beforehand. In the following example, we transform the non-numeric adjustment variables sex and mig to be numeric.

sapply(bt2010read[,c("sex", "mig", "ses")], class)

##       sex       mig       ses 
##  "factor" "logical" "numeric"

bt2010read[,"sexnum"] <- car::recode(bt2010read[,"sex"], "'male'=0; 'female'=1", 
                         as.factor = FALSE)
bt2010read[,"mignum"] <- as.numeric(bt2010read[,"mig"])
results <- repMean(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
           repInd = "jkrep", imp="imp", groups = "country", group.splits = 0:1, 
           cross.differences = TRUE, adjust = c("sexnum", "mignum", "ses"), 
           dependent = "score", progress = FALSE)

## Warning in repMeanList(datL = datL, a = a): To date, for adjusted means, cross-level differences can only be computed with method 'old'. Set 'crossDiffSE' to 'old'.
## 2 analyse(s) overall according to: 'group.splits = 0 1'.
##  
##  analysis.number hierarchy.level groups.divided.by group.differences.by adjust
##                1               0                                     NA  FALSE
##                2               1           country                   NA   TRUE
## 
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.

res <- report2(results, add = list( kb="reading"))[["plain"]]

Please note that, to date, cross-level differences for adjusted means can only be computed using the methods old. In the zeroth hierarchy level, no adjustment takes place. As the distribution of background variables in the total population is used as the reference for the adjusted group means, the adjusted population mean is equal to the unadjusted population mean.

If trends should be computed for adjusted means, the procedure sketched above cannot be adopted without further ado. If the adjusted mean of countryA in 2015 should be compared with the adjusted mean of countryA in 2010, the reference group is no longer the total population (e.g., all countries). Unadjusted means do no depend on the specific research questions, but for adjusted means, the research questions matters: Does countryA in 2015 differ from countryB in 2015? Or does countryA in 2010 differ from countryA in 2015? Both questions require different adjustment approaches.

In the previous section, the adjustment for only one time of measurement was sketched: Would Japan still differ from Vietnam, if the distribution of background variables were equivalent? For trend analyses, the research question could be: Would there be differences between 2010 and 2015 for Japan, if the composition of students according to some demographic variables would not have changed? The package eatRep does not differentiate between both types of research questions. To compute adjusted trends, the formerly known trend variable year must be used as grouping variable. If adjusted trends for different groups should be estimated, the subsetting according to groups must be done by hand or via an outer loop. The incorporation of linking errors, if desired, must be done by hand likewise. The following example illustrates the procedure. The standard error of the trend estimate is computed as the square root of the sum of the squared standard errors for 2010, 2015 and the link:

btread  <- bt[which(bt[,"domain"] == "reading"),]
btread[,"sexnum"] <- car::recode(btread[,"sex"], "'male'=0; 'female'=1", as.factor = FALSE)
btread[,"mignum"] <- as.numeric(btread[,"mig"])
btread[,"year"] <- as.integer(btread[,"year"])
results <- by(data = btread,  INDICES = btread[,"country"], FUN = function(sub.dat) {
           res <- repMean(datL = sub.dat, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
                  repInd = "jkrep", imp="imp", groups = c("year"), 
                  adjust = c("sexnum", "mignum", "ses"), dependent = "score",
                  progress = FALSE)
           res <- report2(res, add = list( kb="reading", 
                          country= as.character(sub.dat[1,"country"])))[["plain"]]
           res[,"trend"]   <- diff(res[,"est"])
           res[,"trendSE"] <- sqrt(sum(res[,"se"]^2) + unique(sub.dat[,"leScore"])^2)
           return(res)})

## 1 analyse(s) overall according to: 'group.splits = 1'.
## Assume unnested structure with 3 imputations.
## Create 62 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 1'.
## Assume unnested structure with 3 imputations.
## Create 65 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 1'.
## Assume unnested structure with 3 imputations.
## Create 48 replicate weights according to JK2 procedure.

results <- do.call("rbind", results)

3. Frequency analyses (repTable)

Univariate frequency analyses of polytomous variables can be thought of mean analyses of dichotomous indicator variables. repTable() would not be necessary then—for example, you can redefine a 5-level polytomous variable into five dichotomous indicators and call repMean() five times. The main differences between frequency and mean analyses is the underlying statistic for group differences: mean analyses typically uses a t test for independent samples (e.g., “Differ males and females in their mean reading competencies?”). Frequency tables, however, use \(\chi^2\) tests, for example (“Differ males and females in their distribution of competence levels?”). In theory, you can test for each competence level 1, 2, 3, 4, and 5 with a separate t test, whether males and females differ. The comparisons, however, are not independent—a Bonferroni correction might be appropriate then. In the following, both variants are sketched:

### first example: group comparisons with a chi^2-Test: we check for each country 
### whether the distribution of competence levels differs between males and females
freq1 <- repTable(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
         repInd = "jkrep", imp="imp", groups = c("country", "sex"), 
         group.differences.by = "sex", cross.differences = FALSE, dependent = "comp", 
         chiSquare = TRUE, progress = FALSE)

## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.

res1  <- report2(freq1, add = list( kb = "reading"))[["plain"]]

In the second example, the group comparisons are conducted applying five separate t tests. For each country and each single competence level, repTable() checks whether the distribution differs between males and females. Technically, repMean() is called five times consecutively.

freq2 <- repTable(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
         repInd="jkrep", imp="imp", groups = c("country", "sex"), group.differences.by = "sex", 
         cross.differences = FALSE, dependent = "comp", chiSquare = FALSE, progress = FALSE)

## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.

res2  <- report2(freq2, add = list( kb = "reading"))[["plain"]]

In repTable(), the computation of cross-level differences and trends works analogously to repMean().

3.1 Looping across several dependent variables

Section 2.5 demonstrates the use of by() loops to analyze more than one domain with one single call. The same principle works for several dependent variables within one domain. Suppose you have several dichotomous criteria (e.g. “failed to reach minimal standard”, “pass regular standard”, “pass optimal standard”), represented by several variables. A lapply() loop ca be applied then:

### abhaengige Variablen definieren
DVs   <- c("failMin", "passReg", "passOpt")
freq3 <- lapply(DVs, FUN = function (dv) { 
         repTable(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
            repInd = "jkrep", imp="imp", groups = c("country", "sex"), 
            group.differences.by = "sex", cross.differences = FALSE, 
            dependent = dv, progress = FALSE) })

## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.

The reporting function report() must be called three times, likewise in a lapply() loop:

allResults3     <- do.call("rbind", lapply(freq3, report))

3.2 Nested loops

Both types of loops (across non-nested grouping variables and across several dependent variables) may be combined. In the following example, we want to analyze three dependent variables for two domains. Hence, a two-stage loop for \(3\times 2=6\) analyses is necessary:

DVs   <- c("failMin", "passReg", "passOpt")
freq4 <- lapply(DVs, FUN = function (dv) { 
         f4 <- by ( data = bt2010, INDICES = bt2010[,"domain"], FUN = function (sub.dat) {
               repTable(datL = sub.dat, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
                         repInd = "jkrep", imp="imp", groups = c("country", "sex"), 
                         group.differences.by = "sex", cross.differences = FALSE, 
                         dependent = dv, progress = FALSE)})})

## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.

To convert the results in a more user-friendly format:

allResults4  <- lapply(freq4, FUN = function (av) { 
                       do.call("rbind", lapply(names(av), FUN = function ( x ) { 
                         report2(av[[x]], add = list(kb=x))[["plain"]]}))})
allResults4  <- do.call("rbind", allResults4)

A combination of two loops also works for trend analyses. Please note that each dependent variable potentially has its own linking error. If so, one solution is to store the variable name along with its linking error in a data.frame and use an apply() loop:

### two-stage nested loop with trend analysis
### first we define the dependent variables (dv) and their linking errors (le)
DVs   <- data.frame ( dv = c("failMin", "passReg", "passOpt"), 
                      le = c("leFailMin", "lePassReg", "lePassOpt"), 
                      stringsAsFactors = FALSE)
freq5 <- apply(DVs, MARGIN = 1, FUN = function (depVars) { 
         f4 <- by ( data = bt, INDICES = bt[,"domain"], FUN = function (sub.dat) {
               repTable(datL = sub.dat, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
                         repInd = "jkrep", imp="imp", groups = c("country", "sex"), 
                         group.differences.by = "sex", cross.differences = FALSE, 
                         trend = "year", dependent = depVars[["dv"]],
                         linkErr = depVars[["le"]], progress = FALSE)})})

## 
## Trend group: '2010'
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 
## 
## Trend group: '2015'
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
##
## 
## Trend group: '2010'
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 
## 
## Trend group: '2015'
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
##
## 
## Trend group: '2010'
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 
## 
## Trend group: '2015'
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
##
## 
## Trend group: '2010'
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 
## 
## Trend group: '2015'
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
##
## 
## Trend group: '2010'
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 
## 
## Trend group: '2015'
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.
##
## 
## Trend group: '2010'
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 
## 
## Trend group: '2015'
## 1 analyse(s) overall according to: 'group.splits = 2'.
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.

To convert the results in a more user-friendly format:

allResults5 <- lapply(freq5, FUN = function (av) { 
                       do.call("rbind", lapply(names(av), FUN = function ( x ) { 
                          report2(av[[x]], add = list(kb=x))[["plain"]]}))})

## Warning in FUN(X[[i]], ...): Found 6 missing linking errors for dependent variable 'failMin' and parameter(s) 'NcasesValid'. Assume linking error of 0 for these cases.
## Warning in FUN(X[[i]], ...): Found 6 missing linking errors for dependent variable 'passReg' and parameter(s) 'NcasesValid'. Assume linking error of 0 for these cases.
## Warning in FUN(X[[i]], ...): Found 6 missing linking errors for dependent variable 'passOpt' and parameter(s) 'NcasesValid'. Assume linking error of 0 for these cases.

allResults5 <- do.call("rbind", allResults5)

4. Analyses of ranges (repQuantile)

When analyzing quartiles, quantiles or percentiles, please note that the computation of group differences is not supported yet. repQuantile requires to specify the probabilities via the probs argument. The following example illustrates the usage of the function for the domain “reading” for 2010 and 2015. We compute the 5., 10., 25., 75., 90., and 95. percentile:

btRead <- bt[which(bt[,"domain"] == "reading"),]
quan   <- repQuantile(datL = btRead, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
               repInd = "jkrep", imp="imp", groups = "country", trend = "year", 
               dependent = "score", linkErr = "leScore", 
               probs = c(.05, .1, .25, .75, .90, .95), progress = FALSE )

## 
## Trend group: '2010'
## 1 analyse(s) overall according to: 'group.splits = 1'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.
## 
## 
## Trend group: '2015'
## 1 analyse(s) overall according to: 'group.splits = 1'.
## Assume unnested structure with 3 imputations.
## Create 73 replicate weights according to JK2 procedure.

res    <- report(quan, add = list(domain = "reading"))

5. Regression analyses (repGlm)

repGlm allows to estimate linear and logistic regression models. To date, trend analyses do not incorporate linking errors. The reporting function report() optionally allows to print the results to the console (if printGlm is set to TRUE). In the first example, the regression of reading competence on sex, SES is estimated in each country separately. For a valid interpretation of interaction effects, SES variable is standardized within each imputation:

bt2010read <- by(data=bt2010read, INDICES = bt2010read[,"imp"], FUN = function ( i ) { 
                i[,"ses_std"] <- scale(i[,"ses"])[,1]
                return(i)})
bt2010read <- do.call("rbind", bt2010read)              
reg1   <- repGlm(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
               repInd = "jkrep", imp="imp", groups = "country",  formula = score~sex*ses_std, 
               family=gaussian(link="identity"), progress = FALSE)

## 1 analyse(s) overall according to: 'group.splits = 1'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.

res1   <- report2(reg1, add = list(domain = "reading"), printGlm = TRUE)[["plain"]]

##        Trend group: 'noTrend'.
##             groups: type = point; country = countryA; row = 1; id = 14574361_1
##             domain: reading
## dependent Variable: score
##  
##         parameter     est    se t.value p.value sig
## 1     (Intercept) 505.837 4.428 114.243   0.000 ***
## 2         ses_std  23.697 3.876   6.114   0.000 ***
## 3         sexmale   8.677 5.594   1.551   0.121    
## 4 sexmale:ses_std   4.109 5.527   0.743   0.457    
## 
##             R-squared: 0.115; SE(R-squared): NA
## 1034 observations and 1030 degrees of freedom.
## ------------------------------------------------------------------
##             groups: type = point; country = countryB; row = 4; id = 14574361_4
##             domain: reading
## dependent Variable: score
##  
##         parameter     est    se t.value p.value sig
## 1     (Intercept) 499.044 4.624 107.934   0.000 ***
## 2         ses_std  28.344 4.637   6.113   0.000 ***
## 3         sexmale   7.918 7.804   1.015   0.311    
## 4 sexmale:ses_std   7.598 5.024   1.512   0.131    
## 
##             R-squared: 0.181; SE(R-squared): NA
## 959 observations and 955 degrees of freedom.
## ------------------------------------------------------------------
##             groups: type = point; country = countryC; row = 7; id = 14574361_7
##             domain: reading
## dependent Variable: score
##  
##         parameter     est    se t.value p.value sig
## 1     (Intercept) 525.240 3.788 138.663   0.000 ***
## 2         ses_std  24.083 5.246   4.591   0.000 ***
## 3         sexmale  15.108 5.289   2.856   0.004  **
## 4 sexmale:ses_std   0.879 7.460   0.118   0.906    
## 
##             R-squared: 0.096; SE(R-squared): NA
## 1086 observations and 1082 degrees of freedom.

The second example illustrates a logistic regression. Whether or not the regular standard was passed (passReg) is the dependent variable. The variable country is now used as a predictor.

reg2   <- repGlm(datL = bt2010read, ID="idstud", wgt="wgt", type="jk2", PSU="jkzone", 
               repInd = "jkrep", imp="imp", formula = passReg~country*ses_std, 
               family=binomial(link="logit"), progress = FALSE)

## 1 analyse(s) overall according to: 'group.splits = 0'.
## Assume unnested structure with 3 imputations.
## Create 92 replicate weights according to JK2 procedure.

res2   <- report2(reg2, add = list(domain = "reading"), printGlm = TRUE)[["plain"]]

##        Trend group: 'noTrend'.
##             groups: type = point; row = 1; id = 14575099_1
##             domain: reading
## dependent Variable: passReg
##  
##                 parameter    est    se t.value p.value sig
## 1             (Intercept) -0.227 0.115  -1.972   0.049   *
## 2         countrycountryB -0.206 0.156  -1.317   0.188    
## 3 countrycountryB:ses_std  0.097 0.159   0.613   0.540    
## 4         countrycountryC  0.465 0.141   3.305   0.001 ***
## 5 countrycountryC:ses_std -0.086 0.142  -0.604   0.546    
## 6                 ses_std  0.609 0.082   7.401   0.000 ***
## 
##             R-squared: 0.116; SE(R-squared): NA
## 3079 observations and 3073 degrees of freedom.

The output gives the \(R^2\) (for linear regression models) as well as Nagelkerke’s \(R^2\) (for logistic regression models).

References

Davier, M. von, Gonzalez, E., & Mislevy, R. J. (2009). What are plausible values and why are they useful? [Journal Article]. IERI Monograph Series: Issues and Methodologies in Large Scale Assessments, 2, 9–36.

Foy, P., Galia, J., & Li, I. (2008). Scaling the data from the TIMSS 2007 mathematics and science asssessment [Book Section]. In J. F. Olson, M. O. Martin, & I. V. S. Mullis (Eds.), TIMSS 2007 technical report (pp. 225–280). TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.

Grotenhuis, M. te, Pelzer, B., Eisinga, R., Nieuwenhuis, R., Schmidt-Catran, A., & Konig, R. (2017). When size matters: Advantages of weighted effect coding in observational studies [Journal Article]. International Journal of Public Health, 62, 163–167.

Hambleton, R. K., & Jones, R. W. (1993). An NCME instructional module on: Comparison of classical test theory and item response theory and their applications to test development [Journal Article]. Educational Measurement: Issues and Practice, 12(3), 38–47.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory [Book]. Sage.

Krewski, D., & Rao, J. N. K. (1981). Inference from stratified samples: Properties of the linearization, jackknife and balanced repeated replication method [Journal Article]. Annals of Statistics, 9, 1010–1019.

Little, R. J. A., & Rubin, D. B. (1987). Statistical analyses with missing data [Book]. Wiley.

Mayer, A., Dietzfelbinger, L., Rosseel, Y., & Steyer, R. (2016). The EffectLiteR approach for analyzing average and conditional effects [Journal Article]. Multivariate Behavioral Research, 51, 374–391.

Mislevy, R. J., Beaton, A. E., Kaplan, B., & Sheehan, K. M. (1992). Estimating population characteristics from sparse matrix samples of item responses [Journal Article]. Journal of Educational Measurement, 29, 133–161.

Mislevy, R. J., Johnson, E. G., & Muraki, E. (1992). Scaling procedures in NAEP [Journal Article]. Journal of Educational and Behavioral Statistics, 17(2), 131–154.

Monseur, C., & Adams, R. (2009). Plausible values: How to deal with their limitations [Journal Article]. Journal of Applied Measurement, 10(3), 320–334.

Nachtigall, C., Kröhne, U., Enders, U., & Steyer, R. (2008). Causal effects and fair comparisons: Considering the influence of context variables on student competencies [Book Section]. In J. Hartig, E. Klieme, & D. Leutner (Eds.), Assessment of competencies in educational contexts. Hogrefe & Huber.

Rubin, D. B. (2003). Nested multiple imputation of NMES via partially incompatible MCMC [Journal Article]. Statistica Neerlandica, 57(1), 3–18.

Rust, K. (2014). Sampling, weighting, and variance estimation [Book Section]. In L. Rutkowski, M. Von Davier, & D. Rutkowski (Eds.), Handbook of international large-scale assessment. CRC Press.

Rust, K., & Rao, J. N. K. (1996). Variance estimation for complex surveys using replication techniques [Journal Article]. Statistical Methods in Medical Research, 5, 283–310.

Weirich, S., Hecht, M., Becker, B., & Zitzmann, S. (2021). Comparing group means with the total mean in random samples, surveys, and large-scale assessments: A tutorial and software illustration [Journal Article]. Behavior Research Methods, 1–12. https://doi.org/https://doi.org/10.3758/s13428-021-01553-1

Wolter, K. M. (2007). Introduction to variance estimation [Book]. Springer.