Test Convergence

Andrew L Jackson

2023-10-19

In this example we save the raw jags model output and test for convergence using the coda package.

library(SIBER)
library(coda)

Fit a basic SIBER model to the example data bundled with the package, taking care to set parms$output = TRUE and to set an appropriate working directory. You might choose to set this as parms$save.dir = getwd() to save it to your current working directory or you may choose to specify a specific directory. In this example, I use a temporary R directory to avoid writing to the actual package directory on your machine when you install and build the this package and associated vignette.

# load in the included demonstration dataset
data("demo.siber.data")
#
# create the siber object
siber.example <- createSiberObject(demo.siber.data)

# Calculate summary statistics for each group: TA, SEA and SEAc
group.ML <- groupMetricsML(siber.example)

# options for running jags
parms <- list()
parms$n.iter <- 2 * 10^4   # number of iterations to run the model for
parms$n.burnin <- 1 * 10^3 # discard the first set of values
parms$n.thin <- 10     # thin the posterior by this many
parms$n.chains <- 3        # run this many chains

# set save.output = TRUE
parms$save.output = TRUE

# you might want to change the directory to your local directory or a 
# sub folder in your current working directory. I have to set it to a 
# temporary directory that R creates and can use for the purposes of this 
# generic vignette that has to run on any computer as the package is 
# built and installed.
parms$save.dir = tempdir()

# define the priors
priors <- list()
priors$R <- 1 * diag(2)
priors$k <- 2
priors$tau.mu <- 1.0E-3

# fit the ellipses which uses an Inverse Wishart prior
# on the covariance matrix Sigma, and a vague normal prior on the 
# means. Fitting is via the JAGS method.
ellipses.posterior <- siberMVN(siber.example, parms, priors)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 30
##    Unobserved stochastic nodes: 3
##    Total graph size: 45
## 
## Initializing model
## 
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 30
##    Unobserved stochastic nodes: 3
##    Total graph size: 45
## 
## Initializing model
## 
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 30
##    Unobserved stochastic nodes: 3
##    Total graph size: 45
## 
## Initializing model
## 
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 30
##    Unobserved stochastic nodes: 3
##    Total graph size: 45
## 
## Initializing model
## 
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 30
##    Unobserved stochastic nodes: 3
##    Total graph size: 45
## 
## Initializing model
## 
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 30
##    Unobserved stochastic nodes: 3
##    Total graph size: 45
## 
## Initializing model

Now we want to determine whether our models have converged. There are separate models for each ellipse, and there should be one for each saved in our parms$save.dir directory. Note that the gelman.diag function does not seem to deal with the multivariate covariance matrix properly. We can calculate the statistic on the marginal parameters of the covariance matrix separately by setting multivariate = FALSE. See ?gelman.diag for more advice and assistance with interpreting this statistic: basically, we are looking for scale reduction factors less than 1.1. These models tend to behave well since we have z-scored the data for each ellipse prior to fitting and so convergence should not be an issue in most cases.

N.B. the parameter estimates we are performing these convergence tests on are the based on the estimates for the z-scored data as fitted by the jags model and as saved to file, and so are not the same scale as your actual raw data. In this regard, the means should approximate 0, and the diagonals of the covariance matrix close to 1, with a non-zero off-diagonal. These are back-transformed within SIBER when calculating the subsequent statistics such as SEA or shifts in the bivariate means.

# get a list of all the files in the save directory
all.files <- dir(parms$save.dir, full.names = TRUE)

# find which ones are jags model files
model.files <- all.files[grep("jags_output", all.files)]

# test convergence for the first one
do.this <- 1

load(model.files[do.this])

gelman.diag(output, multivariate = FALSE)
## Potential scale reduction factors:
## 
##             Point est. Upper C.I.
## Sigma2[1,1]          1       1.00
## Sigma2[2,1]          1       1.01
## Sigma2[1,2]          1       1.01
## Sigma2[2,2]          1       1.00
## mu[1]                1       1.00
## mu[2]                1       1.00
gelman.plot(output, auto.layout = FALSE)

Repeat for whichever or all ellipses you want to test convergence. There are other convergence tests available within the coda package and beyond, which you may use with the mcmc.list object called output that is saved in the various *.RData files produced.