Abstract
The BayesianTools (BT) package supports model analysis (including sensitivity analysis and uncertainty analysis), Bayesian model calibration, as well as model selection and multi-model inference techniques for system models.The purpose of this first section is to give you a quick overview of the most important functions of the BayesianTools (BT) package. For a more detailed description, see the later sections
If you haven’t installed the package yet, either run
install.packages("BayesianTools")
Or follow the instructions on https://github.com/florianhartig/BayesianTools to install a development or an older version.
Loading and citation
library(BayesianTools)
citation("BayesianTools")
##
## To cite package 'BayesianTools' in publications use:
##
## Hartig F, Minunno F, Paul S (2023). _BayesianTools: General-Purpose
## MCMC and SMC Samplers and Tools for Bayesian Statistics_. R package
## version 0.1.8, <https://github.com/florianhartig/BayesianTools>.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {BayesianTools: General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics},
## author = {Florian Hartig and Francesco Minunno and Stefan { Paul}},
## year = {2023},
## note = {R package version 0.1.8},
## url = {https://github.com/florianhartig/BayesianTools},
## }
Note: BayesianTools calls a number of secondary packages. Particular important is coda, which is used on a number of plots and summary statistics. If you make heavy use of the summary statistics and diagnostics plots, it would be nice to cite coda as well!
Pro-tip: if you are running a stochastic algorithms such as an MCMC, you should always set or record your random seed to make your results reproducible (otherwise, results will change slightly every time you run the code)
set.seed(123)
In a real application, to ensure reproducibility, it would also be useful to record the session info,
sessionInfo()
which lists the version number of R and all loaded packages.
The central object in the BT package is the BayesianSetup. This class contains the information about the model to be fit (likelihood), and the priors for the model parameters.
A BayesianSetup is created by the createBayesianSetup function. The function expects a log-likelihood and (optional) a log-prior. It then automatically creates the posterior and various convenience functions for the samplers.
Advantages of the BayesianSetup include 1. support for automatic parallelization 2. functions are wrapped in try-catch statements to avoid crashes during long MCMC evaluations 3. and the posterior checks if the parameter is outside the prior first, in which case the likelihood is not evaluated (makes the algorithms faster for slow likelihoods).
If no prior information is provided, an unbounded flat prior is created. If no explicit prior, but lower and upper values are provided, a standard uniform prior with the respective bounds is created, including the option to sample from this prior, which is useful for SMC and also for getting starting values. This option is used in the following example, which creates a multivariate normal likelihood density and a uniform prior for 3 parameters.
<- generateTestDensityMultiNormal(sigma = "no correlation")
ll = createBayesianSetup(likelihood = ll, lower = rep(-10, 3), upper = rep(10, 3)) bayesianSetup
See later more detailed description about the BayesianSetup.
Hint: for an example how to run this steps for dynamic ecological model, see ?VSEM
Once you have your setup, you may want to run a calibration. The runMCMC function is the main wrapper for all other implemented MCMC/SMC functions. It always takes the following arguments
As an example, choosing the sampler name “Metropolis” calls a versatile Metropolis-type MCMC with options for covariance adaptation, delayed rejection, tempering and Metropolis-within-Gibbs sampling. For details, see the the later reference on MCMC samplers. This is how we would call this sampler with default settings
= 10000
iter = list(iterations = iter, message = FALSE)
settings <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings) out
All samplers can be plotted and summarized via the console with the standard print, and summary commands
print(out)
## [1] "mcmcSampler - you can use the following methods to summarize, plot or reduce this class:"
## [1] getSample plot print summary
## see '?methods' for accessing help and source code
summary(out)
## # # # # # # # # # # # # # # # # # # # # # # # # #
## ## MCMC chain summary ##
## # # # # # # # # # # # # # # # # # # # # # # # # #
##
## # MCMC sampler: Metropolis
## # Nr. Chains: 1
## # Iterations per chain: 10000
## # Rejection rate: 0.684
## # Effective sample size: 1005
## # Runtime: 2.883 sec.
##
## # Parameters
## MAP 2.5% median 97.5%
## par 1 0 -1.991 0.016 2.034
## par 2 0 -1.931 -0.077 1.802
## par 3 0 -1.970 -0.007 1.850
##
## ## DIC: 11.153
## ## Convergence
## Gelman Rubin multivariate psrf: Only one chain; convergence cannot be determined!
##
## ## Correlations
## par 1 par 2 par 3
## par 1 1.000 -0.019 -0.029
## par 2 -0.019 1.000 0.011
## par 3 -0.029 0.011 1.000
and plottted with several plot functions. The marginalPlot can either be plotted as histograms with density overlay, which is also the default, or as a violin plot (see “?marginalPlot”).
plot(out) # plot internally calls tracePlot(out)
correlationPlot(out)
marginalPlot(out, prior = TRUE)
Other Functions that can be applied to all samplers include model selection scores such as the DIC and the marginal Likelihood (for the calculation of the Bayes factor, see later section for more details), and the Maximum Aposteriori Value (MAP). For the marginal likelihood calculation it is possible to chose from a set of methods (see “?marginalLikelihood”).
marginalLikelihood(out)
## $ln.ML
## [1] -9.07554
##
## $ln.lik.star
## [1] -2.756816
##
## $ln.pi.star
## [1] -8.987197
##
## $ln.pi.hat
## [1] -2.668472
##
## $method
## [1] "Chib"
DIC(out)
## $DIC
## [1] 11.15333
##
## $IC
## [1] 13.96971
##
## $pD
## [1] 2.816389
##
## $pV
## [1] 2.8407
##
## $Dbar
## [1] 8.336937
##
## $Dhat
## [1] 5.520548
MAP(out)
## $parametersMAP
## par 1 par 2 par 3
## 4.650153e-04 -1.707961e-04 -6.534865e-05
##
## $valuesMAP
## Lposterior Llikelihood Lprior
## -11.744013 -2.756816 -8.987197
You can extract (a part of) the sampled parameter values by
getSample(out, start = 100, end = NULL, thin = 5, whichParameters = 1:2)
For all samplers, you can conveniently perform multiple runs via the nrChains argument
= 1000
iter = list(iterations = iter, nrChains = 3, message = FALSE)
settings <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings) out
## Parameter(s) currentChain not used in Metropolis
##
## Parameter(s) currentChain not used in Metropolis
##
## Parameter(s) currentChain not used in Metropolis
The result is an object of mcmcSamplerList, which should allow to do everything one can do with an mcmcSampler object (with slightly different output sometimes).
print(out)
## [1] "mcmcSamplerList - you can use the following methods to summarize, plot or reduce this class:"
## [1] getSample plot print summary
## see '?methods' for accessing help and source code
summary(out)
## # # # # # # # # # # # # # # # # # # # # # # # # #
## ## MCMC chain summary ##
## # # # # # # # # # # # # # # # # # # # # # # # # #
##
## # MCMC sampler: Metropolis
## # Nr. Chains: 3
## # Iterations per chain: 1000
## # Rejection rate: 0.688
## # Effective sample size: 308
## # Runtime: 1.034 sec.
##
## # Parameters
## psf MAP 2.5% median 97.5%
## par 1 1.011 0 -2.172 0.005 1.876
## par 2 1.023 0 -1.853 0.103 1.936
## par 3 1.013 0 -1.795 0.027 2.180
##
## ## DIC: 11.187
## ## Convergence
## Gelman Rubin multivariate psrf: 1.024
##
## ## Correlations
## par 1 par 2 par 3
## par 1 1.000 0.006 0.085
## par 2 0.006 1.000 -0.030
## par 3 0.085 -0.030 1.000
For example, in the plot you now see 3 chains.
plot(out)
There are a few additional functions that may only be available for lists, for example convergence checks
#getSample(out, coda = F)
gelmanDiagnostics(out, plot = T)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## par 1 1.01 1.01
## par 2 1.02 1.07
## par 3 1.01 1.04
##
## Multivariate psrf
##
## 1.02
The BT package provides a large class of different MCMC samplers, and it depends on the particular application which is most suitable.
In the absence of further information, we currently recommend the DEzs sampler. This is also the default in the runMCMC function.
The likelihood should be provided as a log density function.
= logDensity(x) ll
See options for parallelization below. We will use a simple 3-d multivariate normal density for this demonstration.
<- generateTestDensityMultiNormal(sigma = "no correlation")
ll = createBayesianSetup(likelihood = ll, lower = rep(-10, 3), upper = rep(10, 3)) bayesianSetup
Likelihoods are often costly to compute. If that is the case for you, you should think about parallelization possibilities. The ‘createBayesianSetup’ function has the input variable ‘parallel’, with the following options
Algorithms in the BayesianTools package can make use of parallel computing if this option is specified in the BayesianSetup. Note that currently, parallelization is used by the following algorithms: SMC, DEzs and DREAMzs sampler. It can also be used through the BayesianSetup with the functions of the sensitivity package.
Here some more details on the parallelization
The in-build parallelization is the easiest way to make use of parallel computing. In the “parallel” argument you can choose the number of cores used for parallelization. Alternatively for TRUE or “auto” all available cores except for one will be used. Now the proposals are evaluated in parallel. Technically, the in-build parallelization uses an R cluster to evaluate the posterior density function. The input for the parallel function is a matrix, where each column represents a parameter and each row a proposal. In this way, the proposals can be evaluated in parallel. For sampler, where only one proposal is evaluated at a time (namely the Metropolis based algorithms as well as DE/DREAM without the zs extension), no parallelization can be used.
The second option is to use an external parallelization. Here, a parallelization is attempted in the user defined likelihood function. To make use of external parallelization, the likelihood function needs to take a matrix of proposals and return a vector of likelihood values. In the proposal matrix each row represents one proposal, each column a parameter. Further, you need to specify the “external” parallelization in the “parallel” argument. In simplified terms the use of external parallelization uses the following steps:
## Definition of likelihood function
<- function(matrix){
likelihood # Calculate likelihood in parallel
# Return vector of likelihood valus
}
## Create Bayesian Setup
<- createBayesianSetup(likelihood, parallel = "external", ...)
BS
## Run MCMC
runMCMC(BS, sampler = "SMC", ...)
If you want to run your calculations on a cluster there are several ways to achieve it.
In the first case you want to parallize n internal (not overall chains) on n cores. The argument “parallel = T” in “createBayesianSetup” allows only at most parallelization on 3 cores for the SMC, DEzs and DreamsSamplers. But by setting “parallel = n” to n cores in the “createBayesianSetup”, the internal chains of DEzs and DREAMzs will be parallelized on n cores. This works only for the DEzs and DREAMzs samplers.
## n = Number of cores
=2
n<- c(1:10)
x <- function(param) return(sum(dnorm(x, mean = param, log = T)))
likelihood <- createBayesianSetup(likelihood, parallel = n, lower = -5, upper = 5)
bayesianSetup
## give runMCMC a matrix with n rows of proposals as startValues or sample n times from the previous created sampler
<- runMCMC(bayesianSetup, settings = list(iterations = 1000)) out
In the second case you want to parallize n internal chains on n cores with a external parallilzed likelihood function. Unlike the previous case, that way DEzs, DREAMzs, and SMC samplers can be parallelized.
### Create cluster with n cores
<- parallel::makeCluster(n)
cl
## Definition of the likelihood
<- function(X) sum(dnorm(c(1:10), mean = X, log = T))
likelihood
## Definition of the likelihood which will be calculated in parallel. Instead of the parApply function, we could also define a costly parallelized likelihood
<- function(param) parallel::parApply(cl = cl, X = param, MARGIN = 1, FUN = likelihood)
pLikelihood
## export functions, dlls, libraries
# parallel::clusterEvalQ(cl, library(BayesianTools))
::clusterExport(cl, varlist = list(likelihood))
parallel
## create BayesianSetup
<- createBayesianSetup(pLikelihood, lower = -10, upper = 10, parallel = 'external')
bayesianSetup
## For this case we want to parallelize the internal chains, therefore we create a n row matrix with startValues, if you parallelize a model in the likelihood, do not set a n*row Matrix for startValue
= list(iterations = 100, nrChains = 1, startValue = bayesianSetup$prior$sampler(n))
settings
## runMCMC
<- runMCMC(bayesianSetup, settings, sampler = "DEzs") out
In a another case your likelihood requires a parallized model. Start your cluster and export your model, the required libraries, and dlls. Now you can start your calculations with the argument “parallel = external” in createBayesianSetup.
### Create cluster with n cores
<- parallel::makeCluster(n)
cl
## export your model
# parallel::clusterExport(cl, varlist = list(complexModel))
## Definition of the likelihood
<- function(param) {
likelihood # ll <- complexModel(param)
# return(ll)
}
## create BayesianSetup and settings
<- createBayesianSetup(likelihood, lower = -10, upper = 10, parallel = 'external')
bayesianSetup = list(iterations = 100, nrChains = 1)
settings
## runMCMC
<- runMCMC(bayesianSetup, settings) out
In the last case you can parallize over whole chain calculations. However, here the likelihood itself will not be parallelized. Each chain will be run on one core and the likelihood will be calculated on that core.
### Definition of likelihood function
<- c(1:10)
x <- function(param) return(sum(dnorm(x, mean = param, log = T)))
likelihood
## Create BayesianSetup and settings
<- createBayesianSetup(likelihood, lower = -10, upper = 10, parallel = F)
bayesianSetup = list(iterations = 100000)
settings
## Start cluster with n cores for n chains and export BayesianTools library
<- parallel::makeCluster(n)
cl ::clusterEvalQ(cl, library(BayesianTools))
parallel
## calculate parallel n chains, for each chain the likelihood will be calculated on one core
<- parallel::parLapply(cl, 1:n, fun = function(X, bayesianSetup, settings) runMCMC(bayesianSetup, settings, sampler = "DEzs"), bayesianSetup, settings)
out
## Combine the chains
<- createMcmcSamplerList(out) out
** Remark: even though parallelization can significantly reduce the computation time, it is not always useful because of the so-called communication overhead (computational time for distributing and retrieving infos from the parallel cores). For models with low computational cost, this procedure can take more time than the actual evaluation of the likelihood. If in doubt, make a small comparison of the runtime before starting your large sampling. **
The prior in the BayesianSetup consists of four parts
These information can be passed by first creating an a extra object, via createPrior, or through the the createBayesianSetup function.
You have 5 options to create a prior
If creating a user-defined prior, the following information can/should be provided to createPrior:
The following example from the help shows how this works
# Create a BayesianSetup
<- generateTestDensityMultiNormal(sigma = "no correlation")
ll = createBayesianSetup(likelihood = ll, lower = rep(-10, 3), upper = rep(10, 3))
bayesianSetup
= list(iterations = 2500, message = FALSE)
settings <- runMCMC(bayesianSetup = bayesianSetup, settings = settings)
out
= createPriorDensity(out, method = "multivariate", eps = 1e-10, lower = rep(-10, 3), upper = rep(10, 3), best = NULL)
newPrior <- createBayesianSetup(likelihood = ll, prior = newPrior)
bayesianSetup
= list(iterations = 1000, message = FALSE)
settings <- runMCMC(bayesianSetup = bayesianSetup, settings = settings) out
The runMCMC function is the central function for starting MCMC algorithms in the BayesianTools package. It requires a bayesianSetup, a choice of sampler (standard is DEzs), and optionally changes to the standard settings of the chosen sampler.
runMCMC(bayesianSetup, sampler = “DEzs”, settings = NULL)
One optional argument that you can always use is nrChains - the default is 1. If you choose more, the runMCMC will perform several runs.
<- generateTestDensityMultiNormal(sigma = "no correlation")
ll
= createBayesianSetup(likelihood = ll, lower = rep(-10, 3), upper = rep(10, 3))
bayesianSetup
= list(iterations = 10000, nrChains= 3, message = FALSE)
settings <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
out
plot(out)
marginalPlot(out, prior = T)
correlationPlot(out)
gelmanDiagnostics(out, plot=T)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## par 1 1 1.00
## par 2 1 1.00
## par 3 1 1.01
##
## Multivariate psrf
##
## 1
# option to restart the sampler
= list(iterations = 1000, nrChains= 1, message = FALSE)
settings <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
out
<- runMCMC(bayesianSetup = out)
out2
<- runMCMC(bayesianSetup = out2)
out3
#plot(out)
#plot(out3)
# create new prior from posterior sample
<- createPriorDensity(out2) newPriorFromPosterior
For convenience we define a number of iterations
= 10000 iter
The BayesianTools package is able to run a large number of Metropolis-Hastings (MH) based algorithms All of these samplers can be accessed by the “Metropolis” sampler in the runMCMC function by specifying the sampler’s settings.
The following code gives an overview about the default settings of the MH sampler.
applySettingsDefault(sampler = "Metropolis")
## $sampler
## [1] "Metropolis"
##
## $startValue
## NULL
##
## $iterations
## [1] 10000
##
## $burnin
## [1] 0
##
## $thin
## [1] 1
##
## $consoleUpdates
## [1] 100
##
## $parallel
## NULL
##
## $message
## [1] TRUE
##
## $optimize
## [1] TRUE
##
## $proposalGenerator
## NULL
##
## $adapt
## [1] FALSE
##
## $adaptationInterval
## [1] 500
##
## $adaptationNotBefore
## [1] 3000
##
## $DRlevels
## [1] 1
##
## $proposalScaling
## NULL
##
## $adaptationDepth
## NULL
##
## $temperingFunction
## NULL
##
## $gibbsProbabilities
## NULL
##
## $nrChains
## [1] 1
##
## $runtime
## [1] 0
##
## $sessionInfo
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] BayesianTools_0.1.8
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.10 highr_0.9 bslib_0.4.0
## [4] compiler_4.2.1 nloptr_2.0.3 jquerylib_0.1.4
## [7] tools_4.2.1 boot_1.3-28 digest_0.6.31
## [10] lme4_1.1-30 jsonlite_1.8.0 evaluate_0.15
## [13] nlme_3.1-157 lattice_0.20-45 rlang_1.0.6
## [16] Matrix_1.4-1 cli_3.4.1 rstudioapi_0.13
## [19] yaml_2.3.5 mvtnorm_1.1-3 xfun_0.31
## [22] fastmap_1.1.0 coda_0.19-4 DHARMa_0.4.6
## [25] stringr_1.4.0 knitr_1.39 sass_0.4.2
## [28] IDPmisc_1.1.20 stats4_4.2.1 grid_4.2.1
## [31] R6_2.5.1 tmvtnorm_1.5 rmarkdown_2.14
## [34] minqa_1.2.4 magrittr_2.0.3 htmltools_0.5.2
## [37] MASS_7.3-57 splines_4.2.1 numDeriv_2016.8-1.1
## [40] Brobdingnag_1.2-7 bridgesampling_1.1-2 sandwich_3.0-2
## [43] stringi_1.7.6 gmm_1.7 cachem_1.0.6
## [46] zoo_1.8-10
The following examples show how the different settings can be used. As you will see different options can be activated singly or in combination.
The following settings will run the standard Metropolis Hastings MCMC.
Refernences: Hastings, W. K. (1970). Monte carlo sampling methods using markov chains and their applications. Biometrika 57 (1), 97-109.
Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller (1953). Equation of state calculations by fast computing machines. The journal of chemical physics 21 (6), 1087 - 1092.
<- list(iterations = iter, adapt = F, DRlevels = 1, gibbsProbabilities = NULL, temperingFunction = NULL, optimize = F, message = FALSE)
settings <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
out plot(out)
This sampler uses an optimization step prior to the sampling process. The optimization aims at improving the starting values and the covariance of the proposal distribution.
<- list(iterations = iter, adapt = F, DRlevels = 1, gibbsProbabilities = NULL, temperingFunction = NULL, optimize = T, message = FALSE)
settings <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
out plot(out)
In the adaptive Metropolis sampler (AM) the information already acquired in the sampling process is used to improve (or adapt) the proposal function. In the BayesianTools package the history of the chain is used to adapt the covariance of the propoasal distribution.
References: Haario, H., E. Saksman, and J. Tamminen (2001). An adaptive metropolis algorithm. Bernoulli , 223-242.
<- list(iterations = iter, adapt = T, DRlevels = 1, gibbsProbabilities = NULL, temperingFunction = NULL, optimize = T, message = FALSE)
settings <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
out plot(out)
Even though rejection is an essential step of a MCMC algorithm it can also mean that the proposal distribution is (locally) badly tuned to the target distribution. In a delayed rejection (DR) sampler a second (or third, etc.) proposal is made before rejection. This proposal is usually drawn from a different distribution, allowing for a greater flexibility of the sampler. In the BayesianTools package the number of delayed rejection steps as well as the scaling of the proposals can be determined. ** Note that the current version only supports two delayed rejection steps. **
References: Green, Peter J., and Antonietta Mira. “Delayed rejection in reversible jump Metropolis-Hastings.” Biometrika (2001): 1035-1053.
<- list(iterations = iter, adapt = F, DRlevels = 2, gibbsProbabilities = NULL, temperingFunction = NULL, optimize = T, message = FALSE)
settings <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
out plot(out)
The delayed rejection adaptive Metropolis (DRAM) sampler is merely a combination of the two previous sampler (DR and AM).
References: Haario, Heikki, et al. “DRAM: efficient adaptive MCMC.” Statistics and Computing 16.4 (2006): 339-354.
<- list(iterations = iter, adapt = T, DRlevels = 2, gibbsProbabilities = NULL, temperingFunction = NULL, optimize = T, message = FALSE)
settings <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
out plot(out)
To reduce the dimensions of the target function a Metropolis-within-Gibbs sampler can be run with the BayesianTools package. This means in each iteration only a subset of the parameter vector is updated. In the example below at most two (of the three) parameters are updated each step, and it is double as likely to vary one than varying two.
** Note that currently adaptive cannot be mixed with Gibbs updating! **
<- list(iterations = iter, adapt = T, DRlevels = 1, gibbsProbabilities = c(1,0.5,0), temperingFunction = NULL, optimize = T, message = FALSE)
settings <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
out plot(out)
Simulated tempering is closely related to simulated annealing (e.g. Bélisle, 1992) in optimization algorithms. The idea of tempering is to increase the acceptance rate during burn-in. This should result in a faster initial scanning of the target function. To include this a tempering function needs to be supplied by the user. The function describes how the acceptance rate is influenced during burn-in. In the example below an exponential decline approaching 1 (= no influece on the acceptance rate)is used.
References: Bélisle, C. J. (1992). Convergence theorems for a class of simulated annealing algorithms on rd. Journal of Applied Probability, 885–895.
C. J. Geyer (2011) Importance sampling, simulated tempering, and umbrella sampling, in the Handbook of Markov Chain Monte Carlo, S. P. Brooks, et al (eds), Chapman & Hall/CRC.
<- function(x) 5 * exp(-0.01*x) + 1
temperingFunction <- list(iterations = iter, adapt = F, DRlevels = 1, gibbsProbabilities = c(1,1,0), temperingFunction = temperingFunction, optimize = T, message = FALSE)
settings <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
out plot(out)
The BT package implements two versions of the differential evolution MCMC. In doubt, you should use the DEzs option.
The first is the normal DE MCMC, corresponding to Ter Braak, Cajo JF. “A Markov Chain Monte Carlo version of the genetic algorithm Differential Evolution: easy Bayesian computing for real parameter spaces.” Statistics and Computing 16.3 (2006): 239-249. In this sampler multiple chains are run in parallel (but not in the sense of parallel computing). The main difference to the Metropolis based algorithms is the creation of the proposal. Generally all samplers use the current positin of the chain and add a step in the parameter space to generate a new proposal. Whereas in the Metropolis based sampler this step is usually drawn from a multivariate normal distribution (yet every distribution is possible), the DE sampler uses the current position of two other chains to generate the step for each chain. For sucessful sampling at least 2*d chains, with d being the number of parameters, need to be run in parallel.
<- list(iterations = iter, message = FALSE)
settings <- runMCMC(bayesianSetup = bayesianSetup, sampler = "DE", settings = settings)
out plot(out)
The second is the Differential Evolution MCMC with snooker update and sampling from past states, corresponding to ter Braak, Cajo JF, and Jasper A. Vrugt. “Differential evolution Markov chain with snooker updater and fewer chains.” Statistics and Computing 18.4 (2008): 435-446. This extension covers two differences to the normal DE MCMC. First a snooker update is used based on a user defined probability. Second also past states of other chains are respected in the creation of the proposal. These extensions allow for fewer chains (i.e. 3 chains are usually enough for up to 200 parameters) and parallel computing as the current position of each chain is only dependent on the past states of the other chains.
<- list(iterations = iter, message = FALSE)
settings <- runMCMC(bayesianSetup = bayesianSetup, sampler = "DEzs", settings = settings)
out plot(out)
Also for the DREAM sampler, there are two versions included. First of all, the standard DREAM sampler, see Vrugt, Jasper A., et al. “Accelerating Markov chain Monte Carlo simulation by differential evolution with self-adaptive randomized subspace sampling.” International Journal of Nonlinear Sciences and Numerical Simulation 10.3 (2009): 273-290.
This sampler is largely build on the DE sampler with some significant differences: 1) More than two chains can be used to generate a proposal. 2) A randomized subspace sampling can be used to enhance the efficiency for high dimensional posteriors. Each dimension is updated with a crossover probalitity CR. To speed up the exploration of the posterior DREAM adapts the distribution of CR values during burn-in to favor large jumps over small ones. 3) Outlier chains can be removed during burn-in.
<- list(iterations = iter, message = FALSE)
settings <- runMCMC(bayesianSetup = bayesianSetup, sampler = "DREAM", settings = settings)
out plot(out)
The second implementation uses the same extension as the DEzs sampler. Namely sampling from past states and a snooker update. Also here this extension allows for the use of fewer chains and parallel computing.
Again, in doubt you should prefer “DREAMzs”.
<- list(iterations = iter, message = FALSE)
settings <- runMCMC(bayesianSetup = bayesianSetup, sampler = "DREAMzs", settings = settings)
out plot(out)
The T-walk is a MCMC algorithm developed by Christen, J. Andrés, and Colin Fox. “A general purpose sampling algorithm for continuous distributions (the t-walk).” Bayesian Analysis 5.2 (2010): 263-281. In the sampler two independent points are used to explore the posterior space. Based on probabilities four different moves are used to generate proposals for the two points. As for the DE sampler this procedure requires no tuning of the proposal distribution for efficient sampling in complex posterior distributions.
= list(iterations = iter, message = FALSE)
settings
<- runMCMC(bayesianSetup = bayesianSetup, sampler = "Twalk", settings = settings) out
All MCMCs should be checked for convergence. We recommend the standard procedure of Gelmal-Rubin. This procedure requires running several MCMCs (we recommend 3). This can be achieved either directly in the runMCMC (nrChains = 3), or, for runtime reasons, by combining the results of three independent runMCMC evaluations with nrChains = 1.
<- list(iterations = iter, nrChains = 3, message = FALSE)
settings <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings) out
## Parameter(s) currentChain not used in Metropolis
##
## Parameter(s) currentChain not used in Metropolis
##
## Parameter(s) currentChain not used in Metropolis
plot(out)
#chain = getSample(out, coda = T)
gelmanDiagnostics(out, plot = F)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## par 1 1 1.00
## par 2 1 1.01
## par 3 1 1.00
##
## Multivariate psrf
##
## 1
MCMCs sample the posterior space by creating a chain in parameter space. While this allows “learning” from past steps, it does not permit the parallel execution of a large number of posterior values at the same time.
An alternative to MCMCs are particle filters, aka Sequential Monte-Carlo (SMC) algorithms. See Hartig, F.; Calabrese, J. M.; Reineking, B.; Wiegand, T. & Huth, A. Statistical inference for stochastic simulation models - theory and application Ecol. Lett., 2011, 14, 816-827
The easiest option is to simply sample a large number of parameters and accept them according to their posterior value. This option can be emulated with the implemented SMC, setting iterations to 1.
<- list(initialParticles = iter, iterations= 1)
settings <- runMCMC(bayesianSetup = bayesianSetup, sampler = "SMC", settings = settings)
out plot(out)
The more sophisticated option is using the implemented SMC, which is basically a particle filter that applies several filter steps.
<- list(initialParticles = iter, iterations= 10)
settings <- runMCMC(bayesianSetup = bayesianSetup, sampler = "SMC", settings = settings)
out plot(out)
Note that the use of a number for initialParticles requires that the bayesianSetup includes the possibility to sample from the prior.
There are a number of Bayesian model selection and model comparison methods. The BT implements three of the most common of them, the DIC, the WAIC, and the Bayes factor.
On the Bayes factor, see Kass, R. E. & Raftery, A. E. Bayes Factors J. Am. Stat. Assoc., Amer Statist Assn, 1995, 90, 773-795
An overview on DIC and WAIC is given in Gelman, A.; Hwang, J. & Vehtari, A. (2014) Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24, 997-1016-. On DIC, see also the original reference by Spiegelhalter, D. J.; Best, N. G.; Carlin, B. P. & van der Linde, A. (2002) Bayesian measures of model complexity and fit. J. Roy. Stat. Soc. B, 64, 583-639.
The Bayes factor relies on the calculation of marginal likelihoods, which is numerically not without problems. The BT package currently implements three methods
The recommended way is the method “Chib” (Chib and Jeliazkov, 2001). which is based on MCMC samples, but performs additional calculations. Despite being the current recommendation, note there are some numeric issues with this algorithm that may limit reliability for larger dimensions.
The harmonic mean approximation, is implemented only for comparison. Note that the method is numerically unrealiable and usually should not be used.
The third method is simply sampling from the prior. While in principle unbiased, it will only converge for a large number of samples, and is therefore numerically inefficient.
Data linear Regression with quadratic and linear effect
= 30
sampleSize <- (-(sampleSize-1)/2):((sampleSize-1)/2)
x <- 1 * x + 1*x^2 + rnorm(n=sampleSize,mean=0,sd=10)
y plot(x,y, main="Test Data")
Likelihoods for both
<- function(param){
likelihood1 = param[1] + param[2]*x + param[3] * x^2
pred = dnorm(y, mean = pred, sd = 1/(param[4]^2), log = T)
singlelikelihoods return(sum(singlelikelihoods))
}
<- function(param){
likelihood2 = param[1] + param[2]*x
pred = dnorm(y, mean = pred, sd = 1/(param[3]^2), log = T)
singlelikelihoods return(sum(singlelikelihoods))
}
Posterior definitions
<- createBayesianSetup(likelihood1, lower = c(-5,-5,-5,0.01), upper = c(5,5,5,30))
setUp1
<- createBayesianSetup(likelihood2, lower = c(-5,-5,0.01), upper = c(5,5,30)) setUp2
MCMC and marginal likelihood calculation
= list(iterations = 15000, message = FALSE)
settings <- runMCMC(bayesianSetup = setUp1, sampler = "Metropolis", settings = settings)
out1 #tracePlot(out1, start = 5000)
= marginalLikelihood(out1)
M1
M1
= list(iterations = 15000, message = FALSE)
settings <- runMCMC(bayesianSetup = setUp2, sampler = "Metropolis", settings = settings)
out2 #tracePlot(out2, start = 5000)
= marginalLikelihood(out2)
M2 M2
Bayes factor (need to reverse the log)
exp(M1$ln.ML - M2$ln.ML)
## [1] 1.726888e+15
BF > 1 means the evidence is in favor of M1. See Kass, R. E. & Raftery, A. E. (1995) Bayes Factors. J. Am. Stat. Assoc., Amer Statist Assn, 90, 773-795.
Assuming equal prior weights on all models, we can calculate the posterior weight of M1 as
exp(M1$ln.ML) / ( exp(M1$ln.ML) + exp(M2$ln.ML))
## [1] 1
If models have different model priors, multiply with the prior probabilities of each model.
The Deviance information criterion is a commonly applied method to summarize the fit of an MCMC chain. It can be obtained via
DIC(out1)$DIC
## [1] 292.0454
DIC(out2)$DIC
## [1] 365.3305
The Watanabe–Akaike information criterion is another criterion for model comparison. To be able to calculate the WAIC, the model must implement a log-likelihood that density that allows to calculate the log-likelihood point-wise (the likelihood functions requires a “sum” argument that determines whether the summed log-likelihood should be returned). It can be obtained via
# This will not work, since likelihood1 has no sum argument
# WAIC(out1)
# likelihood with sum argument
<- function(param, sum = TRUE){
likelihood3 <- param[1] + param[2]*x + param[3] * x^2
pred <- dnorm(y, mean = pred, sd = 1/(param[4]^2), log = T)
singlelikelihoods return(if (sum == TRUE) sum(singlelikelihoods) else singlelikelihoods)
}<- createBayesianSetup(likelihood3, lower = c(-5,-5,-5,0.01), upper = c(5,5,5,30))
setUp3 <- runMCMC(bayesianSetup = setUp3, sampler = "Metropolis", settings = settings)
out3
WAIC(out3)
## $WAIC1
## [1] 230.808
##
## $WAIC2
## [1] 231.8506
##
## $lppd
## [1] -112.2759
##
## $pWAIC1
## [1] 3.128133
##
## $pWAIC2
## [1] 3.649388