Abstract
This tutorial discusses how to interface models written in other programming languages with R, so that they can be fit with BayesianToolsA basic requirement to allow calibration in BT is that we need to be able to execute the model with a given set of parameters. Strictly speaking, BT will not see the model as such, but requires a likelihood function with interface likelihood(par), where par is a vector, but in this function, you will then probably run the model with parameters par, where par could stay a vector, or be transformed to another format, e.g. data.frame, matrix or list.
What happens now depends on how your model is programmed - I have listed the steps in order of convenience / speed. If your model has never been interfaced with R you will likely have to move down to the last option.
Usually, you will have to do nothing. Just make sure you can call your model a in
runMyModel(par)
Typically this function will directly return the model outputs, so step 2 can be skipped.
If you have your model prepared as a dll, or you can prepare it that way, you can use the \(\ref{https://stat.ethz.ch/R-manual/R-devel/library/base/html/dynload.html}{dyn.load()}\) function to link R to your model
dyn.load(model)
runMyModel(par){
= # model call here
out # process out
return(out)
}
Again, if you implement this, you will also typically return the output directly via the dll and not write to file, which means that step 2 can be skipped.
The tricky thing in this approach is that you have to code the interface to your dll, which technically means in most programming languages to set your variables as external or something similar, so that they can be accessed from the outside. How this works will depend on the programming language.
RCPP is a highly flexible environment to interface between R and C/C++. If your model is coded in C / C++, RCPP offers the most save and powerful way to connect with R (much more flexible than with command line or dll).
However, doing the interface may need some adjustments to the code, and there can be technical problems that are difficult to solve for beginners. I do not recommend to attempt interfacing an existing C/C++ model unless you have RCPP experience, or at least very food C/C++ experience.
Again, if you implement this, you will also typically return the output directly via the dll and not write to file, which means that step 2 can be skipped.
If your model is written in a compiled or interpreted language, and accepts parameters via std I/O, wrapping is usually nothing more than writing the system call in an R function. An example would be
runMyModel(par){
# Create here a string with what you would write to call the model from the command line
<- paste("model.exe", par[1], par[2])
systemCall
= system(systemCall, intern = TRUE) # intern indicates whether to capture the output of the command as an R character vector
out
# write here to convert out in the apprpriate R classes
}
Note: If you have problems with the system command, try system2. If the model returns the output via std.out, you can catch this and convert it and skip step 2. If your model writes to file, go to step 2.
Many models read parameters with a parameter file. In this case you want to do something like this
runMyModel(par, returnData = NULL){
writeParameters(par)
system("Model.exe")
if(! is.null(returnData)) return(readData(returnData)) # The readData function will be defined later
}
writeParameters(par){
# e.g.
# read template parameter fil
# replace strings in template file
# write parameter file
}
Depending on your problem, it can also make sense to define a setup function such as
<- function(parameterTemplate, site, localConditions){
setUpModel
# create the runModel, readData functions (see later) here
return(list(runModel, readData))
}
How you do the write parameter function depends on the file format you use for the parameters. In general, you probably want to create a template parameter file that you use as a base and from which you change parameters
You have to change your model code to achieve one of the former options. If the model is in C/C++, going directly to RCPP seems the best alternative.
If your model returns the output directly (which is highly preferable, ), you can skip this step.
For simple models, you might consider returning the model output directly with the runMyModel function. This is probably so for cases a) and b) above, i.e. model is already in R, or accepts parameters via command line.
More complicated models, however, produce a large number of outputs and you typically don’t need all of them. It is therefore more useful to make on or several separate readData or getDate function. The only two different cases I will consider here is
Model is a dll If the model is a dll file, the best thing would probably be to implement appropriate getData functions in the source code that can then be called from R. If your model is in C and in a dll, interfacing this via RCPP would probably be easier, because you can directly return R dataframes and other data structures.
Model writes file output If the model writes file output, write a getData function that reads in the model outputs and returns the data in the desired format, typically the same that you would use to represent your field data.
getData(type = X){
read.csv(xxx)
# do some transformation
# return data in desidered format
}
From R, you should now be able to do something like that
= c(1,2,3,4 ..)
par
runMyModel(par)
<- getData(type = DesiredType)
output
plot(output)
The last step is optional, but we recommend that you take it from the start, because there is really no downside to it. You work with R code in several files that you run by hand, or diretly put all code into an R package. Creating and managing R packages is very easy, and it’s easier to pass on your code because everything, including help, is in on package. To create an R package, follow the tutorial . Remember to create a good documentation using Roxygen.
For running sensitivity analyses or calibrations, runtime is often an issue. Before you parallelize, make sure your model is as fast as possible.
A possibility to speed up the run time of your model is to run it on multiple cores (CPU’s). To do so, you have two choices:
Which of the two makes more sense depends a lot on your problem. To parallelize the model itself will be interesting in particular for very large models, which could otherwise not be calibrated with MCMCs. However, this approach will typically require to write parallel C/C++ code and require advanced programming skills, which is the reason why we will not further discuss it here.
The usual advice in parallel computing is anyway to parallelize the outer loops first, to minimize communication overhead, which would suggest to start with parallelizing model evaluations. This is also much easier to program. Even within this, there are two levels of parallelization possible:
Currently, BT only supports parallelization within MCMCs / SMCs, but it easy to also implement between sampler parallelization by hand. Both approaches are describe below.
Within-sampler parallelization is particular useful for algorithms that can use a large number of cores in parallel, e.g. sensitivity analyses or SMC sampling. For the MCMCs, it depends on the settings and the algorithm how much parallelization they can make use of. In general, MCMCs are Markovian, as the name says, i.e. the set up a chain of sequential model evaluations, and those calls can therefore not be fully parallelized. However, a number of MCMCs in the BT package uses MCMC algorithms that can be partly parallelized, in particular the population MCMC algorithms DE/DEzs/DREAM/DREAMzs. For all these cases, BT will automatically use parallelization of the BT setup indicates that this is implemented.
How to do this? A first requirement to do so is to to have your model wrapped into an R function (see PREVIOUS SECTION). If that is the case, R offers a number of options to run functions in parallel. The easiest is to use the parallel package that comes with the R core. For other packages, see the internet and the CRAN task view on High Performance Computing
As an example, assume we have the following, very simple model:
<-function(x){
mymodel<-0.2*x+0.1^x
outputreturn(output)
}
To start a parallel computation, we will first need to create a cluster object. Here we will initiate a cluster with 2 CPU’s.
library(parallel)
<- makeCluster(2)
cl
<- function(parList){
runParallelparSapply(cl, parList, mymodel)
}
runParallel(c(1,2))
You could use this principle to build your own parallelized likelihood. However, something very similar to the previous loop is automatized in BayesianTools. You can directly create a parallel model evaluation function with the function generateParallelExecuter, or alternatively directly in the createBayesianSetup
library(BayesianTools)
<- generateParallelExecuter(mymodel) parModel
If your model is tread-safe, you should be able to run this out of the box. I therefore recommend using the hand-coded paraellelization only of the model is not thread-safe.
Additionally to the within-chain parallelization, you can also run several MCMCs in parallel, and combine them later to a single McmcSamplerList
library(BayesianTools)
<- generateTestDensityMultiNormal(sigma = "no correlation")
ll <- createBayesianSetup(likelihood = ll, lower = rep(-10, 3), upper = rep(10, 3))
bayesianSetup
= list(iterations = 200)
settings
# run the several MCMCs chains either in seperate R sessions, or via R parallel packages
<- runMCMC(bayesianSetup = bayesianSetup, sampler = "DEzs", settings = settings) out1
##
Running DEzs-MCMC, chain 1 iteration 201 of 201 . Current logp -16.84623 -12.12165 -13.12708 . Please wait!
## runMCMC terminated after 0.0760000000000005seconds
<- runMCMC(bayesianSetup = bayesianSetup, sampler = "DEzs", settings = settings) out2
##
Running DEzs-MCMC, chain 1 iteration 201 of 201 . Current logp -12.16394 -12.36114 -13.63612 . Please wait!
## runMCMC terminated after 0.0649999999999977seconds
<- createMcmcSamplerList(list(out1, out2))
res plot(res)
Thread safety quite generally means that you can execute multiple instances of your code on your hardware. There are various things that can limit Thread safety, for example