As part of a reproducible work flow, caching of various function calls are a critical component. Down the road, it is likely that an entire work flow from raw data to publication, decision support, report writing, presentation building etc., could be built and be reproducible anywhere, on demand. The reproducible::Cache function is built to work with any R function. However, it becomes very powerful in a SpaDES context because we can build large, powerful applications that are transparent and tied to the raw data that may be many conceptual steps upstream in the workflow. To do this, we have built several customizations within the SpaDES package. Important to this is dealing correctly with the simList, which is an object that has slot that is an environment. But more important are the various tools that can be used at higher levels, i.e., not just for “standard” functions.

1 Caching as part of `SpaDES`

Some of the details of the simList-specific features of this Cache function include:

The function converts all elements that have an environment as part of their attributes into a format that has no unique environment attribute, using format if a function, and as.list in the case of the simList environment.
When used within SpaDES modules, Cache (capital C) does not require that the argument cachePath be specified. If called from inside a SpaDES module, Cache will use the cachePath argument from a call to cachePath(sim), taking the sim from the call stack. Similarly, if no cachePath argument is specified, then it will use getOption("spades.cachePath"), which will, by default, be a temporary location with no persistence between R sessions! To persist between sessions, use SpaDES::setPaths() every session.

In a SpaDES context, there are several levels of caching that can be used as part of a reproducible workflow. Each level can be used to a modeller’s advantage; and, all can be – and are often – used concurrently.

1.1 At the `spades` level

And entire call to spades can be cached. This will have the effect of eliminating any stochasticity in the model as the output will simply be the cached version of the simList. This is likely most useful in situations where reproducibility is more important than “new” stochasticity (e.g., building decision support systems, apps, final version of a manuscript).

library(terra)
library(reproducible)
library(SpaDES.core)

mySim <- simInit(
  times = list(start = 0.0, end = 3.0),
  params = list(
    .globals = list(stackName = "landscape", burnStats = "testStats"),
    randomLandscapes = list(.plotInitialTime = NA),
    fireSpread = list(.plotInitialTime = NA)
  ),
  modules = list("randomLandscapes", "fireSpread"),
  paths = list(modulePath = getSampleModules(tempdir()))
)

This functionality can be achieved within a spades call.

# compare caching ... run once to create cache
system.time({
  outSim <- spades(Copy(mySim), cache = TRUE, notOlderThan = Sys.time())
})

## May30 13:32:03 simInit Using setDTthreads(1). To change: 'options(spades.DTthreads = X)'.

## May30 13:32:03 chckpn:init total elpsd: 0.067 secs | 0 checkpoint init 0

## May30 13:32:03 save  :init total elpsd: 0.068 secs | 0 save init 0

## May30 13:32:03 prgrss:init total elpsd: 0.07 secs | 0 progress init 0

## May30 13:32:03 load  :init total elpsd: 0.072 secs | 0 load init 0

## May30 13:32:03 rndmLn:init total elpsd: 0.074 secs | 0 randomLandscapes init

## May30 13:32:03 rndmLn:init New objects created:

## May30 13:32:03 rndmLn:init        <char>
## May30 13:32:03 rndmLn:init 1:  landscape

## May30 13:32:03 frSprd:init total elpsd: 0.18 secs | 0 fireSpread init 1

## May30 13:32:03 frSprd:init fireSpread

## May30 13:32:03 frSprd:init New objects created:

## May30 13:32:03 frSprd:init        <char>
## May30 13:32:03 frSprd:init 1:  testStats

## May30 13:32:03 frSprd:burn total elpsd: 0.2 secs | 1 fireSpread burn 5

## May30 13:32:03 frSprd:stats total elpsd: 0.23 secs | 1 fireSpread stats 5

## May30 13:32:03 frSprd:stats fireSpread

## May30 13:32:03 frSprd:burn total elpsd: 0.23 secs | 2 fireSpread burn 5

## May30 13:32:03 frSprd:stats total elpsd: 0.26 secs | 2 fireSpread stats 5

## May30 13:32:03 frSprd:stats fireSpread

## May30 13:32:03 frSprd:burn total elpsd: 0.27 secs | 3 fireSpread burn 5

## May30 13:32:03 frSprd:stats total elpsd: 0.3 secs | 3 fireSpread stats 5

## May30 13:32:03 frSprd:stats fireSpread

## simList saved in
## SpaDES.core:::savedSimEnv()$.sim
## It will be deleted at next spades() call.

## Saving large object (fn: spades, cacheId: ac0a5289e5ad6e29) to
##   Cache: 92.8 Mb

##  Done!

## Saved! Cache file: ac0a5289e5ad6e29.rds; fn: spades

##    user  system elapsed 
##   3.853   0.108   3.967

Note that if there were any visualizations (here we turned them off with .plotInitialTime = NA above) they will happen the first time through, but not the cached times.

# faster 2nd time
system.time({
  outSimCached <- spades(Copy(mySim), cache = TRUE)
})

## Object to retrieve (fn: spades, ac0a5289e5ad6e29.rds) ...

## Loaded! Cached result from previous spades call

## from  module

##    user  system elapsed 
##   1.240   0.000   1.243

all.equal(outSim, outSimCached)

##  [1] "Names: 3 string mismatches"                                       
##  [2] "Length mismatch: comparison on first 4 components"                
##  [3] "Component 2: Modes: numeric, NULL"                                
##  [4] "Component 2: Lengths: 4, 0"                                       
##  [5] "Component 2: target is numeric, current is NULL"                  
##  [6] "Component 3: target is NULL, current is PackedSpatRaster"         
##  [7] "Component 4: Modes: S4, numeric"                                  
##  [8] "Component 4: Lengths: 1, 3"                                       
##  [9] "Component 4: Attributes: < Modes: list, NULL >"                   
## [10] "Component 4: Attributes: < Lengths: 5, 0 >"                       
## [11] "Component 4: Attributes: < names for target but not for current >"
## [12] "Component 4: Attributes: < current is not list-like >"

1.2 Module-level caching

If the parameter .useCache in the module’s metadata is set to TRUE, then every event in the module will be cached. That means that every time that module is called from within a spades() call, Cache will be called. Only the objects inside the simList that correspond to the inputObjects or the outputObjects from the module metadata will be assessed for caching. For general use, module-level caching would be mostly useful for modules that have no stochasticity, such as data-preparation modules, GIS modules etc.

In this example, we will use the cache on the randomLandscapes module. This means that each subsequent call to spades will result in identical outputs from the randomLandscapes module (only!). This would be useful when only one random landscape is needed simply for trying something out, or putting into production code (e.g., publication, decision support, etc.).

# Module-level
params(mySim)$randomLandscapes$.useCache <- TRUE
system.time({
  randomSim <- spades(Copy(mySim), .plotInitialTime = NA,
                      notOlderThan = Sys.time(), debug = TRUE)
})

## May30 13:32:08 simInit Using setDTthreads(1). To change: 'options(spades.DTthreads = X)'.

## May30 13:32:08 chckpn:init eventTime moduleName eventType eventPriority

## May30 13:32:08 chckpn:init 0         checkpoint init      0

## May30 13:32:08 save  :init 0         save       init      0

## May30 13:32:08 prgrss:init 0         progress   init      0

## May30 13:32:08 load  :init 0         load       init      0

## May30 13:32:08 rndmLn:init 0         randomLandscapes init      1

## May30 13:32:10 rndmLn:init Saving large object (fn: doEvent.randomLandscapes, cacheId:
## May30 13:32:10 rndmLn:init   7a1e1fac75a91549) to Cache: 92.6 Mb

##  Done!
##

## May30 13:32:12 rndmLn:init Saved! Cache file: 7a1e1fac75a91549.rds; fn: doEvent.randomLandscapes

## May30 13:32:12 rndmLn:init New objects created:

## May30 13:32:12 rndmLn:init        <char>
## May30 13:32:12 rndmLn:init 1:  landscape

## May30 13:32:12 frSprd:init 0         fireSpread       init      1

## May30 13:32:12 frSprd:init fireSpread

## May30 13:32:12 frSprd:init New objects created:

## May30 13:32:12 frSprd:init        <char>
## May30 13:32:12 frSprd:init 1:  testStats

## May30 13:32:12 frSprd:burn 1         fireSpread       burn      5

## May30 13:32:12 frSprd:stats 1         fireSpread       stats     5

## May30 13:32:12 frSprd:stats fireSpread

## May30 13:32:12 frSprd:burn 2         fireSpread       burn      5

## May30 13:32:12 frSprd:stats 2         fireSpread       stats     5

## May30 13:32:12 frSprd:stats fireSpread

## May30 13:32:12 frSprd:burn 3         fireSpread       burn      5

## May30 13:32:12 frSprd:stats 3         fireSpread       stats     5

## May30 13:32:12 frSprd:stats fireSpread

## simList saved in
## SpaDES.core:::savedSimEnv()$.sim
## It will be deleted at next spades() call.

##    user  system elapsed 
##   3.402   0.068   3.475

# faster the second time
system.time({
  randomSimCached <- spades(Copy(mySim), .plotInitialTime = NA, debug = TRUE)
})

## May30 13:32:12 simInit Using setDTthreads(1). To change: 'options(spades.DTthreads = X)'.

## May30 13:32:12 chckpn:init eventTime moduleName eventType eventPriority

## May30 13:32:12 chckpn:init 0         checkpoint init      0

## May30 13:32:12 save  :init 0         save       init      0

## May30 13:32:12 prgrss:init 0         progress   init      0

## May30 13:32:12 load  :init 0         load       init      0

## May30 13:32:12 rndmLn:init 0         randomLandscapes init      1

## May30 13:32:12 rndmLn:init Object to retrieve (fn: doEvent.randomLandscapes,
## May30 13:32:12 rndmLn:init   7a1e1fac75a91549.rds) ...

## May30 13:32:13 rndmLn:init Loaded! Cached result from previous doEvent.randomLandscapes call

## May30 13:32:13 rndmLn:init for init event in randomLandscapes module

## May30 13:32:13 rndmLn:init randomLandscapes

## May30 13:32:13 rndmLn:init New objects created:

## May30 13:32:13 rndmLn:init        <char>
## May30 13:32:13 rndmLn:init 1:  landscape

## May30 13:32:13 frSprd:init 0         fireSpread       init      1

## May30 13:32:13 frSprd:init fireSpread

## May30 13:32:13 frSprd:init New objects created:

## May30 13:32:13 frSprd:init        <char>
## May30 13:32:13 frSprd:init 1:  testStats

## May30 13:32:13 frSprd:burn 1         fireSpread       burn      5

## May30 13:32:13 frSprd:stats 1         fireSpread       stats     5

## May30 13:32:13 frSprd:stats fireSpread

## May30 13:32:13 frSprd:burn 2         fireSpread       burn      5

## May30 13:32:13 frSprd:stats 2         fireSpread       stats     5

## May30 13:32:13 frSprd:stats fireSpread

## May30 13:32:13 frSprd:burn 3         fireSpread       burn      5

## May30 13:32:13 frSprd:stats 3         fireSpread       stats     5

## May30 13:32:13 frSprd:stats fireSpread

## simList saved in
## SpaDES.core:::savedSimEnv()$.sim
## It will be deleted at next spades() call.

##    user  system elapsed 
##   1.155   0.008   1.166

Test that only layers produced in randomLandscapes are identical, not fireSpread.

layers <- list("DEM", "forestAge", "habitatQuality", "percentPine", "Fires")
same <- lapply(layers, function(l) {
  identical(randomSim$landscape[[l]], randomSimCached$landscape[[l]])
})
names(same) <- layers
print(same) # Fires is not same because all non-init events in fireSpread are not cached

## $DEM
## [1] TRUE
## 
## $forestAge
## [1] TRUE
## 
## $habitatQuality
## [1] TRUE
## 
## $percentPine
## [1] TRUE
## 
## $Fires
## [1] FALSE

1.3 Event-level caching

If the parameter .useCache in the module’s metadata is set to a character or character vector, then that or those event(s), identified by their name, will be cached. That means that every time the event is called from within a spades call, Cache will be called. Only the objects inside the simList that correspond to the inputObjects or the outputObjects as defined in the module metadata will be assessed for caching inputs or outputs, respectively. The fact that all and only the named inputObjects and outputObjects are cached and returned may be inefficient (i.e., it may cache more objects than are necessary) for individual events.

Similar to module-level caching, event-level caching would be mostly useful for events that have no stochasticity, such as data-preparation events, GIS events etc. Here, we don’t change the module-level caching for randomLandscapes, but we add to it a cache for only the “init” event for fireSpread.

params(mySim)$fireSpread$.useCache <- "init"
system.time({
  randomSim <- spades(Copy(mySim), .plotInitialTime = NA,
                      notOlderThan = Sys.time(), debug = TRUE)
})

## May30 13:32:14 simInit Using setDTthreads(1). To change: 'options(spades.DTthreads = X)'.

## May30 13:32:14 chckpn:init eventTime moduleName eventType eventPriority

## May30 13:32:14 chckpn:init 0         checkpoint init      0

## May30 13:32:14 save  :init 0         save       init      0

## May30 13:32:14 prgrss:init 0         progress   init      0

## May30 13:32:14 load  :init 0         load       init      0

## May30 13:32:14 rndmLn:init 0         randomLandscapes init      1

## May30 13:32:15 rndmLn:init Saving large object (fn: doEvent.randomLandscapes, cacheId:
## May30 13:32:15 rndmLn:init   7a1e1fac75a91549) to Cache: 92.6 Mb

##  Done!
##

## May30 13:32:17 rndmLn:init Saved! Cache file: 7a1e1fac75a91549.rds; fn: doEvent.randomLandscapes

## May30 13:32:17 rndmLn:init New objects created:

## May30 13:32:17 rndmLn:init        <char>
## May30 13:32:17 rndmLn:init 1:  landscape

## May30 13:32:17 frSprd:init 0         fireSpread       init      1

## May30 13:32:19 frSprd:init Saving large object (fn: doEvent.fireSpread, cacheId:
## May30 13:32:19 frSprd:init   970325f6b148bd8a) to Cache: 92.7 Mb

##  Done!
##

## May30 13:32:20 frSprd:init Saved! Cache file: 970325f6b148bd8a.rds; fn: doEvent.fireSpread

## May30 13:32:20 frSprd:init New objects created:

## May30 13:32:20 frSprd:init        <char>
## May30 13:32:20 frSprd:init 1:  testStats

## May30 13:32:20 frSprd:burn 1         fireSpread       burn      5

## May30 13:32:20 frSprd:stats 1         fireSpread       stats     5

## May30 13:32:20 frSprd:stats fireSpread

## May30 13:32:20 frSprd:burn 2         fireSpread       burn      5

## May30 13:32:20 frSprd:stats 2         fireSpread       stats     5

## May30 13:32:20 frSprd:stats fireSpread

## May30 13:32:20 frSprd:burn 3         fireSpread       burn      5

## May30 13:32:21 frSprd:stats 3         fireSpread       stats     5

## May30 13:32:21 frSprd:stats fireSpread

## simList saved in
## SpaDES.core:::savedSimEnv()$.sim
## It will be deleted at next spades() call.

##    user  system elapsed 
##   6.797   0.036   6.842

# faster the second time
system.time({
  randomSimCached <- spades(Copy(mySim), .plotInitialTime = NA, debug = TRUE)
})

## May30 13:32:21 simInit Using setDTthreads(1). To change: 'options(spades.DTthreads = X)'.

## May30 13:32:21 chckpn:init eventTime moduleName eventType eventPriority

## May30 13:32:21 chckpn:init 0         checkpoint init      0

## May30 13:32:21 save  :init 0         save       init      0

## May30 13:32:21 prgrss:init 0         progress   init      0

## May30 13:32:21 load  :init 0         load       init      0

## May30 13:32:21 rndmLn:init 0         randomLandscapes init      1

## May30 13:32:21 rndmLn:init Object to retrieve (fn: doEvent.randomLandscapes,
## May30 13:32:21 rndmLn:init   7a1e1fac75a91549.rds) ...

## May30 13:32:22 rndmLn:init Loaded! Cached result from previous doEvent.randomLandscapes call

## May30 13:32:22 rndmLn:init for init event in randomLandscapes module

## May30 13:32:22 rndmLn:init randomLandscapes

## May30 13:32:22 rndmLn:init New objects created:

## May30 13:32:22 rndmLn:init        <char>
## May30 13:32:22 rndmLn:init 1:  landscape

## May30 13:32:22 frSprd:init 0         fireSpread       init      1

## May30 13:32:22 frSprd:init Object to retrieve (fn: doEvent.fireSpread, 970325f6b148bd8a.rds) ...

## May30 13:32:23 frSprd:init Loaded! Cached result from previous doEvent.fireSpread call

## May30 13:32:23 frSprd:init for init event in fireSpread module

## May30 13:32:23 frSprd:init fireSpread

## May30 13:32:23 frSprd:init New objects created:

## May30 13:32:23 frSprd:init        <char>
## May30 13:32:23 frSprd:init 1:  testStats

## May30 13:32:23 frSprd:burn 1         fireSpread       burn      5

## May30 13:32:23 frSprd:stats 1         fireSpread       stats     5

## May30 13:32:23 frSprd:stats fireSpread

## May30 13:32:23 frSprd:burn 2         fireSpread       burn      5

## May30 13:32:23 frSprd:stats 2         fireSpread       stats     5

## May30 13:32:23 frSprd:stats fireSpread

## May30 13:32:23 frSprd:burn 3         fireSpread       burn      5

## May30 13:32:23 frSprd:stats 3         fireSpread       stats     5

## May30 13:32:23 frSprd:stats fireSpread

## simList saved in
## SpaDES.core:::savedSimEnv()$.sim
## It will be deleted at next spades() call.

##    user  system elapsed 
##   2.071   0.000   2.077

1.4 Function-level caching

Any function can be cached using: Cache(FUN = functionName, ...).

This will be a slight change to a function call, such as: projectRaster(raster, crs = crs(newRaster)) to Cache(projectRaster, raster, crs = crs(newRaster)).

ras <- terra::rast(terra::ext(0, 1e3, 0, 1e3), res = 1, vals = 1)
system.time({
  map <- Cache(SpaDES.tools::neutralLandscapeMap(ras),
               cachePath = cachePath(mySim),
               userTags = "neutralLandscapeMap",
               notOlderThan = Sys.time())
})

## Warning: In (SpaDES.tools::neutralLandscapeMap(ras))(): nlm_mpd changes the
## dimensions of the RasterLayer if even ncols/nrows are choosen.

## Saving large object (fn: SpaDES.tools::neutralLandscapeMap, cacheId:
##   94d035af43fc613d) to Cache: 16.9 Mb

##  Done!

## Saved! Cache file: 94d035af43fc613d.rds; fn:
##   SpaDES.tools::neutralLandscapeMap

##    user  system elapsed 
##   1.952   0.017   1.972

# faster the second time
system.time({
  mapCached <- Cache(SpaDES.tools::neutralLandscapeMap(ras),
                     cachePath = cachePath(mySim),
                     userTags = "neutralLandscapeMap")
})

## Object to retrieve (fn: SpaDES.tools::neutralLandscapeMap,
##   94d035af43fc613d.rds) ...

## Loaded! Cached result from previous
##   SpaDES.tools::neutralLandscapeMap call

##    user  system elapsed 
##   0.654   0.008   0.665

## NOTE: can't use all.equal on SpatRaster (they are pointers); use compareGeom()
all.equal(map[], mapCached[])

## [1] TRUE

1.5 Working with the Cache manually

Since the cache is simply a DBI database table, all DBI functions will work as is. In addition, there are several helpers in the reproducible package, including showCache, keepCache and clearCache, and the more advanced createCache, loadFromCache, rmFromCache, and saveToCache that may be useful. Also, one can access cached items manually (rather than simply rerunning the same Cache function again).

cacheDB <- showCache(mySim, userTags = "neutralLandscapeMap")

## Cache size:

##   Total (including Rasters): 4.2 Mb

##   Selected objects (not including Rasters): 4.2 Mb

## get the RasterLayer that was produced with neutralLandscapeMap()
map <- loadFromCache(cacheId = cacheDB$cacheId, cachePath = cachePath(mySim))

## Loaded! Cached result from previous  call

clearPlot()
Plot(map)

03 Caching `SpaDES` simulations

Eliot J. B. McIntire

May 30 2024

1 Caching as part of `SpaDES`

1.1 At the `spades` level

1.2 Module-level caching

1.3 Event-level caching

1.4 Function-level caching

1.5 Working with the Cache manually

2 Reproducible Workflow

2.1 Nested Caching

2.1.0.1 Cache the `spades` call

2.1.0.2 Cache a whole module

2.1.0.3 Cache individual functions

2.2 Data-to-decisions

03 Caching SpaDES simulations

Eliot J. B. McIntire

May 30 2024

1 Caching as part of SpaDES

1.1 At the spades level

1.2 Module-level caching

1.3 Event-level caching

1.4 Function-level caching

1.5 Working with the Cache manually

2 Reproducible Workflow

2.1 Nested Caching

2.1.0.1 Cache the spades call

2.1.0.2 Cache a whole module

2.1.0.3 Cache individual functions

2.2 Data-to-decisions

03 Caching `SpaDES` simulations

1 Caching as part of `SpaDES`

1.1 At the `spades` level

2.1.0.1 Cache the `spades` call