rsatscan

Ken Kleinman

2024-06-24

SaTScan is a powerful stand-alone free software program that runs spatio-temporal scan statistics. It is carefully optimized and contains many tricks to reduce the computational burden, which is doubly intensive. The scanning itself is computer intensive, particularly in spatio-temporal settings, and the Monte Carlo hypothesis testing involves resampling and redoing the scanning for hundreds or thousands of random data sets.

There are two ways to run SaTScan. The easiest way to choose between the many data, analysis, parameter and output options is to use the graphical user interface (GUI). The GUI allows complete control, but precludes automated or repeated operation of multiple analyses over time. While more cumbersome, the SaTScan parameter file makes it possible to run SaTScan in batch mode, bypassing the GUI. It is not trivial to integrate it with various data sets and other analyses though. The rsatscan package contains a set of functions and defines a class and methods to make it easy to work with SaTScan from R. This allows easy automation and integration with data sets and analyses.

Before running rsatscan, it is recommended to first explore the SaTScan GUI to familiarize oneself with the various data, analysis, parameter options. Also, it is often a good idea to create a template parameter file using the GUI, and then list those parameter settings using the ss.options()in rsatscan.

The rsatscan functions can be grouped into three sets: SaTScan parameter functions that set parameters for SaTScan or write them in a file to the OS; write functions that write R data frames to the OS in SaTScan-readable formats; and the satscan() function, which calls out into the OS, runs SaTScan, and returns a satscan class object. Successful use of the package requires a fairly precise understanding of the SaTScan parameter file, for which users are referred to the SaTScan User Guide.

library("rsatscan")
## rsatscan only does anything useful if you have SaTScan
## See http://www.satscan.org/ for free access

Basic usage of the package will:

  1. use the ss.options() function to set SaTScan parameters; these are saved in R
  2. use the write.ss.prm() function to write the SaTScan parameter file
  3. use the satscan() function to run SaTScan
  4. use summary() on the satscan object and proceed to analyze the results from SaTScan in R.

Space-Time Permutation example: NYC fever data

The New York City fever data, which are distributed with SaTScan, are also included with the package.

head(NYCfevercas)
##     zip cases       date
## 1 11229     1 2001/11/22
## 2 11208     1 2001/11/13
## 3 11208     1 2001/11/24
## 4 11212     1  2001/11/3
## 5 11374     1 2001/11/10
## 6 10452     1 2001/11/20
head(NYCfevergeo)
##     zip      lat      long
## 1 10001 40.75037 -73.99674
## 2 10002 40.72199 -73.99000
## 3 10003 40.73097 -73.98841
## 4 10004 40.68834 -74.02002
## 5 10005 40.70550 -74.00816
## 6 10006 40.70754 -74.01292

For good style, an analysis would begin by resetting the parameter file:

invisible(ss.options(reset=TRUE))

Parameters for a specific SaTScan version (>= 9.2) can be set using the ‘version’ argument:

invisible(ss.options(reset=TRUE, version="10.2"))

If a version is not specified, the parameters will be set to the latest version available in ‘rsatscan’. More information on ss.options is available in the documentation page, accessible using ?ss.options.

Then, one would change parameters as desired. This can be done in as many or few steps as you like; the previous state of the parameter set is retained, as in par() or options(). Here, the parameters used in the example from the SaTScan manual are replicated:

ss.options(list(CaseFile="NYCfever.cas", PrecisionCaseTimes=3))
ss.options(c("StartDate=2001/11/1","EndDate=2001/11/24"))
ss.options(list(CoordinatesFile="NYCfever.geo", AnalysisType=4, ModelType=2, TimeAggregationUnits=3))
ss.options(list(UseDistanceFromCenterOption="y", MaxSpatialSizeInDistanceFromCenter=3, NonCompactnessPenalty=0))
ss.options(list(MaxTemporalSizeInterpretation=1, MaxTemporalSize=7))
ss.options(list(ProspectiveStartDate="2001/11/24", ReportGiniClusters="n", LogRunToHistoryFile="n"))
ss.options(list(SaveSimLLRsDBase="y"))

Note that the second call to ss.options() uses the character vector format, while the others use the list format; either works.

It might be reasonable at this point to check what the parameter file looks like:

head(ss.options(),3)
## [1] "[Input]"               ";case data filename"   "CaseFile=NYCfever.cas"

Then, we write the parameter file, the case file, and the geometry file to some writeable location in the OS, using the functions in package. These ensure that SaTScan-readable formats are used.

td = tempdir()
write.ss.prm(td, "NYCfever")
write.cas(NYCfevercas, td, "NYCfever")
write.geo(NYCfevergeo, td, "NYCfever")

The write.??? functions append the appropriate file extensions to the files they save into the OS.

Then we’re ready to run SaTScan. The location and name of the SaTScan executable may well differ on you r disk, particularly if you do not use Windows. In a later release of the package, it may be possible to detect the location the executable

# This step omitted in compliance with CRAN policies
# Please install SaTScan and run the vignette with this and following code uncommented
# SaTScan can be downloaded from www.satscan.org, free of charge
# you will also find there fully compiled versions of this vignette with results

## NYCfever = satscan(td, "NYCfever", sslocation="C:/Program Files/SaTScan", ssbatchfilename="SaTScanBatch64")

The rsatscan package provides a summary method for satscan objects.

## summary(NYCfever)

The satscan object has a slot for each possible output file that SaTScan creates, and contains whatever output files your call happened to generate.

## summary.default(NYCfever)

If SaTScan generated a shapefile, satscan() reads it, by way of the sf function st_read(), if it’s available, into a class defined in the sf package. You can use the plot methods defined in the sf package to plot it, or use one of the many packages that builds on the sf package for further processing.

## plot(sf::st_geometry(NYCfever$shapeclust))

It might be interesting to examine the scan statistics from the Monte Carlo steps.

## hist(unlist(NYCfever$llr), main="Monte Carlo")

# Let's draw a line for the clusters in the observed data
## abline(v=NYCfever$col[,c("TEST_STAT")], col = "red")

This shows why none of the observed clusters had small p=values.

Poisson spatio-temporal example: New Mexico brain tumor data

This is another data set included with SaTScan. It differs from the NYC fever example in that denominators are available; these are provided in a population file. The analysis uses the Poisson model rather than the Spatio-temporal permutation.

write.cas(NMcas, td,"NM")
write.geo(NMgeo, td,"NM")
write.pop(NMpop, td,"NM")

Again, replicating the examples from the SaTScan user guide, we set up and then write the parameter file, then run SaTScan.

invisible(ss.options(reset=TRUE))
ss.options(list(CaseFile="NM.cas",StartDate="1973/1/1",EndDate="1991/12/31", 
                PopulationFile="NM.pop",
                CoordinatesFile="NM.geo", CoordinatesType=0, AnalysisType=3))
ss.options(c("NonCompactnessPenalty=0", "ReportGiniClusters=n", "LogRunToHistoryFile=n"))

write.ss.prm(td,"testnm")
## testnm = satscan(td,"testnm", sslocation="C:/Program Files/SaTScan", ssbatchfilename="SaTScanBatch64")

Note that the parameter file need not have the same name as the case and other input files, which also need not share a name, though it may be helpful in keeping things organized.

## summary(testnm)

One of the elements of a satscan class object is the parameter set which was used to call SaTScan. This may be useful, later.

## head(testnm$prm,10)

Bernoulli purely spatial example: North Humberside leukemia and lymphoma

A third data set included with SaTScan is also included with the package. This one has cases and controls, and uses the Bernoulli model. We replicate the parameters from the SaTScan manual again.

write.cas(NHumbersidecas, td, "NHumberside")
write.ctl(NHumbersidectl, td, "NHumberside")
write.geo(NHumbersidegeo, td, "NHumberside")

invisible(ss.options(reset=TRUE))
ss.options(list(CaseFile="NHumberside.cas", ControlFile="NHumberside.ctl"))
ss.options(list(PrecisionCaseTimes=0, StartDate="2001/11/1", EndDate="2001/11/24"))
ss.options(list(CoordinatesFile="NHumberside.geo", CoordinatesType=0, ModelType=1))
ss.options(list(TimeAggregationUnits = 3, NonCompactnessPenalty=0))
ss.options(list(ReportGiniClusters="n", LogRunToHistoryFile="n"))

write.ss.prm(td, "NHumberside")
## NHumberside = satscan(td, "NHumberside", sslocation="C:/Program Files/SaTScan", ssbatchfilename="SaTScanBatch64")

## summary(NHumberside)