ern
The package ern
has two functions with which to estimate
the daily effective reproduction number, \(\mathcal{R}_t\), each for a different data
stream:
estimate_R_ww()
, which uses the concentration of a
pathogen in wastewater over time as the input signal;estimate_R_cl()
, which uses the count of clinical
reported cases over time as the input signal.In both cases, the general method is the same:
Step 2 is common to both wastewater and clinical methods; where they differ is in step 1. We give details of both steps below, followed by full demos for both wastewater and clinical input data.
In both the clinical and wastewater cases, step 1 will produce an ensemble of realizations for the inferred daily incidence. We use these timeseries of realizations, along with a family of generation interval distributions specified by the user, to compute \(\mathcal{R}_t\) in an ensemble of realizations.
To compute a single realization for the \(\mathcal{R}_t\) ensemble, we draw one
realization out of the ensemble of inferred daily incidence as well as
one generation interval distribution out of the user-specified family,
and feed both of these components into
EpiEstim::estimate_R()
. Once all realizations of \(\mathcal{R}_t\) have been computed, the
ensemble is summarized by day with a mean and confidence interval
bounds.
Note that the estimation of \(R_t\),
once the daily incidence has been inferred, is outsourced to the R
library EpiEstim
. Put simply, ern
is a wrapper
around EpiEstim
.
How we infer daily incidence from the input depends on the input data source.
We convert the pathogen concentration in wastewater over time to a daily disease incidence by performing a deconvolution with a fecal shedding distribution, which describes the distribution of virus shed in feces by an infected individual during their disease course.
If the input clinical reports are not daily, ern
assumes
that they are aggregated over the time between report dates and infer
the daily count of cases using a Markov Chain Monte-Carlo
algorithm implemented in the R library rjags
.
Then, we convert the daily count of clinical reports over time to the actual daily incidence of infections in the following way:
The function estimate_R_ww()
estimates \(R_t\) from pathogen concentration measured
in wastewater. It takes several inputs and parameters, described in the next two
sections.
estimate_R_ww()
requires the following inputs from the
user:
ww.conc
: pathogen concentration in wastewater over
time, as a data frame with columns date
(measurement date)
and value
(concentration value)
Distribution families for several quantities:
dist.fec
: fecal shedding ratedist.gi
: generation intervalHere, we estimate \(\mathcal{R}_t\) from a subset of wastewater data from the Iona Island wastewater treatment plant in Vancouver.
# Loading sample SARS-CoV-2 wastewater data
ww.conc = ern::ww.data
head(ww.conc)
#> # A tibble: 6 × 2
#> date value
#> <date> <dbl>
#> 1 2023-07-23 11.4
#> 2 2023-07-27 38.3
#> 3 2023-07-30 59.4
#> 4 2023-08-03 36.1
#> 5 2023-08-06 24.9
#> 6 2023-08-10 12.0
# Define SARS-CoV-2 fecal shedding and generation interval distributions
dist.fec = ern::def_dist(
dist = "gamma",
mean = 12.90215,
mean_sd = 1.136829,
shape = 1.759937,
shape_sd = 0.2665988,
max = 33
)
dist.gi = ern::def_dist(
dist = "gamma",
mean = 6.84,
mean_sd = 0.7486,
shape = 2.39,
shape_sd = 0.3573,
max = 15
)
We can visualize the assumed distributions with
plot_dist()
:
plot_dist(dist.fec) + labs(title = paste0("Mean fecal shedding distribution (", dist.fec$dist, ")"))
plot_dist(dist.gi) + labs(title = paste0("Mean generation interval distribution (", dist.gi$dist, ")"))
plot_dist()
returns a ggplot
object, and so
it can be further annotated with the usual ggplot2
tools
(like labs()
as above).
Note that the above dist.x
lists define
families of distributions (there is uncertainty specified in
the mean distribution parameters), while plot_dist()
only
plots the mean distribution in this family.
estimate_R_ww()
also takes a number of parameter sets
that give the user control over various components of the \(\mathcal{R}_t\) estimation:
scaling.factor
: a factor used to scale pathogen
concentration in wastewater to prevalence (number of infectious cases in
the population at a given point in time)
prm.smooth
: smoothing settings for the input
wastewater data. Pathogen concentration measurements are inherently
noisy, hence smoothing the input concentration data usually leads to
smoother \(R_t\) estimates.
prm.R
: settings for the \(\mathcal{R}_t\) calculation
All of these parameters have defaults, but they can also be adjusted
by the user. These settings are further described in the example below,
but you may also want to consult the documentation of
estimate_R_ww()
for more details.
# Initializing scaling factor
scaling.factor = 1
# Initializing smoothing parameters
prm.smooth = list(
align = 'center', # smoothing alignment
method = 'loess', # smoothing method
span = 0.30, # smoothing span (used for loess smoothing only)
floor = 5 # minimum smoothed concentration value (optional, loess smoothing only)
)
# Initialzing Rt settings
prm.R = list(
iter = 20, # number of iterations in Rt ensemble
CI = 0.95, # confidence interval
window = 10, # Time window for Rt calculations
config.EpiEstim = NULL # optional EpiEstim configuration for Rt calculations
)
Once the above inputs and parameters are defined, we estimate \(\mathcal{R}_t\) as follows:
r.estim = ern::estimate_R_ww(
ww.conc = ww.conc,
dist.fec = dist.fec,
dist.gi = dist.gi,
scaling.factor = scaling.factor,
prm.smooth = prm.smooth,
prm.R = prm.R,
silent = TRUE # suppress output messages
)
estimate_R_ww()
returns a list with four elements:
ww.conc
: the original input of pathogen
concentration in wastewater over time
ww.smooth
: the smoothed wastewater concentration
over time; includes columns:
t
: internal time indexobs
: smoothed value of the observationdate
inc
: the daily incidence inferred over time;
includes columns:
date
mean
: mean of the inferred daily incidencelwr
, upr
: lower and upper bounds of a 95%
confidence interval of the inferred daily incidenceR
: the estimated daily reproduction number over
time; includes columns:
date
mean
: mean \(\mathcal{R}_t\) valuelwr
, upr
: lower and upper bounds of a
confidence interval for each \(\mathcal{R}_t\) estimateThe output of estimate_R_ww()
can be visualized readily
using plot_diagnostic_ww()
, which generates a figure with
the following panels:
estimate_R_cl()
takes several inputs and parameters,
described in the next two sections.
estimate_R_cl()
requires the following inputs from the
user:
cl.data
: clinical disease reports over time, as a
data frame with columns date
(report date) and
value
(count of reports)
Distribution families for several quantities:
dist.repdelay
: reporting delaydist.repfrac
: reporting fractiondist.incub
: incubation perioddist.gi
: generation intervalHere, we estimate \(\mathcal{R}_t\) for a sample of weekly clinical COVID-19 reports in the province of Quebec:
dat <- (ern::cl.data
|> dplyr::filter(
pt == "qc",
dplyr::between(date, as.Date("2021-07-01"), as.Date("2021-09-01"))
))
# define reporting delay
dist.repdelay = ern::def_dist(
dist = 'gamma',
mean = 5,
mean_sd = 1,
sd = 1,
sd_sd = 0.1,
max = 10
)
# define reporting fraction
dist.repfrac = ern::def_dist(
dist = "unif",
min = 0.1,
max = 0.3
)
# define incubation period
dist.incub = ern::def_dist(
dist = "gamma",
mean = 3.49,
mean_sd = 0.1477,
shape = 8.5,
shape_sd = 1.8945,
max = 8
)
# define generation interval
dist.gi = ern::def_dist(
dist = "gamma",
mean = 6.84,
mean_sd = 0.7486,
shape = 2.39,
shape_sd = 0.3573,
max = 15
)
We can visualize the assumed distributions with
plot_dist()
:
plot_dist(dist.repdelay) + labs(title = paste0("Mean reporting delay distribution (", dist.repdelay$dist, ")"))
plot_dist(dist.incub) + labs(title = paste0("Mean incubation period distribution (", dist.incub$dist, ")"))
plot_dist(dist.gi) + labs(title = paste0("Mean generation interval distribution (", dist.gi$dist, ")"))
plot_dist()
returns a ggplot
object, and so
it can be further annotated with the usual ggplot2
tools
(like labs()
as above).
Note that the above dist.x
lists define
families of distributions (there is uncertainty specified in
the mean distribution parameters), while plot_dist()
only
plots the mean distribution in this family.
estimate_R_cl()
also takes a number of parameter sets
that give the user control over various components of the \(\mathcal{R}_t\) estimation:
prm.daily
: options for aggregate to daily report
inference (only required if input reports are not already
daily)
prm.daily.check
: options for checking aggregates of
inferred daily reports against input values and truncating the start of
the timeseries until aggregates are sufficiently close to the input
values (only required if input reports are not already daily)
prm.smooth
: smoothing settings for the daily
reports
prm.R
: settings for the \(\mathcal{R}_t\) calculation
All of these parameters have defaults, but they can also be adjusted
by the user. These settings are further described in the example below,
but you may also want to consult the documentation of
estimate_R_cl()
for more details.
# settings for daily report inference
prm.daily = list(
method = "renewal",
popsize = 1e7, # population size
# Here, low value for `burn` and `iter`
# to have a fast compilation of the vignette.
# For real-world applications, both `burn` and `iter`
# should be significantly increased (e.g., 10,000).
# Also, the number of chains should be at least 3
# (instead of 1 here) for real-world applications.
burn = 100,
iter = 200,
chains = 1,
prior_R0_shape = 2,
prior_R0_rate = 0.6,
prior_alpha_shape = 1,
prior_alpha_rate = 1
)
# settings for checks of daily inferred reports
prm.daily.check = list(
agg.reldiff.tol = 200
)
# smoothing settings for daily inferred reports
prm.smooth = list(
method = "rollmean",
window = 3,
align = 'center'
)
# Rt settings
prm.R = list(
iter = 10, # number of iterations in Rt ensemble
CI = 0.95, # 95% confidence interval
window = 7, # time window for each Rt estimate
config.EpiEstim = NULL
)
Once the above inputs and parameters are defined, we estimate \(\mathcal{R}_t\) as follows:
r.estim = estimate_R_cl(
cl.data = dat,
dist.repdelay = dist.repdelay,
dist.repfrac = dist.repfrac,
dist.incub = dist.incub,
dist.gi = dist.gi,
prm.daily = prm.daily,
prm.daily.check = prm.daily.check,
prm.smooth = prm.smooth,
prm.R = prm.R,
silent = TRUE # suppress output messages
)
estimate_R_cl()
returns a list with four elements:
cl.data
: the original input of clinical disease
reports over time
cl.daily
: reports as input for Rt calculation
(inferred daily counts if original inputs were aggregates, smoothed if
specified); includes columns:
id
: identifier for each realization of the daily report
inferencedate
: daily datevalue
: inferred daily report countt
: internal time indexinferred.agg
: inferred daily reports re-aggregated
on the reporting schedule as input in cl.data
; includes
columns:
date
: report dateobs
: original (aggregated) observationsmean.agg
: mean of the aggregated inferred daily
reportslwr.agg
, upr.agg
: lower and upper bounds
of a 95% confidence interval of the aggregated inferred daily
reportsR
: the estimated daily reproduction number over
time; includes columns:
date
mean
: mean \(\mathcal{R}_t\) valuelwr
, upr
: lower and upper bounds of a
confidence interval for each \(\mathcal{R}_t\) estimateuse
: logical flag used internally for the plotting
method demonstrated belowThe output of estimate_R_cl()
can be visualized readily
using plot_diagnostic_cl()
, which generates a figure with
the following panels:
burn
and
iter
.