Welcome to the SkipTrack Package!
SkipTrack is a Bayesian hierarchical model for self-reported menstrual cycle length data on mobile health apps. The model is an extension of the hierarchical model presented in Li et al. (2022) that focuses on predicting an individual’s next menstrual cycle start date while accounting for cycle length inaccuracies introduced by non-adherence in user self-tracked data.
Li et al. (2022) notes that apps designed to help users track their menstrual cycles “are subject to adherence artifacts that may obscure health-related conclusions: if a user forgets to track their period, their cycle length computations are inflated.” This is visualized in the image below in which the numbers represent days after the initial bleeding day is recorded in the app, \(\color{red}{\text{red}}\) days are bleeding days recorded by the user, and \(\color{blue}{\text{blue}}\) days are bleeding days not recorded by the user.
\[\overbrace{\underbrace{\color{red}{1, 2, 3, 4}, 5, \dots, 29}_\text{True Cycle, 29 Days}}^\text{Recorded Cycle, 29 Days}, \overbrace{\underbrace{\color{red}{30, 31, 32, 33}, 34, \dots, 61}_\text{True Cycle, 32 Days}, \underbrace{\color{blue}{62, 63, 64, 65}, 66, \dots, 90}_\text{True Cycle, 29 Days}}^\text{Recorded Cycle, 61 Days}\]
The SkipTrack model extends the model given by Li et al. (2022) by specifying parameters for each individuals for cycle length regularity, as well as their cycle length mean, and weakening assumptions made by Li et al. on the probability of failing to track a cycle.
In short, the modeling framework assumed by SkipTrack is as follows. The observed cycle lengths are represented with \(y_{ij}\) where \(1 \leq i \leq n\) represents an individual who has contributed \(n_i\) observations, with \(1 \leq j \leq n_i\). We assume that
\[ y_{ij} \sim \text{LogNormal}\big(\mu_i + \log(c_{ij}), \tau_i\big), \] where \(\mu_i\) is an individual level mean parameter, \(\tau_i\) is an individual level precision parameter, and \(c_{ij}\) is an integer-valued parameter representing the number of true cycles present in the observed cycle \(y_{ij}\). That is, if \(c_{ij} = 1\) then \(y_{ij}\) is a true cycle, if \(c_{ij} = 2\) then \(y_{ij}\) gives the length of two true cycles added together, and so on.
We then assume
\[ \mu_i \sim \text{Normal}(\mu, \rho) \mspace{100mu}\tau_i \sim \text{Gamma}(\theta, \phi) \]
where \(\rho\) is a precision parameter, and the Gamma distribution above is parameterized by mean (\(\theta\)) and rate \(\phi\).
This is a fully interpretable model that allows for the identification of skipping in cycle tracking, while allowing for different individual’s regularities, and accounting for uncertainty in the model. A paper discussing the full model details will be published soon.
The SkipTrack package provides functions for fitting the SkipTrack model, evaluating model run diagnostics, retrieving and visualizing model results, and simulating related data. We begin our tutorial by examining some simulated data.
First, we simulate data on 100 individuals from the SkipTrack model where each observed \(y_{ij}\) value has a 75% probability of being a true cycle, a 20% probability of being two true cycles recorded as one, and a 5% probability of being three true cycles recorded as one.
#Simulate data
dat <- skipTrack.simulate(n = 100, model = 'skipTrack', skipProb = c(.75, .2, .05))
names(dat)
#> [1] "Y" "cluster" "X" "Z" "Beta"
#> [6] "Gamma" "NumTrue" "Underlying"
The result of the simulation function is simply a named list with various components. The (currently) important components are
Y
: the \(y_{ij}\)
values, observed outcomescluster
: the \(i\)
values, individual markersNumTrue
: the \(c_{ij}\) values, number of true cycles in
an observed cycleUnderlying
: underlying parameters pertaining to the
specific model used for data simulationLooking at the histogram of dat$Y
, we can see a clear
mixture of at least two distributions, one centered around 30 days, and
another centered near 60 days (corresponding to the true cycles and
observed cycles containing two true cycles respectively), which is what
we expect based on our generation.
Fitting the SkipTrack model using this simulated data requires a call
to the function skipTrack.fit
. Note that because this is a
Bayesian model and is fit with an MCMC algorithm, it can take some time
with large datasets and a high number of MCMC reps and chains.
In this code we ask for 4 chains, each with 1000 iterations, run sequentially. Note that we recommend allowing the sampler to run longer than this (usually at least 5000 iterations per chain), but we use a short run here to save time.
If useParallel = TRUE
, the MCMC chains will be evaluated
in parallel, which helps with longer runs.
Once we have the model results we are able to examine model diagnostics, visualize results from the model, and view a model summary.
Multivariate, multichain MCMC diagnostics, including traceplots,
Gelman-Rubin diagnostics, and effective sample size, are all available
for various parameters from the model fit. These are supplied using the
genMCMCDiag
package, see that packages’ documentation for
details.
Here we show the output of the diagnostics on the \(c_{ij}\) parameters, which show that (at least for the \(c_{ij}\) values) the algorithm is mixing effectively (or will be, once the algorithm runs a little longer).
#> ----------------------------------------------------
#> Generalized MCMC Diagnostics using lanfear Method
#> ----------------------------------------------------
#>
#> |Effective Sample Size:
#> |---------------------------
#> | Chain 1| Chain 2| Chain 3| Chain 4| Sum|
#> |-------:|-------:|-------:|-------:|-------:|
#> | 86.077| 81.6| 91.054| 114.178| 372.909|
#>
#> |Gelman-Rubin Diagnostic:
#> |---------------------------
#> | Point est.| Upper C.I.|
#> |----------:|----------:|
#> | 1.001| 1.005|
In order to see some important plots for the SkipTrack model fit, you
can simply use plot(ft)
, and the plots are directly
accessible using skipTrack.visualize(ft)
.
A summary is available for the SkipTrack model fit with
summary(ft)
, with more detailed results accessible through
skipTrack.results(ft)
. Importantly, these results are based
on a default chain burn-in value of 750 draws. This can be changed using
the parameter burnIn
for either function.
summary(ft)
#> ----------------------------------------------------
#> Summary of skipTrack.fit using skipTrack model
#> ----------------------------------------------------
#> Mean Coefficients:
#>
#> Estimate 95% CI Lower 95% CI Upper
#> (Intercept) 3.406 3.376 3.436
#>
#> ----------------------------------------------------
#> Precision Coefficients:
#>
#> Estimate 95% CI Lower 95% CI Upper
#> (Intercept) 5.36 5.134 5.593
#>
#> ----------------------------------------------------
#> Diagnostics:
#>
#> Effective Sample Size Gelman-Rubin
#> Betas 4004.0 1
#> Gammas 21.8 1
#> cijs 351.1 1
#>
#> ----------------------------------------------------
summary(ft, burnIn = 500)
#> ----------------------------------------------------
#> Summary of skipTrack.fit using skipTrack model
#> ----------------------------------------------------
#> Mean Coefficients:
#>
#> Estimate 95% CI Lower 95% CI Upper
#> (Intercept) 3.407 3.378 3.437
#>
#> ----------------------------------------------------
#> Precision Coefficients:
#>
#> Estimate 95% CI Lower 95% CI Upper
#> (Intercept) 5.342 5.125 5.569
#>
#> ----------------------------------------------------
#> Diagnostics:
#>
#> Effective Sample Size Gelman-Rubin
#> Betas 4004.00 1
#> Gammas 21.77 1
#> cijs 460.23 1
#>
#> ----------------------------------------------------
This introduction provides enough information to start fitting the SkipTrack model. For further information regarding different methods of simulating data, additional model fitting, and tuning parameters for fitting the model, please see the help pages. Additional vignettes are forthcoming.