This is an initial attempt to enable easy calculation/visualization of study designs from R/gap which benchmarked relevant publications and eventually the app can produce more generic results.
One can run the app with R/gap installation as follows,
setwd(file.path(find.package("gap"),"shinygap"))
library(shiny)
runApp()
Alternatively, one can run the app from source using gap/inst/shinygap
. In fact, these are conveniently wrapped up as runshinygap()
function.
To set the default parameters, some compromises need to be made, e.g., Kp=[1e-5, 0.4], MAF=[1e-3, 0.8], alpha=[1e-8, 0.05], beta=[0.01, 0.4]. The slider inputs provide upper bounds of parameters.
This is a call to fbsize()
.
This is a call to pbsize()
.
This is a call to ccsize()
whose power
argument indcates power (TRUE) or sample size (FALSE) calculation.
We implement it in function whose format is
tscc(model, GRR, p1, n1, n2, M, alpha.genome, pi.samples, pi.markers, K)
which requires specification of disease model (multiplicative, additive, dominant, recessive), genotypic relative risk (GRR), the estimated risk allele frequency in cases (\(p_1\)), total number of cases (\(n_1\)) total number of controls (\(n_2\)), total number of markers (\(M\)), the false positive rate at genome level (\(\alpha_\mathit{genome}\)), the proportion of markers to be selected (\(\pi_\mathit{markers}\), also used as the false positive rate at stage 1) and the population prevalence (\(K\)).
This is detailed in the package vignettes gap, https://cran.r-project.org/package=gap, or jss1.
Our implemention is with respect to two aspects2.
\[\Phi\left(Z_\alpha+\tilde{n}^\frac{1}{2}\theta\sqrt{\frac{p_1p_2p_D}{q+(1-q)p_D}}\right)\] where \(\alpha\) is the significance level, \(\theta\) is the log-hazard ratio for two groups, \(p_j, j = 1, 2\), are the proportion of the two groups in the population (\(p_1 + p_2 = 1\)), \(\tilde{n}\) is the total number of subjects in the subcohort, \(p_D\) is the proportion of the failures in the full cohort, and \(q\) is the sampling fraction of the subcohort.
\[\tilde{n}=\frac{nBp_D}{n-B(1-p_D)}\] where \(B=\frac{Z_{1-\alpha}+Z_\beta}{\theta^2p_1p_2p_D}\) and \(n\) is the whole cohort size.
Tests of allele frequency differences between cases and controls in a two-stage design are described here3. The usual test of proportions can be written as \[z(p_1,p_2,n_1,n_2,\pi_{samples})=\frac{p_1-p_2}{\sqrt{\frac{p_1(1-p_1)}{2n_1\pi_{sample}}+\frac{p_2(1-p_2)}{2n_2\pi_{sample}}}}\] where \(p_1\) and \(p_2\) are the allele frequencies, \(n_1\) and \(n_2\) are the sample sizes, \(\pi_{samples}\) is the proportion of samples to be genotyped at stage 1. The test statistics for stage 1, for stage 2 as replication and for stages 1 and 2 in a joint analysis are then \(z_1 = z(\hat p_1,\hat p_2,n_1,n_2,\pi_{samples})\), \(z_2 = z(\hat p_1,\hat p_2,n_1,n_2,1-\pi_{samples})\), \(z_j = \sqrt{\pi_{samples}}z_1+\sqrt{1-\pi_{samples}}z_2\), respectively. Let \(C_1\), \(C_2\), and \(C_j\) be the thresholds for these statistics, the false positive rates can be obtained according to \(P(|z_1|>C_1)P(|z_2|>C_2,sign(z_1)=sign(z_2))\) and \(P(|z_1|>C_1)P(|z_j|>C_j||z_1|>C_1)\) for replication-based and joint analyses, respectively.