library(data.table)
library(simitation)
This is Part 2 of the vignette. Please refer to ‘Introduction_to_Simitation’ for Part 1.
The sim.prop2 method is used to generate binary data for a two-sample proportions test based on a probabilities of success \(p_x\) and \(p_y\) for samples of respective sizes \(n_x\) and \(n_y\) and the overall number of experiments \(B\).
simdat.prop2 <- sim.prop2(nx = 30, ny = 40, px = 0.5, py = 0.55, num.experiments = 2000,
experiment.name = "sim", group.name = "treatment", x.value = "group_1", y.value = "group_2",
value.name = "correct_answer", seed = 3)
print(simdat.prop2)
## sim treatment correct_answer
## 1: 1 group_1 1
## 2: 1 group_1 0
## 3: 1 group_1 1
## 4: 1 group_1 0
## 5: 1 group_1 0
## ---
## 139996: 2000 group_2 0
## 139997: 2000 group_2 0
## 139998: 2000 group_2 1
## 139999: 2000 group_2 0
## 140000: 2000 group_2 1
The sim.prop2.test method is used to implement two-sample tests of proportions (see prop.test) independently across the \(B\) repeated experiments. This is testing a hypothesized value of \(p = p_x - p_y\) (which defaults to zero) in each of the \(B\) simulated experiments.
test.statistics.prop2 <- sim.prop2.test(simdat.prop2 = simdat.prop2, p = NULL, alternative = "less",
conf.level = 0.95, correct = T, experiment.name = "sim", group.name = "treatment",
x.value = "group_1", y.value = "group_2", value.name = "correct_answer")
print(test.statistics.prop2)
## sim statistic df p.value lower.ci upper.ci estimate x.estimate
## 1: 1 8.101852e-01 1 0.1840328 -1 0.08648872 -0.13333333 0.5666667
## 2: 2 5.328372e-31 1 0.5000000 -1 0.19060611 -0.01666667 0.6333333
## 3: 3 1.188859e-02 1 0.4565874 -1 0.17665899 -0.04166667 0.6333333
## 4: 4 1.226521e-03 1 0.4860312 -1 0.19173799 -0.03333333 0.5666667
## 5: 5 7.578619e-31 1 0.5000000 -1 0.19795590 -0.01666667 0.5333333
## ---
## 1996: 1996 7.578619e-31 1 0.5000000 -1 0.19795590 -0.01666667 0.5333333
## 1997: 1997 1.031504e+00 1 0.1549028 -1 0.07393313 -0.15000000 0.5000000
## 1998: 1998 1.388221e+00 1 0.1193529 -1 0.05394214 -0.16666667 0.5333333
## 1999: 1999 2.759672e-01 1 0.2996784 -1 0.13320075 -0.09166667 0.5333333
## 2000: 2000 6.428819e-01 1 0.2113346 -1 0.10012327 -0.12500000 0.5000000
## y.estimate null.value alternative
## 1: 0.700 0 less
## 2: 0.650 0 less
## 3: 0.675 0 less
## 4: 0.600 0 less
## 5: 0.550 0 less
## ---
## 1996: 0.550 0 less
## 1997: 0.650 0 less
## 1998: 0.700 0 less
## 1999: 0.625 0 less
## 2000: 0.625 0 less
## method
## 1: 2-sample test for equality of proportions with continuity correction
## 2: 2-sample test for equality of proportions with continuity correction
## 3: 2-sample test for equality of proportions with continuity correction
## 4: 2-sample test for equality of proportions with continuity correction
## 5: 2-sample test for equality of proportions with continuity correction
## ---
## 1996: 2-sample test for equality of proportions with continuity correction
## 1997: 2-sample test for equality of proportions with continuity correction
## 1998: 2-sample test for equality of proportions with continuity correction
## 1999: 2-sample test for equality of proportions with continuity correction
## 2000: 2-sample test for equality of proportions with continuity correction
The analyze.simstudy.prop2 method summarizes the \(B\) two-sample tests of proportions across the repeated experiments.
analysis.prop2 <- analyze.simstudy.prop2(test.statistics.prop2 = test.statistics.prop2,
alternative = "less", conf.level = 0.95, the.quantiles = c(0.025, 0.1, 0.9, 0.975))
print(analysis.prop2)
## $estimate.summary
## q.2.5 q.10 q.90 q.97.5 mean st.error
## 1: -0.2833333 -0.2083333 0.1166667 0.191875 -0.04835833 0.1234189
##
## $stat.summary
## q.2.5 q.10 q.90 q.97.5 mean st.error
## 1: 0 5.498929e-31 2.442618 5.048447 0.8499318 1.38748
##
## $p.value.summary
## reject.proportion non.reject.proportion
## 1 0.073 0.927
##
## $df.summary
## q.2.5 q.10 q.90 q.97.5 mean st.error
## 1: 1 1 1 1 1 0
##
## $ci.limit.summary
## q.2.5 q.10 q.90 q.97.5 mean st.error
## 1: -0.06674141 0.01436183 0.3376187 0.415833 0.173367 0.1235462
The simstudy.prop2 method is used to a) generate data for experiments with two groups and a binary outcome, b) implement two-sample tests of proportions, and c) analyze these tests across the repeated experiments.
study.prop2 <- simstudy.prop2(nx = 30, ny = 40, px = 0.5, py = 0.55, num.experiments = 2000,
p = NULL, alternative = "less", conf.level = 0.95, correct = T, the.quantiles = c(0.025,
0.5, 0.975), experiment.name = "sim", group.name = "treatment", x.value = "treatment_1",
y.value = "treatment_2", value.name = "correct_answer", seed = 904)
print(study.prop2)
## $simdat.prop2
## sim treatment correct_answer
## 1: 1 treatment_1 0
## 2: 1 treatment_1 1
## 3: 1 treatment_1 1
## 4: 1 treatment_1 1
## 5: 1 treatment_1 0
## ---
## 139996: 2000 treatment_2 0
## 139997: 2000 treatment_2 1
## 139998: 2000 treatment_2 0
## 139999: 2000 treatment_2 0
## 140000: 2000 treatment_2 1
##
## $test.statistics.prop2
## sim statistic df p.value lower.ci upper.ci estimate x.estimate
## 1: 1 0.09674447 1 0.3778860 -1 0.16012343 -0.06666667 0.4333333
## 2: 2 3.10657248 1 0.9610116 -1 0.46206681 0.24166667 0.6666667
## 3: 3 0.64288194 1 0.2113346 -1 0.10012327 -0.12500000 0.5000000
## 4: 4 0.43116981 1 0.2557077 -1 0.11825463 -0.10833333 0.4666667
## 5: 5 0.09843750 1 0.3768564 -1 0.15917041 -0.06666667 0.5333333
## ---
## 1996: 1996 0.74955455 1 0.1933086 -1 0.09250375 -0.13333333 0.4666667
## 1997: 1997 0.05833333 1 0.4045749 -1 0.16910931 -0.05833333 0.4666667
## 1998: 1998 0.89413373 1 0.1721798 -1 0.07961568 -0.14166667 0.3333333
## 1999: 1999 0.64288194 1 0.2113346 -1 0.10012327 -0.12500000 0.5000000
## 2000: 2000 0.06076389 1 0.4026463 -1 0.16576452 -0.05833333 0.5666667
## y.estimate null.value alternative
## 1: 0.500 0 less
## 2: 0.425 0 less
## 3: 0.625 0 less
## 4: 0.575 0 less
## 5: 0.600 0 less
## ---
## 1996: 0.600 0 less
## 1997: 0.525 0 less
## 1998: 0.475 0 less
## 1999: 0.625 0 less
## 2000: 0.625 0 less
## method
## 1: 2-sample test for equality of proportions with continuity correction
## 2: 2-sample test for equality of proportions with continuity correction
## 3: 2-sample test for equality of proportions with continuity correction
## 4: 2-sample test for equality of proportions with continuity correction
## 5: 2-sample test for equality of proportions with continuity correction
## ---
## 1996: 2-sample test for equality of proportions with continuity correction
## 1997: 2-sample test for equality of proportions with continuity correction
## 1998: 2-sample test for equality of proportions with continuity correction
## 1999: 2-sample test for equality of proportions with continuity correction
## 2000: 2-sample test for equality of proportions with continuity correction
##
## $sim.analysis.prop2
## $sim.analysis.prop2$estimate.summary
## q.2.5 q.50 q.97.5 mean st.error
## 1: -0.275 -0.05 0.1916667 -0.04718333 0.1198669
##
## $sim.analysis.prop2$stat.summary
## q.2.5 q.50 q.97.5 mean st.error
## 1: 0 0.2507323 4.444274 0.793184 1.308529
##
## $sim.analysis.prop2$p.value.summary
## reject.proportion non.reject.proportion
## 1 0.067 0.933
##
## $sim.analysis.prop2$df.summary
## q.2.5 q.50 q.97.5 mean st.error
## 1: 1 1 1 1 0
##
## $sim.analysis.prop2$ci.limit.summary
## q.2.5 q.50 q.97.5 mean st.error
## 1: -0.05750388 0.1773755 0.4157645 0.1745985 0.1197995
The sim.chisq.gf method is used to generate categorical data for a \(\chi^2\) test of goodness of fit. The specifications include the sample size \(n\) of each experiment, the categorical values to be sampled, the probability density, and the overall number of experiments \(B\).
simdat.chisq.gf <- sim.chisq.gf(n = 100, values = LETTERS[1:4], prob = c(0.4, 0.3,
0.2, 0.1), num.experiments = 2000, experiment.name = "experiment_id", value.name = "classification",
seed = 31)
print(simdat.chisq.gf)
## experiment_id classification
## 1: 1 B
## 2: 1 A
## 3: 1 A
## 4: 1 A
## 5: 1 A
## ---
## 199996: 2000 A
## 199997: 2000 A
## 199998: 2000 D
## 199999: 2000 A
## 200000: 2000 B
The sim.chisq.test.gf method is used to implement \(\chi^2\) tests of goodness of fit (see chisq.test) independently across the \(B\) repeated experiments. This is testing the observed counts of the categorical values relative to those expected from the hypothesized probabilities.
test.statistics.chisq.test.gf <- sim.chisq.test.gf(simdat.chisq.gf = simdat.chisq.gf,
hypothesized.probs = c(0.25, 0.3, 0.15, 0.3), correct = F, experiment.name = "experiment_id",
value.name = "classification")
print(test.statistics.chisq.test.gf)
## experiment_id statistic df p.value
## 1: 1 19.99333 3 1.702833e-04
## 2: 2 33.84000 3 2.141413e-07
## 3: 3 31.46000 3 6.800906e-07
## 4: 4 32.40000 3 4.309971e-07
## 5: 5 25.62667 3 1.141766e-05
## ---
## 1996: 1996 24.16000 3 2.313046e-05
## 1997: 1997 22.47333 3 5.199072e-05
## 1998: 1998 23.40667 3 3.322050e-05
## 1999: 1999 27.27333 3 5.159507e-06
## 2000: 2000 28.40000 3 2.993459e-06
The analyze.simstudy.chisq.test.gf method summarizes the \(B\) \(\chi^2\) tests of goodness of fit across the repeated experiments.
analysis.chisq.gf <- analyze.simstudy.chisq.test.gf(test.statistics.chisq.test.gf = test.statistics.chisq.test.gf,
conf.level = 0.95, the.quantiles = c(0.25, 0.75))
print(analysis.chisq.gf)
## $stat.summary
## q.25 q.75 mean st.error
## 1: 21.39 32.47333 27.27865 8.331138
##
## $p.value.summary
## reject.proportion non.reject.proportion
## 1: 0.998 0.002
The simstudy.chisq.test.gf method is used to a) generate data for experiments with categorical data, b) implement \(\chi^2\) tests of goodness of fit, and c) analyze these tests across the repeated experiments.
study.chisq.gf <- simstudy.chisq.test.gf(n = 75, values = LETTERS[1:4], actual.probs = c(0.3,
0.3, 0.2, 0.2), hypothesized.probs = rep.int(x = 0.25, times = 4), num.experiments = 40,
conf.level = 0.95, correct = F, the.quantiles = c(0.25, 0.75), experiment.name = "experiment_id",
value.name = "classification", seed = 61)
print(study.chisq.gf)
## $simdat
## experiment_id classification
## 1: 1 B
## 2: 1 B
## 3: 1 B
## 4: 1 D
## 5: 1 A
## ---
## 2996: 40 D
## 2997: 40 D
## 2998: 40 A
## 2999: 40 C
## 3000: 40 D
##
## $test.statistics
## experiment_id statistic df p.value
## 1: 1 2.6000000 3 0.4574895469
## 2: 2 4.8400000 3 0.1838951036
## 3: 3 11.4533333 3 0.0095108954
## 4: 4 9.8533333 3 0.0198548943
## 5: 5 2.0666667 3 0.5586857021
## 6: 6 7.5066667 3 0.0573874040
## 7: 7 5.5866667 3 0.1335459099
## 8: 8 4.2000000 3 0.2406618852
## 9: 9 8.1466667 3 0.0430758306
## 10: 10 4.9466667 3 0.1757443898
## 11: 11 10.4933333 3 0.0148061892
## 12: 12 3.5600000 3 0.3130632920
## 13: 13 2.2800000 3 0.5163632002
## 14: 14 2.2800000 3 0.5163632002
## 15: 15 1.0000000 3 0.8012519569
## 16: 16 19.4533333 3 0.0002202992
## 17: 17 2.3866667 3 0.4961214445
## 18: 18 5.1600000 3 0.1604491360
## 19: 19 1.7466667 3 0.6266090464
## 20: 20 2.8133333 3 0.4213097138
## 21: 21 3.6666667 3 0.2997805886
## 22: 22 2.8133333 3 0.4213097138
## 23: 23 2.0666667 3 0.5586857021
## 24: 24 1.3200000 3 0.7243894462
## 25: 25 14.8666667 3 0.0019342103
## 26: 26 0.7866667 3 0.8526534606
## 27: 27 9.0000000 3 0.0292908865
## 28: 28 1.3200000 3 0.7243894462
## 29: 29 9.0000000 3 0.0292908865
## 30: 30 7.1866667 3 0.0661801654
## 31: 31 14.6533333 3 0.0021381941
## 32: 32 12.3066667 3 0.0064031877
## 33: 33 5.2666667 3 0.1532800232
## 34: 34 2.3866667 3 0.4961214445
## 35: 35 6.3333333 3 0.0964723224
## 36: 36 3.9866667 3 0.2629074931
## 37: 37 7.9333333 3 0.0474097862
## 38: 38 9.3200000 3 0.0253254084
## 39: 39 9.2133333 3 0.0265849405
## 40: 40 13.2666667 3 0.0040940177
## experiment_id statistic df p.value
##
## $sim.analysis
## $sim.analysis$stat.summary
## q.25 q.75 mean st.error
## 1: 2.386667 9.053333 6.226667 4.513542
##
## $sim.analysis$p.value.summary
## reject.proportion non.reject.proportion
## 1: 0.35 0.65
The sim.chisq.ind method is used to generate categorical outcome data across the levels of a categorical independent variable for use in a \(\chi^2\) test of independence. The specifications include the sample size \(n\) of each level of the independent variable (and within each experiment), the categorical values to be sampled, the probability density within each categorical level (specified as a matrix), and the overall number of experiments \(B\).
n <- c(50, 75, 100)
values <- LETTERS[1:4]
group.names <- sprintf("group_%d", 1:3)
probs <- matrix(data = c(0.25, 0.25, 0.25, 0.25, 0.4, 0.3, 0.2, 0.1, 0.2, 0.4, 0.2,
0.2), nrow = length(n), byrow = T)
simdat.chisq.ind <- sim.chisq.ind(n = c(50, 75, 100), values = LETTERS[1:4], probs = probs,
num.experiments = 2000, experiment.name = "exp_id", group.name = "treatment_group",
group.values = sprintf("group_%d", 1:3), value.name = "category", seed = 31)
print(simdat.chisq.ind)
## exp_id treatment_group category
## 1: 1 group_1 D
## 2: 1 group_1 B
## 3: 1 group_1 B
## 4: 1 group_1 B
## 5: 1 group_1 C
## ---
## 449996: 2000 group_3 D
## 449997: 2000 group_3 A
## 449998: 2000 group_3 A
## 449999: 2000 group_3 B
## 450000: 2000 group_3 B
The sim.chisq.test.ind method is used to implement \(\chi^2\) tests of independence (see chisq.test) independently across the \(B\) repeated experiments. This is testing the observed counts of the categorical values across the levels of the independent variable relative to those expected from a hypothesis of independence between the independent and dependent variables.
test.statistics.chisq.test.ind <- sim.chisq.test.ind(simdat.chisq.ind = simdat.chisq.ind,
correct = T, experiment.name = "exp_id", group.name = "treatment_group", value.name = "category")
print(test.statistics.chisq.test.ind)
## exp_id statistic df p.value
## 1: 1 5.307586 6 5.050104e-01
## 2: 2 26.706805 6 1.643136e-04
## 3: 3 12.795336 6 4.640364e-02
## 4: 4 28.470513 6 7.660256e-05
## 5: 5 7.964288 6 2.407314e-01
## ---
## 1996: 1996 8.374005 6 2.119629e-01
## 1997: 1997 16.892837 6 9.685260e-03
## 1998: 1998 13.645147 6 3.386126e-02
## 1999: 1999 12.854038 6 4.541328e-02
## 2000: 2000 16.606892 6 1.084191e-02
The analyze.simstudy.chisq.test.ind method summarizes the \(B\) \(\chi^2\) tests of independence across the repeated experiments.
analysis.chisq.ind <- analyze.simstudy.chisq.test.ind(test.statistics.chisq.test.ind = test.statistics.chisq.test.ind,
conf.level = 0.95, the.quantiles = c(0.025, 0.975))
print(analysis.chisq.ind)
## $stat.summary
## q.2.5 q.97.5 mean st.error
## 1: 6.646683 37.47393 19.64823 8.048949
##
## $p.value.summary
## reject.proportion non.reject.proportion
## 1: 0.806 0.194
The simstudy.chisq.test.ind method is used to a) generate data for experiments with categorical independent and dependent variables, b) implement \(\chi^2\) tests of independence, and c) analyze these tests across the repeated experiments.
study.chisq.ind <- simstudy.chisq.test.ind(n = c(30, 35, 40), values = LETTERS[1:4],
probs = probs, num.experiments = 2000, conf.level = 0.95, correct = T, the.quantiles = c(0.025,
0.975), experiment.name = "exp_id", group.name = "treatment_group", group.values = sprintf("group_%d",
1:3), value.name = "category", seed = 77)
print(study.chisq.ind)
## $simdat.chisq.ind
## exp_id treatment_group category
## 1: 1 group_1 C
## 2: 1 group_1 D
## 3: 1 group_1 B
## 4: 1 group_1 D
## 5: 1 group_1 A
## ---
## 209996: 2000 group_3 A
## 209997: 2000 group_3 D
## 209998: 2000 group_3 D
## 209999: 2000 group_3 B
## 210000: 2000 group_3 B
##
## $test.statistics.chisq.test.ind
## exp_id statistic df p.value
## 1: 1 16.692964 6 0.01048041
## 2: 2 11.365260 6 0.07772267
## 3: 3 8.828703 6 0.18344334
## 4: 4 11.848328 6 0.06543923
## 5: 5 8.797762 6 0.18527533
## ---
## 1996: 1996 6.811598 6 0.33862260
## 1997: 1997 8.888580 6 0.17994193
## 1998: 1998 7.443813 6 0.28174447
## 1999: 1999 6.565685 6 0.36288386
## 2000: 2000 15.127593 6 0.01928720
##
## $sim.analysis
## $sim.analysis$stat.summary
## q.2.5 q.97.5 mean st.error
## 1: 3.547409 26.60943 12.59337 6.036573
##
## $sim.analysis$p.value.summary
## reject.proportion non.reject.proportion
## 1: 0.431 0.569
Simulation studies grow more complex in multivariate settings. Generating multiple variables can require a specification of their distributions and interdependence. The simitation package aims to simplify the creation of simulated data in multivariate settings. This is achived through a number of steps:
Specify the variables to generate using symbolic inputs.
Allow for dependent relationships in the variables through specifications of regression formulas.
Then generate simulated data sets basesd only on the symbolic inputs.
We will discuss this approach in the sections below.
Using an example of a health and wellness study, we aim to simulate data for a study that aims to model:
Outcome 1: Healthy Lifestyle: a binary classification of a subject’s lifestyle.
Outcome 2: Weight: A continuous measurement of a subject’s weight.
These outcomes will be modeled in terms of the following covariates:
Basic functions in R such as rnorm(), sample(), runif(), and rpois() could be used to generate these values. However, the simitation package allows the specifications to remain symbolic, as seen below:
step.age <- "Age ~ N(45, 10)"
step.female <- "Female ~ binary(0.53)"
step.health.percentile <- "Health.Percentile ~ U(0,100)"
step.exercise.sessions <- "Exercise.Sessions ~ Poisson(2)"
step.diet <- "Diet ~ sample(('Light', 'Moderate', 'Heavy'), (0.2, 0.45, 0.35))"
Then we can create a regression formula as the symbolic representation for the binary Healthy.Lifestyle variable:
step.healthy.lifestyle <- "Healthy.Lifestyle ~ logistic(log(0.45) - 0.1 * (Age -45) + 0.05 * Female + 0.01 * Health.Percentile + 0.5 * Exercise.Sessions - 0.1 * (Diet == 'Moderate') - 0.4 * (Diet == 'Heavy'))"
Likewise, we can build a linear regression formula for the Weight outcome:
step.weight <- "Weight ~ lm(150 - 15 * Female + 0.5 * Age - 0.1 * Health.Percentile - 0.2 * Exercise.Sessions + 5 * (Diet == 'Moderate') + 15 * (Diet == 'Heavy') - 2 * Healthy.Lifestyle + N(0, 10))"
Note that this regression formula includes a specification of the error terms in terms of a distribution, e.g. \(N(0,10)\).
Finally, we can concatenate these symbolic inputs into a variable the.steps that will be used as an input to multivariate simulations.
the.steps <- c(step.age, step.female, step.health.percentile, step.exercise.sessions,
step.diet, step.healthy.lifestyle, step.weight)
print(the.steps)
## [1] "Age ~ N(45, 10)"
## [2] "Female ~ binary(0.53)"
## [3] "Health.Percentile ~ U(0,100)"
## [4] "Exercise.Sessions ~ Poisson(2)"
## [5] "Diet ~ sample(('Light', 'Moderate', 'Heavy'), (0.2, 0.45, 0.35))"
## [6] "Healthy.Lifestyle ~ logistic(log(0.45) - 0.1 * (Age -45) + 0.05 * Female + 0.01 * Health.Percentile + 0.5 * Exercise.Sessions - 0.1 * (Diet == 'Moderate') - 0.4 * (Diet == 'Heavy'))"
## [7] "Weight ~ lm(150 - 15 * Female + 0.5 * Age - 0.1 * Health.Percentile - 0.2 * Exercise.Sessions + 5 * (Diet == 'Moderate') + 15 * (Diet == 'Heavy') - 2 * Healthy.Lifestyle + N(0, 10))"
The simulation.steps method creates multivariate simulated data based on the.steps (specified in the previous section), the sample size \(n\) for a single experiment, and the overall number of experiments \(B\).
simdat.multivariate <- simulation.steps(the.steps = the.steps, n = 50, num.experiments = 2000,
experiment.name = "sim", seed = 41)
print(simdat.multivariate)
## sim Age Female Health.Percentile Exercise.Sessions Diet
## 1: 1 37.05632 TRUE 29.71407 1 Moderate
## 2: 1 44.46198 FALSE 76.89747 0 Moderate
## 3: 1 34.63613 FALSE 51.34102 3 Light
## 4: 1 48.20881 TRUE 51.50149 1 Moderate
## 5: 1 43.42641 TRUE 95.67911 3 Heavy
## ---
## 99996: 2000 50.01564 FALSE 23.67137 4 Moderate
## 99997: 2000 36.44738 TRUE 18.06063 3 Heavy
## 99998: 2000 42.20966 TRUE 56.94578 2 Light
## 99999: 2000 44.44400 TRUE 98.21218 2 Light
## 100000: 2000 36.43157 TRUE 70.15200 0 Heavy
## Healthy.Lifestyle Weight
## 1: TRUE 153.6319
## 2: FALSE 156.0598
## 3: TRUE 151.5775
## 4: TRUE 158.0484
## 5: TRUE 174.1727
## ---
## 99996: TRUE 174.3380
## 99997: TRUE 170.2326
## 99998: FALSE 154.7910
## 99999: TRUE 128.5885
## 100000: TRUE 174.0905
The sim.statistics.lm model is used to separately fit linear regression models in each experiment (as specified by the grouping.variables). This relies upon the simulated multivariate data as well as the intended formula for the linear regression model.
stats.lm <- sim.statistics.lm(simdat = simdat.multivariate, the.formula = Weight ~
Age + Female + Health.Percentile + Exercise.Sessions + Healthy.Lifestyle, grouping.variables = "sim")
print(stats.lm)
## $the.coefs
## sim Coefficient Estimate Std. Error t value
## 1: 1 (Intercept) 143.41244699 12.02644163 11.9247614
## 2: 1 Age 0.69115562 0.24810565 2.7857311
## 3: 1 FemaleTRUE -8.47760669 3.56128054 -2.3804939
## 4: 1 Health.Percentile -0.18169526 0.05213267 -3.4852473
## 5: 1 Exercise.Sessions 1.09512438 1.53223089 0.7147254
## ---
## 11996: 2000 Age 0.05654876 0.28312228 0.1997327
## 11997: 2000 FemaleTRUE -12.83816717 4.31336302 -2.9763707
## 11998: 2000 Health.Percentile -0.16658583 0.07629281 -2.1835063
## 11999: 2000 Exercise.Sessions 3.24032954 1.45462179 2.2276096
## 12000: 2000 Healthy.LifestyleTRUE -4.61786974 4.95439748 -0.9320749
## Pr(>|t|)
## 1: 2.237792e-15
## 2: 7.849760e-03
## 3: 2.168587e-02
## 4: 1.126164e-03
## 5: 4.785538e-01
## ---
## 11996: 8.426099e-01
## 11997: 4.726254e-03
## 11998: 3.437501e-02
## 11999: 3.106828e-02
## 12000: 3.563843e-01
##
## $summary.stats
## sim sigma df rse r.squared adj.r.squared fstatistic f.numdf
## 1: 1 11.143719 44 11.143719 0.4276650 0.3626270 6.575612 5
## 2: 2 12.470285 44 12.470285 0.4650459 0.4042556 7.650008 5
## 3: 3 11.142565 44 11.142565 0.3858350 0.3160435 5.528397 5
## 4: 4 10.508400 44 10.508400 0.5391233 0.4867509 10.294043 5
## 5: 5 9.069615 44 9.069615 0.7091380 0.6760855 21.454901 5
## ---
## 1996: 1996 10.332274 44 10.332274 0.5370773 0.4844724 10.209651 5
## 1997: 1997 11.833808 44 11.833808 0.4182409 0.3521319 6.326536 5
## 1998: 1998 11.758757 44 11.758757 0.4288661 0.3639645 6.607945 5
## 1999: 1999 10.416190 44 10.416190 0.5447162 0.4929794 10.528603 5
## 2000: 2000 14.041371 44 14.041371 0.3798946 0.3094281 5.391136 5
## f.dendf f.pvalue
## 1: 44 1.197250e-04
## 2: 44 3.035780e-05
## 3: 44 4.920593e-04
## 4: 44 1.400543e-06
## 5: 44 8.216837e-11
## ---
## 1996: 44 1.535810e-06
## 1997: 44 1.664007e-04
## 1998: 44 1.147523e-04
## 1999: 44 1.086012e-06
## 2000: 44 5.956363e-04
The analyze.simstudy.lm summarizes the \(B\) linear regression models across the repeated experiments.
analysis.lm <- analyze.simstudy.lm(the.coefs = stats.lm$the.coefs, summary.stats = stats.lm$summary.stats,
conf.level = 0.95, the.quantiles = c(0.25, 0.75), coef.name = "Coefficient",
estimate.name = "Estimate", lm.p.name = "Pr(>|t|)", f.p.name = "f.pvalue")
print(analysis.lm)
## $lm.estimate.summary
## Coefficient q.25 q.75 mean st.error
## 1: (Intercept) 151.5068859 165.33470637 158.43416265 10.46546190
## 2: Age 0.3455159 0.61848162 0.48300680 0.19722956
## 3: FemaleTRUE -17.1688096 -12.54728544 -14.90209299 3.48346863
## 4: Health.Percentile -0.1381232 -0.05769657 -0.09753103 0.05975137
## 5: Exercise.Sessions -0.9913888 0.74392705 -0.10991843 1.29124842
## 6: Healthy.LifestyleTRUE -5.5838909 -0.19630860 -2.86531869 4.04913916
##
## $lm.p.summary
## Coefficient reject.proportion non.reject.proportion
## 1: (Intercept) 1.0000 0.0000
## 2: Age 0.6840 0.3160
## 3: FemaleTRUE 0.9885 0.0115
## 4: Health.Percentile 0.3455 0.6545
## 5: Exercise.Sessions 0.0485 0.9515
## 6: Healthy.LifestyleTRUE 0.1155 0.8845
##
## $lm.stats.summary
## stat q.25 q.75 mean st.error
## 1: sigma 10.6873045 12.3143905 11.5124512 1.22594777
## 2: rse 10.6873045 12.3143905 11.5124512 1.22594777
## 3: r.squared 0.3976626 0.5388679 0.4655912 0.09954937
## 4: adj.r.squared 0.3292152 0.4864665 0.4048630 0.11086180
## 5: fstatistic 5.8097521 10.2834676 8.2803198 3.41380832
##
## $fstatistic.p.summary
## reject.proportion non.reject.proportion
## 1: 0.991 0.009
The simstudy.lm method is used to a) generate multivariate data for experiments, b) implement linear regression models, and c) analyze these models across the repeated experiments.
study.lm <- simstudy.lm(the.steps = the.steps, n = 100, num.experiments = 2000, the.formula = Weight ~
Age + Female + Health.Percentile + Exercise.Sessions + Healthy.Lifestyle, conf.level = 0.95,
the.quantiles = c(0.25, 0.75), experiment.name = "sim", seed = 11)
print(study.lm)
## $the.steps
## [1] "Age ~ N(45, 10)"
## [2] "Female ~ binary(0.53)"
## [3] "Health.Percentile ~ U(0,100)"
## [4] "Exercise.Sessions ~ Poisson(2)"
## [5] "Diet ~ sample(('Light', 'Moderate', 'Heavy'), (0.2, 0.45, 0.35))"
## [6] "Healthy.Lifestyle ~ logistic(log(0.45) - 0.1 * (Age -45) + 0.05 * Female + 0.01 * Health.Percentile + 0.5 * Exercise.Sessions - 0.1 * (Diet == 'Moderate') - 0.4 * (Diet == 'Heavy'))"
## [7] "Weight ~ lm(150 - 15 * Female + 0.5 * Age - 0.1 * Health.Percentile - 0.2 * Exercise.Sessions + 5 * (Diet == 'Moderate') + 15 * (Diet == 'Heavy') - 2 * Healthy.Lifestyle + N(0, 10))"
##
## $simdat
## sim Age Female Health.Percentile Exercise.Sessions Diet
## 1: 1 39.08969 FALSE 74.49755 2 Light
## 2: 1 46.50081 TRUE 98.57778 4 Moderate
## 3: 1 53.66492 TRUE 12.05114 5 Heavy
## 4: 1 43.58467 TRUE 97.88096 3 Light
## 5: 1 28.43229 TRUE 67.85286 0 Heavy
## ---
## 199996: 2000 19.37264 FALSE 26.48270 2 Heavy
## 199997: 2000 49.38716 TRUE 57.24480 0 Heavy
## 199998: 2000 37.58661 TRUE 84.77410 3 Light
## 199999: 2000 48.46724 FALSE 27.32050 3 Moderate
## 200000: 2000 48.92280 TRUE 27.51480 1 Heavy
## Healthy.Lifestyle Weight
## 1: TRUE 166.2727
## 2: TRUE 167.0829
## 3: TRUE 172.8174
## 4: TRUE 133.3357
## 5: FALSE 155.2499
## ---
## 199996: TRUE 167.6353
## 199997: FALSE 171.4798
## 199998: TRUE 130.6448
## 199999: TRUE 167.5629
## 200000: TRUE 172.3558
##
## $statistics
## $statistics$the.coefs
## sim Coefficient Estimate Std. Error t value
## 1: 1 (Intercept) 156.35192591 7.54424520 20.72466122
## 2: 1 Age 0.46370604 0.14074535 3.29464565
## 3: 1 FemaleTRUE -13.50742857 2.32305161 -5.81451936
## 4: 1 Health.Percentile -0.11561014 0.03864230 -2.99180295
## 5: 1 Exercise.Sessions -0.02755147 0.95516858 -0.02884462
## ---
## 11996: 2000 Age 0.47335644 0.11591227 4.08374763
## 11997: 2000 FemaleTRUE -15.83360573 2.19748554 -7.20532875
## 11998: 2000 Health.Percentile -0.06802191 0.03570605 -1.90505272
## 11999: 2000 Exercise.Sessions -1.02835128 0.95862934 -1.07273087
## 12000: 2000 Healthy.LifestyleTRUE -2.17698768 2.49814368 -0.87144214
## Pr(>|t|)
## 1: 8.010977e-37
## 2: 1.390472e-03
## 3: 8.315250e-08
## 4: 3.541780e-03
## 5: 9.770497e-01
## ---
## 11996: 9.309291e-05
## 11997: 1.428372e-10
## 11998: 5.983010e-02
## 11999: 2.861382e-01
## 12000: 3.857331e-01
##
## $statistics$summary.stats
## sim sigma df rse r.squared adj.r.squared fstatistic f.numdf
## 1: 1 11.51544 94 11.51544 0.3581783 0.3240388 10.491624 5
## 2: 2 10.86909 94 10.86909 0.6087671 0.5879568 29.253213 5
## 3: 3 10.23716 94 10.23716 0.3830009 0.3501818 11.670062 5
## 4: 4 11.83674 94 11.83674 0.4322802 0.4020823 14.314925 5
## 5: 5 11.29339 94 11.29339 0.5794029 0.5570307 25.898359 5
## ---
## 1996: 1996 11.25075 94 11.25075 0.4970658 0.4703140 18.580639 5
## 1997: 1997 12.50857 94 12.50857 0.2661006 0.2270634 6.816591 5
## 1998: 1998 12.32557 94 12.32557 0.3624283 0.3285149 10.686880 5
## 1999: 1999 12.54309 94 12.54309 0.4168465 0.3858277 13.438508 5
## 2000: 2000 10.46022 94 10.46022 0.4398125 0.4100153 14.760190 5
## f.dendf f.pvalue
## 1: 94 5.073379e-08
## 2: 94 8.539413e-18
## 3: 94 8.738455e-09
## 4: 94 2.076056e-10
## 5: 94 2.385578e-16
## ---
## 1996: 94 8.528932e-13
## 1997: 94 1.823595e-05
## 1998: 94 3.775146e-08
## 1999: 94 6.953370e-10
## 2000: 94 1.135880e-10
##
##
## $sim.analysis
## $sim.analysis$lm.estimate.summary
## Coefficient q.25 q.75 mean st.error
## 1: (Intercept) 153.8055564 163.28857287 158.59064763 7.00464638
## 2: Age 0.3950494 0.56843278 0.48232587 0.13047951
## 3: FemaleTRUE -16.4975226 -13.48388116 -14.99592332 2.35117629
## 4: Health.Percentile -0.1260884 -0.06945318 -0.09769115 0.04188296
## 5: Exercise.Sessions -0.7333715 0.42029077 -0.14193699 0.86623729
## 6: Healthy.LifestyleTRUE -4.7566724 -1.10116880 -2.93533317 2.75277017
##
## $sim.analysis$lm.p.summary
## Coefficient reject.proportion non.reject.proportion
## 1: (Intercept) 1.0000 0.0000
## 2: Age 0.9520 0.0480
## 3: FemaleTRUE 1.0000 0.0000
## 4: Health.Percentile 0.6375 0.3625
## 5: Exercise.Sessions 0.0495 0.9505
## 6: Healthy.LifestyleTRUE 0.1830 0.8170
##
## $sim.analysis$lm.stats.summary
## stat q.25 q.75 mean st.error
## 1: sigma 10.9924961 12.1214639 11.5614994 0.82715969
## 2: rse 10.9924961 12.1214639 11.5614994 0.82715969
## 3: r.squared 0.3953620 0.4889834 0.4412176 0.07059451
## 4: adj.r.squared 0.3632004 0.4618016 0.4114951 0.07434954
## 5: fstatistic 12.2929844 17.9894098 15.4042828 4.50505328
##
## $sim.analysis$fstatistic.p.summary
## reject.proportion non.reject.proportion
## 1: 1 0
The sim.statistics.logistic model is used to separately fit logistic regression models in each experiment (as specified by the grouping.variables). This relies upon the simulated multivariate data as well as the intended formula for the logistic regression model.
stats.logistic <- sim.statistics.logistic(simdat = simdat.multivariate, the.formula = Healthy.Lifestyle ~
Age + Female + Health.Percentile + Exercise.Sessions, grouping.variables = "sim")
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
print(stats.logistic)
## $the.coefs
## sim Coefficient Estimate Std. Error z value Pr(>|z|)
## 1: 1 (Intercept) 8.570872490 3.18678409 2.6895052 0.007155803
## 2: 1 Age -0.251002813 0.08000813 -3.1372163 0.001705603
## 3: 1 FemaleTRUE 2.221545300 1.09352041 2.0315536 0.042198871
## 4: 1 Health.Percentile 0.004706005 0.01461277 0.3220475 0.747416721
## 5: 1 Exercise.Sessions 1.317739277 0.51665679 2.5505119 0.010756486
## ---
## 9996: 2000 (Intercept) 7.767045588 3.07772650 2.5236309 0.011614982
## 9997: 2000 Age -0.242044880 0.08040886 -3.0101765 0.002610959
## 9998: 2000 FemaleTRUE 0.312734358 0.93500249 0.3344744 0.738021637
## 9999: 2000 Health.Percentile 0.020461665 0.01450923 1.4102517 0.158465370
## 10000: 2000 Exercise.Sessions 0.958409635 0.40248878 2.3812084 0.017255949
##
## $summary.stats
## sim deviance aic df.residual null.deviance df.null iter dispersion
## 1: 1 34.83327 44.83327 45 62.68695 49 6 1
## 2: 2 40.12408 50.12408 45 65.34182 49 5 1
## 3: 3 51.58797 61.58797 45 66.40641 49 4 1
## 4: 4 58.74795 68.74795 45 69.23470 49 3 1
## 5: 5 44.90752 54.90752 45 67.30117 49 5 1
## ---
## 1996: 1996 57.01894 67.01894 45 69.23470 49 4 1
## 1997: 1997 52.59694 62.59694 45 69.23470 49 5 1
## 1998: 1998 56.74188 66.74188 45 68.99438 49 4 1
## 1999: 1999 51.08389 61.08389 45 68.02920 49 5 1
## 2000: 2000 44.20892 54.20892 45 69.23470 49 6 1
The analyze.simstudy.logistic summarizes the \(B\) logistic regression models across the repeated experiments.
analysis.logistic <- analyze.simstudy.logistic(the.coefs = stats.logistic$the.coefs,
summary.stats = stats.logistic$summary.stats, conf.level = 0.95, the.quantiles = c(0.1,
0.9))
print(analysis.logistic)
## $logistic.estimate.summary
## Coefficient q.10 q.90 mean st.error
## 1: (Intercept) 1.325120118 7.36873943 4.96610051 33.75119988
## 2: Age -0.189809359 -0.05993065 -0.14617820 1.12595874
## 3: FemaleTRUE -0.994708171 1.09472858 0.02736958 1.64590583
## 4: Health.Percentile -0.005779916 0.03077342 0.01402565 0.09100922
## 5: Exercise.Sessions 0.199163400 1.06745005 0.80315298 8.03293308
##
## $logistic.p.summary
## Coefficient reject.proportion non.reject.proportion
## 1: (Intercept) 0.4515 0.5485
## 2: Age 0.7950 0.2050
## 3: FemaleTRUE 0.0560 0.9440
## 4: Health.Percentile 0.1430 0.8570
## 5: Exercise.Sessions 0.5045 0.4955
##
## $logistic.stats.summary
## stat q.10 q.90 mean st.error
## 1: deviance 39.48138 58.20652 49.12613 7.5046829
## 2: aic 49.48138 68.20652 59.12613 7.5046829
## 3: df.residual 45.00000 45.00000 45.00000 0.0000000
## 4: null.deviance 61.08643 69.23470 66.20739 3.2487290
## 5: df.null 49.00000 49.00000 49.00000 0.0000000
## 6: iter 4.00000 6.00000 4.76800 0.8958668
## 7: dispersion 1.00000 1.00000 1.00000 0.0000000
The simstudy.logistic method is used to a) generate multivariate data for experiments, b) implement logistic regression models, and c) analyze these models across the repeated experiments.
study.logistic <- simstudy.logistic(the.steps = the.steps, n = 100, num.experiments = 2000,
the.formula = Healthy.Lifestyle ~ Age + Female + Health.Percentile + Exercise.Sessions,
conf.level = 0.95, the.quantiles = c(0.025, 0.1, 0.5, 0.9, 0.975), experiment.name = "sim",
seed = 222)
print(study.logistic)
## $the.steps
## [1] "Age ~ N(45, 10)"
## [2] "Female ~ binary(0.53)"
## [3] "Health.Percentile ~ U(0,100)"
## [4] "Exercise.Sessions ~ Poisson(2)"
## [5] "Diet ~ sample(('Light', 'Moderate', 'Heavy'), (0.2, 0.45, 0.35))"
## [6] "Healthy.Lifestyle ~ logistic(log(0.45) - 0.1 * (Age -45) + 0.05 * Female + 0.01 * Health.Percentile + 0.5 * Exercise.Sessions - 0.1 * (Diet == 'Moderate') - 0.4 * (Diet == 'Heavy'))"
## [7] "Weight ~ lm(150 - 15 * Female + 0.5 * Age - 0.1 * Health.Percentile - 0.2 * Exercise.Sessions + 5 * (Diet == 'Moderate') + 15 * (Diet == 'Heavy') - 2 * Healthy.Lifestyle + N(0, 10))"
##
## $simdat
## sim Age Female Health.Percentile Exercise.Sessions Diet
## 1: 1 59.87757 TRUE 53.2428192 2 Heavy
## 2: 1 48.74060 TRUE 21.1235821 2 Moderate
## 3: 1 43.40462 TRUE 89.7428594 1 Heavy
## 4: 1 24.27340 FALSE 78.9232109 2 Moderate
## 5: 1 40.92160 TRUE 40.9784437 2 Moderate
## ---
## 199996: 2000 44.08748 FALSE 0.8803819 1 Heavy
## 199997: 2000 49.89825 FALSE 66.3038554 1 Heavy
## 199998: 2000 34.60809 FALSE 97.7703888 9 Moderate
## 199999: 2000 42.53373 FALSE 34.2455237 1 Moderate
## 200000: 2000 51.71965 TRUE 57.2398308 7 Light
## Healthy.Lifestyle Weight
## 1: TRUE 167.7444
## 2: TRUE 172.1692
## 3: FALSE 152.2713
## 4: TRUE 166.4884
## 5: TRUE 154.0843
## ---
## 199996: FALSE 208.5940
## 199997: FALSE 162.0506
## 199998: TRUE 163.7009
## 199999: FALSE 176.3599
## 200000: FALSE 149.6969
##
## $statistics
## $statistics$the.coefs
## sim Coefficient Estimate Std. Error z value Pr(>|z|)
## 1: 1 (Intercept) 2.285946652 1.229073604 1.8598940 0.0629005209
## 2: 1 Age -0.061184650 0.022883910 -2.6736974 0.0075020119
## 3: 1 FemaleTRUE 0.662604161 0.453119465 1.4623167 0.1436544440
## 4: 1 Health.Percentile 0.005978490 0.007771018 0.7693317 0.4416964078
## 5: 1 Exercise.Sessions 0.185667303 0.157993205 1.1751601 0.2399306874
## ---
## 9996: 2000 (Intercept) 4.349599655 1.469778307 2.9593576 0.0030828110
## 9997: 2000 Age -0.106013657 0.028717101 -3.6916560 0.0002227987
## 9998: 2000 FemaleTRUE 0.213843598 0.464045088 0.4608250 0.6449241329
## 9999: 2000 Health.Percentile 0.006218133 0.008566194 0.7258922 0.4679048673
## 10000: 2000 Exercise.Sessions 0.208224291 0.160878118 1.2942984 0.1955623684
##
## $statistics$summary.stats
## sim deviance aic df.residual null.deviance df.null iter dispersion
## 1: 1 117.09183 127.0918 95 130.6836 99 4 1
## 2: 2 93.51307 103.5131 95 137.9888 99 5 1
## 3: 3 106.91722 116.9172 95 132.8128 99 4 1
## 4: 4 91.68049 101.6805 95 134.6023 99 5 1
## 5: 5 103.12119 113.1212 95 133.7496 99 4 1
## ---
## 1996: 1996 113.31108 123.3111 95 136.0584 99 4 1
## 1997: 1997 92.48790 102.4879 95 131.7911 99 5 1
## 1998: 1998 98.27400 108.2740 95 136.0584 99 5 1
## 1999: 1999 106.46465 116.4646 95 131.7911 99 4 1
## 2000: 2000 113.32667 123.3267 95 135.3717 99 4 1
##
##
## $sim.analysis
## $sim.analysis$logistic.estimate.summary
## Coefficient q.2.5 q.10 q.50 q.90
## 1: (Intercept) 1.022134461 1.9366636009 3.68466173 5.73934700
## 2: Age -0.174310418 -0.1493509481 -0.10465432 -0.06911418
## 3: FemaleTRUE -1.005002023 -0.5980217453 0.06354152 0.68079738
## 4: Health.Percentile -0.006602118 -0.0006587377 0.01011092 0.02208901
## 5: Exercise.Sessions 0.166758003 0.2916224153 0.53118976 0.81325366
## q.97.5 mean st.error
## 1: 7.20331742 3.77560592 1.534385977
## 2: -0.05310227 -0.10729112 0.031556005
## 3: 1.07653035 0.04598816 0.507826776
## 4: 0.02801667 0.01039520 0.008932604
## 5: 0.99250538 0.54517771 0.208341093
##
## $sim.analysis$logistic.p.summary
## Coefficient reject.proportion non.reject.proportion
## 1: (Intercept) 0.7745 0.2255
## 2: Age 0.9835 0.0165
## 3: FemaleTRUE 0.0555 0.9445
## 4: Health.Percentile 0.2075 0.7925
## 5: Exercise.Sessions 0.8185 0.1815
##
## $sim.analysis$logistic.stats.summary
## stat q.2.5 q.10 q.50 q.90 q.97.5 mean
## 1: deviance 83.72335 92.14815 105.2289 116.2879 121.4046 104.6476
## 2: aic 93.72335 102.14815 115.2289 126.2879 131.4046 114.6476
## 3: df.residual 95.00000 95.00000 95.0000 95.0000 95.0000 95.0000
## 4: null.deviance 122.17286 128.20710 134.6023 137.9888 138.5894 133.5637
## 5: df.null 99.00000 99.00000 99.0000 99.0000 99.0000 99.0000
## 6: iter 3.00000 4.00000 4.0000 5.0000 5.0000 4.4290
## 7: dispersion 1.00000 1.00000 1.0000 1.0000 1.0000 1.0000
## st.error
## 1: 9.6689714
## 2: 9.6689714
## 3: 0.0000000
## 4: 4.2571340
## 5: 0.0000000
## 6: 0.5823475
## 7: 0.0000000
Simulation studies can be used to investigate or substantiate many qualities of a planned study. In this section, we will illustrate some of the applications of the earlier results.
study.t <- simstudy.t(n = 25, mean = 0.3, sd = 1, num.experiments = 2000, alternative = "greater",
mu = 0, conf.level = 0.95, the.quantiles = c(0.025, 0.975), experiment.name = "experiment",
value.name = "x", seed = 817)
print(study.t)
## $simdat.t
## experiment x
## 1: 1 1.0668056
## 2: 1 -0.6146314
## 3: 1 -0.4373689
## 4: 1 0.3528564
## 5: 1 0.8214256
## ---
## 49996: 2000 -1.9832003
## 49997: 2000 2.0778384
## 49998: 2000 0.9263137
## 49999: 2000 1.5917713
## 50000: 2000 2.0528758
##
## $test.statistics.t
## experiment statistic df p.value lower.ci upper.ci estimate
## 1: 1 1.3583148 24 0.093498018 -0.06851698 Inf 0.26397123
## 2: 2 -0.1881835 24 0.573842530 -0.36901378 Inf -0.03656656
## 3: 3 0.5158334 24 0.305345272 -0.20995274 Inf 0.09062445
## 4: 4 1.3301625 24 0.097983811 -0.06490505 Inf 0.22676605
## 5: 5 2.8754202 24 0.004163486 0.22032277 Inf 0.54401014
## ---
## 1996: 1996 2.4060883 24 0.012092597 0.12504056 Inf 0.43276171
## 1997: 1997 -0.0905527 24 0.535700234 -0.31350069 Inf -0.01575874
## 1998: 1998 -0.1703523 24 0.566919482 -0.37240923 Inf -0.03372294
## 1999: 1999 0.3658412 24 0.358844273 -0.18927582 Inf 0.05148162
## 2000: 2000 2.1611001 24 0.020442317 0.09949100 Inf 0.47756865
## null.value alternative method
## 1: 0 greater One Sample t-test
## 2: 0 greater One Sample t-test
## 3: 0 greater One Sample t-test
## 4: 0 greater One Sample t-test
## 5: 0 greater One Sample t-test
## ---
## 1996: 0 greater One Sample t-test
## 1997: 0 greater One Sample t-test
## 1998: 0 greater One Sample t-test
## 1999: 0 greater One Sample t-test
## 2000: 0 greater One Sample t-test
##
## $sim.analysis.t
## $sim.analysis.t$estimate.summary
## q.2.5 q.97.5 mean st.error
## 1: -0.1003992 0.6894601 0.2987991 0.1995733
##
## $sim.analysis.t$stat.summary
## q.2.5 q.97.5 mean st.error
## 1: -0.4910528 3.739643 1.538052 1.061821
##
## $sim.analysis.t$p.value.summary
## reject.proportion non.reject.proportion
## 1 0.426 0.574
##
## $sim.analysis.t$ci.limit.summary
## q.2.5 q.97.5 mean st.error
## 1: -0.451764 0.3668702 -0.04020412 0.2049304
This demonstrates the selected quantiles, mean, and standard error for the estimated mean of the one-sample \(t\) test based upon the \(B\) experiments conducted in the simulation study.
print(study.t$sim.analysis.t$estimate.summary)
## q.2.5 q.97.5 mean st.error
## 1: -0.1003992 0.6894601 0.2987991 0.1995733
This demonstrates the selected quantiles, mean, and standard error for the test statistics of the one-sample \(t\) test based upon the \(B\) experiments conducted in the simulation study.
print(study.t$sim.analysis.t$stat.summary)
## q.2.5 q.97.5 mean st.error
## 1: -0.4910528 3.739643 1.538052 1.061821
study.t2 <- simstudy.t2(nx = 30, ny = 40, meanx = 0, meany = 0.2, sdx = 1, sdy = 1,
num.experiments = 2000, alternative = "less", mu = 0, conf.level = 0.9, the.quantiles = c(0.1,
0.5, 0.9), experiment.name = "experiment_id", group.name = "category", x.value = "a",
y.value = "b", value.name = "measurement", seed = 41)
print(study.t2)
## $simdat.t2
## experiment_id category measurement
## 1: 1 a -0.79436834
## 2: 1 a -0.05380187
## 3: 1 a -1.03638729
## 4: 1 a 0.32088088
## 5: 1 a -0.15735895
## ---
## 139996: 2000 b -1.24812965
## 139997: 2000 b 1.01921469
## 139998: 2000 b 1.92539513
## 139999: 2000 b 0.43106238
## 140000: 2000 b 0.39878221
##
## $test.statistics.t2
## experiment_id statistic df p.value lower.ci upper.ci
## 1: 1 -2.4406805 66.79080 0.008659102 -Inf -0.240911381
## 2: 2 -0.5966090 50.20063 0.276724805 -Inf 0.166058545
## 3: 3 -2.0641443 67.73438 0.021418533 -Inf -0.166332119
## 4: 4 -1.1731405 66.66282 0.122457116 -Inf 0.036474751
## 5: 5 -1.9871163 65.72674 0.025539963 -Inf -0.159738152
## ---
## 1996: 1996 -1.4066423 51.86134 0.082748915 -Inf -0.028622042
## 1997: 1997 -1.2632246 67.76179 0.105418846 -Inf 0.006960161
## 1998: 1998 -1.2048580 60.53539 0.116473685 -Inf 0.021924034
## 1999: 1999 0.5838079 52.11040 0.719065871 -Inf 0.441892144
## 2000: 2000 -0.3548475 59.68726 0.361977295 -Inf 0.206616067
## estimate x.estimate y.estimate null.value alternative
## 1: -0.51293286 -0.31924839 0.19368447 0 less
## 2: -0.14112115 -0.20991200 -0.06879084 0 less
## 3: -0.44590567 -0.12689675 0.31900892 0 less
## 4: -0.35293676 0.09945739 0.45239415 0 less
## 5: -0.45833178 -0.08318576 0.37514602 0 less
## ---
## 1996: -0.37088896 0.08419039 0.45507935 0 less
## 1997: -0.28411916 -0.05211736 0.23200180 0 less
## 1998: -0.29080441 0.02080981 0.31161422 0 less
## 1999: 0.13709093 0.10303569 -0.03405524 0 less
## 2000: -0.07791007 0.08914319 0.16705325 0 less
## method
## 1: Welch Two Sample t-test
## 2: Welch Two Sample t-test
## 3: Welch Two Sample t-test
## 4: Welch Two Sample t-test
## 5: Welch Two Sample t-test
## ---
## 1996: Welch Two Sample t-test
## 1997: Welch Two Sample t-test
## 1998: Welch Two Sample t-test
## 1999: Welch Two Sample t-test
## 2000: Welch Two Sample t-test
##
## $sim.analysis.t2
## $sim.analysis.t2$estimate.summary
## q.10 q.50 q.90 mean st.error
## 1: -0.4995356 -0.2004179 0.09904178 -0.1983179 0.2413597
##
## $sim.analysis.t2$stat.summary
## q.10 q.50 q.90 mean st.error
## 1: -2.072232 -0.848695 0.4213835 -0.8301772 1.019355
##
## $sim.analysis.t2$df.summary
## q.10 q.50 q.90 mean st.error
## 1: 54.78029 62.84369 67.5426 61.79574 5.090286
##
## $sim.analysis.t2$p.value.summary
## reject.proportion non.reject.proportion
## 1 0.317 0.683
##
## $sim.analysis.t2$ci.limit.summary
## q.10 q.50 q.90 mean st.error
## 1: -0.1842525 0.1102616 0.416871 0.1126389 0.2419847
This demonstrates the selected quantiles, mean, and standard error for the estimated difference in means of the two-sample \(t\) test based upon the \(B\) experiments conducted in the simulation study.
print(study.t2$sim.analysis.t2$estimate.summary)
## q.10 q.50 q.90 mean st.error
## 1: -0.4995356 -0.2004179 0.09904178 -0.1983179 0.2413597
This demonstrates the selected quantiles, mean, and standard error for the test statistics of the two-sample \(t\) test based upon the \(B\) experiments conducted in the simulation study.
print(study.t2$sim.analysis.t2$stat.summary)
## q.10 q.50 q.90 mean st.error
## 1: -2.072232 -0.848695 0.4213835 -0.8301772 1.019355
study.prop <- simstudy.prop(n = 30, p.actual = 0.42, p.hypothesized = 0.5, num.experiments = 2000,
alternative = "less", conf.level = 0.92, correct = T, the.quantiles = c(0.04,
0.5, 0.96), experiment.name = "simulation_id", value.name = "success", seed = 8001)
print(study.prop)
## $simdat.prop
## simulation_id success
## 1: 1 1
## 2: 1 0
## 3: 1 1
## 4: 1 1
## 5: 1 1
## ---
## 59996: 2000 0
## 59997: 2000 0
## 59998: 2000 0
## 59999: 2000 0
## 60000: 2000 1
##
## $test.statistics.prop
## simulation_id statistic df p.value lower.ci upper.ci estimate
## 1: 1 0.30000000 1 0.291941210 0 0.5767450 0.4333333
## 2: 2 4.03333333 1 0.022304859 0 0.4441283 0.3000000
## 3: 3 0.03333333 1 0.427566070 0 0.6085396 0.4666667
## 4: 4 1.63333333 1 0.100621310 0 0.5115639 0.3666667
## 5: 5 0.00000000 1 0.500000000 0 0.6242420 0.5000000
## ---
## 1996: 1996 0.30000000 1 0.291941210 0 0.5767450 0.4333333
## 1997: 1997 1.63333333 1 0.100621310 0 0.5115639 0.3666667
## 1998: 1998 4.03333333 1 0.022304859 0 0.4441283 0.3000000
## 1999: 1999 0.03333333 1 0.427566070 0 0.6085396 0.4666667
## 2000: 2000 5.63333333 1 0.008811045 0 0.4094787 0.2666667
## null.value alternative
## 1: 0.5 less
## 2: 0.5 less
## 3: 0.5 less
## 4: 0.5 less
## 5: 0.5 less
## ---
## 1996: 0.5 less
## 1997: 0.5 less
## 1998: 0.5 less
## 1999: 0.5 less
## 2000: 0.5 less
## method
## 1: 1-sample proportions test with continuity correction
## 2: 1-sample proportions test with continuity correction
## 3: 1-sample proportions test with continuity correction
## 4: 1-sample proportions test with continuity correction
## 5: 1-sample proportions test without continuity correction
## ---
## 1996: 1-sample proportions test with continuity correction
## 1997: 1-sample proportions test with continuity correction
## 1998: 1-sample proportions test with continuity correction
## 1999: 1-sample proportions test with continuity correction
## 2000: 1-sample proportions test with continuity correction
##
## $sim.analysis.prop
## $sim.analysis.prop$estimate.summary
## q.4 q.50 q.96 mean st.error
## 1: 0.2666667 0.4333333 0.5666667 0.42205 0.0885905
##
## $sim.analysis.prop$stat.summary
## q.4 q.50 q.96 mean st.error
## 1: 0 0.5666667 5.633333 1.317267 1.846495
##
## $sim.analysis.prop$p.value.summary
## reject.proportion non.reject.proportion
## 1 0.2135 0.7865
##
## $sim.analysis.prop$ci.limit.summary
## q.4 q.50 q.96 mean st.error
## 1: 0.4094787 0.576745 0.7008002 0.5624227 0.08459248
This demonstrates the selected quantiles, mean, and standard error for the estimated success probability of the one-sample proportions test based upon the \(B\) experiments conducted in the simulation study.
print(study.prop$sim.analysis.prop$estimate.summary)
## q.4 q.50 q.96 mean st.error
## 1: 0.2666667 0.4333333 0.5666667 0.42205 0.0885905
This demonstrates the selected quantiles, mean, and standard error for the test statistics of the one-sample proportions test based upon the \(B\) experiments conducted in the simulation study.
print(study.prop$sim.analysis.prop$stat.summary)
## q.4 q.50 q.96 mean st.error
## 1: 0 0.5666667 5.633333 1.317267 1.846495
This demonstrates the selected quantiles, mean, and standard error for the estimated difference in proportions of the two-sample proportions test based upon the \(B\) experiments conducted in the simulation study.
print(study.prop2$sim.analysis.prop2$estimate.summary)
## q.2.5 q.50 q.97.5 mean st.error
## 1: -0.275 -0.05 0.1916667 -0.04718333 0.1198669
This demonstrates the selected quantiles, mean, and standard error for the test statistics of the two-sample proportions test based upon the \(B\) experiments conducted in the simulation study.
print(study.prop2$sim.analysis.prop2$stat.summary)
## q.2.5 q.50 q.97.5 mean st.error
## 1: 0 0.2507323 4.444274 0.793184 1.308529
This demonstrates the selected quantiles, mean, and standard error for the test statistics of the \(\chi^2\) test of goodness of fit based upon the \(B\) experiments conducted in the simulation study.
print(study.chisq.gf$sim.analysis$stat.summary)
## q.25 q.75 mean st.error
## 1: 2.386667 9.053333 6.226667 4.513542
This demonstrates the selected quantiles, mean, and standard error for the test statistics of the \(\chi^2\) test of independence based upon the \(B\) experiments conducted in the simulation study.
print(study.chisq.ind$sim.analysis$stat.summary)
## q.2.5 q.97.5 mean st.error
## 1: 3.547409 26.60943 12.59337 6.036573
This demonstrates the selected quantiles, mean, and standard errors of the estimated coefficients of the linear regression model based upon the \(B\) experiments conducted in the simulation study.
print(study.lm$sim.analysis$lm.estimate.summary)
## Coefficient q.25 q.75 mean st.error
## 1: (Intercept) 153.8055564 163.28857287 158.59064763 7.00464638
## 2: Age 0.3950494 0.56843278 0.48232587 0.13047951
## 3: FemaleTRUE -16.4975226 -13.48388116 -14.99592332 2.35117629
## 4: Health.Percentile -0.1260884 -0.06945318 -0.09769115 0.04188296
## 5: Exercise.Sessions -0.7333715 0.42029077 -0.14193699 0.86623729
## 6: Healthy.LifestyleTRUE -4.7566724 -1.10116880 -2.93533317 2.75277017
This demonstrates the selected quantiles, mean, and standard errors of the estimated coefficients of the logistic regression model based upon the \(B\) experiments conducted in the simulation study.
print(study.logistic$sim.analysis$logistic.estimate.summary)
## Coefficient q.2.5 q.10 q.50 q.90
## 1: (Intercept) 1.022134461 1.9366636009 3.68466173 5.73934700
## 2: Age -0.174310418 -0.1493509481 -0.10465432 -0.06911418
## 3: FemaleTRUE -1.005002023 -0.5980217453 0.06354152 0.68079738
## 4: Health.Percentile -0.006602118 -0.0006587377 0.01011092 0.02208901
## 5: Exercise.Sessions 0.166758003 0.2916224153 0.53118976 0.81325366
## q.97.5 mean st.error
## 1: 7.20331742 3.77560592 1.534385977
## 2: -0.05310227 -0.10729112 0.031556005
## 3: 1.07653035 0.04598816 0.507826776
## 4: 0.02801667 0.01039520 0.008932604
## 5: 0.99250538 0.54517771 0.208341093
The rate of false positive conclusions can be empirically investigated in a simulation study of repeated experiments. This is performed when the data are generated under a scenario of no effect.
Here we show the rate of Type I errors (rejected null hypotheses) and true negatives (non-rejected null hypotheses) for a one-sample \(t\) test.
study.noeffect.t <- simstudy.t(n = 15, mean = 0, sd = 2, num.experiments = 2000,
seed = 71)
print(study.noeffect.t$sim.analysis.t$p.value.summary)
## reject.proportion non.reject.proportion
## 1 0.046 0.954
Here we show the rate of Type I errors (rejected null hypotheses) and true negatives (non-rejected null hypotheses) for a two-sample \(t\) test.
study.noeffect.t2 <- simstudy.t2(nx = 20, ny = 25, meanx = 0, meany = 0, sdx = 1,
sdy = 1, num.experiments = 2000, alternative = "greater", seed = 129)
print(study.noeffect.t2$sim.analysis.t2$p.value.summary)
## reject.proportion non.reject.proportion
## 1 0.045 0.955
Here we show the rate of Type I errors (rejected null hypotheses) and true negatives (non-rejected null hypotheses) for a one-sample proportions test.
study.noeffect.prop <- simstudy.prop(n = 40, p.actual = 0.25, p.hypothesized = 0.25,
num.experiments = 2000, alternative = "less", conf.level = 0.95, seed = 98)
print(study.noeffect.prop$sim.analysis.prop$p.value.summary)
## reject.proportion non.reject.proportion
## 1 0.0165 0.9835
Here we show the rate of Type I errors (rejected null hypotheses) and true negatives (non-rejected null hypotheses) for a two-sample proportions test.
study.noeffect.prop2 <- simstudy.prop2(nx = 40, ny = 40, px = 0.4, py = 0.4, num.experiments = 2000,
alternative = "two.sided", conf.level = 0.95, seed = 71)
print(study.noeffect.prop2$sim.analysis.prop2$p.value.summary)
## reject.proportion non.reject.proportion
## 1 0.028 0.972
Here we show the rate of Type I errors (rejected null hypotheses) and true negatives (non-rejected null hypotheses) for a \(\chi^2\) test of goodness of fit in which the data are generated from the hypothesized distribution.
study.noeffect.chisq.gf <- simstudy.chisq.test.gf(n = 100, values = LETTERS[1:5],
actual.probs = rep.int(x = 0.2, times = 5), hypothesized.probs = rep.int(x = 0.2,
times = 5), num.experiments = 2000, conf.level = 0.95, seed = 3)
print(study.noeffect.chisq.gf$sim.analysis$p.value.summary)
## reject.proportion non.reject.proportion
## 1: 0.0645 0.9355
Here we show the rate of Type I errors (rejected null hypotheses) and true negatives (non-rejected null hypotheses) for a \(\chi^2\) test of independence in which the data are generated from the hypothesized distribution.
study.noeffect.chisq.ind <- simstudy.chisq.test.ind(n = c(50, 50), values = LETTERS[1:5],
probs = matrix(data = 0.2, nrow = 2, ncol = 5), num.experiments = 2000, conf.level = 0.95,
seed = 8)
print(study.noeffect.chisq.ind$sim.analysis$p.value.summary)
## reject.proportion non.reject.proportion
## 1: 0.052 0.948
Here we show the rate of Type I errors (rejected null hypotheses) and true negatives (non-rejected null hypotheses) for a linear regression in which the Health.Score does not depend on the subject’s Weight.
study.noeffect.lm <- simstudy.lm(the.steps = c("Age ~ N(50,10)", "Weight ~ N(150, 10)",
"Health.Score ~ lm(47 + 0.1 * Age + N(0, 5))"), n = 100, num.experiments = 2000,
the.formula = Health.Score ~ Age + Weight, conf.level = 0.9, seed = 4)
print(study.noeffect.lm$sim.analysis$lm.p.summary)
## Coefficient reject.proportion non.reject.proportion
## 1: (Intercept) 1.0000 0.0000
## 2: Age 0.6245 0.3755
## 3: Weight 0.0975 0.9025
Here we show the rate of Type I errors (rejected null hypotheses) and true negatives (non-rejected null hypotheses) for a logistic regression in which the Hospital status does not depend on the subject’s Weight.
study.noeffect.logistic <- simstudy.logistic(the.steps = c("Age ~ N(50,10)", "Weight ~ N(150, 10)",
"Hospital ~ logistic(-2 + 0.05 * Age)"), n = 100, num.experiments = 2000, the.formula = Hospital ~
Age + Weight, conf.level = 0.4, seed = 31)
print(study.noeffect.logistic$sim.analysis$logistic.p.summary)
## Coefficient reject.proportion non.reject.proportion
## 1: (Intercept) 0.6605 0.3395
## 2: Age 0.9705 0.0295
## 3: Weight 0.6110 0.3890
When an effect exists, the rate of true positive conclusions can be empirically investigated in a simulation study of repeated experiments.
Here we show the rate of Type II errors (non-rejected null hypotheses) and true positives for a one-sample \(t\) test.
study.effect.t <- simstudy.t(n = 50, mean = 0.3, sd = 1, num.experiments = 2000,
alternative = "greater", seed = 44)
print(study.effect.t$sim.analysis.t$p.value.summary)
## reject.proportion non.reject.proportion
## 1 0.679 0.321
Here we show the rates of Type II errors (non-rejected null hypotheses) and true positives for a two-sample \(t\) test.
study.effect.t2 <- simstudy.t2(nx = 100, ny = 100, meanx = 52, meany = 50, sdx = 5,
sdy = 5, num.experiments = 2000, seed = 93, alternative = "two.sided")
print(study.effect.t2$sim.analysis.t2$p.value.summary)
## reject.proportion non.reject.proportion
## 1 0.803 0.197
Here we show the rates of Type II errors (non-rejected null hypotheses) and true positives for a one-sample proportions test.
study.effect.prop <- simstudy.prop(n = 300, p.actual = 0.8, p.hypothesized = 0.75,
num.experiments = 2000, alternative = "greater", conf.level = 0.95, seed = 81)
print(study.effect.prop$sim.analysis.prop$p.value.summary)
## reject.proportion non.reject.proportion
## 1 0.644 0.356
Here we show the rates of Type II errors (non-rejected null hypotheses) and true positives for a two-sample proportions test.
study.effect.prop2 <- simstudy.prop2(nx = 500, ny = 500, px = 0.5, py = 0.45, num.experiments = 2000,
alternative = "greater", conf.level = 0.95, seed = 117)
print(study.effect.prop2$sim.analysis.prop2$p.value.summary)
## reject.proportion non.reject.proportion
## 1 0.4685 0.5315
Here we show the rate of Type II errors (non-rejected null hypotheses) and true positives for a \(\chi^2\) test of goodness of fit.
study.effect.chisq.gf <- simstudy.chisq.test.gf(n = 100, values = LETTERS[1:5], actual.probs = c(0.3,
0.35, 0.15, 0.1, 0.1), hypothesized.probs = rep.int(x = 0.2, times = 5), num.experiments = 2000,
conf.level = 0.95, seed = 83)
print(study.effect.chisq.gf$sim.analysis$p.value.summary)
## reject.proportion non.reject.proportion
## 1: 0.9965 0.0035
Here we show the rates of Type II errors (non-rejected null hypotheses) and true positives for a \(\chi^2\) test of independence.
study.effect.chisq.ind <- simstudy.chisq.test.ind(n = c(300, 300), values = LETTERS[1:5],
probs = matrix(data = c(0.25, 0.25, 0.2, 0.2, 0.1, rep.int(x = 0.2, times = 5)),
nrow = 2, ncol = 5, byrow = T), num.experiments = 2000, conf.level = 0.95,
seed = 8)
print(study.effect.chisq.ind$sim.analysis$p.value.summary)
## reject.proportion non.reject.proportion
## 1: 0.8485 0.1515
Here we show the rates of Type II errors (non-rejected null hypotheses) and true positives for a linear regression in which the Health.Score depends on the subject’s Weight.
study.effect.lm <- simstudy.lm(the.steps = c("Age ~ N(50,10)", "Weight ~ N(150, 10)",
"Health.Score ~ lm(40 + 0.1 * Age + 0.05 * Weight + N(0, 5))"), n = 100, num.experiments = 2000,
the.formula = Health.Score ~ Age + Weight, conf.level = 0.9, seed = 4)
print(study.effect.lm$sim.analysis$lm.p.summary)
## Coefficient reject.proportion non.reject.proportion
## 1: (Intercept) 1.0000 0.0000
## 2: Age 0.6245 0.3755
## 3: Weight 0.2630 0.7370
Here we show the rates of Type II errors (non-rejected null hypotheses) and true positives for a logistic regression in which the Hospital status depends on the subject’s Weight.
glm.steps <- c("Age ~ N(50,10)", "Weight ~ N(170, 15)", "Hospital ~ logistic(-2 + 0.1 * (Age-50) + 0.08 * (Weight-170))")
study.effect.logistic <- simstudy.logistic(the.steps = glm.steps, n = 100, num.experiments = 2000,
the.formula = Hospital ~ Age + Weight, conf.level = 0.95, seed = 31)
print(study.effect.logistic$sim.analysis$logistic.p.summary)
## Coefficient reject.proportion non.reject.proportion
## 1: (Intercept) 0.9990 0.0010
## 2: Age 0.8990 0.1010
## 3: Weight 0.9675 0.0325