This package implements methods to evaluate the performance characteristics of various point and interval estimators for adaptive two-stage designs with prespecified sample-size recalculation rules. Further, it allows for evaluation of these estimators on real datasets, and it implements methods to calculate p-values.
Currently, it works for designs objects which were produced by the
R-package adoptr
, which calculates optimal design
parameters for adaptive two-stage designs. You can learn about adoptr
here: optad.github.io/adoptr/.
An introductory vignette covering common usecases is given at https://jan-imbi.github.io/adestr/articles/Introduction.html.
This package comes suite of unit tests. The code of the test cases can be viewed here: https://github.com/jan-imbi/adestr/tree/master/tests/testthat. The authors assume no responsibility for the correctness of the code or results produced by its usage. Use at your own risk.
You may also be interested in the reference implementation looking at the https://github.com/jan-imbi/adestr/blob/master/R/reference_implementation.R. It uses the same notation as in our paper (doi.org/10.1002/sim.10020) and may therefore be easier to understand at first.
In order to showcase the capabilities of this package, we need a trial design first. We refer to the example from the adoptr documentation for this. You can read more about optimal adaptive designs fitted via the adoptr package here: optad.github.io/adoptr/articles/adoptr_jss.html.
For the sake of this introduction, a pre-computed version of the
first example from optad.github.io/adoptr/articles/adoptr.html
is provided with this package via the get_example_design
function.
Now that we have created an optimal adaptive design, we can
investigate the performance characteristics of various estimators for
the mean in that design. To this end, the
evaluate_estimator
function can be used.
evaluate_estimator(
score = MSE(),
estimator = SampleMean(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = 0.3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: Sample mean
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Expectation: 0.3056727
#> Bias: 0.005672677
#> Variance: 0.03777784
#> MSE: 0.03781002
The mean squared error of the sample mean depends on the true
underlying value of the paramter \(\mu\), which of course is unknown.
Therefore, it may be advisable to use the
evaluate_estimator
function on an array of values for \(\mu\) to investigate the distributional
properties of an estimator.
In the following, the MSE of the sample mean vs. a weighted sample mean with fixed weights will be plotted.
mse_mle <- evaluate_estimator(
score = MSE(),
estimator = SampleMean(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = seq(-0.75, 1.32, .03),
sigma = 1
)
mse_weighted_sample_means <- evaluate_estimator(
score = MSE(),
estimator = WeightedSampleMean(w1 = .8),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = seq(-0.75, 1.32, .03),
sigma = 1
)
plot(c(mse_mle, mse_weighted_sample_means))
Median unbiased estimators are estimators where the probability of
overestimation is equal to the probability of underestimation. They can
be derived by choosing a sample space ordering. In adestr
,
5 different sample-space orderings are implemented: The MLE (maximum
likelihood estimator) ordering, the LR (likelihood ratio) test ordering,
the ST (score test) ordering, the SWCF (stage-wise combination function)
ordering, and the NP (Neyman-Pearson) ordering. The latter (NP ordering)
is only useful for the calculation of p-values, estimators derived from
that ordering are usually nonsense.
Lets look at the ‘median-unbiased’ property for estimators derived from the four orderings:
evaluate_estimator(
score = OverestimationProbability(),
estimator = MedianUnbiasedMLEOrdering(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = 0.4,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: Median unbiased (MLE ordering)
#> Assumed sigma: 1
#> Assumed mu: 0.4
#> Results:
#> Probability of overestimation: 0.4988042
evaluate_estimator(
score = OverestimationProbability(),
estimator = MedianUnbiasedLikelihoodRatioOrdering(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = 0.4,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: Median unbiased (LR test ordering)
#> Assumed sigma: 1
#> Assumed mu: 0.4
#> Results:
#> Probability of overestimation: 0.4990921
evaluate_estimator(
score = OverestimationProbability(),
estimator = MedianUnbiasedScoreTestOrdering(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = 0.4,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: Median unbiased (Score test ordering)
#> Assumed sigma: 1
#> Assumed mu: 0.4
#> Results:
#> Probability of overestimation: 0.5037828
evaluate_estimator(
score = OverestimationProbability(),
estimator = MedianUnbiasedStagewiseCombinationFunctionOrdering(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = 0.4,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: Median unbiased (SWCF ordering)
#> Assumed sigma: 1
#> Assumed mu: 0.4
#> Results:
#> Probability of overestimation: 0.4976513
Compare this to the Overestimation probability of the sample mean:
evaluate_estimator(
score = OverestimationProbability(),
estimator = SampleMean(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = 0.4,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: Sample mean
#> Assumed sigma: 1
#> Assumed mu: 0.4
#> Results:
#> Probability of overestimation: 0.5943933
Apart from sample-space ordering based methods, there are various other ways of defining alternative point estimators which may have desirable properties. Here are a couple that are presented in our paper evaluated for mu=0.3 and sigma=1.
evaluate_estimator(
score = MSE(),
estimator = SampleMean(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = 0.3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: Sample mean
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Expectation: 0.3056727
#> Bias: 0.005672677
#> Variance: 0.03777784
#> MSE: 0.03781002
evaluate_estimator(
score = MSE(),
estimator = BiasReduced(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = 0.3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: Bias reduced MLE (iterations=1)
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Expectation: 0.3042277
#> Bias: 0.004227723
#> Variance: 0.03453116
#> MSE: 0.03454903
evaluate_estimator(
score = MSE(),
estimator = PseudoRaoBlackwell(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = 0.3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: Pseudo Rao-Blackwellized
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Expectation: 0.3022287
#> Bias: 0.002228688
#> Variance: 0.03268444
#> MSE: 0.03268941
evaluate_estimator(
score = MSE(),
estimator = RaoBlackwell(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = 0.3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: Rao-Blackwellized
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Expectation: 0.300153
#> Bias: 0.0001530249
#> Variance: 0.03548938
#> MSE: 0.03548941
evaluate_estimator(
score = MSE(),
estimator = AdaptivelyWeightedSampleMean(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = 0.3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: AdaptivelyWeightedSampleMean(w1^2=50%)
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Expectation: 0.3039742
#> Bias: 0.003974222
#> Variance: 0.03817511
#> MSE: 0.0381909
The coverage of an interval estimator is the probability of that an estimator will cover the true value of the parameter in question. It may be evaluated like this:
evaluate_estimator(
score = Coverage(),
estimator = NaiveCI(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = .07,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: Naive CI
#> Assumed sigma: 1
#> Assumed mu: 0.07
#> Results:
#> Coverage: 0.933259
As you can see, the naive confidence interval does not have the correct coverage of 95%. Here is an example for an estimator that achieves the correct coverage:
evaluate_estimator(
score = Coverage(),
estimator = LikelihoodRatioOrderingCI(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = .07,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: LR test ordering CI
#> Assumed sigma: 1
#> Assumed mu: 0.07
#> Results:
#> Coverage: 0.9500889
For the parameter of mu that we chose in this example, the naive confidence interval performs particularly bad. We can plot the coverage of the naive confidence interval for an array of values like this:
Amongst the interval estimators which have the correct coverage, one might be interested in selecting the one with the least width. We can evaluate the expected width of an interval estimator for a particular set of assumptions like this:
evaluate_estimator(
score = Width(),
estimator = MLEOrderingCI(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = .3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: MLE ordering CI
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Width: 0.6453952
evaluate_estimator(
score = Width(),
estimator = LikelihoodRatioOrderingCI(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = .3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: LR test ordering CI
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Width: 0.6351904
evaluate_estimator(
score = Width(),
estimator = ScoreTestOrderingCI(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = .3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: Score test ordering CI
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Width: 0.6502033
evaluate_estimator(
score = Width(),
estimator = StagewiseCombinationFunctionOrderingCI(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = .3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: SWCF ordering CI
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Width: 0.6575988
When choosing a combination of point and interval estimator to report at the end of a trial, one might want the point estimator to be more or less in the middle of the respective interval. To evaluate the ‘centrality’ of an estimator, which in this case is defined as the difference between the distance of the point estimator the lower end of an interval and the upper end of an interval, one can use the following command:
evaluate_estimator(
score = Centrality(interval = StagewiseCombinationFunctionOrderingCI()),
estimator = SampleMean(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = .3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: Sample mean
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Centrality with respect to SWCF ordering CI: 0.00715509
evaluate_estimator(
score = Centrality(interval = NaiveCI()),
estimator = SampleMean(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = .3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: Sample mean
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Centrality with respect to Naive CI: -4.072497e-18
In the framework of optimal adaptive designs, the rejection boundaries for the primary testing decision of a design are optimized directly and are not derived from a common test statistic. Therefore, confidence intervals derived from such statistics do not necessarily agree with the primary test decision. One may evaluate the chance of agreement like this:
evaluate_estimator(
score = TestAgreement(),
estimator = MLEOrderingCI(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = .3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: MLE ordering CI
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Agreement with test decision: 0.9867598
evaluate_estimator(
score = TestAgreement(),
estimator = LikelihoodRatioOrderingCI(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = .3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: LR test ordering CI
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Agreement with test decision: 0.9859001
evaluate_estimator(
score = TestAgreement(),
estimator = ScoreTestOrderingCI(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = .3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: Score test ordering CI
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Agreement with test decision: 0.8269152
evaluate_estimator(
score = TestAgreement(),
estimator = StagewiseCombinationFunctionOrderingCI(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = .3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: SWCF ordering CI
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Agreement with test decision: 1.000034
The confidence interval derived from the stage-wise combination function ordering (by its construction) always agrees with the primary testing decision.
Like confidence intervals, p-values always come with an associated test decision. However, for the same reason as described in the chapter about interval estimators, these test decision derived from various ways of calculating p-values may not necessarily agree with the primary test decision of an optimal adaptive design. One may evaluate the probability of agreement for a particular set of assumptions like this:
evaluate_estimator(
score = TestAgreement(),
estimator = MLEOrderingPValue(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = .3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: MLE ordering p-value
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Agreement with test decision: 0.9846693
evaluate_estimator(
score = TestAgreement(),
estimator = LikelihoodRatioOrderingPValue(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = .3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: LR test ordering p-value
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Agreement with test decision: 0.9859001
evaluate_estimator(
score = TestAgreement(),
estimator = ScoreTestOrderingPValue(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = .3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: Score test ordering p-value
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Agreement with test decision: 0.8269606
evaluate_estimator(
score = TestAgreement(),
estimator = StagewiseCombinationFunctionOrderingPValue(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
mu = .3,
sigma = 1
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Estimator: SWCF ordering p-value
#> Assumed sigma: 1
#> Assumed mu: 0.3
#> Results:
#> Agreement with test decision: 1.000034
Again, we see that only the p-value derived from the stage-wise combination function ordering always agrees with the primary testing decision.
So far we have only looked at evaluating performance characteristics
for a particular set of assumptions, a choice of performance
characteristics and a choice of a few estimators. However, when
designing a trial, one might want to produce results for a variety of
different scenarios, which can be time consuming. The evaluation of
performance characteristics in adestr
can run in parallel
using the future
framework.
library(future.apply)
#> Loading required package: future
# Change to e.g. plan(multisession) for parallel computing
plan(sequential)
# Scenario 1:
scores1 <- list(MSE(), OverestimationProbability())
estimators1 <- list(SampleMean(), BiasReduced())
dist1 <- list(Normal(two_armed = TRUE))
designs1 <- list(get_example_design(two_armed = TRUE))
mu1 <- seq(-1,1,.5)
sigma1 <- 1
# Scenario 2:
scores2 <- list(Coverage(), Width())
estimators2 <- list(NaiveCI(), StagewiseCombinationFunctionOrderingCI())
dist2 <- list(Normal(two_armed = TRUE))
designs2 <- list(get_example_design(two_armed = TRUE))
mu2 <- seq(-1,1,.5)
sigma2 <- 1
# Evaluate in parallel
res <- evaluate_scenarios_parallel(
score_lists = list(scores1, scores2),
estimator_lists = list(estimators1, estimators2),
data_distribution_lists = list(dist1, dist2),
design_lists = list(designs1, designs2),
mu_lists = list(mu1, mu2),
sigma_lists = list(sigma1, sigma2)
)
res[[1]]
#> estimator data_distribution
#> 1 Bias reduced MLE (iterations=1) Normal<two-armed>
#> 2 Bias reduced MLE (iterations=1) Normal<two-armed>
#> 3 Bias reduced MLE (iterations=1) Normal<two-armed>
#> 4 Bias reduced MLE (iterations=1) Normal<two-armed>
#> 5 Bias reduced MLE (iterations=1) Normal<two-armed>
#> 6 Sample mean Normal<two-armed>
#> 7 Sample mean Normal<two-armed>
#> 8 Sample mean Normal<two-armed>
#> 9 Sample mean Normal<two-armed>
#> 10 Sample mean Normal<two-armed>
#> design mu sigma Expectation
#> 1 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> -0.5 1 -0.49939216
#> 2 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> -1.0 1 -0.99999772
#> 3 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> 0.0 1 -0.01606505
#> 4 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> 0.5 1 0.51833127
#> 5 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> 1.0 1 0.99918705
#> 6 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> -0.5 1 -0.50010511
#> 7 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> -1.0 1 -0.99999807
#> 8 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> 0.0 1 -0.02491862
#> 9 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> 0.5 1 0.52702761
#> 10 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> 1.0 1 1.00028935
#> error_Expectation functionEvaluations_Expectation Bias error_Bias
#> 1 1.960169e-05 96 6.078357e-04 1.960169e-05
#> 2 9.514441e-04 62 2.284733e-06 9.514441e-04
#> 3 1.777337e-05 213 -1.606505e-02 1.777337e-05
#> 4 4.691721e-04 591 1.833127e-02 4.691721e-04
#> 5 1.116012e-05 383 -8.129478e-04 1.116012e-05
#> 6 8.315412e-06 96 -1.051082e-04 8.315412e-06
#> 7 9.514289e-04 62 1.932112e-06 9.514289e-04
#> 8 1.594939e-05 213 -2.491862e-02 1.594939e-05
#> 9 2.894096e-04 523 2.702761e-02 2.894096e-04
#> 10 9.213421e-06 383 2.893473e-04 9.213421e-06
#> functionEvaluations_Bias Variance error_Variance
#> 1 96 0.03584328 1.763680e-05
#> 2 62 0.03550108 6.144227e-09
#> 3 213 0.02797111 7.090928e-06
#> 4 591 0.02881243 2.126220e-05
#> 5 383 0.03583686 1.865170e-05
#> 6 96 0.03539081 9.104316e-06
#> 7 62 0.03549953 4.460077e-09
#> 8 213 0.02779093 7.299455e-06
#> 9 523 0.02945197 9.840547e-06
#> 10 383 0.03520539 1.135277e-05
#> functionEvaluations_Variance MSE error_MSE functionEvaluations_MSE
#> 1 96 0.03584365 3.723849e-05 192
#> 2 122 0.03550108 9.514503e-04 184
#> 3 247 0.02822920 2.486430e-05 460
#> 4 707 0.02914847 4.904343e-04 1298
#> 5 213 0.03583752 2.981182e-05 596
#> 6 96 0.03539082 1.741973e-05 192
#> 7 122 0.03549953 9.514333e-04 184
#> 8 247 0.02841187 2.324884e-05 460
#> 9 553 0.03018246 2.992501e-04 1076
#> 10 213 0.03520547 2.056619e-05 596
#> Probability of overestimation error_Probability of overestimation
#> 1 0.5001404 2.251825e-04
#> 2 0.5000217 3.779495e-04
#> 3 0.5055955 1.861311e-03
#> 4 0.5141305 6.286754e-03
#> 5 0.4997007 3.746175e-04
#> 6 0.4999989 2.248484e-05
#> 7 0.5000217 3.779495e-04
#> 8 0.4736938 1.457061e-03
#> 9 0.5550895 1.395147e-02
#> 10 0.4999743 1.249999e-04
#> functionEvaluations_Probability of overestimation
#> 1 528
#> 2 362
#> 3 1393
#> 4 1333
#> 5 407
#> 6 528
#> 7 362
#> 8 1303
#> 9 1363
#> 10 377
#> EstimatorScoreResult
#> 1 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 2 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 3 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 4 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 5 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 6 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 7 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 8 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 9 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 10 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
res[[2]]
#> estimator data_distribution
#> 1 Naive CI Normal<two-armed>
#> 2 Naive CI Normal<two-armed>
#> 3 Naive CI Normal<two-armed>
#> 4 Naive CI Normal<two-armed>
#> 5 Naive CI Normal<two-armed>
#> 6 SWCF ordering CI Normal<two-armed>
#> 7 SWCF ordering CI Normal<two-armed>
#> 8 SWCF ordering CI Normal<two-armed>
#> 9 SWCF ordering CI Normal<two-armed>
#> 10 SWCF ordering CI Normal<two-armed>
#> design mu sigma Coverage
#> 1 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> -0.5 1 0.9500604
#> 2 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> -1.0 1 0.9499647
#> 3 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> 0.0 1 0.9449795
#> 4 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> 0.5 1 0.9372047
#> 5 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> 1.0 1 0.9500391
#> 6 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> -0.5 1 0.9499836
#> 7 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> -1.0 1 0.9499647
#> 8 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> 0.0 1 0.9500699
#> 9 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> 0.5 1 0.9499538
#> 10 TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80> 1.0 1 0.9498600
#> error_Coverage functionEvaluations_Coverage Width error_Width
#> 1 0.0008288183 712 0.7383716 9.444517e-06
#> 2 0.0008820986 482 0.7384423 3.414488e-04
#> 3 0.0010153071 1273 0.6918503 2.783059e-05
#> 4 0.0034991405 1573 0.6778015 1.804778e-04
#> 5 0.0008299689 1453 0.7382501 1.050093e-05
#> 6 0.0008194698 542 0.7384927 8.305015e-06
#> 7 0.0008820986 482 0.7384423 3.414488e-04
#> 8 0.0021817246 1273 0.7163671 1.228666e-04
#> 9 0.0006645727 1063 0.6996062 1.999098e-04
#> 10 0.0008090573 467 0.7384132 1.013179e-05
#> functionEvaluations_Width
#> 1 164
#> 2 62
#> 3 179
#> 4 421
#> 5 383
#> 6 62
#> 7 62
#> 8 281
#> 9 455
#> 10 383
#> EstimatorScoreResult
#> 1 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 2 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 3 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 4 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 5 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 6 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 7 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 8 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 9 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
#> 10 <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>, <S4 class 'EstimatorScoreResult' [package "adestr"] with 8 slots>
You will get back a list of data.frames containing the results for each scenario. Note that within one scenario, the evaluation will take place for the cross-product of all arguments that are supplied! This means that within one scenario, every estimator is evaluated with every score at every point of mu for every sigma and every design. Depending on your settings, this can get very time-consuming very fast, making parallelization essential.
Next, let us look at how to the package can be used to calculate estimates after data has been collected.
The first stage data of a trial might look like this:
set.seed(321)
dat <- data.frame(
endpoint = c(rnorm(56, .3, 1), rnorm(56, 0, 1)),
group = factor(rep(c("trt", "ctl"),
c(56,56)), levels = c("trt", "ctl")),
stage = rep(1, 112)
)
head(dat)
#> endpoint group stage
#> 1 2.00490322 trt 1
#> 2 -0.41203856 trt 1
#> 3 0.02201509 trt 1
#> 4 0.18035098 trt 1
#> 5 0.17603938 trt 1
#> 6 0.56818377 trt 1
analyze(data = dat,
statistics = list(),
data_distribution = Normal(two_armed = TRUE),
design = get_example_design(two_armed = TRUE),
sigma = 1)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Observed number of stages: 1
#> Observed n1 (group 1) 56
#> Observed n1 (group 2) 56
#> Observed n1 (total) 112
#> Z1 1.75
#> Interim decision: continue to second stage
#> Calculated n2(Z1) (per group) 46.99923
#> Calculated c2(Z1) 1.14
The results suggest recruiting 23 more patients per group for the second stage.
We will simulate 47 more patients per group:
dat <- rbind(dat,
data.frame(
endpoint = c(rnorm(47, .3, 1), rnorm(47, 0, 1)),
group = factor(rep(c("trt", "ctl"),
c(47, 47)), levels = c("trt", "ctl")),
stage = rep(2, 94)
))
Now, we can use the analyze
function to evaluate a
selection of estimators on the complete dataset:
analyze(
data = dat,
statistics = c(
SampleMean(),
BiasReduced(),
PseudoRaoBlackwell(),
MedianUnbiasedStagewiseCombinationFunctionOrdering(),
StagewiseCombinationFunctionOrderingCI(),
StagewiseCombinationFunctionOrderingPValue()
),
data_distribution = Normal(two_armed = TRUE),
sigma = 1,
design = get_example_design(two_armed = TRUE)
)
#> Design: TwoStageDesign<n1=56;0.8<=x1<=2.3:n2=18-80>
#> Data Distribution: Normal<two-armed>
#> Observed number of stages: 2
#> Observed n1 (group 1) 56
#> Observed n1 (group 2) 56
#> Observed n1 (total) 112
#> Z1 1.75
#> Interim decision: continue to second stage
#> Calculated n2(Z1) (per group) 46.99923
#> Calculated c2(Z1) 1.14
#> Observed n2 (group 1) 47
#> Observed n2 (group 2) 47
#> Observed n2 (in total) 94
#> Z2 2.71
#> Final test decision: reject null
#>
#> Stage 2 results:
#> Sample mean: 0.434684
#> Bias reduced MLE (iterations=1): 0.4221533
#> Pseudo Rao-Blackwellized: 0.2658506
#> Median unbiased (SWCF ordering): 0.3047428
#> SWCF ordering CI: [0.04435513, 0.5484439]
#> SWCF ordering p-value: 0.01097266
The estimates presented here are for the difference in means of the two normal distributions. Keep in mind that a difference of \(\mu=0.3\) was used in the simulation.
Note that while the median unbiased estimator performs well in this particular example, this is not universally true.