Health-related quality of life is a key outcome in health technology assessments because it is patient-relevant and it is needed to calculate quality-adjusted life years. Quality of life instruments typically measure health problems in multiple domains using ordinal Likert scales. Value sets or valuation functions convert these profiles of ordinal measures into cardinal health-related utilities between 1 (perfect health) and minus infinity, where 0 represents death, and negative values represent health states worse than death. Because 100% quality of life represents perfect health, health state utilities are limited at 1. The lowest possible utility in a local value set further defines a lower limit of health state utilities in a local population. Thus, health state utilities are limited dependent variables. In addition, health state utilities often show gaps between 1 and the next smaller utility in the value set. These gaps occur more frequently in quality of life instruments with few levels in the Likert scales such as the EQ-5D-3L (Mulhern et al. 2018). A last but important particularity of health state utilities is that they can be the consequence of multiple latent classes, or they can exhibit multi-modal marginal densities (Hernández Alava et al. 2014).
Adjusted limited dependent variable mixture models are finite mixtures of normal distributions that account for limits, gaps between 1 and the next smaller utility value and multi-modality (Hernández Alava, Wailoo, and Ara 2012; Hernández Alava et al. 2013, 2014; Hernández Alava and Wailoo 2015; Mukuria et al. 2019). These features can improve empirical fit, parameter identification and predictive accuracy compared to standard regression models. Thus, adjusted limited dependent variable mixture models are particularly useful for mapping studies (Gray, Wailoo, and Hernández Alava 2018; Gray, Hernández Alava, and Wailoo 2018; Dixon, Hollingworth, and Sparrow 2020; Yang et al. 2019; Xu et al. 2020; Fuller et al. 2017; Pennington et al. 2020).
The R ‘aldvmm’ package is an implementation of the adjusted limited dependent variable mixture model proposed by Hernández Alava and Wailoo (2015) using normal component distributions and a multinomial logit model of probabilities of component membership.
The objectives of this vignette are to demonstrate the usage of the ‘aldvmm’ package, show important challenges of fitting adjusted limited dependent variable mixture models and validate the R implementation against the STATA package (Hernández Alava and Wailoo 2015) using publicly available data.
Adjusted limited dependent variable mixture models are finite mixtures of normal distributions in \(K\) components \(c\) with conditional expectations \(E[y|X, c] = X\beta^{c}\) and standard deviations \(\sigma^{c}\). The probabilities of component membership are estimated using a multinomial logit model as \(P[c|X]=exp(X\delta^{c})/\sum_{k=1}^{K}exp(X\delta^{k})\). The model accumulates the density mass of the finite mixture below a minimum value \(\Psi_1\) at the value \(\Psi_1\), and the density mass above a maximum value \(\Psi_{2}\) at 1. If the maximum value \(\Psi_2\) is smaller than 1, the model emulates a value set with a gap between 1 and the next smaller value.
\[\begin{equation} \label{eq:limits} \begin{array}{ll} y_{i}|c =& \begin{cases} \begin{array}{ll} 1 & \text{if } y_{i}|c > \Psi_{2}\\ \Psi_{1} & \text{if } y_{i}|c \leq \Psi_{1}\\ y_{i}|c & \text{if } \Psi_{1} < y_{i}|c \leq \Psi_{2}\\ \end{array} \end{cases} \end{array} \end{equation}\]
In this vignette, we estimate the same models of post-operative EQ-5D-3L utilities as Hernández Alava and Wailoo (2015) and include post-operative Oxford Hip Scores (divided by 10) as the only explanatory variable \(x\).
\[\begin{equation} \label{eq:models} \begin{array}{lrl} \text{Model 1:}& E[y|c, X] &= \beta_{0}^{c} + \beta_{1}^{c}x\\ & P[c|X] &= \text{mlogit}(\delta_{0}^{c})\\ &&\\ \text{Model 2:}& E[y|c, X] &= \beta_{0}^{c} + \beta_{1}^{c}x\\ & P[c|X] &= \text{mlogit}(\delta_{0}^{c} + \delta_{1}^{c}x) \end{array} \end{equation}\]
The aldvmm()
function fits an adjusted limited dependent variable mixture model using the likelihood function from Hernández Alava and Wailoo (2015). The function calls optimx::optimr()
to minimize the negative log-likelihood using analytical gradients from aldvmm.gr()
. The aldvmm()
function accepts all optimization methods available in optimx::optimr()
except for “nlm”, which requires a different implementation of the likelihood function. The default optimization method is “BFGS”.
The model formula in aldvmm()
is an object of class “formula” with two parts on the right-hand side of ~
. The first part on the left of the |
delimiter represents the model of expected values of \(K\) normal distributions. The second part on the right of the |
delimiter represents the model of probabilities of component membership from a multinomial logit model.
The ‘aldvmm’ package provides four options for the generation of starting values of the optimization algorithm.
“zero”: A vector of zeroes (default).
“random”: A vector of standard normal random values.
“constant”: Parameter estimates of a constant-only model as starting values for intercepts and standard deviations, and zeroes for all other parameters. The auxiliary models for obtaining starting values are fitted using zero starting values.
“sann”: Parameter estimates of a simulated annealing algorithm.
The ‘aldvmm’ package obtains fitted values using the expected value function from Hernández Alava and Wailoo (2015). Covariance matrices and standard errors of parameters are obtained from a numerical approximation of the hessian matrix using numDeriv::hessian()
. Standard errors of fitted values in the estimation data \(SE^{fit}_{i}\) and standard errors of predicted values in new data \(SE^{pred}_{i}\) are calculated using the delta method (Dowd, Greene, and Norton 2014; Whitmore 1986). \(G_{i}\) denotes the gradient of a fitted value with respect to changes in parameter estimates, \(\Sigma\) denotes the covariance matrix of parameters, and \(MSE\) denotes the mean squared error of fitted versus observed values in the estimation data.
\[\begin{equation}
\begin{array}{rl}
SE^{fit}_{i} &= \sqrt{G_{i}'\Sigma G_{i}}
\end{array}
\end{equation}\]
\[\begin{equation}
\begin{array}{rl}
SE^{pred}_{i} &= \sqrt{MSE + G_{i}'\Sigma G_{i}}
\end{array}
\end{equation}\]
The aldvmm()
function returns an object of S3 class “aldvmm” for which methods for generic functions print()
, summary()
, stats::predict()
, stats::coef()
, stats::nobs()
, stats::vcov()
, stats::model.matrix()
, stats::formula()
, stats::residuals()
, stats::update()
and sandwich::estfun()
are available. Objects of class “aldvmm” can be supplied to sandwich::sandwich()
, sandwich::vcovCL()
, sandwich::vcovPL()
, sandwich::vcovHAC()
and sandwich::vcovBS()
to estimate robust or clustered standard errors (see example in the appendix). sandwich::estfun()
calls aldvmm.sc()
to return analytical gradients of the log-likelihood with respect to parameters for each observation, which allows fast computation of robust standard errors. sandwich::vcovBS()
allows re-estimating the covariance matrix using bootstrapping with and without clustering, which can be particularly useful in cases with no valid covariance matrix from model fitting.
The latest version of the ‘aldvmm’ package can be installed from cran.
We analyze the same publicly available EQ-5D-3L utility data from English patients after hip replacement in 2011 and 2012 (NHS Digital 2013) as Hernández Alava and Wailoo (2015) in their description of the STATA ALDVMM package.
temp <- tempfile()
url <- paste0("https://files.digital.nhs.uk/publicationimport/pub11xxx/",
"pub11359/final-proms-eng-apr11-mar12-data-pack-csv.zip")
download.file(url, temp)
rm(url)
df <- read.table(unz(description = temp,
filename = "Hip Replacement 1112.csv"),
sep = ",",
header = TRUE)
unlink(temp)
rm(temp)
df <- df[, c("AGEBAND", "SEX", "Q2_EQ5D_INDEX", "HR_Q2_SCORE")]
df <- df[df$AGEBAND != "*" & df$SEX != "*", ]
df$eq5d <- df$Q2_EQ5D_INDEX
df$hr <- df$HR_Q2_SCORE/10
df <- df[stats::complete.cases(df), ]
set.seed(101010101)
df <- df[sample(1:nrow(df), size = nrow(df)*0.3), ]
The data includes 35’166 observations with complete information on patients’ post-operative utilities, Oxford Hip Scores, age and sex. Like Hernández Alava and Wailoo (2015), we draw a 30% sub-sample of 10’549 observations from the population of complete observations of these variables. Although we follow a similar approach in data preparation preparation as Hernández Alava and Wailoo (2015), our random sample is not identical to the data used in their study. Post-operative EQ-5D-3L utilities from English value sets (Dolan 1997) show a bimodal distribution, limits at -0.594 and 1 and a gap between 1 and 0.883 (figure 4.1).
We first fit model 1 with the default “BFGS” optimization method and “zero” initial values. The values 0.883 and -0.594 in the argument ‘psi’ represent the maximum and minimum values smaller than 1 in the English value set (Dolan 1997). As the data shows a bi-modal distribution (figure 4.1), we estimate a mixture of 2 normal distributions (‘ncmp’ = 2). aldvmm()
returns an object of class “aldvmm”.
library("aldvmm")
fit <- aldvmm::aldvmm(eq5d ~ hr | 1,
data = df,
psi = c(0.883, -0.594),
ncmp = 2)
summary(fit)
pred <- predict(fit,
se.fit = TRUE,
type = "fit")
We obtain a summary table of regression results using the generic function summary()
. The model converges at a log-likelihood of -706 and an Akaike information criterion value of -1’399 (table 5.1). The function aldvmm()
returns the negative log-likelihood and Akaike information criterion, and smaller values suggest better fit.
The coefficients of the intercepts and covariates for the expected values \(E[y|c, X]\) of the normal distributions can be interpreted as marginal effects on component means. ‘lnsigma’ denotes the natural logarithm of the estimated standard deviation \(\sigma^{c}\). The coefficients of covariates in the multinomial logit model of probabilities of component membership are log-transformed relative probabilities. Our model only includes two components, and the multinomial logit model collapses to a binomial logit model. The intercept of -0.7281 means that the average probability of an observation in the data to belong to component 1 is \(\text{exp}(-0.7281)\) or 0.4828255 times the probability to belong to component 2.
Estimate | Std. Error | t value | Pr(>|t|) | 2.5 % | 97.5 % | ||
---|---|---|---|---|---|---|---|
E[y|X, c] | |||||||
Comp1 | (Intercept) | -0.4312 | 0.0224 | -19.2471 | 0 | -0.4751 | -0.3873 |
hr | 0.3136 | 0.0065 | 47.9444 | 0 | 0.3008 | 0.3264 | |
lnsigma | -1.2479 | 0.0215 | -57.9753 | 0 | -1.2901 | -1.2057 | |
Comp2 | (Intercept) | 0.2358 | 0.0069 | 34.1725 | 0 | 0.2223 | 0.2493 |
hr | 0.1459 | 0.0019 | 76.359 | 0 | 0.1422 | 0.1497 | |
lnsigma | -2.4623 | 0.0178 | -138.1184 | 0 | -2.4972 | -2.4273 | |
P[c|X] | |||||||
Comp1 | (Intercept) | -0.7281 | 0.0607 | -12.0038 | 0 | -0.847 | -0.6092 |
N = 10549 | ll = -706 | AIC = -1399 | BIC = -1348 |
We obtain expected values of observations in the estimation data using the generic function predict()
. Standard errors of fitted (estimation data) or predicted (new data) values are calculated using the delta method. Expected values exhibit a smoother distribution than observed values and do not show a gap between 1 and 0.883, because they are weighted averages of component distributions and 1 (figure 5.1).
Expected values of observations in the estimation data can also be used to calculate average incremental effects or average treatment effects on the treated. Average treatment effects on the treated compare predictions for treated individuals to predictions for the same individuals without the effect of the treatment indicator. Standard errors of average treatment effects on the treated can be calculated using the delta method (see example code in the appendix).
A visual inspection of mean residuals over deciles of expected values (see example code of the modified Hosmer-Lemeshow test in the appendix) shows that model 1 fits the data poorly (figure 5.2). Although the test is overpowered with large data, and confidence bands should not be interpreted directly, the absolute over-prediction among individuals with low expected values is quite large.
We can use the ‘sandwich’ package (Zeileis 2006) to calculate robust or clustered standard errors from objects of class “aldvmm” (see example code in the appendix).
Hernández Alava and Wailoo (2015) suggested that the likelihood function of the adjusted limited dependent variable mixture model with the English EQ-5D-3L data might have multiple local optima, and that the estimation is sensitive to initial values. We thus fit model 1 with all optimization algorithms and methods for generating initial values available in aldvmm()
to assess the sensitivity of model fits to optimization settings and to find the maximum likelihood estimates.
init.method <- c("zero", "random", "constant", "sann")
optim.method <- c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "nlminb", "Rcgmin",
"Rvmmin","hjn")
fit1_all <- list()
for (i in init.method) {
for (j in optim.method) {
set.seed(101010101) # Seed for random starting values
fit1_all[[i]][[j]] <- aldvmm::aldvmm(eq5d ~ hr | 1,
data = df,
psi = c(0.883, -0.594),
ncmp = 2,
init.method = i,
optim.method = j)
}
}
The maximum likelihood varies considerably across optimization methods and initial values which confirms the sensitivity of the model to changes in these settings (table 5.2). The most frequent log-likelihood is 706.32, but the Hooke and Jeeves Pattern Search Optimization (“hjn”) with “zero” initial values converges at a log-likelihood 4’295.62.
The optimization methods “Nelder-Mead”, “CG”, “L-BFGS-B”, “Rcgmmin” and “hjn” are particularly sensitive to changes of starting values. Only the method “nlminb” converges at the same log-likelihood of 706.32 regardless of initial values. However, even the “nlminb” algorithm converges at different parameters with starting values from a constant-only model than with the other starting values despite the same maximum log-likelihood.
Nelder-Mead | BFGS | CG | L-BFGS-B | nlminb | Rcgmin | Rvmmin | hjn | |
---|---|---|---|---|---|---|---|---|
zero | -245.7 | 706.3 | -634.8 | -4513.3 | 706.3 | -634.8 | 706.3 | 4295.6 |
random | 213.9 | -616.8 | 696.9 | -2830.2 | 706.3 | 706.3 | -609.6 | -627.8 |
constant | 706.3 | 706.3 | -634.8 | -634.8 | 706.3 | -634.8 | 706.3 | 1703.3 |
sann | 706.3 | 706.3 | 701.4 | 706.3 | 706.3 | 706.3 | 706.3 | 706.3 |
The computation times of optimization routines were relatively short and did not vary considerably across methods expect for the “hjn” algorithm and simulated annealing “sann” starting values which requires more computation time (table 5.3).
Nelder-Mead | BFGS | CG | L-BFGS-B | nlminb | Rcgmin | Rvmmin | hjn | |
---|---|---|---|---|---|---|---|---|
zero | 0.43 | 0.11 | 0.18 | 0.06 | 0.11 | 1.22 | 0.10 | 2.48 |
random | 0.39 | 0.17 | 2.24 | 0.07 | 0.11 | 1.36 | 0.28 | 0.41 |
constant | 0.43 | 0.15 | 0.23 | 0.07 | 0.15 | 1.22 | 0.12 | 2.68 |
sann | 2.54 | 2.36 | 4.50 | 2.47 | 2.38 | 3.57 | 2.36 | 4.23 |
Parameter estimates differ considerably across selected optimization algorithms (table 5.4). The default “BFGS” - “zero” approach converges at the same parameters as the “nlminb” - “zero” method. The solution of the “hjn” method is rather extreme with no effect of the Oxford Hip Score, a very small standard deviation in component 2 and a very low probability of membership of component 2. Also, the Modified Hosmer-Lemeshow test did not show better fit of the “hjn” solution compared to the other algorithms (figure 7.2 in the appendix). The solution of the “hjn” method suggests that a single-component model should be explored as well.
BFGS | Nelder-Mead | nlminb | hjn | ||
---|---|---|---|---|---|
E[y|X, c] | |||||
Comp1 | (Intercept) | -0.4312 | -0.0426 | -0.4307 | 0.5263 |
hr | 0.3136 | 0.2194 | 0.3135 | 0.0881 | |
lnsigma | -1.2479 | -1.8595 | -1.2481 | -1.2898 | |
Comp2 | (Intercept) | 0.2358 | 4.1853 | 0.2358 | 0.691 |
hr | 0.1459 | -1.0081 | 0.1459 | 0 | |
lnsigma | -2.4623 | -0.0249 | -2.4624 | -11.5422 | |
P[c|X] | |||||
Comp1 | (Intercept) | -0.7281 | 3.5636 | -0.728 | 2.1902 |
ll | -706.32 | 245.69 | -706.32 | -4295.62 |
As adjusted limited dependent variable mixture models are frequently used for tasks that rely on predictions, we also compare expected values from the “BFGS”, “Nelder-Mead” and “hjn” methods to expected values from the “nlminb” method. Expected values from the “Nelder-Mead” method are smaller, and values from the “hjn” method are larger than “nlminb” values among observations with the lowest 80% of expected values. Predictions from “BFGS” match predictions from the “nlminb” model fit closely (figure 5.3).
Based on the comparison of log-likelihoods, parameter estimates and predicted values, we deem the “nlminb” optimization method with “zero” starting values the most robust approach for the used data and model.
We can also fit model 1 with user-defined starting values and box constraints. When constraints are imposed, the aldvmm()
function uses the optimization method “L-BFGS-B”, which shows to be very sensitive to starting values. We use zero initial values for all parameters except for the intercept in the multinomial logit which we set to the estimate from the “nlminb” optimization method with “zero” starting values (0.7283) (table 5.4). We impose a lower limit of -3 to the log-standard deviations in both components.
The aldvmm()
function returns a warning that the covariance matrix included negative values on the diagonal. We see that these values are the variances of the intercept and the log-standard deviation in component 2 (table 5.5). The log-likelihood amounts to 628, which is smaller than the maximum log-likelihoods of model fits without constraints. The parameter estimates do not resemble any of the solutions of the unconstrained “BFGS”, “Nelder-Mead”, “nlminb” or “hjn” optimization methods with “zero” starting values (table 5.4), which further emphasizes the difficulties in finding a global optimum of the likelihood with English EQ-5D-3L utilities after hip replacement.
init <- c(0, 0, 0, 0, 0, 0, 0.7283)
lo <- c(-Inf, -Inf, -3, -Inf, -Inf, -3, -Inf)
hi <- c(Inf, Inf, Inf, Inf, Inf, Inf, Inf)
fit1_cstr <- aldvmm::aldvmm(eq5d ~ hr | 1,
data = df,
psi = c(0.883, -0.594),
ncmp = 2,
init.est = init,
init.lo = lo,
init.hi = hi)
summary(fit1_cstr)
Estimate | Std. Error | t value | Pr(>|t|) | 2.5 % | 97.5 % | ||
---|---|---|---|---|---|---|---|
E[y|X, c] | |||||||
Comp1 | (Intercept) | -0.092 | 0.0085 | -10.8048 | 0 | -0.1087 | -0.0753 |
hr | 0.2325 | 0.0023 | 101.8493 | 0 | 0.228 | 0.237 | |
lnsigma | -1.6406 | 0.0089 | -183.9809 | 0 | -1.6581 | -1.6231 | |
Comp2 | (Intercept) | 0.3931 | NaN | NaN | NaN | NaN | NaN |
hr | 2.5249 | 7.4795 | 0.3376 | 0.7357 | -12.1363 | 17.1861 | |
lnsigma | -0.7651 | NaN | NaN | NaN | NaN | NaN | |
P[c|X] | |||||||
Comp1 | (Intercept) | 6.852 | 0.5388 | 12.7168 | 0 | 5.7958 | 7.9082 |
N = 10549 | ll = 628 | AIC = 1270 | BIC = 1320 |
As the solution of the “hjn” algorithm includes a component with very low probability, we also estimate a single-component model.
fit <- aldvmm::aldvmm(eq5d ~ hr,
data = df,
psi = c(0.883, -0.594),
ncmp = 1,
init.method = "zero",
optim.method = "nlminb")
summary(fit)
The coefficients of the single-component model are relatively similar to the parameters in the second component of model 1 from the “hjn” algorithm (table 5.6). The Akaike information criterion amounts to 1’276 which is larger than the values of the “nlminb” (-706.32) and “hjn” (-4’295.62) solutions of the two-component model and thus suggests worse fit of the single-component model. In the aldvmm()
output, smaller values of the Akaike information criterion indicate better goodness of fit.
Estimate | Std. Error | t value | Pr(>|t|) | 2.5 % | 97.5 % | ||
---|---|---|---|---|---|---|---|
E[y|X, c] | |||||||
Comp1 | (Intercept) | -0.0886 | 0.0085 | -10.4099 | 0 | -0.1053 | -0.0719 |
hr | 0.2318 | 0.0023 | 101.4681 | 0 | 0.2273 | 0.2362 | |
lnsigma | -1.6363 | 0.0089 | -184.2422 | 0 | -1.6537 | -1.6189 | |
N = 10549 | ll = 635 | AIC = 1276 | BIC = 1297 |
As an alternative specification, we explore model 2 with a coefficient of the Oxford Hip score in the multinomial logit model of component membership. For this fit, we use the method “nlminb” with estimates from Hernández Alava and Wailoo (2015) as starting values.
init <- c(-.40293118, .30502755, .22614716, .14801581, -.70755741, 0,
-1.2632051, -2.4541401)
fit2 <- aldvmm::aldvmm(eq5d ~ hr | hr,
data = df,
psi = c(0.883, -0.594),
ncmp = 2,
init.est = init,
optim.method = "nlminb")
summary(fit2)
The Akaike information criterion of model 2 fitted using the “nlminb” method amounts to -1’867 which is smaller than the Akaike information criterion of model 1 (-706.32) with the same method. The smaller Akaike information criterion suggests that the increase in the log-likelihood after inclusion of a coefficient of the Oxford Hip Score on the probability of component membership is sufficiently large to justify the extra parameter. In the aldvmm()
output, smaller values of the Akaike information criterion indicate better goodness of fit.
Estimate | Std. Error | t value | Pr(>|t|) | 2.5 % | 97.5 % | ||
---|---|---|---|---|---|---|---|
E[y|X, c] | |||||||
Comp1 | (Intercept) | 0.0031 | 0.0281 | 0.1084 | 0.9137 | -0.0521 | 0.0582 |
hr | 0.0965 | 0.0116 | 8.3519 | 0 | 0.0739 | 0.1192 | |
lnsigma | -1.2676 | 0.0314 | -40.4141 | 0 | -1.3291 | -1.2061 | |
Comp2 | (Intercept) | 0.1816 | 0.0074 | 24.5105 | 0 | 0.1671 | 0.1961 |
hr | 0.1609 | 0.0019 | 85.4201 | 0 | 0.1573 | 0.1646 | |
lnsigma | -2.2809 | 0.0126 | -181.0789 | 0 | -2.3056 | -2.2563 | |
P[c|X] | |||||||
Comp1 | (Intercept) | 2.4455 | 0.1719 | 14.2293 | 0 | 2.1086 | 2.7824 |
hr | -1.3902 | 0.0566 | -24.5697 | 0 | -1.5011 | -1.2793 | |
N = 10549 | ll = -941 | AIC = -1867 | BIC = -1809 |
To validate the R implementation of adjusted limited dependent variable mixture models, we estimate the four models presented in Hernández Alava and Wailoo (2015) as reference cases in R and STATA. The STATA and R code for model estimation is included in the appendix.
Model 1 with default options
Model 1 with parameter constraints
Model 1 with initial values from constant-only model
Model 2 with user-defined initial values
Ref. case 1 | Ref. case 2 | Ref. case 3 | Ref. case 4 | ||||||
---|---|---|---|---|---|---|---|---|---|
R | STATA | R | STATA | R | STATA | R | STATA | ||
E[y|X, c] | |||||||||
Comp1 | (Intercept) | -0.4307 | -0.427 | -0.092 | -0.092 | 0.2358 | 0.236 | 0.0031 | 0.006 |
hr | 0.3135 | 0.312 | 0.2325 | 0.232 | 0.1459 | 0.146 | 0.0965 | 0.095 | |
lnsigma | -1.2481 | -1.251 | -1.6406 | -1.641 | -2.4624 | -2.463 | -1.2676 | -1.274 | |
Comp2 | (Intercept) | 0.2358 | 0.236 | 100 | 100.000 | -0.4307 | -0.427 | 0.1816 | 0.182 |
hr | 0.1459 | 0.146 | 0 | 0.000 | 0.3135 | 0.312 | 0.1609 | 0.161 | |
lnsigma | -2.4624 | -2.463 | 0 | 0.000 | -1.2481 | -1.251 | -2.2809 | -2.280 | |
P[c|X] | |||||||||
Comp1 | (Intercept) | -0.728 | -0.725 | 6.8564 | 6.855 | 0.728 | 0.725 | 2.4455 | 2.448 |
hr | NA | NA | NA | NA | NA | NA | -1.3902 | -1.393 | |
N = 10549 | ll | -706 | 715.84 | 628 | -613.7 | -706 | 715.84 | -941 | 953.2 |
The parameter estimates and standard errors obtained in R are very similar to the results from STATA (table 6.1 and table 6.2). R did not obtain any standard errors in reference model 2 while STATA returns standard errors for the first component and the probability of belonging to component 1. Although reference models 1 and 3 yield different parameter estimates, they converge at the same log-likelihood which further supports the hypothesis of multiple local optima of the likelihood. The relative ordering of models is consistent across platforms.
Ref. case 1 | Ref. case 2 | Ref. case 3 | Ref. case 4 | ||||||
---|---|---|---|---|---|---|---|---|---|
R | STATA | R | STATA | R | STATA | R | STATA | ||
E[y|X, c] | |||||||||
Comp1 | (Intercept) | 0.0224 | 0.022 | NA | 0.009 | 0.0069 | 0.007 | 0.0281 | 0.028 |
hr | 0.0065 | 0.006 | NA | 0.002 | 0.0019 | 0.002 | 0.0116 | 0.012 | |
lnsigma | 0.0215 | 0.021 | NA | 0.009 | 0.0178 | -0.018 | 0.0314 | 0.032 | |
Comp2 | (Intercept) | 0.0069 | 0.007 | NA | 0.0224 | 0.022 | 0.0074 | 0.007 | |
hr | 0.0019 | 0.002 | NA | 0.0065 | 0.006 | 0.0019 | 0.002 | ||
lnsigma | 0.0178 | 0.018 | NA | 0.0215 | 0.021 | 0.0126 | 0.013 | ||
P[c|X] | |||||||||
Comp1 | (Intercept) | 0.0607 | 0.061 | NA | 0.540 | 0.0607 | 0.061 | 0.1719 | 0.172 |
hr | NA | NA | NA | NA | NA | NA | 0.0566 | 0.056 | |
N = 10549 | ll | -706 | 715.84 | 628 | -613.7 | -706 | 715.84 | -941 | 953.2 |
Fitted values show very similar marginal distributions on both platforms (figure 6.1). R does not return fitted values in reference case 2. The summary statistics of differences in fitted values between R and STATA suggest that individual predictions are quite similar across platforms as well (table 6.3).
Standard errors of fitted values differ visibly between platforms (figure 6.2 and table 6.4). The difference is particularly pronounced in reference case 1, but the standard errors from STATA seem quite extreme compared to all other reference cases. R does not return standard errors of fitted values in reference case 2.
Ref. case 1 | Ref. case 2 | Ref. case 3 | Ref. case 4 | |
---|---|---|---|---|
Min. | -0.000038 | NA | -0.000038 | -0.000051 |
1st Qu. | -0.000036 | NA | -0.000036 | -0.000044 |
Median | -0.000033 | NA | -0.000033 | 0.000008 |
Mean | 0.000037 | NaN | 0.000037 | 0.000035 |
3rd Qu. | 0.000047 | NA | 0.000047 | 0.000070 |
Max. | 0.000567 | NA | 0.000567 | 0.002319 |
Ref. case 1 | Ref. case 2 | Ref. case 3 | Ref. case 4 | |
---|---|---|---|---|
Min. | 0.005897 | NA | -0.002063 | -0.016984 |
1st Qu. | 0.006898 | NA | -0.000551 | -0.000823 |
Median | 0.008690 | NA | 0.000189 | -0.000364 |
Mean | 0.008917 | NaN | 0.000171 | -0.000515 |
3rd Qu. | 0.011050 | NA | 0.001054 | 0.000354 |
Max. | 0.013830 | NA | 0.001530 | 0.000792 |
The comparison of the R and STATA packages shows that the R implementation sometimes behaves differently than the STATA package, but the results are not indicative of technical errors in the R implementation.
Adjusted limited dependent variable mixture models are powerful tools for regression analysis of health state utilities. Unlike standard regression models, adjusted limited dependent variable mixture models account for limits, gaps and multi-modal distributions.
The comparison of different optimization methods with EQ-5D-3L utility data from English patients after hip replacement in 2011 and 2012 (NHS Digital 2013) showed that the likelihood function can be challenging to maximize and can converge at local optima or extreme solutions. Parameter estimates varied considerably across optimization methods and even across methods with the same maximum log-likelihood. However, fitted values were very similar across the four reference cases which suggests that the model is more robust for the identification of incremental and average marginal effects than for parameter identification.
The ‘aldvmm’ package offers a variety of optimization algorithms and methods for generating initial values which is an important strength in such challenging situations. It is essential to assess different optimization algorithms and methods for initial values before interpreting the parameter estimates or predictions of adjusted limited dependent variable mixture models.
The analysis of the EQ-5D-3L utility data also suggests that simpler models with fewer components should be considered when multi-component models are difficult to fit. Even single-component adjusted limited dependent variable mixture models can improve fit compared to traditional regression techniques because they account for limits and gaps.
Although coefficients can be interpreted as marginal effects within each component, they cannot be interpreted in terms of overall expected values. Thus, average marginal effects and average treatment effects need to be calculated from predictions using the generic function predict()
. Standard errors of marginal effects or average treatment effects can be calculated using the delta method.
When no valid covariance matrix of parameters can be obtained or the user is in doubt of the validity of standard errors, the function sandwich::vcovBS()
can be used to obtain bootstrapped standard errors. Due to the large variability of results across model fits, a large number of iterations might be needed which increases computation time. Other functions from the ‘sandwich’ package can also be used to estimate robust or clustered standard errors.
In situations with repeated utility measures, the ‘aldvmm’ package only allows fixed effects estimations with group and time fixed effects which can be an important limitation in the analysis of clinical data. However, time fixed effects can be an appropriate modeling strategy in the presence of general time trends and dynamic selection in the population, e.g. because health state utilities decrease over time and treated individuals survive longer and thus are over-represented in later measurements.
Possible extensions of the ‘aldvmm’ package could include additional component distributions, a mixed model implementation for repeated measures and an implementation of functions for calculating average marginal effects and their standard errors.
# Create treatment indicator
#---------------------------
df$treated <- as.numeric(df$SEX == "2")
# Fit model
#----------
formula <- eq5d ~ treated + hr | 1
fit <- aldvmm(formula,
data = df,
psi = c(-0.594, 0.883))
# Predict treated
#----------------
# Subsample of treated observations
tmpdf1 <- df[df$treated == 1, ]
# Design matrix for treated observations
X1 <- aldvmm.mm(data = tmpdf1,
formula = fit$formula,
ncmp = fit$k,
lcoef = fit$label$lcoef)
# Average expected outcome of treated observations
mean1 <- mean(predict(fit,
newdata = tmpdf1,
type = "fit",
se.fit = TRUE)[["yhat"]], na.rm = TRUE)
# Predict counterfactual
#-----------------------
# Subsample of counterfactual observations
tmpdf0 <- tmpdf1
rm(tmpdf1)
tmpdf0$treated <- 0
# Design matrix for counterfactual observations
X0 <- aldvmm.mm(data = tmpdf0,
formula = fit$formula,
ncmp = fit$k,
lcoef = fit$label$lcoef)
# Average expected outcome of counterfactual osbervations
mean0 <- mean(predict(fit,
newdata = tmpdf0,
type = "fit",
se.fit = TRUE)[["yhat"]], na.rm = TRUE)
rm(tmpdf0)
# Standard error of ATET
#-----------------------
atet.grad <- numDeriv::jacobian(func = function(z) {
yhat1 <- aldvmm.pred(par = z,
X = X1,
y = rep(0, nrow(X1[[1]])),
psi = fit$psi,
ncmp = fit$k,
dist = fit$dist,
lcoef = fit$label$lcoef,
lcmp = fit$label$lcmp,
lcpar = fit$label$lcpar)[["yhat"]]
yhat0 <- aldvmm.pred(par = z,
X = X0,
y = rep(0, nrow(X0[[1]])),
psi = fit$psi,
ncmp = fit$k,
dist = fit$dist,
lcoef = fit$label$lcoef,
lcmp = fit$label$lcmp,
lcpar = fit$label$lcpar)[["yhat"]]
mean(yhat1 - yhat0, na.rm = TRUE)
},
x = fit$coef)
se.atet <- sqrt(atet.grad %*% fit$cov %*% t(atet.grad))
# Summarize
#----------
out <- data.frame(atet = mean1 - mean0,
se = se.atet,
z = (mean1 - mean0) / se.atet)
out$p <- 2*stats::pnorm(abs(out$z), lower.tail = FALSE)
out$ul <- out$atet + stats::qnorm((1 + 0.95)/2) * out$se
out$ll <- out$atet - stats::qnorm((1 + 0.95)/2) * out$se
print(out)
# Create cluster indicator
#-------------------------
df$grp <- as.factor(round(0.5 + runif(nrow(df)) * 5, 0))
# Fit model
#----------
formula <- eq5d ~ hr | 1
fit <- aldvmm(formula,
data = df,
psi = c(-0.594, 0.883))
# Calculate robust and clustered standard errors
#-----------------------------------------------
vc1 <- sandwich::sandwich(fit)
vc2 <- sandwich::vcovCL(fit, cluster = ~ grp)
vc3 <- sandwich::vcovPL(fit, cluster = ~ grp)
vc4 <- sandwich::vcovHAC(fit, cluster = ~ grp)
vc5 <- sandwich::vcovBS(fit)
vc6 <- sandwich::vcovBS(fit, cluster = ~ grp)
# Calculate test statistics
#--------------------------
lmtest::coeftest(fit)
lmtest::coeftest(fit, vcov = vc1)
lmtest::coeftest(fit, vcov = vc2)
lmtest::coeftest(fit, vcov = vc3)
lmtest::coeftest(fit, vcov = vc4)
lmtest::coeftest(fit, vcov = vc5)
lmtest::coeftest(fit, vcov = vc6)
# Number of percentiles
ngroup <- 10
# Extract expected values and residuals
yhat <- fit1_all[["zero"]][["Nelder-Mead"]][["pred"]][["yhat"]]
res <- fit1_all[["zero"]][["Nelder-Mead"]][["pred"]][["res"]]
# Make groups
group <- as.numeric(cut(yhat, breaks = ngroup), na.rm = TRUE)
# Auxiliary regression
aux <- stats::lm(res ~ factor(group))
# Data set of predictions from auxiliary regressions
newdf <- data.frame(group = unique(group)[order(unique(group))])
predict <- predict(aux,
newdata = newdf,
se.fit = TRUE,
interval = 'confidence',
level = 0.95)
plotdat <- as.data.frame(rbind(
cbind(group = newdf$group,
outcome = "mean",
value = predict$fit[ , 'fit']),
cbind(group = newdf$group,
outcome = "ll",
value = predict$fit[ , 'lwr']),
cbind(group = newdf$group,
outcome = "ul",
value = predict$fit[ , 'upr'])
))
# Make plot
plot <- ggplot2::ggplot(plotdat, aes(x = factor(as.numeric(group)),
y = as.numeric(value),
group = factor(outcome))) +
geom_line(aes(linetype = factor(outcome)))
Covariance matrices were incomplete or missing entirely (FALSE
) in multiple optimization approaches (table 7.1)
Nelder-Mead | BFGS | CG | L-BFGS-B | nlminb | Rcgmin | Rvmmin | hjn |
---|---|---|---|---|---|---|---|
FALSE | TRUE | FALSE | FALSE | TRUE | TRUE | TRUE | FALSE |
TRUE | TRUE | TRUE | FALSE | TRUE | TRUE | TRUE | FALSE |
TRUE | TRUE | FALSE | FALSE | TRUE | FALSE | TRUE | FALSE |
TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
# (1) Reference case 1: Model 1 with default optimization settings
fit1_default <- aldvmm::aldvmm(eq5d ~ hr | 1,
data = df,
psi = c(0.883, -0.594),
ncmp = 2,
init.method = "zero",
optim.method = "nlminb")
# (2) Reference case 2: Model 1 with user-defined initial values and constraints on parameters
init <- c(0, 0, 0, 0, 0, 0, 0.7283)
lo <- c(-Inf, -Inf, -3, -Inf, -Inf, -3, -Inf)
hi <- c(Inf, Inf, Inf, Inf, Inf, Inf, Inf)
fit1_cstr <- aldvmm::aldvmm(eq5d ~ hr | 1,
data = df,
psi = c(0.883, -0.594),
ncmp = 2,
init.est = init,
init.lo = lo,
init.hi = hi)
# (3) Reference case 3: Model 1 with initial values from constant-only model
fit1_const <- aldvmm::aldvmm(eq5d ~ hr | 1,
data = df,
psi = c(0.883, -0.594),
ncmp = 2,
init.method = "constant",
optim.method = "nlminb")
# (4) Reference case 4: Model 2 with user-defined initial values
init <- c(-.40293118, .30502755, .22614716, .14801581, -.70755741, 0,
-1.2632051, -2.4541401)
fit2 <- aldvmm::aldvmm(eq5d ~ hr | hr,
data = df,
psi = c(0.883, -0.594),
ncmp = 2,
init.est = init,
optim.method = "nlminb")
* (1) Reference case 1: Model 1 with default optimization settings
aldvmm eq5d hr, ncomponents(2)
* (2) Reference case 2: Model 1 with user-defined initial values and constraints on parameters
matrix input a = (0, 0, 0, 0, 0, 0, 0.7283)
constraint 1 [Comp_2]:hr10 = 0
constraint 2 [Comp_2]:_cons = 100
constraint 3 [lns_2]:_cons = 1e-30
aldvmm eq5d hr, ncomp(2) from(a) c(1 2 3)
* (3) Reference case 3: Model 1 with initial values from constant-only model
aldvmm eq5d hr, ncomp(2) inim(cons)
* (4) Reference case 4: Model 2 with user-defined initial values
matrix input start = (.14801581, .22614716, .30502755, -.40293118, 0,
-.70755741, -2.4541401, -1.2632051)
aldvmm eq5d hr, ncomp(2) prob(hr) from(start)