Abstract
Functional data analysis is a statistical approach used to analyze data that appear as functions or images. Functional data analysis methods can be used to analyze device-based measures such as physical activity and sleep. Although device-based measures are more objective than self-reported measures of physical activity patterns or sleep activity, device-based measures can still be prone to measurement error. When assessing associations between scalar-valued outcomes and device-based measures, scalar-on-function regression models that correct for measurement error can be applied. We develop the MECfda package for several scalar-on-function regression models including multi-level generalized scalar-on-function regression models and functional quantile linear regression models. The developed package implements several bias-correction methods that account for the presence of multiple functional and scalar covariates prone to measurement error in various scalar-on-function regression settings.
Functional data analysis is commonly used to analyze high-dimensional data that appear as functions or images [1] [2] [3]. Functional data analysis can be used to analyzed data collected continuously or frequently over a given time period that appear as complex high dimensional functions of time. When assessing how both functional and scalar-valued covariates influence scalar-valued outcomes, scalar-on-function regression models may be used. Self-reported measures, such as dietary intake, are well known to be prone to measurement error[4], and recent studies have indicated that even the relatively more objective data obtained from wearable devices is prone to measurement error [5] [6]. It has been demonstrated that similar to scalar-valued covariates prone to measurement error, the failure to correct for biases due to measurement error associated with functional data also leads to biased estimates
We develop an R package MECfda for solving scalar-on-function regression models including multi-level generalized scalar-on-function regression models and functional quantile linear regression models measurement error corrections using various bias reduction techniques in these models.
The general form of scalar-on-function linear regression model is \[T(F_{Y_i|X_i,Z_i}) = \sum_{l=1}^{L} \int_{\Omega_l} \beta_l(t) X_{li}(t) dt + (1,Z_i^T)\gamma\] where \(Y_i\) represents the scalar-valued response variable, \(X_{li}(t)\) represents the \(l\)-th function-valued covariate (\(l=1,\dots,L\)), \(\Omega_l\) is the domain of the \(l\)-th function-valued covariate, \(\beta_l\) is a function-valued parameter, \(\beta_l\) and \(X_{li}\) (\(t\in\Omega_l\)) are in \(L^2(\Omega_l)\), \(Z_i = (Z_{1i},\dots,Z_{qi})^T\) represents the scalar valued covariates, \(\gamma = (\gamma_0,\gamma_1,\dots,\gamma_q)^T\) represents scalar-valued parameter, \(F_{Y_i|X_i,Z_i}\) represents the cumulative distribution function of \(Y_i\) given \(X_i\) and \(Z_i\), and \(T(\cdot)\) is a statistical functional. A statistical functional is defined as a mapping from probability distribution to real numbers. Statistical functionals represent quantifiable properties of probability distributions, such as expectation and variance. The following gives forms of the statistical functionals in the two types of scalar-on-function linear regression models that can be solved using the algorithms included in MECfda package.
In multi-level generalized scalar-on-function generalized linear regression models, \[T(F_{Y_i|X_i,Z_i}) = g\left\{\int_\mathbb{R}ydF_{Y_i|X_i,Z_i}(y)\right\} = g(E[Y_i|X_i,Z_i]),\] where \(g(\cdot)\) is a link function as in generalized linear model. The only restriction of \(g(\cdot)\) is that it should be strictly monotonic.
In scalar-on-function quantile linear regression models, \[T(F_{Y_i|X_i,Z_i}) = Q_{Y_i|X_i,Z_i}(\tau) = F_{Y_i|X_i,Z_i}^{-1}(\tau),\] where \(F_{Y_i|X_i,Z_i}^{-1}(\cdot)\) is the inverse of \(F_{Y_i|X_i,Z_i}(\cdot)\), \(\tau\in(0,1)\).
Function-valued variables (functional variables) are usually recorded as the value of the functions at certain (time) points in its domain. The data of a function-valued variable are often presented in the form of a matrix, \((x_{ij})_{n\times m}\), where \(x_{ij} = f_i(t_j)\), represents the value of \(f_i(t_j)\), each row represents an observation (subject), and each column corresponds to a measurement (time) point.
functional_variable
In the MECfda package, we have an s4 class,
functional_variable
, that represents the data of a
function-valued variable in a matrix form. The class has four slots. The
slot X
, represents matrix \((x_{ij})_{n\times m}\). The slots
t_0
and period
represent the left end and
length of the domain of the function-valued variable, respectively. The
slot t_points
represents the (time) points that the
functional variable is measured. A method, dim
, is provided
to return the number of subjects and measurement (time) points for a
functional_variable
object.
\(\{\rho_{k}\}_{k=1}^\infty\) is a basis of \(L^2(\Omega)\). For an arbitrary function-valued parameter \(\beta(\cdot)\in L^2(\Omega)\), there exists a number sequence \(\{c_{k}\}_{k=1}^\infty\) such that \[\beta(t) = \sum_{k=1}^\infty c_{k}\rho_{k}(t)\] and for a function-valued variable, \(X_i(t)\), \[\int_\Omega\beta(t) X_i(t) dt = \int_\Omega X_i(t) \sum_{k=1}^\infty c_{k}\rho_{k}(t) dt = \sum_{k=1}^\infty c_{k} \int_\Omega \rho_{k}(t) X_i(t) dt \] We perform a truncation for the infinite basis sequence, then \[\int_\Omega\beta(t) X_i(t) dt \approx \sum_{k=1}^p c_{k} \int_\Omega X_i(t) \rho_{k}(t) dt\] where \(p<\infty\). For statistical models with parts in the form of \(\int_\Omega\beta(t) X_i(t) dt\), we use \(c_{k} \int_\Omega X_i(t) \rho_{k}(t) dt\) to approximate \(\int_\Omega\beta(t) X_i(t) dt\) and treat \(\int_{\Omega} \rho_{k}(t) X_{i}(t) dt\) as the variable. In practice, we usually choose the number of basis, \(p\) based on the sample size, \(n\), and the \(p\) is in a form of \(p(n)\). The scalar-on-function linear model is converted to an ordinary scalar-on-scalar linear model, and the problem of estimating \(\beta(t)\) is converted to estimating \(c_k\), \(k=1,\dots p\).
In practice, we may not necessarily use the truncated complete basis of \(L^2\) function space. Instead, we can just use a finite sequence of linearly independent functions as the basis function.
Performing basis expansion methods on function-valued variable data in matrix form as we mentioned is to compute \((b_{ik})_{n\times p}\), where \(b_{ik} = \int_\Omega f(t)\rho_k(t) dt\).
Usually the domain \(\Omega\) of a function-valued variable \(\{X(t),t\in\Omega\}\) is an interval. The commonly used basis sequence types includes the Fourier basis, b-splines basis, and eigen function basis.
The Fourier basis of \(L^2([t_0,t_0+T])\) consists of \[\frac{1}{2},\ \cos(\frac{2\pi}{T}k[x-t_0]),\ \sin (\frac{2\pi}{T}k[x-t_0])\] where \(k = 1,\dots,\infty\).
Fourier_series
In the MECfda package, we have an s4 class,
Fourier_series
that represents the linear combination of
Fourier basis functions \[\frac{a_0}{2} +
\sum_{k=1}^{p_a} a_k \cos{(\frac{2\pi}{T}k(x-t_0))} +
\sum_{k=1}^{p_b} b_k \sin{(\frac{2\pi}{T}k(x-t_0))},\qquad
x\in[t_0,t_0+T].\] The slot double_constant
is the
value of \(a_0\). The slot
cos
contains the coefficient value of \(\cos\) waves, \(a_k\). The slot sin
contains
the coefficient value of \(\sin\)
waves, \(b_k\). The slot
k_cos
contains the values of \(k\) that correspond to the coefficients of
\(\cos\) waves. The slot
k_sin
contains the values of \(k\) that correspond to the coefficients of
\(\sin\) waves. The slot
t_0
is the left end of the domain interval \(t_0\). The slot period
is
length of the domain interval \(T\).
The method plot
for class Fourier_series
is
provided to generate a visual representation of the summation function.
The method FourierSeries2fun
is provided to compute the
value of the summation function. The argument object
should
be an object of Fourier_series
class. The argument
x
is the value of independent variable \(x\). The method extractCoef
is
provided to extract the Fourier coefficients of
Fourier_series
class object.
fsc = Fourier_series(
double_constant = 3,
cos = c(0,2/3),
sin = c(1,7/5),
k_cos = 1:2,
k_sin = 1:2,
t_0 = 0,
period = 1
)
plot(fsc)
FourierSeries2fun(fsc,seq(0,1,0.05))
#> [1] 2.1666667 3.1712610 3.6252757 3.4344848 2.7346112 1.8333333
#> [7] 1.0888125 0.7715265 0.9623175 1.5254623 2.1666667 2.5532270
#> [13] 2.4497052 1.8164508 0.8324982 -0.1666667 -0.8133005 -0.8465074
#> [19] -0.2132530 0.9074283 2.1666667
extractCoef(fsc)
#> $a_0
#> 0
#> 3
#>
#> $a_k
#> 1 2
#> 0.0000000 0.6666667
#>
#> $b_k
#> 1 2
#> 1.0 1.4
The object fsc
represents the summation \[\frac32 + \frac23 \cos(2\pi\cdot2x) + \sin(2\pi
x) + \frac75\sin(2\pi\cdot2x).\]
A b-spline basis \(\{B_{i,p}(x)\}_{i=-p}^{k}\) on the interval
\([t_0,t_{k+1}]\) is defined as \[B_{i,0}(x) = \left\{
\begin{aligned}
&I_{(t_i,t_{i+1}]}(x), & i = 0,1,\dots,k\\
&0, &i<0\ or\ i>k
\end{aligned}
\right.\] \[B_{i,r}(x) = \frac{x -
t_{i}}{t_{i+r}-t_{i}} B_{i,r-1}(x) + \frac{t_{i+r+1} - x}
{t_{i+r+1} - t_{i+1}}B_{i+1,r-1}(x)\] For all discontinuity
points of \(B_{i,r}\) (\(r>0\)) in the interval \((t_0,t_k)\), let the value equals its
limit, which means \[B_{i,r}(x) = \lim_{t\to
x} B_{i,r}(t).\] The slot Boundary.knots
is the
boundary of the spline domain (start and end), which is \(t_0\) and \(t_{k+1}\); the default is \([0,1]\). The slot knots
represents the spline knots, which is \((t_1,\dots,t_k)\), an equally spaced
sequence is chosen by the function automatically (\(t_j = t_0 +
j\cdot\frac{t_{k+1}-t_0}{k+1}\)) when not assigned. The slot
intercept
is used to define whether an intercept is
included in the basis; the default value is TRUE, and must be TRUE. The
slot df
is the degrees of freedom of the basis, which is
the number of the splines, equal to \(p+k+1\). By default \(k =0\) and \(\text{df} = p+1\). The slot
degree
is the degree of the splines, which is the degree of
piecewise polynomials, \(p\); the
default value is 3. See degree in bs.
bspline_basis
and
bspline_series
In the MECfda package, we have an s4 class,
bspline_basis
that represents a b-spline basis, \(\{B_{i,p}(x)\}_{i=-p}^{k}\), on the
interval \([t_0,t_{k+1}]\). We also
have an s4 class, bspline_series
, that represents the
summation \(\sum_{i=0}^{k}b_i
B_{i,p}(x)\). The slot coef
contains the b-spline
coefficients, \(b_i\). (\(i = 0,\dots,k\)) The slot
bspline_basis
is a bspline_basis
object, that
represents the b-spline basis used, \(\{B_{i,p}(x)\}_{i=-p}^{k}\).
The method plot
for class bspline_basis
is
provided to generate a visual representation of the summation function.
The method bsplineSeries2fun
is provided to compute the
value of the summation function. The argument object
should
be an object of the bspline_series
class. The argument
x
is the value of the independent variable \(x\).
bsb = bspline_basis(
Boundary.knots = c(0,24),
df = 7,
degree = 3
)
bss = bspline_series(
coef = c(2,1,3/4,2/3,7/8,5/2,19/10),
bspline_basis = bsb
)
plot(bss)
bsplineSeries2fun(bss,seq(0,24,0.5))
#> [1] 2.0000000 1.7677509 1.5690908 1.4011502 1.2610597 1.1459499 1.0529514
#> [8] 0.9791948 0.9218107 0.8779297 0.8446824 0.8191993 0.7986111 0.7805266
#> [15] 0.7644676 0.7504340 0.7384259 0.7284433 0.7204861 0.7145544 0.7106481
#> [22] 0.7087674 0.7089120 0.7110822 0.7152778 0.7216857 0.7312404 0.7450629
#> [29] 0.7642747 0.7899969 0.8233507 0.8654574 0.9174383 0.9804145 1.0555073
#> [36] 1.1438380 1.2465278 1.3634786 1.4897151 1.6190430 1.7452675 1.8621942
#> [43] 1.9636285 2.0433759 2.0952418 2.1130317 2.0905511 2.0216053 1.9000000
The object bsb
represents \(\{B_{i,3}(x)\}_{i=-3}^{0}\), and the object
bss
represents
\[2B_{i,-3}(x)+B_{i,-2}(x)+\frac34B_{i,-1}(x)+\frac23B_{i,0}(x)
+ \frac78B_{i,1}(x) +
\frac52B_{i,2}(x) +\frac{19}{10}B_{i,3}(x),\] where \(x\in[t_0,t_4]\) and \(t_0=0\), \(t_k =
t_{k-1}+6\) ,\(k=1,2,3,4\).
A generic function, basis2fun
, is provided for the class
Fourier_series
and bspline_series
. When
applied to a Fourier_series
object, it is equivalent to
method FourierSeries2fun
. When applied to a
bspline_series
object, it is equivalent to method
bsplineSeries2fun
.
basis2fun(fsc,seq(0,1,0.05))
#> [1] 2.1666667 3.1712610 3.6252757 3.4344848 2.7346112 1.8333333
#> [7] 1.0888125 0.7715265 0.9623175 1.5254623 2.1666667 2.5532270
#> [13] 2.4497052 1.8164508 0.8324982 -0.1666667 -0.8133005 -0.8465074
#> [19] -0.2132530 0.9074283 2.1666667
basis2fun(bss,seq(0,24,0.5))
#> [1] 2.0000000 1.7677509 1.5690908 1.4011502 1.2610597 1.1459499 1.0529514
#> [8] 0.9791948 0.9218107 0.8779297 0.8446824 0.8191993 0.7986111 0.7805266
#> [15] 0.7644676 0.7504340 0.7384259 0.7284433 0.7204861 0.7145544 0.7106481
#> [22] 0.7087674 0.7089120 0.7110822 0.7152778 0.7216857 0.7312404 0.7450629
#> [29] 0.7642747 0.7899969 0.8233507 0.8654574 0.9174383 0.9804145 1.0555073
#> [36] 1.1438380 1.2465278 1.3634786 1.4897151 1.6190430 1.7452675 1.8621942
#> [43] 1.9636285 2.0433759 2.0952418 2.1130317 2.0905511 2.0216053 1.9000000
Suppose \(K(s,t)\in L^2(\Omega\times \Omega)\), \(f(t)\in L^2(\Omega)\). Then \(K\) induces an linear operator \(\mathcal{K}\) by \[(\mathcal{K}f)(x) = \int_{\Omega} K(t,x)f(t)dt\] If \(\xi(\cdot)\in L^2(\Omega)\) such that \[\mathcal{K}\xi = \lambda \xi\] where \(\lambda\in C\), we call \(\xi\) an eigenfunction/eigenvector of \(\mathcal{K}\), and \(\lambda\) an eigenvalue associated with \(\xi\).
All the eigenfunctions of \(\mathcal{K}\) make a basis of \(L^2(\Omega)\). We call the basis induced by \[K(s,t)=\text{Cov}(X(t),X(s))\] a functional principal component (FPC) basis, where \(\{X(t),t\in\Omega\}\) is a stochastic process.
functional_variable
classThe MECfda pakcage provide the methods
fourier_basis_expansion
and
bspline_basis_expansion
for the class
functional_variable
, which perform basis expansion using
Fourier basis and b-spline basis respectively.
We use \[\frac{1}{|T|}\sum_{t\in T} \rho_{k}(t) X_{i}(t)\] to compute the integral \[\int_{\Omega} \rho_{k}(t) X_{i}(t) dt\] where \(T\) is the measurement (time) points of \(X_{i}(t)\), and \(|T|\) represents the cardinal number of \(T\).
fcRegression(Y, FC, Z, formula.Z, family = gaussian(link = "identity"),
basis.type = c("Fourier", "Bspline"), basis.order = 6L,
bs_degree = 3)
The MECfda package provides a function,
fcRegression
, to fit generalized linear mixed effect
models, including ordinary linear models, and generalized linear models
with fixed and random effects, using basis expansion to discretize the
function-valued covariates. The function fcRegression
can
solve a linear model in the following form: \[g(E(Y_i|X_i,Z_i)) = \sum_{l=1}^{L}
\int_{\Omega_l} \beta_l(t) X_{li}(t) dt + (1,Z_i^T)\gamma.\] The
function can allow one or multiple function-valued covariates to be used
as fixed effects, and zero, one, or multiple scalar-valued covariates to
be used as fixed or random effects. Response variable, function-valued
covariates, and scalar-valued covariates are input separately as three
different arguments, Y
, FC
, and
Z
, respectively. The format of the input data can be very
flexible. For response variable, the input format can be an atomic
vector, a one-column matrix or data frame. The recommended form is a
one-column data frame or matrix with a column name, because in this
case, the name of response variable is specified. For function-valued
covariates, a functional_variable
object or a matrix or a
data frame or a list of these objects can be accepted as input data.
When one functional_variable
object, matrix or, data frame
is input as argument FC
, only one function-valued covariate
is included in the model. When a list of these objects is input as
argument FC
, the model can have multiple function-valued
covariates, with each element of the list corresponding to a different
function-valued covariate. For of scalar-valued covariates, a matrix,
data frame, atomic vector, NULL
, or, no input can be
accepted as input data. When argument Z
is not assigned, no
scalar-valued covariate is included in the model and argument
formula.Z
should also be NULL
, or no input.
When an atomic vector is input as argument Z
, only one
scalar-valued is included in the model, and, the name of the
scalar-valued covariate is not specified. So even if only one
scalar-valued covariate is included in the model, a matrix or data frame
with column name is recommended as the input for argument
Z
. The argument formula.Z
is used to specify
which part of the argument Z
is used and whether to treat
scalar-valued covariates as fixed effects or random effects. The
argument family
can specify the distribution type of the
response variable and link function to be used in the regression. The
argument basis.type
indicate the type of basis function to
be used in basis expansion process. Available options are ‘Fourier’ and
‘Bspline’, representing the Fourier basis and the b-spline basis,
respectively. The argument basis.order
indicates the number
of function basis to be used. When using the Fourier basis \(\frac{1}{2},\sin k t, \cos k t, k =
1,\dots,p_f\), basis.order
is the number \(p_f\). When using the b-splines basis \(\{B_{i,p}(x)\}_{i=-p}^{k}\),
basis.order
is the number of splines, equal to \(k+p+1\). The argument
bs_degree
specifies the degree of the piecewise polynomials
of the b-spline basis function if using the b-splines basis, and is only
needed when using the b-spline basis.
The function fcRegression
returns an object of s3 class
fcRegression
. as a list that containing the following
elements.
We can use the method predict
to get a predicted value
from the model. We can use the method fc.beta
to get the
values of function-valued linear coefficient parameters, \(\beta_l(t)\).
data(MECfda.data.sim.0.0)
res = fcRegression(FC = MECfda.data.sim.0.0$FC,
Y=MECfda.data.sim.0.0$Y,
Z=MECfda.data.sim.0.0$Z,
family = gaussian(link = "identity"),
basis.order = 5, basis.type = c('Bspline'),
formula.Z = ~ Z_1 + (1|Z_2))
t = (0:100)/100
plot(x = t, y = fc.beta(res,1,t), ylab = expression(beta[1](t)))
fcQR(Y, FC, Z, formula.Z, tau = 0.5, basis.type = c("Fourier", "Bspline"),
basis.order = 6L, bs_degree = 3)
The MECfda package provides a function,
fcQR
, for fitting quantile linear regression models. The
method for addressing function-valued covariates is discretization using
basis expansion. The function fcQR
can solve a linear model
in the following form: \[Q_{Y_i|X_i,Z_i}(\tau) =
\sum_{l=1}^L\int_{\Omega_l} \beta_l(\tau,t) X_{li}(t) dt
+ (1,Z_i^T)\gamma.\] The function allows one or multiple
function-valued covariates, and zero, one, or multiple scalar-valued
covariates. Data input occurs as described for the function
fcRegression
. The treatment of the scalar-valued covariates
is specified by the argument formula.Z
, similar to the
function fcRegression
. The only difference between function
fcQR and function fcRegression is that the quantile linear regression
model does not include a random effect. The quantile \(\tau\) is specified by the argument
tau
. The type and parameters of the basis function are
specified bythe argument basis.type
,
basis.order
, and bs_degree
, as described for
the function fcRegression
.
The function fcQR
returns an object of s3 class
fcQR
, as a list that contains containing the following
elements.
We can use the method predict
to obtain a predicted
value from the model and use the method fc.beta
to obtain
the values of function-valued linear coefficient parameters \(\beta_l(t)\).
Data collected in real- world settings often include measurement error, especially function-valued data. Measurement error in a data set may result in estimation bias. The *MECfda** package also provides bias correction estimation method functions for certain linear regression models for use with data containing measurement error.
ME.fcRegression_MEM(
data.Y,
data.W,
data.Z,
method = c("UP_MEM", "MP_MEM", "average"),
t_interval = c(0, 1),
t_points = NULL,
d = 3,
family.W = c("gaussian", "poisson"),
family.Y = "gaussian",
formula.Z,
basis.type = c("Fourier", "Bspline"),
basis.order = NULL,
bs_degree = 3,
smooth = FALSE,
silent = TRUE
)
Wearable monitoring devices permit the continuous monitoring of
biological processes, such as blood glucose metabolism, and behaviors,
such as sleep quality and physical activity.
Continuous monitoring often collect data in 60-second epochs over
multiple days, resulting in high-dimensional multi-level longitudinal
curves that are best described and analyzed as multi-level functional
data.
Although researchers have previously addressed measurement error in
scalar covariates prone to error, less work has been done for correcting
measurement error in multi-level high dimensional curves prone to
heteroscedastic measurement error. Therefore, Luan et. al. proposed
mixed effects-based-model-based (MEM) methods for bias correction due to
measurement error in multi-level functional data from the exponential
family of distributions that are prone to complex heteroscedastic
measurement error.
They first developed MEM methods to adjust for biases due to the
presence of measurement error in multi-level generalized functional
regression models.
They assumed that the distributions of the function-valued covariates
prone to measurement error belong to the exponential family. This
assumption allows for a more general specification of the distributions
of error-prone functional covariates compared to current approaches that
often entail normality assumptions for these observed measures. The
approach adopted by Luan et al. allows a nonlinear association between
the true measurement and the observed measurement prone to measurement
error.
Second, they treated the random errors in the observed measures as
complex heteroscedastic errors from the Gaussian distribution with
covariance error functions. Third, these methods can be used to evaluate
relationships between multi-level function-valued covariates with
complex measurement error structures and scalar outcomes with
distributions in the exponential family.
Fourth, they treat the function-valued covariate as an observed measure
of the true function-valued unobserved latent covariate.
Additionally, these methods employ a point-wise method for fitting the
multi-level functional MEM approach, avoiding the need to compute
complex and intractable integrals that would be required in traditional
approaches for reducing biases due to measurement error in multi-level
functional data [7].
Their statistical model is as follows: \[\begin{align*} &g(E(Y_i|X_i,Z_i)) = \int_{\Omega} \beta(t) X_{i}(t) dt + (1,Z_i^T)\gamma\\ &h(E(W_{ij}(t)|X_i(t))) = X_i(t)\\ &X_i(t) = \mu_x(t) + \varepsilon_{xi}(t) \end{align*}\] where the response variable \(Y_i\) and scalar-valued covariates \(Z_i\) are measured without error, function-valued covariate \(X_i(t)\) is repeatedly measured with error as \(W_{ij}(t)\). The model also includes the following assumptions:
They proposed a MEM estimation method to correct for bias caused by the presence of measurement error in the function-valued covariate. This method allows for the investigation of a nonlinear measurement error model, in which the relationship between the true and observed measurements is not constrained to be linear, and the distribution assumption for the observed measurement is relaxed to encompass the exponential family rather than being limited to a Gaussian distribution.
The MEM approach is a two stage method that employs functional
mixed-effects models. The first stage of the MEM approach involves using
a functional mixed-effects model
to approximate the true measure \(X_i(t)\) with the repeated observed
measurement \(W_{ij}(t)\). The MEM
approach is primarily based on the assumptions that \(h[E\{W_{ij}(t)|X_i(t)\}] = X_i(t)\) and
\(X_i(t) = \mu_x(t) +
\varepsilon_{xi}(t)\). That is, in the functional mixed-effects
model containing one fixed intercept and one random intercept, the
random intercept is assumed to to be the subject-wise deviation of \(X_i(t)\) from the mean process \(\mu_x(t)\), and the fixed intercept is
assumed to represent \(\mu_x(t)\). The
second stage involves replacing \(X_i(t)\) in the regression model with the
resulting approximation of \(X_i(t)\)
from the first stage. The MEM approach employs point-wise (UP_MEM) and
multi-point-wise (MP_MEM) estimation procedures to avoid potential
computational complexities caused by analyses of multi-level functional
data and computations of potentially intractable and complex
integrals.
The MECfda package provides a function
ME.fcRegression_MEM
for application to the bias correction
estimation method. The response variable, function-valued covariates,
and scalar-valued covariates are input separately as three different
arguments. The argument data.Y
is the response variable.
Inputs can be an atomic vector, a one-column matrix or data frame,
recommended form is a one-column data frame with column name. The
argument data.W
is the data for \(W\), the measurement of \(X\), in the statistical model. The input
should be a 3-dimensional array. in which each row represents a subject,
each column represent a measurement (time) point, each layer represents
an observation. The argument data.Z
is the data for the
scalar covariates, It can be no input or NULL
if the model
does not include a scalar covariate, input in the form of an atomic
vector when the model includes only one scalar covariate, or input as a
matrix or data frame, with the recommended form being a data frame with
column names. The argument method
specifies the method for
constructing the substitution \(X\).
Available options includes 'UP_MEM'
, 'MP_MEM'
,
and 'average'
. The argument t_interval
specifies the domain of the function-valued covariate and should be a
2-element vector, representing an interval. The default is
c(0,1)
, represent interval \([0,1]\). The argument t_points
is the sequence of measurement (time) points, and the default is
NULL
. The argument d
is the number of
measurement (time) points involved for MP_MEM
(the default
value is 3, which is also the minimum value). The argument
family.W
is the distribution of \(W\) given \(X\), and available options are
'gaussian'
and 'poisson'
. The argument
family.Y
is a description of the error distribution and
link function to be used in the model. See stats::family
.
The argument formula.Z
is used to specify which part of the
argument Z
is used and whether to treat the scalar-valued
covariates, as fixed effects or random effects. The argument
basis.type
indicates the type of basis function to be used
in the basis expansion process. Available options are
'Fourier'
and 'Bspline'
, representing Fourier
basis and b-spline basis respectively. The argument
basis.order
indicates number of the function basis to be
used. When using Fourier basis \(\frac{1}{2},\sin k t, \cos k t, k =
1,\dots,p_f\), basis.order
is the number \(p_f\). When using b-splines basis, \(\{B_{i,p}(x)\}_{i=-p}^{k}\),
basis.order
is the number of splines, equal to \(k+p+1\). The argument
bs_degree
specifies the degree of the piecewise polynomials
of b-spline basis function if use b-splines basis. This argument is need
only when using b-spline basis. The argument smooth
specifies whether to smooth the substitution of \(X\), and the default is FALSE. The argument
silent
specifies whether not to show the statuse of the
running of the function, and the default is TRUE.
The function ME.fcRegression_MEM returns a fcRegression object.
ME.fcQR_IV.SIMEX(
data.Y,
data.W,
data.Z,
data.M,
tau = 0.5,
t_interval = c(0, 1),
t_points = NULL,
formula.Z,
basis.type = c("Fourier", "Bspline"),
basis.order = NULL,
bs_degree = 3
)
Health a major concerns for many people, and as technology advances,
wearable devices have become the mainstream method for monitoring and
evaluating individual physical activity levels. Despite personal
preferences in brands and feature design, the accuracy of the data
presented is what makes the device a great product. These functional
data collected from devices are generally considered more accurate and
subjective compared to other objective methods like questionnaires and
self-reports. Because physical activity levels are not directly
observable, algorithms are used generate corresponding summary reports
of different activity level using complex data (e.g. steps, heart
rate).
However, these results may differ depending on which device is used. And
aside from variation in hardware, data collected could also vary by the
combinations between how the device is worn and the activity of
interest. Although current devices may be sufficiently accurate to
monitor general physical activity levels, more precise data may enable
additional functions such as detecting irregular heart rhythms or
radiation exposures that would contribute toward improving the health of
the general public and elevating the overall well-being of society.
Quantile regression is a tool that was developed to meet the need for modeling complex relationships between a set of covariates and quantiles of an outcome. In obesity studies, the effects of interventions or covariates on body mass index (BMI) are most commonly estimated using ordinary least square methods. However, mean regression approaches are unable to address these effects for individuals whose BMI values fall in the upper quantile of the distribution. Compared with traditional linear regression approaches, quantile regression approaches make no assumptions regarding the distribution of residuals and are robust to outlying observations.
Tekwe et. al. proposed a simulation extrapolation (SIMEX) estimation method that uses an instrumental variable to correct for bias due to measurement error in scalar-on-function quantile linear regression. They demonstrated the usefulness of this method by evaluating the association between physical activity and obesity using the National Health and Nutrition Examination Survey (NHANES) dataset [8]. Their statistical model is as follows: \[\begin{align*} &Q_{Y_i|X_i,Z_i}(\tau) = \int_{\Omega} \beta(\tau,t) X_{i}(t) dt + (1,Z_i^T)\gamma(\tau)\\ &W_i(t) = X_i(t) + U_i(t)\\ &M_i(t) = \delta(t) X_i(t) + \eta_i(t) \end{align*}\] where the response variable \(Y_i\) and scalar-valued covariates \(Z_i\) are measured without error, the function-valued covariate \(X_i(t)\) is measured with error as \(W_i(t)\), and \(M_i(t)\) is a measured instrumental variable. The model also includes the following assumptions:
for \(\forall t,s\in[0,1]\) and \(i = 1,\dots,n\).
Their bias correction estimation method uses a two-stage strategy to correct for measurement error in a function-valued covariate and then fits a linear quantile regression model. In the first stage, an instrumental variable is used to estimate the covariance matrix associated with the measurement error. In the second stage, SIMEX is used to correct for measurement error in the function-valued covariate.
The MECfda package provides a function
ME.fcQR_IV.SIMEX
for the application of their bias
correction estimation method. The argument data.Y
is the
response variable. Input can be an atomic vector, a one-column matrix or
data frame, recommended form is a one-column data frame with column
name. The argument data.W
is the data for \(W\), the measurement of \(X\), in the statistical model. The input
should be a dataframe or matrix, in which each row represents a subject,
each column represent a measurement (time) point. The argument
data.Z
is the data for scalar covariates, This argument can
be input as no input or NULL
when no scalar covariate is
included in the model, an atomic vector when only one scalar covariate
is included in the model, or as a matrix or data frame. The recommended
form is a data frame with column names. The argument data.M
is the data of \(M\), the instrumental
variable. Input should be a dataframe or matrix, in which each row
represents a subject. each column represent a measurement (time) point.
The argument tau
is the quantile \(\tau\in(0,1)\), default is 0.5. The
argument t_interval
specifies the domain of the
function-valued covariate, which should be a 2-element vector,
represents an interval. The default is c(0,1)
, represent
interval \([0,1]\). The argument
t_points
is the sequence of the measurement (time) points,
default is NULL
. The argument formula.Z
is
used to specify which part of the argument Z
is used and
how to treat the scalar-valued covariates. The argument
basis.type
indicates the type of basis function to be used
in the basis expansion process. Available options are
'Fourier'
and 'Bspline'
, representing the
Fourier basis and the b-spline basis, respectively. The argument
basis.order
indicates number of the function basis to be
used. When using the Fourier basis \(\frac{1}{2},\sin k t, \cos k t, k =
1,\dots,p_f\), basis.order
is the number \(p_f\). When using the b-splines basis \(\{B_{i,p}(x)\}_{i=-p}^{k}\),
basis.order
is the number of splines, equal to \(k+p+1\). The argument
bs_degree
specifies the degrees of piecewise polynomials of
the b-spline basis function if using the b-splines basis. This argument
is only needed when using b-spline basis.
The function ME.fcQR_IV.SIMEX
returns an
ME.fcQR_IV.SIMEX class object as a list that containing the following
elements.
ME.fcQR_CLS(
data.Y,
data.W,
data.Z,
tau = 0.5,
t_interval = c(0, 1),
t_points = NULL,
grid_k,
grid_h,
degree = 45,
observed_X = NULL
)
Zhang et. al. proposed a corrected loss score estimation method for scalar-on-function quantile linear regression to correct for bias due to measurement error [9]. Their statistical model is as follows: \[\begin{align*} &Q_{Y_i|X_i,Z_i}(\tau) = \int_{\Omega} \beta(\tau,t) X_{i}(t) dt + (1,Z_i^T)\gamma(\tau)\\ &W_i(t) = X_i(t) + U_i(t) \end{align*}\] where the response variable \(Y_i\) and the scalar-valued covariates \(Z_i\) are measured without error, the function-valued covariate \(X_i(t)\) is measured with error as \(W_i(t)\).
Zhang et al. proposed a new corrected loss function for a partially functional linear quantile model with functional measurement error in this manuscript. They established a corrected quantile objective function for an observed measurement, which is an unbiased estimator of the quantile objective function that would be obtained if the true measurements were available. The estimators of the regression parameters are obtained by optimizing the resulting corrected loss function. The resulting estimator of the regression parameters is shown to be consistent.
The MECfda package provides a function,
ME.fcQR_CLS
, for the application of their bias correction
estimation method. The argument data.Y
is the response
variable. Input can be an atomic vector, a one-column matrix or data
frame, with the recommended form being a one-column data frame with
column name. The argument data.W
is the data for \(W\), the measurement of \(X\), in the statistical model. The input
should be a 3-dimensional array, in which each row represents a subject,
each column represent a measurement (time) point, each layer represents
an observation. The argument data.Z
is the data for scalar
covariates, This argument can be no input or NULL
when no
scalar covariate is included in the model, as an atomic vector when only
one scalar covariate is included in the model, or as a matrix or data
frame. The recommended form is a data frame with column names. The
argument tau
is the quantile \(\tau\in(0,1)\), with the default value 0.5.
The argument t_interval
specifies the domain of the
function-valued covariate, and should be a 2-element vector,
representing an interval. The default is c(0,1)
,
representing interval \([0,1]\). The
argument t_points
is the sequence of measurement (time)
points, default is NULL
. The argument grid_k
is an atomic vector, in which each element is a candidate basis
number.
The argument grid_h
is a non-zero-value atomic vector, in
which each element is a candidate value for the tuning parameter. The
argument degree
is used in to compute the derivative and
integral, and the default value is 45, which is large enough for most
scenario. The argument observed_X
is for estimating
parametric variance, and the default value is NULL
.
The function returns an ME.fcQR_CLS class object. as a list that containing the following elements.
NULL
.Tekwe et. al. proposed a bias correction estimation method for scalar-on-function linear regression model with measurement error using an instrumental variable [6]. Their statistical model is as follows: \[\begin{align*} &Y_i = \int_0^1 \beta(t)X_i(t)dt + \varepsilon_i\\ &W_i(t) = X_i(t) + U_i(t)\\ &M_i(t) = \delta X_i(t) + \eta_i(t) \end{align*}\] where the response variable \(Y_i\) and are measured without error, the function-valued covariate \(X_i(t)\) is measured with error as \(W_i(t)\), and \(M_i(t)\) is an measured instrumental variable. They included the following additional assumptions:
for \(\forall t,s\in[0,1]\) and \(i = 1,\dots,n\).
The MECfda package provides a function
ME.fcLR_IV
for the application of their bias correction
estimation method. The argument data.Y
is the response
variable, which can be input as an atomic vector, a one-column matrix or
data frame, with the recommended form being a one-column data frame with
column name. The argument data.W
is the data for \(W\), the measurement of \(X\), in the statistical model. The input
should be a dataframe or matrix, in which each row represents a subject.
each column represent a measurement (time) point. The argument
data.M
is the data for \(M\), the instrumental variable. Input
should be a dataframe or matrix. Each row represents a subject. Each
column represent a measurement (time) point. The argument
t_interval
specifies the domain of the function-valued
covariate, and should be a 2-element vector, represents an interval. The
default value is c(0,1)
, represent interval \([0,1]\). The argument t_points
is the sequence of the measurement (time) points, and the default value
is NULL
. The argument CI.bootstrap
specifies
whether to return the confidence jinterval using the bootstrap method,
and the default is FALSE.
The function returns a ME.fcLR_IV class object as a list that containing the following elements.