Denote the \(p\)-variate predictors \(x_i\), \(i=1,...,n\) with corresponding responses \(y_i\). The predictors are assumed to follow the model \[ x_i = A s_i + b, \] where \(A\) is a non-singular \(p \times p\) matrix, \(b\) a \(p\)-vector and the random vector \(s\) can be decomposed into \((s_1^T,s_2^T)^T\) with \(E(s)=0\) and \(Cov(s)=I_p\) where \(s_1\) has dimension \(k\) and \(s_2\) dimension \(p-k\). It is then assumed that \(s_1\) is the signal part, the interesting components, which are relevant to model \(y\), whereas \(s_2\) is the noise part.
The working model assumption is then that \[ (y,s_1^T)^T \ is \ independent \ of \ s_2. \] Defining \(S_1=E((x-b)(x-b)^T)\) and \(S_2=E(E(x-b|y)E(x-b|y)^T)\) the sliced inverse regression (SIR) can be interpreted as finding the transformation matrix \(W\) such that \[ W S_1 W^T = I_p \ and \ W S_2 W^T = D, \] where \(D\) is a diagonal matrix with diagonal elements \(d_1 \geq d_2 \geq ... \geq d_k > d_{k+1} = ... = d_p = 0\).
In practice \(S_2\) is estimated by approximating \(E(x-b|y)\) by dividing \(y\) into \(h\) slices where in this package \(y\) is divided into \(h\) intervals containing an equal number of observations.
The practical problem is to decide then the value \(k\).
For an asymptotic test using the test statistic \[ T= n\sum_{i=k+1}^p d_i, \] the limiting distribution under the null is then \(T \sim \chi^2_{(p-k)(h-k-1)}\). Therefore for the hypothetical value \(k\) and the number of slices \(h\) is required that \(k \geq h-1\).
Bootstrap tests can be constructed as follows
The p-value is then \[ \frac{\#(T_i^* \geq T)+1}{m+1}. \]
Some simulated data with true \(k=2\):
set.seed(1234)
n <- 200
p <- 10
X <- matrix(rnorm(p*n), ncol = p)
eps <- rnorm(n, sd=0.25)
y <- X[, 1]/ (0.5+(X[, 2]+1.5)^2)
pairs(cbind(y,X))
First performing the asymptotic test
library(ICtest)
SIRasympk2 <- SIRasymp(X,y,2)
screeplot(SIRasympk2)
SIRasympk2
##
## SIR test for subspace dimension
##
## data: X
## T = 65.121, df = 56, p-value = 0.1891
## alternative hypothesis: the last 8 eigenvalues are not zero
Then the bootstrap test
SIRbootk2 <- SIRboot(X,y,2)
SIRbootk2
##
## SIR bootstrapping test for subspace dimension
##
## data: X
## T = 0.32561, replications = 200, p-value = 0.209
## alternative hypothesis: the last 8 eigenvalues are not zero
Looking at the first two components and their relationship with the response
pairs(cbind(y, components(SIRbootk2, which="k")))