This is the simplest case for Meshed GPs. We start by generating some data
library(magrittr)
library(dplyr)
library(ggplot2)
library(meshed)
set.seed(2021)
<- 50 # coord values for jth dimension
SS <- 2 # spatial dimension
dd <- SS^2 # number of locations
n <- 3 # number of covariates
p
<- seq(0, 1, length.out=SS)
xlocs <- expand.grid(list(xlocs, xlocs))
coords
<- 1.5
sigmasq <- 0.5
nu <- 10
phi <- .1
tausq
# covariance at coordinates
<- sigmasq * exp(-phi * as.matrix(dist(coords)))
CC # cholesky of covariance matrix
<- t(chol(CC))
LC # spatial latent effects are a GP
<- LC %*% rnorm(n)
w # measurement errors
<- tausq^.5 * rnorm(n)
eps
# covariates and coefficients
<- matrix(rnorm(n*p), ncol=p)
X <- matrix(rnorm(p), ncol=1)
Beta
# univariate outcome, fully observed
<- X %*% Beta + w + eps
y_full
# now introduce some NA values in the outcomes
<- y_full %>% matrix(ncol=1)
y sample(1:n, n/5, replace=FALSE), ] <- NA
y[
<- coords %>%
simdata cbind(data.frame(Outcome_full= y_full,
Outcome_obs = y,
w = w))
%>% ggplot(aes(Var1, Var2, fill=w)) +
simdata geom_raster() +
scale_fill_viridis_c() +
theme_minimal() + ggtitle("w: Latent GP")
%>% ggplot(aes(Var1, Var2, fill=y)) +
simdata geom_raster() +
scale_fill_viridis_c() +
theme_minimal() + ggtitle("y: Observed outcomes")
We now fit the following spatial regression model \[ y = X \beta + w + \epsilon, \] where \(w \sim MGP\) are spatial random effects, and \(\epsilon \sim N(0, \tau^2)\). For spatial data, an exponential covariance function is used by default: \(C(h) = \sigma^2 \exp( -\phi h )\) where \(h\) is the spatial distance.
The prior for the spatial decay \(\phi\) is \(U[l,u]\) and the values of \(l\) and \(u\) must be specified. We choose \(l=1\), \(u=30\) for this dataset.1
Setting verbose
tells spmeshed
how many
intermediate messages to output while running MCMC. We set up MCMC to
run for 3000 iterations, of which we discard the first third. We use the
argument block_size=25
to specify the coarseness of domain
partitioning. In this case, the same result can be achieved by setting
axis_partition=c(10, 10)
.
Finally, we note that NA
values for the outcome are OK
since the full dataset is on a grid. spmeshed
will figure
this out and use settings optimized for partly observed lattices, and
automatically predict the outcomes at missing locations. On the other
hand, X
values are assumed to not be missing.
<- 200 # too small! this is just a vignette.
mcmc_keep <- 400
mcmc_burn <- 2
mcmc_thin
<- system.time({
mesh_total_time <- spmeshed(y, X, coords,
meshout #axis_partition=c(10,10), #same as block_size=25
block_size = 25,
n_samples = mcmc_keep, n_burn = mcmc_burn, n_thin = mcmc_thin,
n_threads = 2,
verbose = 0,
prior=list(phi=c(1,30))
)})
We can now do some postprocessing of the results. We extract
posterior marginal summaries for \(\sigma^2\), \(\phi\), \(\tau^2\), and \(\beta_2\). The model that
spmeshed
targets is a slight reparametrization of the
above:2
\[ y = X \beta + \lambda w + \epsilon,
\] where \(w\sim MGP\) has
unitary variance. This model is equivalent to the previous one and in
fact we find \(\sigma^2=\lambda^2\).
summary(meshout$lambda_mcmc[1,1,]^2)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 1.310 1.833 1.976 2.140 2.319 4.758
summary(meshout$theta_mcmc[1,1,])
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 2.839 5.933 7.305 7.137 7.949 12.564
summary(meshout$tausq_mcmc[1,])
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.08837 0.11358 0.12587 0.12465 0.13383 0.17332
summary(meshout$beta_mcmc[1,1,])
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -0.8278 -0.8049 -0.7970 -0.7958 -0.7860 -0.7515
We proceed to plot predictions across the domain along with the recovered latent effects.
# process means
<- data.frame(w_mgp = meshout$w_mcmc %>% summary_list_mean())
wmesh # predictions
<- data.frame(y_mgp = meshout$yhat_mcmc %>% summary_list_mean())
ymesh
<- meshout$coordsdata %>%
outdf cbind(wmesh, ymesh) %>%
left_join(simdata)
%>%
outdf ggplot(aes(Var1, Var2, fill=w_mgp)) +
geom_raster() +
scale_fill_viridis_c() +
theme_minimal() + ggtitle("w: recovered")
%>%
outdf ggplot(aes(Var1, Var2, fill=y_mgp)) +
geom_raster() +
scale_fill_viridis_c() +
theme_minimal() + ggtitle("y: predictions")
spmeshed
implements a model which can be
interpreted as assigning \(\sigma^2\) a
folded-t via parameter expansion if settings$ps=TRUE
, or an
inverse Gamma with parameters \(a=2\)
and \(b=1\) if
settings$ps=FALSE
, which cannot be changed at this time.
\(\tau^2\) is assigned an exponential
prior.↩︎
At its core, spmeshed
implements the
spatial factor model \(Y(s) = X(s)\beta +
\Lambda v(s) + \epsilon(s)\) where \(w(s) = \Lambda v(s)\) is modeled via linear
coregionalization.↩︎