Meiyue Wang and Shizhong Xu
Department of Botany and Plant Sciences, University of California, Riverside
Single marker models that detecting one locus at a time are subject to many problems in genome-wide association studies (GWAS) and quantitative trait locus (QTL) mapping, which includes large matrix inversion, over-conservativeness after Bonferroni correction and difficulty in evaluation of total genetic contribution. Such problems can be solved by a multiple locus model which includes all markers in the same model with effects being estimated simultaneously. The sparse Bayesian learning method (SBL), implemented in sbl
package, is a multiple locus model that can handle extremely large sample size (>100,000) and outcompetes other multiple locus GWAS methods in terms of detection power and computing time.
sbl
package installationsbl
can be downloaded and installed locally. The download link is here.
install.packages('sbl_0.1.0.tar.gz', repos=NULL, type='source')
The usage of sbl
to perform QTL mapping and GWAS is very simple (Note: Please remove the markers without variation before running the program):
library('sbl')
# load example data
data(phe)
data(intercept)
data(gen)
sblgwas()
function to perform regression and detect significant markers.# A minimal invocation of "sblgwas()" function looks like:
fit1<-sblgwas(x = intercept, y = phe, z = gen)
# Restuls of markers surrounding the second simulated QTL with non-zero effect in the example data
fit1$blup[c(17:21),]
## gamma vg wald p_wald
## 17 0.000000 0.0000000 0.00000 1.000000e+00
## 18 2.146942 0.1789244 25.76150 3.863190e-07
## 19 -2.050378 0.1539640 27.30541 1.737251e-07
## 20 0.000000 0.0000000 0.00000 1.000000e+00
## 21 0.000000 0.0000000 0.00000 1.000000e+00
Users can arbitrarily set the value of t
between [0, 2] to control the sparseness of model, default is -1.
# Setting t = 0 leads to the most sparse model
fit2<-sblgwas(x = intercept, y = phe, z = gen, t = 0)
fit2$parm
## iter error s2 beta df
## 1 22 5.930216e-07 2.827957 19.78626 4.612319
# Setting t = -2 leads to the least sparse model
fit3<-sblgwas(x = intercept, y = phe, z = gen, t = -2)
fit3$parm
## iter error s2 beta df
## 1 31 8.46504e-07 1.003592 20.03221 9.707904
max.iter
and min.err
are two thresholds to stop the program when either of them is met. max.iter
defines the maximum number of iterations that the program is allowed to run, default is 200; min.err
defines the minimum threshold of mean squared error of random effects estimated from the current and the previous iteration, default is 1e-6.
# Set max.iter and min.err to control the convergence of the program
fit4<-sblgwas(x = intercept, y = phe, z = gen, t = -1, max.iter = 300, min.err = 1e-8)
fit4$parm
## iter error s2 beta df
## 1 18 4.675823e-09 2.30688 19.70123 5.332911