NOTE: Due to the size of the data, this vignette does not contain the output of the code.
This example study makes use of GWA data from the Wellcome Trust
Centre for Human Genetics\(^{1,2,3}\)
(http://mtweb.cs.ucl.ac.uk/mus/www/mouse/index.shtml).
The genotypes from this study were downloaded directly using the
BGLR-R
package. This study contains \(N =\) 1,814 heterogenous stock of mice from
85 families (all descending from eight inbred progenitor strains)\(^{1,2}\), and 131 quantitative traits that
are classified into 6 broad categories including behavior, diabetes,
asthma, immunology, haematology, and biochemistry. Phenotypic
measurements for these mice can be found freely available online to
download (details can be found at http://mtweb.cs.ucl.ac.uk/mus/www/mouse/HS/index.shtml).
In the main text, we focused on 15 hematological phenotypes including:
atypical lymphocytes (ALY; Haem.ALYabs
), basophils (BAS;
Haem.BASabs
), hematocrit (HCT; Haem.HCT
),
hemoglobin (HGB; Haem.HGB
), large immature cells (LIC;
Haem.LICabs
), lymphocytes (LYM; Haem.LYMabs
),
mean corpuscular hemoglobin (MCH; Haem.MCH
), mean
corpuscular volume (MCV; Haem.MCV
), monocytes (MON;
Haem.MONabs
), mean platelet volume (MPV;
Haem.MPV
), neutrophils (NEU; Haem.NEUabs
),
plateletcrit (PCT; Haem.PCT
), platelets (PLT;
Haem.PLT
), red blood cell count (RBC;
Haem.RBC
), red cell distribution width (RDW;
Haem.RDW
), and white blood cell count (WBC;
Haem.WBC
). All phenotypes were previously corrected for
sex, age, body weight, season, year, and cage effects \(^{1,2}\). For individuals with missing
genotypes, we imputed values by the mean genotype of that SNP in their
corresponding family. Only polymorphic SNPs with minor allele frequency
above 5% were kept for the analyses. This left a total of \(J =\) 10,227 autosomal SNPs that were
available for all mice.
In this section, we apply mvMAPIT to individual-level genotypes and 15 hematology traits in a heterogeneous stock of mice dataset from the Wellcome Trust Centre for Human Genetics\(^{1,2,3}\). This collection of data contains approximately \(N =\) 2,000 individuals depending on the phenotype, and each mouse has been genotyped at \(J =\) 10,346 SNPs. Specifically, this stock of mice are known to be genetically related with population structure and the genetic architectures of these particular traits have been shown to have different levels of broad-sense heritability with varying contributions from non-additive genetic effects.
The number of complete samples in the data varies for different
traits and trait pairs. For this study we created separate data sets for
each trait and trait pair containing the genotype data in a genotype
matrix encoded as {0, 1, 2}
(minor allele count) and the
trait or trait pair in a phenotype matrix. Apply mvmapit()
to each data set by running the following.
As a result, we get redundant \(P\)-values for some of the univariate variance components. The statistical detection of epistasis is sensitive to sample size. Therefore, we coalesce the redundant data by keeping the analysis results of the largest data set used in the analysis and impute missing data from the next smaller data set that has no missing data.
The results of the paper data are published on Harvard Dataverse. Find the files for Download here\(^{23}\). For running the code snippets in this vignette, download the two files
and read the files using the following:
We also include results corresponding to the univariate MAPIT model and the covariance test for comparison. Overall, the single-trait marginal epistatic test does only identifies significant variants for the large immature cells (LIC) after Bonferroni correction (\(P = 4.83\times 10^{-6}\)). A complete picture of this can be seen in the following figure, which depicts Manhattan plots of our genome-wide interaction study for all combinations of trait pairs. Here, we can see that most of the signal in the combined \(P\)-values from mvMAPIT likely stems from the covariance component portion of the model.
for_ticks_chr <- aggregate(position ~ chr, mice_data$fisher, function(x) c(first = min(x), last = max(x))) %>%
mutate(tick = floor((position[,"first"] + position[,"last"]) / 2)) %>%
mutate(chr2 = case_when(chr %% 5 == 0 ~ as.character(chr),
chr == 1 ~ as.character(chr),
TRUE ~ ""))
for_facetgrid_row <- as_labeller(c(`1` = "Trait #1", `2` = "Trait #2", `3` = "Covariance", `4` = "Combined"))
gg <- mice_SI_paper$fisher %>% ggplot(aes(
x = position,
y = -log10(pplot),
color = factor(color)
)) +
geom_point_rast(
size = 0.7) +
scale_color_manual(
values = c("#8b8b8b", "#bfbfbf", "#1b9e77")
) +
scale_y_continuous(breaks = c(0, 5, 10),
labels = c("0", "5", ">10")) +
geom_hline(
aes(
yintercept = -log10(5.179737e-06),
linetype = "Bonferroni"
),
color = "#d95f02",
size = 0.3
) +
theme_bw() +
facet_grid(x ~ y) +
theme(
panel.grid.major.x = element_blank(),
legend.position = "bottom",
text = element_text(family = "Times"),
) +
labs(
y = bquote(-log[10](p)),
color = "") +
scale_x_continuous("Chromosome",
breaks = for_ticks_chr$tick,
labels = for_ticks_chr$chr2) +
scale_linetype_manual(name = "", values = c('dashed'))
show(gg)
The hypothesis that most of the signal in the combined \(P\)-values from mvMAPIT likely stems from the covariance component portion of the model holds true for the joint pairwise analysis of hematocrit (HCT) and hemoglobin (HGB) and mean corpuscular hemoglobin (MCH) and mean corpuscular volume (MCV) (e.g., see the third and fourth rows of the following figure).
gg <- mice_HCTHGB_MCVMCH$fisher %>% ggplot(aes(
x = position,
y = -log10(pplot),
color = factor(color)
)) +
geom_point_rast(
size = 0.7) +
scale_color_manual(
values = c("#8b8b8b", "#bfbfbf", "#1b9e77")
) +
scale_y_continuous(breaks = c(0, 5, 10),
labels = c("0", "5", ">10")) +
geom_hline(
aes(
yintercept = -log10(5.179737e-06),
linetype = "Bonferroni"
),
color = "#d95f02",
size = 0.3
) +
theme_bw() +
facet_grid(row ~ case, labeller = labeller(row = for_facetgrid_row)) +
theme(
panel.grid.major.x = element_blank(),
legend.position = "bottom",
text = element_text(family = "Times"),
) +
labs(
y = bquote(-log[10](p)),
color = "") +
scale_x_continuous("Chromosome",
breaks = for_ticks_chr$tick,
labels = for_ticks_chr$chr2) +
scale_linetype_manual(name = "", values = c('dashed'))
show(gg)
One explanation for observing more signal in the covariance components over the univariate test could be derived from the traits having low heritability but high correlation between epistatic interaction effects. In our simulation studies (see publication) we showed that the sensitivity of the covariance statistic increased for these cases. Notably, the non-additive signal identified by the covariance test is not totally dependent on the empirical correlation between traits. Instead, as previously shown in our simulation study, the power of mvMAPIT over the univariate approach occurs when there is correlation between the effects of epistatic interactions shared between two traits. Importantly, many of the candidate SNPs selected by the mvMAPIT framework have been previously discovered by past publications as having some functional nonlinear relationship with the traits of interest. For example, the multivariate analysis with traits MCH and MCV show a significant SNP rs4173870 (\(P = 4.89\times 10^{-10}\)) in the gene hematopoietic cell-specific Lyn substrate 1 (Hcls1) on chromosome 16 which has been shown to play a role in differentiation of erythrocytes\(^7\). Similarly, the joint analysis of HGB and HCT shows hits in multiple coding regions. One example here are the SNPs rs3692165 (\(P = 1.82\times 10^{-6}\)) and rs13482117 (\(P = 8.94\times 10^{-7}\)) in the gene calcium voltage-gated channel auxiliary subunit alpha2delta 3 (Cacna2d3) on chromosome 14, which has been associated with decreased circulating glucose levels\(^8\), and SNP rs3724260 (\(P = 4.58\times 10^{-6}\)) in the gene Dicer1 on chromosome 12 which has been annotated for anemia both in humans and mice\(^9\).
For full analysis, we provide a summary table which lists the combined \(P\)-values after running mvMAPIT with Fisher’s method. The following table lists a select subset of SNPs in coding regions of genes that have been associated with phenotypes related to the hematopoietic system, immune system, or homeostasis and metabolism. Each of these are significant (after correction for multiple hypothesis testing) in the mvMAPIT analysis of related hematology traits. Some of these phenotypes have been reported as having large broad-sense heritability, which improves the ability of mvMAPIT to detect the signal. For example, the genes Arf2 and Cacna2d3 are associated with phenotypes related to glucose homeostasis, which has been reported to have a large heritable component (estimated \(H^2 = 0.3\) for insulin sensitivity\(^{10}\)). Similarly, the genes App and Pex1 are associated with thrombosis where (an estimated) more than half of phenotypic variation has been attributed to genetic effects (estimated \(H^2 \ge 0.6\) for susceptibility to common thrombosis\(^{11}\)).
SNP | Location | Trait 1 | Trait 2 | Trait 1 \(P\) | Trait 2 \(P\) | Cov. \(P\) | Comb. \(P\) | Gene | Genomic Annotation | Reference |
---|---|---|---|---|---|---|---|---|---|---|
rs3699393 | 2:5887012 | MCV | PLT | 0.21 | 0.23 | 5.75e-7 | 4.9e-06 | Upf2 | anemia and abnormal bone marrow cell development | \(^{12}\) |
rs13478092 | 5:3601413 | LIC | PLT | 0.034 | 0.58 | 1.67e-10 | 1.26e-9 | Pex1 | abnormal venous thrombosis | \(^{13}\) |
rs3694887 | 5:102770070 | ALY | LIC | 1.26e-4 | 0.013 | 2.54e-6 | 1.55e-9 | Aff1 | abnormal B and T cell number and morphology | \(^{14}\) |
rs3694887 | 5:102770070 | LIC | PLT | 0.013 | 0.28 | 5.47e-27 | 4.49e-26 | Aff1 | abnormal B and T cell number and morphology | \(^{14}\) |
rs13478923 | 6:99475169 | ALY | LIC | 2.8e-4 | 0.12 | 1.79e-6 | 1.81e-8 | Foxp1 | abnormal B cell differentiation, physiology, count | \(^{15,16}\) |
rs13478924 | 6:99571626 | ALY | LIC | 3.11e-4 | 0.12 | 2.70e-6 | 2.86e-8 | Foxp1 | abnormal B cell differentiation, physiology, count | \(^{15,16}\) |
rs13478985 | 6:115245823 | MCV | WBC | 0.16 | 0.40 | 1.14e-81 | 1.34e-78 | Atg7 | decreased bone marrow cell count | \(^{17}\), \(^{18}\) |
rs3723163 | 11:103800737 | HCT | LYM | 0.072 | 0.30 | 3.99e-107 | 2.66e-104 | Arf2 | decreased fasting circulating glucose level | \(^8\) |
rs3723163 | 11:103800737 | HGB | WBC | 0.069 | 0.25 | 1.85e-7 | 6.76e-7 | Arf2 | decreased fasting circulating glucose level | \(^8\) |
rs3724260 | 12:100163212 | HGB | HCT | 0.030 | 0.062 | 1.44e-5 | 4.58e-6 | Dicer1 | anemia | \(^9\) |
rs3692165 | 14:27756640 | HCT | HGB | 0.026 | 0.037 | 9.9e-6 | 1.8e-06 | Cacna2d3 | decreased circulating glucose level | \(^8\) |
rs3697466 | 14:27485228 | HCT | HGB | 0.026 | 0.037 | 9.9e-6 | 1.8e-06 | Cacna2d3 | decreased circulating glucose level | \(^8\) |
rs13482117 | 14:27614362 | HCT | HGB | 0.023 | 0.03 | 5.9e-6 | 9.0e-07 | Cacna2d3 | decreased circulating glucose level | \(^8\) |
rs6159786 | 14:27820736 | HCT | HGB | 0.026 | 0.037 | 9.9e-6 | 1.8e-06 | Cacna2d3 | decreased circulating glucose level | \(^8\) |
rs6244569 | 14:27044891 | HCT | HGB | 0.026 | 0.037 | 9.9e-6 | 1.8e-06 | Cacna2d3 | decreased circulating glucose level | \(^8\) |
rs13482288 | 14:81840412 | ALY | BAS | 0.036 | 0.65 | 1.78e-8 | 1.1e-07 | Tdrd3 | abnormal B cell differentiation and physiology | \(^{19}\) |
rs3680448 | 14:81934085 | ALY | BAS | 0.036 | 0.65 | 1.78e-8 | 1.1e-07 | Tdrd3 | abnormal B cell differentiation and physiology | \(^{19}\) |
rs4173870 | 16:35764290 | MCH | MCV | 0.14 | 0.71 | 1.20e-11 | 4.89e-10 | Hcls1 | differentiation of erythrocytes | \(^7\) |
rs4212102 | 16:84204704 | PLT | WBC | 0.17 | 0.35 | 1.16e-10 | 2.44e-9 | App | increased susceptibility to induced thrombosis | \(^{20,11}\) |
rs4212186 | 16:84273330 | PLT | WBC | 0.17 | 0.36 | 5.88e-11 | 1.31e-9 | App | increased susceptibility to induced thrombosis | \(^{20,11}\) |
rs3711994 | 19:45078018 | ALY | LYM | 3.71e-4 | 0.10 | 1.04e-12 | 2.80e-14 | Btrc | abnormal lymphocyte morphology | \(^{21}\) |
In the first two columns, we list SNPs and their genetic location
according to the mouse assembly NCBI build 34 (accessed from \(^{21}\)) in the format
Chromosome:Basepair
. Next, we give the results stemming
from univariate analyses on traits 1 and 2, respectively, the covariance
(cov) test, and the overall \(P\)-value
derived by mvMAPIT using Fisher’s method. The last columns detail the
closest neighboring genes found using the Mouse Genome Informatics
database\(^4\) \(^5\) \(^6\), a short summary of the suggested
annotated function for those genes, and the reference to the source of
the annotation.
1: William Valdar, Jonathan Flint, and Richard Mott. Simulating the collaborative cross: power of quantitative trait loci detection and mapping resolution in large sets of recombinant inbred strains of mice. Genetics, 172(3):1783–1797, 2006. ISSN 0016-6731. https://doi.org/10.1534/genetics.104.039313.
2: William Valdar, Leah C. Solberg, Dominique Gauguier, Stephanie Burnett, Paul Klenerman, William O. Cookson, Martin S. Taylor, J. Nicholas P. Rawlins, Richard Mott, and Jonathan Flint. Genome-wide genetic association of complex traits in heterogeneous stock mice. Nature Genetics, 38(8):879–887, 2006. ISSN 1546-1718. https://doi.org/10.1038/ng1840.
3: Wellcome trust centre for human genetics - mouse resources. URL http://mtweb.cs.ucl.ac.uk/mus/www/mouse/index.shtml.
4: Judith A Blake, Richard Baldarelli, James A Kadin, Joel E Richardson, Cynthia L Smith, and Carol J Bult. Mouse genome database (MGD): Knowledgebase for mouse–human comparative biology. Nucleic Acids Research, 49:D981–D987, 2020. ISSN 0305-1048. https://doi.org/10.1093/nar/gkaa1083.
5: Cynthia L. Smith and Janan T. Eppig. The mammalian phenotype ontology: enabling robust annotation and comparative analysis. Wiley Interdisciplinary Reviews. Systems Biology and Medicine, 1(3):390–399, 2009. ISSN 1939-005X. https://doi.org/10.1002/wsbm.44.
6: Mouse Genome Informatics database http://www.informatics.jax.org
7: Karla F. Castro-Ochoa, Idaira M. Guerrero-Fonseca, and Michael Schnoor. Hematopoietic cell specific lyn substrate (HCLS1 or HS1): A versatile actin-binding protein in leukocytes. Journal of Leukocyte Biology, 105(5):881–890, 2019. ISSN 1938-3673. doi: 10.1002/JLB.MR0618-212R.
8: Obtaining and loading phenotype annotations from the international mouse phenotyping consortium (IMPC) database. Mouse Genome Informatics and the International Mouse Phenotyping Consortium, 2014. URL http://www.informatics.jax.org/reference/J:211773.
9: Marc H. G. P. Raaijmakers, Siddhartha Mukherjee, Shangqin Guo, Siyi Zhang, Tatsuya Kobayashi, Jesse A. Schoonmaker, Benjamin L. Ebert, Fatima Al-Shahrour, Robert P. Hasserjian, Edward O. Scadden, Zinmar Aung, Marc Matza, Matthias Merkenschlager, Charles Lin, Johanna M. Rommens, and David T. Scadden. Bone progenitor dysfunction induces myelodysplasia and secondary leukaemia. Nature, 464(7290):852–857, 2010. ISSN 1476-4687. doi: 10.1038/nature08851. URL https://www.nature.com/articles/nature08851. Number: 7290 Publisher: Nature Publishing Group.
10: Jill M. Norris and Stephen S. Rich. Genetics of glucose homeostasis. Arteriosclerosis, Thrombosis, and Vascular Biology, 32(9):2091–2096, 2012. doi: 10.1161/ATVBAHA.112.255463. URL https://www.ahajournals.org/doi/10.1161/ATVBAHA.112.255463. Publisher: American Heart Association.
11: Juan Carlos Souto, Laura Almasy, Montserrat Borrell, Francisco Blanco-Vaca, José Mateo, José Manuel Soria, Inma Coll, Rosa Felices, William Stone, Jordi Fontcuberta, and John Blangero. Genetic susceptibility to thrombosis and its relationship to physiological risk factors: The GAIT study. The American Journal of Human Genetics, 67(6):1452–1459, 2000. ISSN 0002-9297. doi: 10.1086/316903. URL https://www.sciencedirect.com/science/article/pii/S0002929707632145.
12: Joachim Weischenfeldt, Inge Damgaard, David Bryder, Kim Theilgaard-Mönch, Lina A. Thoren, Finn Cilius Nielsen, Sten Eirik W. Jacobsen, Claus Nerlov, and Bo Torben Porse. NMD is essential for hematopoietic stem and progenitor cells and for eliminating by-products of programmed DNA rearrangements. Genes & Development, 22(10):1381–1396, 2008. ISSN 0890-9369, 1549-5477. doi: 10.1101/gad.468808. URL https://genesdev.cshlp.org/content/22/10/1381. Company: Cold Spring Harbor Laboratory Press Distributor: Cold Spring Harbor Laboratory Press Institution: Cold Spring Harbor Laboratory Press Label: Cold Spring Harbor Laboratory Press Publisher: Cold Spring Harbor Lab.
13: Kevin Berendse, Maxim Boek, Marion Gijbels, Nicole N. Van der Wel, Femke C. Klouwer, Marius A. van den Bergh-Weerman, Abhijit Babaji Shinde, Rob Ofman, Bwee Tien Poll-The, Sander M. Houten, Myriam Baes, Ronald J. A. Wanders, and Hans R. Waterham. Liver disease predominates in a mouse model for mild human zellweger spectrum disorder. Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease, 1865(10):2774–2787, 2019. ISSN 0925-4439. doi: 10.1016/j.bbadis.2019.06.013. URL https://www.sciencedirect.com/science/article/pii/S0925443919302121.
14: Patricia Isnard, Nathalie Coré, Philippe Naquet, and Malek Djabali. Altered lymphoid development in mice deficient for the mAF4 proto-oncogene. Blood, 96(2):705–710, 2000. ISSN 0006-4971. doi: 10.1182/blood.V96.2.705. URL https://doi.org/10.1182/blood.V96.2.705.
15: Xiaoming Feng, Gregory C. Ippolito, Lifeng Tian, Karla Wiehagen, Soyoung Oh, Arivazhagan Sambandam, Jessica Willen, Ralph M. Bunte, Shanna D. Maika, June V. Harriss, Andrew J. Caton, Avinash Bhandoola, Philip W. Tucker, and Hui Hu. Foxp1 is an essential transcriptional regulator for the generation of quiescent naive t cells during thymocyte development. Blood, 115(3):510–518, 2010. ISSN 0006-4971. doi: 10.1182/blood-2009-07-232694. URL https://doi.org/10.1182/blood-2009-07-232694.
16: 1451 Hui Hu, Bin Wang, Madhuri Borde, Julie Nardone, Shan Maika, Laura Allred, Philip W. Tucker, and Anjana Rao. Foxp1 is an essential transcriptional regulator of b cell development. Nature Immunology, 7(8):819–826, 2006. ISSN 1529-2916. doi: 10.1038/ni1358. URL https://www.nature.com/articles/ni1358. Number: 8 Publisher: Nature Publishing Group.
17: Monika Mortensen, Elizabeth J. Soilleux, Gordana Djordjevic, Rebecca Tripp, Michael Lutteropp, Elham Sadighi-Akha, Amanda J. Stranks, Julie Glanville, Samantha Knight, Sten-Eirik W. Jacobsen, Kamil R. Kranc, and Anna Katharina Simon. The autophagy protein atg7 is essential for hematopoietic stem cell maintenance. Journal of Experimental Medicine, 208(3):455–467, 2011. ISSN 0022-1007. doi: 10.1084/jem.20101145. URL https://doi.org/10.1084/jem.20101145.
18: Alexander J. Clarke, Ursula Ellinghaus, Andrea Cortini, Amanda Stranks, Anna Katharina Simon, Marina Botto, and Timothy J. Vyse. Autophagy is activated in systemic lupus erythematosus and required for plasmablast development. Annals of the Rheumatic Diseases, 74(5):912–920, 2015. ISSN 0003-4967, 1468-2060. doi: 10.1136/annrheumdis-2013-204343. URL https://ard.bmj.com/content/74/5/912. Publisher: BMJ Publishing Group Ltd Section: Basic and translational research.
19: Yanzhong Yang, Kevin M. McBride, Sean Hensley, Yue Lu, Frederic Chedin, and Mark T. Bedford. Arginine methylation facilitates the recruitment of TOP3b to chromatin to prevent r loop accumu lation. Molecular Cell, 53(3):484–497, 2014. ISSN 1097-2765. doi: 10.1016/j.molcel.2014.01.011. URL https://www.sciencedirect.com/science/article/pii/S1097276514000434.
20: Feng Xu, Judianne Davis, Michael Hoos, andWilliam E. Van Nostrand. Mutation of the kunitz-typeproteinase inhibitor domain in the amyloid-protein precursor abolishes its anti-thrombotic properties in vivo. Thrombosis Research, 155:58–64, 2017. ISSN 0049-3848. doi: 10.1016/j.thromres.2017.05.003. URL https://www.sciencedirect.com/science/article/pii/S0049384817303110.
21: Keiko Nakayama, Shigetsugu Hatakeyama, Shun-ichiro Maruyama, Akira Kikuchi, Kazunori Onoé, Robert A. Good, and Keiichi I. Nakayama. Impaired degradation of inhibitory subunit of NF-b (IkappaB) and -catenin as a result of targeted disruption of the -TrCP1 gene. Proceedings of the National Academy of Sciences, 100(15):8752–8757, 2003. doi: 10.1073/pnas.1133216100. URL https://www.pnas.org/doi/full/10.1073/pnas.1133216100. Publisher: Proceedings of the National Academy of Sciences.
22: Sagiv Shifman, 1480 Jordana Tzenova Bell, Richard R. Copley, Martin S. Taylor, Robert W. Williams, Richard Mott, and Jonathan Flint. A high-resolution single nucleotide polymorphism genetic map of the mouse genome. PLOS Biology, 4(12):e395, 2006. ISSN 1545-7885. doi: 10.1371/journal. pbio.0040395. URL https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.0040395. Publisher: Public Library of Science.
23: Stamp, Julian, 2022, “Leveraging the Genetic Correlation between Traits Improves the Detection of Epistasis in Genome-wide Association Studies”, https://doi.org/10.7910/DVN/WPFIGU, Harvard Dataverse, V1