DiffCorr [1-2] is a package for identifying pattern changes between 2 experimental conditions in correlation networks (e.g., gene co-expression networks), which builds on a commonly used association measure, such as Pearson’s correlation coefficient. This document demonstrates typical correlation network analysis using transcriptome and metabolome data.
Introduction
Molecular interactions can be modeled as networks by measuring associations between molecules in omics data. Gene co-expression analysis, commonly based on transcriptome datasets from microarray experiments and RNA-seq, uses metrics like Pearson’s correlation coefficient to quantify these relationships.
When gene correlations surpass a threshold, they form co-expression or correlation networks. These analyses, often using a “guide-gene” approach [3], offer insights into regulatory mechanisms and have been used to identify genes involved in plant secondary metabolisms.
In addition to identifying differentially expressed genes (DEGs) between samples, changes in correlation patterns, or “differential correlations,” provide insights into molecular interactions [4]. Differential network analysis, which compares networks (e.g., normal vs. diseased), has been applied to both plant and animal studies and has been useful in metabolomics for understanding complex metabolic processes.
This document demonstrate typical correlation network analysis using transcriptome and metabolome data. It also showcases the utility of the DiffCorr [1-2] package by identifying biologically relevant, differentially correlated molecules in transcriptome co-expression and metabolite-to-metabolite correlation networks.
DiffCorr for Golub’s data (ALL/AML leukemia dataset)
This section was created from Additional File 3 included in the original DiffCorr package. As an example, we use Golub’s data (https://coxpress.sourceforge.net/golub.txt). The dataset consist of gene expression profiles from 38 tumor samples including 2 different leukemia subtypes: 27 acute lymphoblastic leukemia (ALL) and 11 acute myeloid leukemia (AML) samples (Golub et al., 1999). The microarray platform used, Affymetrix GeneChip HuGeneFL (known as HU6800), contains 6800 probe-sets. To demonstrate the usefulness of DiffCorr package, we describe and discuss the results from analysis of the transcriptomic dataset.
Reading the Golub dataset
Clusters on each subset
Cut the tree at a correlation of 0.6 using cutree function
g1 <- cutree(hc.mol1, h = 0.4)
g2 <- cutree(hc.mol2, h = 0.4)
##
res1 <- get.eigen.molecule(data = golub.df, groups = g1)
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5
#> [1] 6
#> [1] 7
#> [1] 8
#> [1] 9
#> [1] 10
#> [1] 11
#> [1] 12
#> [1] 13
#> [1] 14
#> [1] 15
#> [1] 16
#> [1] 17
#> [1] 18
#> [1] 19
#> [1] 20
#> [1] 21
#> [1] 22
res2 <- get.eigen.molecule(data = golub.df, groups = g2)
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5
#> [1] 6
#> [1] 7
#> [1] 8
#> [1] 9
#> [1] 10
#> [1] 11
#> [1] 12
#> [1] 13
#> [1] 14
#> [1] 15
#> [1] 16
#> [1] 17
#> [1] 18
#> [1] 19
#> [1] 20
#> [1] 21
#> [1] 22
#> [1] 23
#> [1] 24
#> [1] 25
#> [1] 26
#> [1] 27
#> [1] 28
#> [1] 29
#> [1] 30
#> [1] 31
#> [1] 32
#> [1] 33
#> [1] 34
#> [1] 35
#> [1] 36
#> [1] 37
#> [1] 38
#> [1] 39
#> [1] 40
#> [1] 41
#> [1] 42
#> [1] 43
#> [1] 44
#> [1] 45
#> [1] 46
#> [1] 47
#> [1] 48
#> [1] 49
#> [1] 50
#> [1] 51
#> [1] 52
#> [1] 53
#> [1] 54
#> [1] 55
#> [1] 56
#> [1] 57
#> [1] 58
#> [1] 59
#> [1] 60
#> [1] 61
#> [1] 62
#> [1] 63
#> [1] 64
#> [1] 65
#> [1] 66
#> [1] 67
#> [1] 68
#> [1] 69
#> [1] 70
#> [1] 71
#> [1] 72
#> [1] 73
#> [1] 74
#> [1] 75
#> [1] 76
#> [1] 77
#> [1] 78
#> [1] 79
#> [1] 80
#> [1] 81
#> [1] 82
#> [1] 83
#> [1] 84
#> [1] 85
#> [1] 86
#> [1] 87
Visualizing module networks
You can save the results.
write.modules(g1, res1, outfile = "module1_list.txt")
write.modules(g2, res2, outfile = "module2_list.txt")
You can examine the relationship between modules.
for (i in 1:length(res1$eigen.molecules)) {
for (j in 1: length(res2$eigen.molecules)) {
r <- cor(res1$eigen.molecules[[i]],res2$eigen.molecules[[j]], method = "spearman")
if (abs(r) > 0.8) {
print(paste("(i, j): ", i, " ", j, sep = ""))
print(r)
}
}
}
#> [1] "(i, j): 2 8"
#> [1] 0.830303
#> [1] "(i, j): 4 83"
#> [1] 0.8424242
#> [1] "(i, j): 5 86"
#> [1] 0.8787879
#> [1] "(i, j): 10 56"
#> [1] -0.8666667
#> [1] "(i, j): 10 63"
#> [1] -0.8545455
#> [1] "(i, j): 13 47"
#> [1] -0.8060606
#> [1] "(i, j): 13 87"
#> [1] 0.8181818
#> [1] "(i, j): 21 24"
#> [1] -0.9515152
cor(res1$eigen.molecules[[2]], res2$eigen.molecules[[8]], method = "spearman")
#> [1] 0.830303
plot(res1$eigen.molecules[[2]], res2$eigen.molecules[[8]])
Examine groups of interest graphically
look at groups 21 and 24
Exploring the metabolome data of flavonoid-deficient Arabidopsis
Kusano et al. [5] studied flavonoid-deficient Arabidopsis thaliana (Arabidopsis) mutants and wild-type plants using gas chromatography-mass spectrometry (GC-MS) for metabolite profiling [5-6]. The mutant, transparent testa 4 (tt4), lacks chalcone synthase (CHS), a key enzyme in the flavonoid biosynthesis pathway, and is unable to produce flavonoids, which protect plants from UV-B radiation.
AraMetLeaves
dataset
AraMetLeaves
includes metabolite profiles of 37 aerial
part samples, consisting of 17 Columbia-0 wild-type (Col-0) and 20
tt4 plants, covering a wide range of primary metabolites. The
dataset AraMetLeaves
is available in the DiffCorr
package.
The AraMetLeaves
dataset contains 59 metabolites (rows)
and 50 observations (columns). For comparison with data from aerial
parts [5-6], we selected 59 commonly detected metabolites across both
datasets using MetMask (https://metmask.sourceforge.net). It is important to
note that another genotype, mto1, is also present in the data
matrix. For further information, refer to the help page of
AraMetLeaves
.
colnames(AraMetLeaves)
#> [1] "Col0.1" "Col0.2" "Col0.3" "Col0.4" "Col0.5" "Col0.6" "Col0.7"
#> [8] "Col0.8" "Col0.9" "Col0.10" "Col0.11" "Col0.12" "Col0.13" "Col0.14"
#> [15] "Col0.15" "Col0.16" "Col0.17" "tt4.1" "tt4.2" "tt4.3" "tt4.4"
#> [22] "tt4.5" "tt4.6" "tt4.7" "tt4.8" "tt4.9" "tt4.10" "tt4.11"
#> [29] "tt4.12" "tt4.13" "tt4.14" "tt4.15" "tt4.16" "tt4.17" "tt4.18"
#> [36] "tt4.19" "tt4.20" "mto1.1" "mto1.2" "mto1.3" "mto1.4" "mto1.5"
#> [43] "mto1.6" "mto1.7" "mto1.8" "mto1.9" "mto1.10" "mto1.11" "mto1.12"
#> [50] "mto1.13"
?AraMetLeaves
Differential correlation analysis for tt4 mutant and the wild-type plants
Differential correlation between tt4 and Col-0 can be performed as follows:
comp.2.cc.fdr(output.file = "Met_DiffCorr_res.txt",
log10(AraMetLeaves[, 1:17]), ## Col-0 (17 samples)
log10(AraMetLeaves[, 18:37]), ## tt4 (20 samples)
method = "pearson",
threshold = 1.0, save = TRUE)
As indicated in the ASCII result file “Met_DiffCorr_res.txt,” the DiffCorr package identified significant differential correlations between sinapate and aromatic metabolites in tt4 and wild-type plants. Consistent with previous findings [2], aromatic metabolites in the shikimate pathway—specifically sinapate, phenylalanine, and tyrosine exhibited significant correlations in tt4 but not in wild-type plants (Table 1). This suggests a connection to the role of sinapoyl-malate in protecting the flavonoid-deficient tt4 mutant against UV-B irradiation [5]. Our results demonstrate that Arabidopsis compensates for the deficiency in either flavonoid or sinapoyl-malate production by over-accumulating alternative protective compounds [7]. These findings suggest that DiffCorr is applicable not only to transcriptomic data but also to other post-genomic data types, including metabolomic data.
Table 1. A typical result of pairwise differential correlations from the DiffCorr package. The full list can be found in [1].
molecule X | molecule Y | r1 | p1 | r2 | p2 | p (difference) | (r1-r2) | lfdr (in cond. 1) | lfdr (in cond. 2) | lfdr (difference) |
---|---|---|---|---|---|---|---|---|---|---|
Malate | Threonine | 0.77 | 0.00034 | 0.94 | 1.5E-09 | 0.057 | -0.17 | 0.0049 | 3.2E-08 | 0.76 |
Malate | Phenylalanine | 0.45 | 0.070 | 0.89 | 1.2E-07 | 0.0086 | -0.44 | 0.20 | 2.2E-06 | 0.64 |
Conclusion
The R package DiffCorr provides a straightforward and efficient framework for detecting differential correlations between two conditions in omics data, utilizing Fisher’s z-test. It is a useful tool for inferring potential relationships and identifying biomarker candidates. Based on the concept of “differential network biology,” DiffCorr [1, 5] is applicable not only to metabolomic data but also to transcriptome, proteome, and integrated omics datasets.
References
Fukushima, Gene (2013) https://doi.org/10.1016/j.gene.2012.11.028
Fukushima and Nishida “Using the DiffCorr Package to Analyze and Visualize Differential Correlations in Biological Networks” - Book chapter in “Challenges of Computational Network Analysis with R”. Editors: Matthias Dehmer, Yongtang Shi, and Frank Emmert-Streib. WILEY.
Saito et al. Trends Plant Sci (2008) https://doi.org/10.1016/j.tplants.2007.10.006
de la Fuente, Trends Genet (2010) https://doi.org/10.1016/j.tig.2010.05.001
Kusano et al. BMC Syst Biol (2007) https://doi.org/10.1186/1752-0509-1-53
Fukushima et al. BMC Syst Biol (2011) https://doi.org/10.1186/1752-0509-5-1
Kusano et al. Plant J (2011) https://doi.org/10.1111/j.1365-313x.2011.04599.x