A typical scenario of transcriptome data analysis is identification of differentially expressed genes (DEGs), those with significant changes in the number of transcripts (fold-change), and functional enrichment analysis of DEGs lists using Gene Ontology. Classical gene set enrichment analysis ignores the difference in the fold-changes that lead to the loss of valuable information. The FoldGO method has been created to identify the GO terms significantly overrepresented for the genes responded to the factor within a narrow fold-change-interval. FoldGO processes the DEGs list in three steps:
See an example of the FoldGO performance in the analysis of the transcriptome data on expression changes of Arabidopsis thaliana genes in response to plant hormone auxin treatment (Omelyanchuk et al., 2017)1.
FoldGO pipeline consists of three steps:
As input data the algorithm uses the tables for up- and down-regulated genes that contain Gene IDs and their fold-change values:
GeneID | fold-change |
---|---|
AT3G65420 | 3.6 |
AT1G78450 | 1.5 |
AT2G66890 | 2.1 |
… |
First, one has to separate initial set of genes in to quantiles and generate unions of all neighbouring quantiles. For example, we will use built-in data derived from experiment on auxin treatment of Arabidopsis thaliana roots. GeneGroups function will take only first two columns, so be sure that your data have gene identifiers in the first column and fold-change values (FC) in the second one. Note that data for up- and down-regulation must be processed separately. In the following example demonstrating GeneGroups function usage only the data for up-regulation is used.
GeneID | FC | pval | qval |
---|---|---|---|
AT1G01120 | 2.069 | 2.70e-10 | 2.60e-08 |
AT1G02520 | 1.641 | 7.79e-05 | 2.06e-03 |
AT1G02730 | 1.290 | 3.37e-05 | 9.95e-04 |
AT1G02850 | 1.978 | 2.50e-06 | 1.01e-04 |
AT1G02900 | 1.138 | 1.22e-03 | 1.95e-02 |
AT1G03110 | 2.686 | 8.44e-05 | 2.22e-03 |
At the next step we will conduct functional enrichment analysis of generated groups. For functional enrichment analysis FoldGO uses TopGO package.
For custom annotation one has to provide GO id -> Gene id list or object of MgsaSets
(from mgsa
package) or GAFReader
class. FoldGO provides GAFReader - simple and convinient parser for annotation presented in GAF format. It takes only two arguments:
NOTE: The GAF file in example is truncated. The original full file can be downloaded from GO consortium website: http://www.geneontology.org/page/download-go-annotations.
gaf_path <- system.file("extdata", "gene_association.tair.lzma", package = "FoldGO")
gaf <- GAFReader(file = gaf_path, geneid_col = 10)
One can retrieve direct annotations as list object and version of GAF file using following methods:
To annotate our gene groups we will use FuncAnnotGroupsTopGO function which uses topGO package for singular enrichment analysis. The minimal set of arguments needed for this function to work is:
Another possibility is to use bioconductor packages containing annotations for specific organism. For example “org.Hs.eg.db” (Human) and “org.At.tair.db” (Arabidopsis), package name must be assigned to mapping
argument. In this case one has to assign topGO::annFUN.org
to annot
argument and specify ID
(from topGO package manual: character string specifing the gene identifier to use). Currently only the following identifiers can be used: c("entrez", "genbank", "alias", "ensembl", "symbol", "genename", "unigene")
.
Enrichment analysis with Arabidopsis annotations:
up_annotobj <- FuncAnnotGroupsTopGO(up_groups,"MF", mapping = "org.At.tair.db", annot = topGO::annFUN.org, ID = "entrez", bggenes = bggenes)
Enrichment analysis with annotations for Human:
The fold-specificity recognition procedure consists of GO terms preselection from DEGs annotation and fold-change-specific enrichment analysis. At each step the FDR threshold must be established. By default FDR threshold for GO terms preselection (fdrstep1) is set to 1 (no preselection) and FDR threshold for fold-change-specific enrichment analysis (fdrstep2) is set to 0.05. As a default method for mutltiple testing correction FoldGO uses Benjamini-Hochberg correction procedure2.
up_fsobj <- FoldSpecTest(up_annotobj, fdrstep1 = 0.05, fdrstep2 = 0.01)
down_fsobj <- FoldSpecTest(down_annotobj, fdrstep1 = 0.05, fdrstep2 = 0.01)
It is possible to choose another correction procedure from R base which can be listed via p.adjust.methods
. Here Benjamini-Yekutieli correction procedure3 is selected.
One can inspect the results of enrichment analysis as dataframes. Access dataframe with fold-specific terms
ids | namespace | name | wholeint_pval | wholeint_padj | min_pval | padj | interval |
---|---|---|---|---|---|---|---|
GO:0003723 | MF | RNA binding | 1.80e-07 | 2.71e-05 | 7.28e-05 | 9.26e-03 | 1-4 |
GO:0003735 | MF | structural constituent of ribosome | 1.00e-30 | 1.13e-27 | 2.98e-27 | 2.32e-24 | 1-2 |
GO:0003743 | MF | translation initiation factor activity | 6.50e-05 | 4.80e-03 | 8.31e-05 | 9.26e-03 | 1-2 |
GO:0005198 | MF | structural molecule activity | 1.00e-30 | 1.13e-27 | 1.94e-26 | 7.57e-24 | 1-2 |
GO:0008135 | MF | translation factor activity, RNA binding | 1.80e-07 | 2.71e-05 | 2.56e-08 | 6.67e-06 | 1-2 |
GO:0097159 | MF | organic cyclic compound binding | 2.90e-07 | 3.88e-05 | 3.89e-06 | 6.07e-04 | 1-5 |
where:
And with not fold-specific:
ids | namespace | name | wholeint_pval | wholeint_padj | min_pval | padj | interval |
---|---|---|---|---|---|---|---|
GO:0000166 | MF | nucleotide binding | 3.70e-07 | 4.17e-05 | 3.30e-04 | 2.58e-02 | 1-5 |
GO:0003697 | MF | single-stranded DNA binding | 5.20e-04 | 3.01e-02 | 9.43e-02 | 1.00e+00 | 1-4 |
GO:0003746 | MF | translation elongation factor activity | 2.40e-04 | 1.46e-02 | 2.11e-04 | 2.06e-02 | 1 |
GO:0003883 | MF | CTP synthase activity | 3.90e-04 | 2.31e-02 | 2.26e-01 | 1.00e+00 | 4-5 |
GO:0004386 | MF | helicase activity | 6.50e-08 | 1.13e-05 | 2.32e-02 | 6.04e-01 | 3 |
GO:0005488 | MF | binding | 6.00e-06 | 5.41e-04 | 6.48e-04 | 4.50e-02 | 1-5 |
Via plot
function one can plot “Fold-change specific GO Profile” on which the GO terms significantly associated with a certain fold-change intervals are presented in yellow and blue boxes for up- and down-regulated genes, correspondingly. Here the result for six equal in size fold-change intervals is presented. The diagram presents only fold-change-specific terms. If the gene was associated fold-specifically with down-regulation but not fold-specifically with up-regulation (or vise versa) than not fold-change-specific interval (1-6 here) will be also shown.
Omelyanchuk, N. A. et al. Auxin regulates functional gene groups in a fold-change-specific manner in Arabidopsis thaliana roots // Nat Sci Rep – 2017. – N 7(1) – p 2489↩︎
Benjamini, Y., Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society - 1995 - Series B. 57 (1): 289–300↩︎
Benjamini, Y., Yekutieli. The control of the false discovery rate in multiple testing under dependency. Annals of Statistics - 2001 - 29 (4): 1165–1188. doi:10.1214/aos/1013699998↩︎