This vignette illustrates Amanida
R package, which
contains a collection of functions for computing a weighted
meta-analysis in R using significance, relative change and study size.
It explains how to conduct the meta-analysis, visualize the results and
draw conclusions.
The widespread of metabolomics as a potential tool for clinical diagno-sis has increased the systematic reviews and meta-analysis on this topic. Meta-analysis is the statistical combination in a single estimate for results from primary studies answering the same question, which is a common practice in medical research. When raw data is not available to perform a meta-analysis, there are different approaches that can be applied but them require the standard deviation, that is used for effect size estimate calculation and weighted methods. Nowadays, there isn’t a tool to perform a weighted meta-analysis which could be applied to a dataset of overall results without the standard deviation. Meta-analysis tools base the study of effect size using the difference of means, but in case of metabolomics we found that the comparison between groups is disclosed using relative change. The purpose of amanida is to perform a weighted meta-analysis combining overall results based on statistical significance, relative change and study size. In Amanida, statistical significance and relative change are combined using weighted methods which do no require standard deviation, weighted Fisher for p-values and weighted average for fold-change, where the weight comes from study size.
Furthermore, Amanida also computes qualitative meta-analysis performing a vote-counting for compounds, including the option of only using identifier and trend labels.
We have included an automatized id harmonization step based in Villalba H, Llambrich M, Gumà J, Brezmes J, Cumeras R. A Metabolites Merging Strategy (MMS): Harmonization to Enable Studies’ Intercomparison. Metabolites. 2023; 13(12):1167. https://doi.org/10.3390/metabo13121167. Briefly, all ids are converted to a unique one, in this case the PubChem ID and then checked for duplicates. It also allows to retrieve multiple descriptors such as Molecular Formula, Molecular Weight or InChIKey.
Here we show how to conduct a meta-analysis using Amanida approach. We are using a dataset obtained in systematic review and meta-analysis: Comprehensive Volatilome and Metabolome Signatures of Colorectal Cancer in Urine: A Systematic Review and Meta-Analysis Mallafré et al. Cancers 2021, 13(11), 2534; https://doi.org/10.3390/cancers13112534.
Before using Amanida
is important the curation of the
identifiers. The objective of a meta-analysis is to combine the
different results of a same feature, therefore, it is important that all
identifiers are in the same format.
Dataset to analyse must include the following columns: identifier, p-value, fold-change, study size (N) and reference.
Once we have the dataset in a supported format (xls/xlsx, csv or txt), we import as follows:
coln = c("Compound Name", "P-value", "Fold-change", "N total", "References")
input_file <- getsampleDB()
datafile <- amanida_read(input_file, mode = "quan", coln, separator=";")
#> Loaded dataset with 143 rows which contains 120 different identifiers. There are 18 rows skipped from original dataset because contained NA values.
It creates a tibble which contain the data needed to perform the amanida meta-analysis. Note: missing data is ignored and negative values of fold-change are transformed to positive (1/value).
Before the meta-analysis the IDs can be checked using public databases information. The IDs in format chemical name, InChI, InChIKey, and SMILES are searched in PubChem to transform all into a common nomenclature, PubChem ID. Then duplicates are searched.
In this step will be performed the combination of overall results weightening by study size. Methods for combine results are:
P-value: weighted p-values combination, which is a variant of Fisher’s method. A gamma distribution is used to assign non-integral weights proportional to study size to each p-value.
Fold-change: logarithmic transformation (base 2) and then averaged with weighting by study size.
Note: Only with the object created by
amanida_read
function in mode = "quan"
are
able to compute the amanida meta-analysis.
It is also included a qualitative analysis, the vote-counting, that is computed by the sum of votes assigned as follows: votes are +1 for up-regulation, -1 for down-regulation and 0 if no trend.
If you select the option comp.inf = T
the package will
retrieve the PubChem ID from the ID using webchem
. Results
will return the following information: PubChem ID, Molecular Formula,
Molecular Weight, SMILES, InChIKey, KEGG, ChEBI, HMDB, Drugbank.
Combination results are disclosed in two tables
amanida_result@stat
amanida_results@vote
Amanida meta-analysis gives the combination for p-value and fold-change for each feature.
Trend variable correspond to the direrction of the combination, 1 means that the feature is increased in case group and -1 the opposite. N_total is the sum of the study sizes for that feature, which are disclosed in reference variable.
head(amanida_result@stat)
#> # A tibble: 6 × 6
#> # Groups: id [6]
#> id trend pval fc N_total reference
#> <chr> <dbl> <dbl> <dbl> <int> <chr>
#> 1 1,1,6-Trimethyl-1,2-dihydronaphthalene -1 0.000391 0.224 60 [29] Por…
#> 2 1,3-Dithiane 1 0.0189 3.43 60 [29] Por…
#> 3 1,5,5-Trimethyl-6-methylene-cyclohexene -1 0.0342 0.596 60 [29] Por…
#> 4 1,6,7-Trimethylnaphthalene -1 0.000194 0.320 60 [29] Por…
#> 5 2,2,6-Trimethylcyclohexanone -1 0.0298 0.348 60 [29] Por…
#> 6 2,3-Butanediol -1 0.000091 0.271 135 [43] Lie…
To observe the results of meta-analysis graphically is done with a volcano plot, where the log10(p-values) are plotted against the log2(fold-change). The cut-offs can be selected by the user, in case of fold-change we recommend values higher than 2, where it is considered to have biological meaningfulness.
Note: cut-off have to be indicated without log scale.
volcano_plot(amanida_result, cutoff = c(0.05,3.5))
#> The cut-off used are 0.05 for p-value (1.3 in log10 scale) and 3.5 for fold-change (1.81 in log2 scale).
We will consider that a feature have statistical significance if it is over the cut-offs threshold (dashed lines).
Optionally we van observe the qualitative results. A bar plot shows the result of vote-counting.
Note: output is restricted to 30 compounds to
facilitate the readability. To visualize less compounds increase
counts
parameter.
vote_plot(amanida_result, counts = 1)
#> Cut-off for votes is 1.
#> Too much values, only showing 30 highest values. Please check counts parameter.
Note: The output include compound identifiers from
public repositories such as PubChem or the Human Metabolite DataBase, if
compounds are not used this could be avoided with
comp.inf = F
.
With vote plot discrepancies in compounds behaviour are not detected
at first glance, and we suggest to combine the results with the explore
plot. Discrepancies in results are shown when type = "mix"
is used:
Note: output is restricted to 25 compounds to
facilitate the readability. To visualize less compounds increase
counts
parameter.
In sample data we observe some compounds over the cut-off threshold, then we check the consistency of result. In this case, compounds with more than one report and statistical significance are Hippuric acid and Phenol. Both of them have consistency as all reports results are in the same trend. We can conclude that for Hippuric acid and Phenol have evidences to reject the null hypothesis.
Note: compounds with both up-regulated and down-regulated tendencies points to low consistency.
All computations and functions discussed above can be obtained
straightforward in an html document. From scratch
amanida_report
computes the meta-analysis and display the
results graphically.
It needs the following parameters:
column_id = c("Compound Name", "P-value", "Fold-change", "N total", "References")
input_file <- getsampleDB()
amanida_report(input_file,
separator = ";",
analysis_type = "quan-qual",
column_id = column_id,
pvalue_cutoff = 0.05,
fc_cutoff = 4,
votecount_lim = 2,
comp_inf = F)
The report will be saved in the working directory.
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur 10.16
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] amanida_0.3.0
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.0 xfun_0.39 bslib_0.4.1 purrr_1.0.1
#> [5] colorspace_2.1-0 vctrs_0.6.2 generics_0.1.3 htmltools_0.5.3
#> [9] viridisLite_0.4.2 yaml_2.3.6 utf8_1.2.3 rlang_1.1.1
#> [13] jquerylib_0.1.4 pillar_1.9.0 withr_2.5.0 glue_1.6.2
#> [17] bit64_4.0.5 readxl_1.4.1 lifecycle_1.0.3 stringr_1.5.0
#> [21] munsell_0.5.0 gtable_0.3.3 cellranger_1.1.0 rvest_1.0.3
#> [25] data.tree_1.0.0 kableExtra_1.3.4 evaluate_0.21 labeling_0.4.2
#> [29] knitr_1.45 tzdb_0.3.0 fastmap_1.1.0 parallel_4.1.0
#> [33] fansi_1.0.4 highr_0.9 Rcpp_1.0.10 readr_2.1.3
#> [37] scales_1.2.1 cachem_1.0.6 vroom_1.6.0 webshot_0.5.4
#> [41] webchem_1.2.0 jsonlite_1.8.4 farver_2.1.1 bit_4.0.5
#> [45] systemfonts_1.0.4 ggplot2_3.4.2 hms_1.1.2 digest_0.6.31
#> [49] stringi_1.7.12 dplyr_1.1.2 ggrepel_0.9.3 grid_4.1.0
#> [53] cli_3.6.1 tools_4.1.0 magrittr_2.0.3 sass_0.4.2
#> [57] tibble_3.2.1 crayon_1.5.2 tidyr_1.3.0 pkgconfig_2.0.3
#> [61] ellipsis_0.3.2 xml2_1.3.3 rmarkdown_2.25 svglite_2.1.0
#> [65] httr_1.4.4 rstudioapi_0.14 R6_2.5.1 compiler_4.1.0