You can’t even begin to understand biology, you can’t understand life, unless you understand what it’s all there for, how it arose - and that means evolution. — Richard Dawkins
If you use ggtree
in published research, please cite:
G Yu, D Smith, H Zhu, Y Guan, TTY Lam,
ggtree: an R package for visualization and annotation of phylogenetic tree with different types of meta-data.
revised.
This project arose from our needs to annotate nucleotide substitutions in the phylogenetic tree, and we found that there is no tree visualization software can do this easily. Existing tree viewers are designed for displaying phylogenetic tree, but not annotating it. Although some tree viewers can displaying bootstrap values in the tree, it is hard/impossible to display other information in the tree. Our first solution for displaying nucleotide substituitions in the tree is to add this information in the node/tip names and use traditional tree viewer to show it. We displayed the information in the tree successfully, but we believe this indirect approach is inefficient.
Previously, phylogenetic trees were much smaller. Annotation of phylogenetic trees was not as necessary as nowadays much more data is becomming available. We want to associate our experimental data, for instance antigenic change, with the evolution relationship. Visualizing these associations in a phylogenetic tree can help us to identify evolution patterns. We believe we need a next generation tree viewer that should be programmable and extensible. It can view a phylogenetic tree easily as we did with classical software and support adding annotation data in a layer above the tree. This is the objective of developing the ggtree
. Common tasks of annotating a phylogenetic tree should be easy and complicated tasks can be possible to achieve by adding multiple layers of annotation.
The ggtree
is designed by extending the ggplot2
1 package. It is based on the grammar of graphics and takes all the good parts of ggplot2
. There are other R packages that implement tree viewer using ggplot2
, including OutbreakTools
, phyloseq
2 and ggphylo; they mostly create complex tree view functions for their specific needs. Internally, these packages interpret a phylogenetic as a collection of lines
, which makes it hard to annotate diverse user input that are related to node (taxa). The ggtree
is different to them by interpreting a tree as a collection of taxa
and allowing general flexibilities of annotating phylogenetic tree with diverse types of user inputs.
R
Most of the tree viewer software (including R
packages) focus on Newick
and Nexus
file format, while there are file formats from different evolution analysis software that contain supporting evidences within the file that are ready for annotating a phylogenetic tree. In addition to Newick
and Nexus
, ggtree supports NHX
, jplace
and Phylip
file formats. ggtree
also supports software outputs from BEAST3, EPA4, HYPHY5, PAML6, PHYLDOG7, pplacer8, r8s9, RAxML10 and RevBayes11.
Parsing data from a number of molecular evolution software is not only for visualization in ggtree
, but also bring these data to R
users for further analysis (e.g. summarization, visualization, comparision, test, etc).
For more details, please refer to Tree Data Import vignette.
Tree Visualization in ggtree
is easy, with one line of command ggtree(tree_object)
. It supports several layouts, including rectangular
, slanted
and circular
for Phylogram
and Cladogram
, unrooted
layout, time-scaled and two dimentional phylogenies. Tree Visualization vignette describes these feature in details.
We implement several functions to manipulate a phylogenetic tree.
groupClade
or groupOTU
functionscollapse
functionexpand
functionscaleClade
functionrotate
functionflip
functionDetails and examples can be found in Tree Manipulation vignette.
Most of the phylogenetic trees are scaled by evolutionary distance (substitution/site), in ggtree
a phylogenetic tree can be re-scaled by any numerical variable inferred by evolutionary analysis (e.g. species divergence time, dN/dS, etc). Numerical and category variable can be used to color a phylogenetic tree.
The ggtree
package provides several layers to annotate a phylogenetic tree, including geom_tiplab
for adding tip labels, geom_treescale
for adding a legend of tree scale, geom_hilight
for highlighting selected clades and geom_cladelabel
for labelling selected clades.
It supports annotating phylogenetic trees with analyses obtained from R packages and other commonly used evolutionary software. User’s specific annotation (e.g. experimental data) can be integrated to annotate phylogenetic trees. ggtree
provides write.jplace
function to combine Newick tree file and user’s own data to a single jplace
file that can be parsed and the data can be used to annotate the tree directly in ggtree
.
ggtree
integrates phylopic
database and silhouette images of organisms can be downloaded and used to annotate phylogenetic directly. ggtree
also supports using local images to annotate a phylogenetic tree.
Visualizing an annotated phylogenetic tree with numerical matrix (e.g. genotype table), multiple sequence alignment and subplots are also supported in ggtree
. Examples of annotating phylogenetic trees can be found in the Tree Annotation and Advance Tree Annotation vignettes.
More documents can be found in http://guangchuangyu.github.io/tags/ggtree.
If you have any, let me know. Thx!
Here is the output of sessionInfo()
on the system on which this document was compiled:
## R version 3.2.4 (2016-03-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.4 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] phangorn_2.0.2 Biostrings_2.38.4 XVector_0.10.0
## [4] IRanges_2.4.8 S4Vectors_0.8.11 BiocGenerics_0.16.1
## [7] colorspace_1.2-6 ggtree_1.2.17 ggplot2_2.1.0
## [10] ape_3.4
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.3 formatR_1.3 plyr_1.8.3 tools_3.2.4
## [5] zlibbioc_1.16.0 digest_0.6.9 jsonlite_0.9.19 evaluate_0.8.3
## [9] nlme_3.1-125 gtable_0.2.0 lattice_0.20-33 png_0.1-7
## [13] Matrix_1.2-4 igraph_1.0.1 DBI_0.3.1 yaml_2.1.13
## [17] stringr_1.0.0 dplyr_0.4.3 knitr_1.12.3 fftwtools_0.9-7
## [21] locfit_1.5-9.1 grid_3.2.4 R6_2.1.2 jpeg_0.1-8
## [25] rmarkdown_0.9.5 reshape2_1.4.1 tidyr_0.4.1 magrittr_1.5
## [29] nnls_1.4 scales_0.4.0 htmltools_0.3 assertthat_0.1
## [33] abind_1.4-3 EBImage_4.12.2 tiff_0.1-5 quadprog_1.5-5
## [37] labeling_0.3 stringi_1.0-1 lazyeval_0.1.10 munsell_0.4.3
1.Wickham, H. Ggplot2: Elegant graphics for data analysis. (Springer, 2009).
2.McMurdie, P. J. & Holmes, S. Phyloseq: An r package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE 8, e61217 (2013).
3.Bouckaert, R. et al. BEAST 2: A software platform for bayesian evolutionary analysis. PLoS Comput Biol 10, e1003537 (2014).
4.Berger, S. A., Krompass, D. & Stamatakis, A. Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Systematic Biology 60, 291–302 (2011).
5.Pond, S. L. K., Frost, S. D. W. & Muse, S. V. HyPhy: Hypothesis testing using phylogenies. Bioinformatics 21, 676–679 (2005).
6.Yang, Z. PAML 4: Phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24, 1586–1591 (2007).
7.Boussau, B. et al. Genome-scale coestimation of species and gene trees. Genome Res. 23, 323–330 (2013).
8.Matsen, F. A., Kodner, R. B. & Armbrust, E. V. Pplacer: Linear time maximum-likelihood and bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11, 538 (2010).
9.Marazzi, B. et al. Locating evolutionary precursors on a phylogenetic tree. Evolution 66, 3918–3930 (2012).
10.Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics btu033 (2014). doi:10.1093/bioinformatics/btu033
11.Höhna, S. et al. Probabilistic graphical model representation in phylogenetics. Syst Biol 63, 753–771 (2014).