ggtree
packageThis post is part of my R notes series
I use in this example the ggtree R package, which is designed for “visualization and annotation of phylogenetic trees with their covariates and other associated data”.
Load R packages
# Note: intsall `ggtree` from bioconductor
# source("https://bioconductor.org/biocLite.R")
# biocLite("ggtree")
library(ggtree)
library(ape)
library(data.table)
library(RColorBrewer)
Print version information about R, the OS and attached or loaded packages.
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] RColorBrewer_1.1-2 data.table_1.11.4 ape_5.1
## [4] ggtree_1.10.0 treeio_1.2.1 ggplot2_3.0.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.18 pillar_1.3.0 compiler_3.4.3 plyr_1.8.4
## [5] bindr_0.1.1 tools_3.4.3 digest_0.6.15 jsonlite_1.5
## [9] evaluate_0.11 tibble_1.4.2 gtable_0.2.0 nlme_3.1-131
## [13] lattice_0.20-35 pkgconfig_2.0.1 rlang_0.2.1 rvcheck_0.1.0
## [17] yaml_2.2.0 parallel_3.4.3 bindrcpp_0.2.2 withr_2.1.2
## [21] dplyr_0.7.6 stringr_1.3.1 knitr_1.20 rprojroot_1.3-2
## [25] grid_3.4.3 tidyselect_0.2.4 glue_1.3.0 R6_2.2.2
## [29] rmarkdown_1.10 tidyr_0.8.1 purrr_0.2.5 magrittr_1.5
## [33] backports_1.1.2 scales_0.5.0 htmltools_0.3.6 assertthat_0.2.0
## [37] colorspace_1.3-2 stringi_1.2.4 lazyeval_0.2.1 munsell_0.5.0
## [41] crayon_1.3.4
Generate a random tree
set.seed(2018)
tree <- ape::rcoal(20)
# Simple tree plots using plot.phylo() from ape package
par(mar = c(4, 4, 1, 1))
plot.phylo(tree)
plot.phylo(tree, type = "fan")
Generate annotation data
set.seed(2018)
my_info <- data.table(tip_lbs = tree$tip.label,
groupA = c(rep(1, 3), rep(2, 6), rep(3, 11)),
groupB = c(1,1,1,2,2,3,3,3,2,4,4,4,1,1,2,4,4,4,4,1),
val = rnorm(20))
my_info[, ':=' (groupA = paste0("group_A", groupA),
groupB = paste0("group_B", groupB))]
my_info
## tip_lbs groupA groupB val
## 1: t1 group_A1 group_B1 -0.42298398
## 2: t11 group_A1 group_B1 -1.54987816
## 3: t6 group_A1 group_B1 -0.06442932
## 4: t4 group_A2 group_B2 0.27088135
## 5: t15 group_A2 group_B2 1.73528367
## 6: t20 group_A2 group_B3 -0.26471121
## 7: t2 group_A2 group_B3 2.09947070
## 8: t7 group_A2 group_B3 0.86335122
## 9: t8 group_A2 group_B2 -0.61058715
## 10: t17 group_A3 group_B4 0.63705561
## 11: t5 group_A3 group_B4 -0.64303470
## 12: t16 group_A3 group_B4 -1.03002873
## 13: t3 group_A3 group_B1 0.71248127
## 14: t18 group_A3 group_B1 -0.44577209
## 15: t9 group_A3 group_B2 0.24897965
## 16: t14 group_A3 group_B4 -1.07419395
## 17: t13 group_A3 group_B4 -1.82726172
## 18: t12 group_A3 group_B4 0.01549190
## 19: t10 group_A3 group_B4 -1.68436134
## 20: t19 group_A3 group_B1 0.20446755
Taxa classification with ggtree::groupOTU
## $group_A1
## [1] "t1" "t11" "t6"
##
## $group_A2
## [1] "t4" "t15" "t20" "t2" "t7" "t8"
##
## $group_A3
## [1] "t17" "t5" "t16" "t3" "t18" "t9" "t14" "t13" "t12" "t10" "t19"
## List of 4
## $ edge : int [1:38, 1:2] 21 22 25 25 33 33 22 26 38 38 ...
## $ edge.length: num [1:38] 0.0448 0.3992 0.3458 0.3094 0.0364 ...
## $ tip.label : chr [1:20] "t1" "t11" "t6" "t4" ...
## $ Nnode : int 19
## - attr(*, "class")= chr "phylo"
## - attr(*, "order")= chr "cladewise"
## - attr(*, "group")= Factor w/ 4 levels "0","group_A1",..: 2 2 2 3 3 3 3 3 3 4 ...
## [1] "0" "group_A1" "group_A2" "group_A3"
# If there is a "0" group label, overwrite it.
# "0 is for those didn't belong to any group."
# https://github.com/GuangchuangYu/ggtree/issues/127
# Overwrite "0"
levels(attributes(tree_grA)$group)[1] <- "group_A1"
# Reorder factor levels if needed (this controls the order in the legend)
attributes(tree_grA)$group <- factor(x = attributes(tree_grA)$group,
levels = c("group_A3", "group_A2", "group_A1"))
ggtree
Experiment with color
my_cols <- RColorBrewer::brewer.pal(n = 3, name = "Set1")
names(my_cols) <- levels(attributes(tree_grA)$group)
scales::show_col(my_cols); my_cols
## group_A3 group_A2 group_A1
## "#E41A1C" "#377EB8" "#4DAF4A"
Simple tree view
tree_plot <-
ggtree(tr = tree_grA,
# color by group attribute, check str(tree_grA)
mapping = aes(color = group),
layout = 'circular',
# set line thikness
size = 1) +
# adjust coloring of main groups
scale_color_manual(name = 'Group A',
values = my_cols)
# Add the tip labels
tree_plot + geom_tiplab(size = 4, aes(angle = angle))
This step is need for being able to place segments and labels around the tree view for desired groups.
## label node parent branch.length x y isTip branch angle
## 1: t1 1 25 0.345791354 0.7898439 1 TRUE 0.6169482 18
## 2: t11 2 33 0.036393168 0.7898439 2 TRUE 0.7716473 36
## 3: t6 3 33 0.036393168 0.7898439 3 TRUE 0.7716473 54
## 4: t4 4 38 0.009760975 0.7898439 4 TRUE 0.7849634 72
## 5: t15 5 38 0.009760975 0.7898439 5 TRUE 0.7849634 90
## 6: t20 6 36 0.026143382 0.7898439 7 TRUE 0.7767722 126
## group groupA groupB val
## 1: group_A1 group_A1 group_B1 -0.42298398
## 2: group_A1 group_A1 group_B1 -1.54987816
## 3: group_A1 group_A1 group_B1 -0.06442932
## 4: group_A2 group_A2 group_B2 0.27088135
## 5: group_A2 group_A2 group_B2 1.73528367
## 6: group_A2 group_A2 group_B3 -0.26471121
# select only the tip labels and order by coord y
tree_dt <- tree_dt[isTip == TRUE][order(y)]
# Make table with y cords for each groupB in consecutive order;
# this helps for drawing & labeling segments.
# Note the usage of "rleid" function, which is a grouping ID generator,
# needed because consecutive rows of an identical reoccurring group must form a unique group.
coord_groups <- tree_dt[, .(y1 = y[1],
y2 = y[.N],
angle = mean(angle),
n = .N), # optional - helps with counting
by = .(groupB,
id_gr = rleid(groupB,
prefix = "grp"))]
coord_groups
## groupB id_gr y1 y2 angle n
## 1: group_B1 grp1 1 3 36 3
## 2: group_B2 grp2 4 6 90 3
## 3: group_B3 grp3 7 9 144 3
## 4: group_B1 grp4 10 10 180 1
## 5: group_B4 grp5 11 17 252 7
## 6: group_B2 grp6 18 18 324 1
## 7: group_B1 grp7 19 20 351 2
# Compute the middle y - will be used for placing the group label;
# similarly the mean angle was computed above already.
coord_groups[, y_mid := rowMeans(.SD), .SDcols = c("y1", "y2")]
# For one-record groups where y1=y2, adjust their y coordinates so that a segment gets drawn;
# If not, no segment get drawn for such cases.
# To force a segment add and subtract a small amount (try & error until seems ok).
# Prefer a smaller values since with bigger ones you risk to exaggerate the segments.
coord_groups[, y1_adj := ifelse(y1 == y2, y1 - 0.1, y1)]
coord_groups[, y2_adj := ifelse(y1 == y2, y2 + 0.1, y2)]
# Labels need angle adjustment for cases between 90 and 270 dg
coord_groups[, angle_adj := ifelse(angle %between% c(90, 180),
yes = angle + 180,
no = ifelse(angle > 180 & angle <= 270,
yes = angle - 180,
no = angle))]
# Labels with angles between 90 and 270 dg
# need change of horizontal adjustment argument from 0 to 1.
coord_groups[, hjust_adj := ifelse(angle %between% c(90, 270), yes = 1L, no = 0L)]
# if needed, coloring could be binary
# coord_groups[, col := ifelse(.I%%2, 0.5, 1)]
coord_groups
## groupB id_gr y1 y2 angle n y_mid y1_adj y2_adj angle_adj hjust_adj
## 1: group_B1 grp1 1 3 36 3 2.0 1.0 3.0 36 0
## 2: group_B2 grp2 4 6 90 3 5.0 4.0 6.0 270 1
## 3: group_B3 grp3 7 9 144 3 8.0 7.0 9.0 324 1
## 4: group_B1 grp4 10 10 180 1 10.0 9.9 10.1 360 1
## 5: group_B4 grp5 11 17 252 7 14.0 11.0 17.0 72 1
## 6: group_B2 grp6 18 18 324 1 18.0 17.9 18.1 324 0
## 7: group_B1 grp7 19 20 351 2 19.5 19.0 20.0 351 0
The data table coord_groups
will be used in geom_segment
and geom_text
below.
# Define variable to control x coordinate of segments & labels
my_x <- max(tree_dt$x) + 0.05
tree_labeled <-
tree_plot +
# Add line segments for each group.
geom_segment(data = coord_groups,
aes(x = my_x,
y = y1_adj,
xend = my_x,
yend = y2_adj),
color = "black",
lineend = "butt",
size = 3) +
# Add text group labels at the middle of each segment.
geom_text(data = coord_groups,
aes(x = my_x,
y = y_mid,
angle = angle_adj,
hjust = hjust_adj,
label = groupB),
vjust = 0.5,
size = 2.5,
nudge_x = 0.05, # Offsetting label from its default x coordinate.
color = "black") +
# Adjust theme components
theme(
# Set font size & family - affects legend only
# "sans" = "Arial" and is the default on Windows OS; check windowsFonts()
text = element_text(size = 10, family = "sans"),
# Grab bottom-right (x=1, y=0) legend corner
legend.justification = c(1,0),
# and position it in the bottom-right plot area (needs some tweaking)
legend.position = c(1.1, 0.05),
legend.margin = margin(t = 0, r = 0, b = 0, l = 0),
# Set key hight - affects spacing between legend items (keys).
legend.key.height = unit(4, "mm"),
# Set margin around entire plot. Needs some try-and-error approach until ok,
# especially if you save the plot with `ggsave`.
plot.margin = unit(c(t = 0.2, r = 1.3, b = -0.35, l = -0.2), "cm")
)
tree_labeled
Save tree view as png
file. If a certain size is desired for the png file, then one might also need to adjust graphical element sizes, legend position and plot margins above in the geoms or theme components section.
This is about more complex tree annotation, specifically using segments to create a circular barplot effect. The size of the bar is related to a value corresponding to each tip.
# Define variable to control the x coordinates of bars (segments)
my_factor <- 0.05
x_base <- max(tree_dt$x) + abs(min(tree_dt$val, na.rm = TRUE))*my_factor + 0.02
# Define variable to control the x coordinates of segments & labels
my_x <- x_base + max(tree_dt$val, na.rm = TRUE)*my_factor + 0.05
# Need to add a value (usually a small amount) to `ymax`
# to force a complete circle (otherwise a stripe of white remains).
# This value needs to be just right:
# - not too big because affects the visual which results in strange angles on labels;
# - not too small because otherwise the remaining strip of white is not completely filled.
# Is a matter of try and error until you get it right.
# Also, better to be biased towards smaller values since big value can lead to
# big displacements when drawing the tree.
fill_in_value <- 0.1
# Categorize values - binary
tree_dt[, categ := ifelse(val <= 0, "neg", "pos")]
# Plot the tree with circular barplot
tree_bars <-
tree_plot +
# Add a disc to plot bars on top of it
geom_rect(data = tree_dt,
aes(xmin = x_base + min(val, na.rm = TRUE)*my_factor,
ymin = 0,
xmax = x_base + max(val, na.rm = TRUE)*my_factor,
# Add also fill_in_value to `ymax` to force a complete circle
ymax = max(y) + fill_in_value),
color = NA, # set NA so to avoid coloring borders
fill = "#deebf7", # or try "#8da0cb"
alpha = 0.1) +
# Add bars for each tip label
geom_rect(data = tree_dt,
aes(xmin = x_base,
ymin = y - 0.1,
xmax = x_base + val*my_factor,
ymax = y + 0.1,
fill = categ),
# no borders
color = NA) +
# Fill the bars
scale_fill_manual(name = 'Some variable',
breaks = c("neg", "pos"),
values = c("neg" = "#fc8d62",
"pos" = "#66c2a5"),
labels = c("negative", "positive")) +
# Add a ring (serves as reference for bars)
geom_segment(data = tree_dt,
aes(x = x_base,
y = 0,
xend = x_base,
# Add also fill_in_value to `yend` to force a complete circle
yend = max(y) + fill_in_value),
color = "gray50",
# linetype = "dashed",
size = 0.25) +
# Add line segments for each group.
geom_segment(data = coord_groups,
aes(x = my_x,
y = y1_adj,
xend = my_x,
yend = y2_adj),
color = "black",
lineend = "butt",
size = 3) +
# Add text group labels at the middle of each segment.
geom_text(data = coord_groups,
aes(x = my_x,
y = y_mid,
angle = angle_adj,
hjust = hjust_adj,
label = groupB),
vjust = 0.5,
size = 2.5,
nudge_x = 0.05, # Offsetting label from its default x coordinate.
color = "black") +
theme(
# Set font size & family - affects legend only
# "sans" = "Arial" and is the default on Windows OS; check windowsFonts()
text = element_text(size = 10, family = "sans"),
# Grab bottom-right (x=1, y=0) legend corner
legend.justification = c(1,0),
# and position it in the bottom-right plot area.
legend.position = c(1.25, 0.1),
legend.margin = margin(t = 0, r = 0, b = 0, l = 0),
# Set height of legend items (keys). This affects spacing between them as well.
legend.key.height = unit(4, "mm"),
# Set margin around entire plot.
plot.margin = unit(c(t = 0.4, r = 2.6, b = -0.6, l = 0), "cm")
)
tree_bars
Save tree view as png
file. As mentioned above, if a certain size is desired for the png file, then one might also need to adjust graphical element sizes, legend position and plot margins above in the geoms or theme components section.
ggtree package created by Guangchuang Yu is well documented:
https://bioconductor.org/packages/release/bioc/html/ggtree.html https://guangchuangyu.github.io/ggtree/ https://github.com/GuangchuangYu/ggtree
tidytree package, also by Guangchuang Yu, is worth checking as well:
https://cran.r-project.org/web/packages/tidytree/vignettes/tidytree.html https://cran.r-project.org/web/packages/tidytree/index.html https://github.com/GuangchuangYu/tidytree
Data tree manipulation - I do not work with phylogenetics and I struggled a lot with the jargon to get the graphs done, but this wiki page was very useful: