This post is part of my R notes series

I use in this example the ggtree R package, which is designed for “visualization and annotation of phylogenetic trees with their covariates and other associated data”.

1 Generate sample data

Load R packages

Print version information about R, the OS and attached or loaded packages.

## R version 3.4.3 (2017-11-30)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] RColorBrewer_1.1-2 data.table_1.11.4  ape_5.1           
## [4] ggtree_1.10.0      treeio_1.2.1       ggplot2_3.0.0     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.18     pillar_1.3.0     compiler_3.4.3   plyr_1.8.4      
##  [5] bindr_0.1.1      tools_3.4.3      digest_0.6.15    jsonlite_1.5    
##  [9] evaluate_0.11    tibble_1.4.2     gtable_0.2.0     nlme_3.1-131    
## [13] lattice_0.20-35  pkgconfig_2.0.1  rlang_0.2.1      rvcheck_0.1.0   
## [17] yaml_2.2.0       parallel_3.4.3   bindrcpp_0.2.2   withr_2.1.2     
## [21] dplyr_0.7.6      stringr_1.3.1    knitr_1.20       rprojroot_1.3-2 
## [25] grid_3.4.3       tidyselect_0.2.4 glue_1.3.0       R6_2.2.2        
## [29] rmarkdown_1.10   tidyr_0.8.1      purrr_0.2.5      magrittr_1.5    
## [33] backports_1.1.2  scales_0.5.0     htmltools_0.3.6  assertthat_0.2.0
## [37] colorspace_1.3-2 stringi_1.2.4    lazyeval_0.2.1   munsell_0.5.0   
## [41] crayon_1.3.4

Generate a random tree

Simple tree plots using plot.phylo() from ape package.Simple tree plots using plot.phylo() from ape package.

Simple tree plots using plot.phylo() from ape package.

Generate annotation data

##     tip_lbs   groupA   groupB         val
##  1:      t1 group_A1 group_B1 -0.42298398
##  2:     t11 group_A1 group_B1 -1.54987816
##  3:      t6 group_A1 group_B1 -0.06442932
##  4:      t4 group_A2 group_B2  0.27088135
##  5:     t15 group_A2 group_B2  1.73528367
##  6:     t20 group_A2 group_B3 -0.26471121
##  7:      t2 group_A2 group_B3  2.09947070
##  8:      t7 group_A2 group_B3  0.86335122
##  9:      t8 group_A2 group_B2 -0.61058715
## 10:     t17 group_A3 group_B4  0.63705561
## 11:      t5 group_A3 group_B4 -0.64303470
## 12:     t16 group_A3 group_B4 -1.03002873
## 13:      t3 group_A3 group_B1  0.71248127
## 14:     t18 group_A3 group_B1 -0.44577209
## 15:      t9 group_A3 group_B2  0.24897965
## 16:     t14 group_A3 group_B4 -1.07419395
## 17:     t13 group_A3 group_B4 -1.82726172
## 18:     t12 group_A3 group_B4  0.01549190
## 19:     t10 group_A3 group_B4 -1.68436134
## 20:     t19 group_A3 group_B1  0.20446755

2 Prepare tree & data for plotting

Taxa classification with ggtree::groupOTU

## $group_A1
## [1] "t1"  "t11" "t6" 
## 
## $group_A2
## [1] "t4"  "t15" "t20" "t2"  "t7"  "t8" 
## 
## $group_A3
##  [1] "t17" "t5"  "t16" "t3"  "t18" "t9"  "t14" "t13" "t12" "t10" "t19"
## List of 4
##  $ edge       : int [1:38, 1:2] 21 22 25 25 33 33 22 26 38 38 ...
##  $ edge.length: num [1:38] 0.0448 0.3992 0.3458 0.3094 0.0364 ...
##  $ tip.label  : chr [1:20] "t1" "t11" "t6" "t4" ...
##  $ Nnode      : int 19
##  - attr(*, "class")= chr "phylo"
##  - attr(*, "order")= chr "cladewise"
##  - attr(*, "group")= Factor w/ 4 levels "0","group_A1",..: 2 2 2 3 3 3 3 3 3 4 ...
## [1] "0"        "group_A1" "group_A2" "group_A3"

4 Prepare annotation data

4.1 Join annotation data with the tree view

4.2 Process the updated tree view data

This step is need for being able to place segments and labels around the tree view for desired groups.

##    label node parent branch.length         x y isTip    branch angle
## 1:    t1    1     25   0.345791354 0.7898439 1  TRUE 0.6169482    18
## 2:   t11    2     33   0.036393168 0.7898439 2  TRUE 0.7716473    36
## 3:    t6    3     33   0.036393168 0.7898439 3  TRUE 0.7716473    54
## 4:    t4    4     38   0.009760975 0.7898439 4  TRUE 0.7849634    72
## 5:   t15    5     38   0.009760975 0.7898439 5  TRUE 0.7849634    90
## 6:   t20    6     36   0.026143382 0.7898439 7  TRUE 0.7767722   126
##       group   groupA   groupB         val
## 1: group_A1 group_A1 group_B1 -0.42298398
## 2: group_A1 group_A1 group_B1 -1.54987816
## 3: group_A1 group_A1 group_B1 -0.06442932
## 4: group_A2 group_A2 group_B2  0.27088135
## 5: group_A2 group_A2 group_B2  1.73528367
## 6: group_A2 group_A2 group_B3 -0.26471121
##      groupB id_gr y1 y2 angle n
## 1: group_B1  grp1  1  3    36 3
## 2: group_B2  grp2  4  6    90 3
## 3: group_B3  grp3  7  9   144 3
## 4: group_B1  grp4 10 10   180 1
## 5: group_B4  grp5 11 17   252 7
## 6: group_B2  grp6 18 18   324 1
## 7: group_B1  grp7 19 20   351 2
##      groupB id_gr y1 y2 angle n y_mid y1_adj y2_adj angle_adj hjust_adj
## 1: group_B1  grp1  1  3    36 3   2.0    1.0    3.0        36         0
## 2: group_B2  grp2  4  6    90 3   5.0    4.0    6.0       270         1
## 3: group_B3  grp3  7  9   144 3   8.0    7.0    9.0       324         1
## 4: group_B1  grp4 10 10   180 1  10.0    9.9   10.1       360         1
## 5: group_B4  grp5 11 17   252 7  14.0   11.0   17.0        72         1
## 6: group_B2  grp6 18 18   324 1  18.0   17.9   18.1       324         0
## 7: group_B1  grp7 19 20   351 2  19.5   19.0   20.0       351         0

The data table coord_groups will be used in geom_segment and geom_text below.

5 Annotate tree

5.1 Annotate groups with segments and text

Use segments and text to annotate groups (clades) with `ggtree`

Use segments and text to annotate groups (clades) with ggtree

Save tree view as png file. If a certain size is desired for the png file, then one might also need to adjust graphical element sizes, legend position and plot margins above in the geoms or theme components section.

5.2 Add a circular barplot effect

This is about more complex tree annotation, specifically using segments to create a circular barplot effect. The size of the bar is related to a value corresponding to each tip.

# Define variable to control the x coordinates of bars (segments)
my_factor <- 0.05
x_base <- max(tree_dt$x) + abs(min(tree_dt$val, na.rm = TRUE))*my_factor + 0.02

# Define variable to control the x coordinates of segments & labels
my_x <- x_base + max(tree_dt$val, na.rm = TRUE)*my_factor + 0.05

# Need to add a value (usually a small amount) to `ymax` 
# to force a complete circle (otherwise a stripe of white remains).
# This value needs to be just right:
# -  not too big because affects the visual which results in strange angles on labels;
# -  not too small because otherwise the remaining strip of white is not completely filled.
# Is a matter of try and error until you get it right. 
# Also, better to be biased towards smaller values since big value can lead to 
# big displacements when drawing the tree.
fill_in_value <- 0.1

# Categorize values - binary
tree_dt[, categ := ifelse(val <= 0, "neg", "pos")]

# Plot the tree with circular barplot
tree_bars <- 
  tree_plot +
  # Add a disc to plot bars on top of it
  geom_rect(data = tree_dt,
            aes(xmin = x_base + min(val, na.rm = TRUE)*my_factor,
                ymin = 0,
                xmax = x_base + max(val, na.rm = TRUE)*my_factor,
                # Add also fill_in_value to `ymax` to force a complete circle
                ymax = max(y) + fill_in_value), 
            color = NA, # set NA so to avoid coloring borders
            fill = "#deebf7", # or try "#8da0cb"
            alpha = 0.1) +
  # Add bars for each tip label
  geom_rect(data = tree_dt,
            aes(xmin = x_base,
                ymin = y - 0.1,
                xmax = x_base + val*my_factor,
                ymax = y + 0.1,
                fill = categ),
            # no borders
            color = NA) +
  # Fill the bars
  scale_fill_manual(name   = 'Some variable',
                    breaks = c("neg", "pos"),
                    values = c("neg" = "#fc8d62",
                               "pos" = "#66c2a5"),
                    labels = c("negative", "positive")) +
  # Add a ring (serves as reference for bars)
  geom_segment(data = tree_dt,
               aes(x = x_base, 
                   y = 0, 
                   xend = x_base, 
                   # Add also fill_in_value to `yend` to force a complete circle
                   yend = max(y) + fill_in_value),
               color = "gray50",
               # linetype = "dashed", 
               size = 0.25) +
  # Add line segments for each group.
  geom_segment(data = coord_groups,
               aes(x = my_x, 
                   y = y1_adj, 
                   xend = my_x, 
                   yend = y2_adj),
               color = "black",
               lineend = "butt",
               size = 3) +
  # Add text group labels at the middle of each segment.
  geom_text(data = coord_groups,
            aes(x = my_x,
                y = y_mid,
                angle = angle_adj,
                hjust = hjust_adj,
                label = groupB),
            vjust = 0.5, 
            size  = 2.5,
            nudge_x = 0.05, # Offsetting label from its default x coordinate.
            color = "black") +
  theme(
    # Set font size & family - affects legend only 
    # "sans" = "Arial" and is the default on Windows OS; check windowsFonts()
    text = element_text(size = 10, family = "sans"),
    # Grab bottom-right (x=1, y=0) legend corner 
    legend.justification = c(1,0),
    # and position it in the bottom-right plot area.
    legend.position = c(1.25, 0.1),
    legend.margin = margin(t = 0, r = 0, b = 0, l = 0),
    # Set height of legend items (keys). This affects spacing between them as well.
    legend.key.height = unit(4, "mm"),
    # Set margin around entire plot.
    plot.margin = unit(c(t = 0.4, r = 2.6, b = -0.6, l = 0), "cm")
  )

tree_bars
More complex tree annotation - use segments to create a circular barplot effect

More complex tree annotation - use segments to create a circular barplot effect

Save tree view as png file. As mentioned above, if a certain size is desired for the png file, then one might also need to adjust graphical element sizes, legend position and plot margins above in the geoms or theme components section.

6 References

ggtree package created by Guangchuang Yu is well documented:

https://bioconductor.org/packages/release/bioc/html/ggtree.html https://guangchuangyu.github.io/ggtree/ https://github.com/GuangchuangYu/ggtree

tidytree package, also by Guangchuang Yu, is worth checking as well:

https://cran.r-project.org/web/packages/tidytree/vignettes/tidytree.html https://cran.r-project.org/web/packages/tidytree/index.html https://github.com/GuangchuangYu/tidytree

Data tree manipulation - I do not work with phylogenetics and I struggled a lot with the jargon to get the graphs done, but this wiki page was very useful:

https://www.r-phylo.org/wiki/HowTo/DataTreeManipulation