This post is part of my R notes series

1 What is this about?

Take a usual PCA biplot and make it interactive via the functionality of R packages ggbiplot and plotly.

These notes build on the PCA example from An Introduction to Statistical Learning - with Applications in R (James et al. 2013), using as example the PCA biplot from FIGURE 10.1. The first two principal components for the USArrests data.

2 Prepare R session

Load R packages:

Print version information about R, the OS and attached or loaded packages.

## R version 3.4.3 (2017-11-30)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] plotly_4.8.0  ggbiplot_0.55 scales_0.5.0  plyr_1.8.4    ggplot2_3.0.0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.18      pillar_1.3.0      compiler_3.4.3   
##  [4] bindr_0.1.1       tools_3.4.3       digest_0.6.15    
##  [7] viridisLite_0.3.0 jsonlite_1.5      evaluate_0.11    
## [10] tibble_1.4.2      gtable_0.2.0      pkgconfig_2.0.1  
## [13] rlang_0.2.1       yaml_2.2.0        bindrcpp_0.2.2   
## [16] withr_2.1.2       dplyr_0.7.6       stringr_1.3.1    
## [19] httr_1.3.1        knitr_1.20        htmlwidgets_1.2  
## [22] rprojroot_1.3-2   tidyselect_0.2.4  glue_1.3.0       
## [25] data.table_1.11.4 R6_2.2.2          rmarkdown_1.10   
## [28] tidyr_0.8.1       purrr_0.2.5       magrittr_1.5     
## [31] backports_1.1.2   htmltools_0.3.6   assertthat_0.2.0 
## [34] colorspace_1.3-2  stringi_1.2.4     lazyeval_0.2.1   
## [37] munsell_0.5.0     crayon_1.3.4

3 Run PCA

Run principal components analysis (PCA) and the usual biplot.

The usual PCA `biplot`

The usual PCA biplot

4 Examples with ggbiplot & ggplotly

4.1 Minimal case

First steps with ggbiplot & ggplotly.

A simple PCA biplot with `ggbiplot`

A simple PCA biplot with ggbiplot

A simple interactive PCA biplot with plotly (hover the mouse pointer)

4.2 Add color & size

Example of mapping one of the variables (features) into color and size.

Map urban population feature into color and size

Map urban population feature into color and size

Take the draft plot from above and make it interactive.

Interactive PCA biplot

The two aesthetics will need to be combined in the legend. While this is possible in ggplot with using the guides function, it will not trickle down to ggplotly output. This SO link could be interesting to investigate.

If you need to adjust the limits and breaks for the bubbles, use the scale_ functions. The changes in the legend will also not trickle down to the ggplotly output.

4.3 Custom hover text

Simple example, with adding a single new field in the plotly popup.

## Warning: Ignoring unknown aesthetics: text

Interactive PCA biplot - add a new field in the popup (hover the mouse pointer)

Add more fields in the plotly popup. This was inspired from a plotly example.

## Warning: Ignoring unknown aesthetics: text

Interactive PCA biplot - use the </br> trick to add several fields in the popup (hover the mouse pointer)

A way to automatically add all feature values from the data as fields in the plotly popup.

## Warning: Ignoring unknown aesthetics: text

Interactive PCA biplot - add all feature values as fields in the popup (hover the mouse pointer)

5 References

Carson Sievert (2018) plotly for R. link

James, G., Witten, D., Hastie, T. and Tibshirani, R., 2013. An introduction to statistical learning (Vol. 112). New York: Springer. link

Vincent Q. Vu (2011). ggbiplot: A ggplot2 based biplot. R package version 0.55 link