library(MetaboDiff)

1 Part I: Implementation of WGNCA for metabolomics

The following workflow was adapted from the weighted gene co-expression analysis (WGCNA) proposed by Langfelder and Horvarth1 Zhang, B., & Horvath, S. (2005). A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology, 4(1), Article17. http://doi.org/10.2202/1544-6115.1128 and makes use of the functions of the corresponding WGCNA R package2 Langfelder, P., & Horvath, S. (2008). WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics, 9, 559–559. http://doi.org/10.1186/1471-2105-9-559.

As for the MetaboDiff tutorial, the implementation is demonstrated using the example data from a study by Priolo and colleagues in which the authors used the service of Metabolon® to compare the tissue metabolome of 40 prostate cancers with 16 normal prostate specimens3 Priolo, C., Pyne, S., Rose, J., Regan, E. R., Zadra, G., Photopoulos, C., et al. (2014). AKT1 and MYC Induce Distinctive Metabolic Fingerprints in Human Prostate Cancer. Cancer Research, 74(24), 7198–7204. http://doi.org/10.1158/0008-5472.CAN-14-1490. The MetaboDiff packages contains this data within the object met_example.

In the MetaboDiff tutorial, all steps for WGCNA were performed within a set of functions connected by pipes:

  • diss_matrix - construction of dissimilarity matrix
  • identify_modules - identification of metabolic correlation modules
  • name_modules - name metabolic correlation modules
  • calculate_MS - calculation of module signficance to relate sample traits to modules
met_example <- met_example %>%
    diss_matrix %>%
    identify_modules(min_module_size=5) %>%
    name_modules(pathway_annotation="SUB_PATHWAY") %>%
    calculate_MS(group_factors=c("tumor_normal","random_gender"))

The individual steps will be explained as follows.

1.1 Construction of dissimilarity matrix

The first step in constructing a metabolic correlation network is the creation of a dissimilarity matrix. Biweight midcorrelation was used as a similiarity measure as it is more robust to outliers than the absolute correlation coefficient4 Zheng, C.-H., Yuan, L., Sha, W., & Sun, Z.-L. (2014). Gene differential coexpression analysis based on biweight correlation and maximum clique. BMC Bioinformatics, 15 Suppl 15(Suppl 15), S3. http://doi.org/10.1186/1471-2105-15-S15-S3. This choice is important, as we do not expect metabolites to be correlated in all patients.

The core concept of the so called “weighted” correlation analysis by Langfelder and Horvarth is that instead of defining a “hard” threshold (e.g. an absolute correlation coefficient > 0.8) to decide whether a node to connected to another, the adjacency a is defined by raising the similarity s to a power beta (“soft” threshold):

\[\begin{equation} a_{ij} = s_{ij}^\beta \end{equation}\]

Lastly, the dissimilarity measure w is defined by

\[\begin{equation} w_{ij} = 1 - a_{ij} \end{equation}\]

For detailed rationale of this approach, please see Zhang and Horvath5 Zhang, B., & Horvath, S. (2005). A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology, 4(1), Article17. http://doi.org/10.2202/1544-6115.1128. For metabolic networks, we identified that a beta value of 3 was the lowest power for which the scale-free topology of the topology was met_example.

The function diss_matrix creates the dissimilarity measure for the met_example objects and saves it in the metadata slot

met_example <- diss_matrix(met_example)

1.2 Identification of metabolic correlation modules

To identify metabolic correlation modules, metabolites are next clustered based on the dissimilarity measure where branches of the dendrogram correspond to modules. Ultimately, modules are detected by applying a branch cutting method with a minimal module size of 5 metabolites. We employed the dynamic branch cut method developed by Langfelder and colleagues6 Langfelder, P., Zhang, B., & Horvath, S. (2008). Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics, 24(5), 719–720. http://doi.org/10.1093/bioinformatics/btm563., as constant height cutoffs exhibit suboptimal performance on complicated dendrograms. Figure 1 shows the hierarchical clustering and corresponding modules after branch cutting.

met_example <- identify_modules(met_example, 
                       min_module_size=5)
##  ..cutHeight not given, setting it to 0.991  ===>  99% of the (truncated) height range in dendro.
##  ..done.
WGCNA::plotDendroAndColors(metadata(met_example)$tree, 
                    metadata(met_example)$module_color_vector, 
                    'Module colors', 
                    dendroLabels = FALSE, 
                    hang = 0.03,
                    addGuide = TRUE, 
                    guideHang = 0.05, main='')
Hierarchical clustering of metabolites. The different colors represent the modules identified by the dynamic branch cutting method.

Figure 1: Hierarchical clustering of metabolites
The different colors represent the modules identified by the dynamic branch cutting method.

The relation between the identified metabolic correlation modules can be visualized by a dendrogram of their eigenmetabolite (figure 2). The module eigenmetabolite is defined as the first principal component of the metabolic measurements in the respective module.

par(mar=c(2,2,2,2))
ape::plot.phylo(ape::as.phylo(metadata(met_example)$METree),
           type = 'fan',
           show.tip.label = FALSE, 
           main='')
ape::tiplabels(frame = 'circle',
          col='black', 
          text=rep('',length(unique(metadata(met_example)$modules))), 
          bg = WGCNA::labels2colors(0:21))
Hierarchy of metabolic correlation modules as revealed by the clustering of module eigenmetabolite. Each node represents a metabolic correlation module.

Figure 2: Hierarchy of metabolic correlation modules as revealed by the clustering of module eigenmetabolite
Each node represents a metabolic correlation module.

# number of metabolites per module
table(metadata(met_example)$modules)
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 
##  2 30 22 20 15 14 14 13 12 11 10  9  9  8  8  8  7  6  5  5  5  5

1.3 Name metabolic correlation modules

To enable a better interpretation of metabolic correlation modules, modules are named according to the most abundant pathway annotation in a module (figure 3).

# calculate module significance
met_example <- name_modules(met_example,
                   pathway_annotation = "SUB_PATHWAY")

# plot phylogram with names
ape::plot.phylo(ape::as.phylo(metadata(met_example)$METree), cex=0.9)
Hierarchy of metabolic correlation modules. Modules are named according to the most abundant pathway annotation in a module. Module 0 comprises the two metabolites without a significant interaction.

Figure 3: Hierarchy of metabolic correlation modules
Modules are named according to the most abundant pathway annotation in a module. Module 0 comprises the two metabolites without a significant interaction.

1.4 Calculation of module signficance to relate sample traits to modules

An advantage of correlation network analysis is the possibility to integrate external information. At the lowest hierarchical level, metabolite significance (MetS) measures can be defined as the statistical significance (i.e. p-value, \(p_i\)) between the \(i\)-th node profile (metabolite) \(x_i\) and the sample trait \(T\)

\[\begin{equation} MetS_i = -log~p_i \end{equation}\]

Module significance (MS) in turn can be determined as the average absolute metabolite significance measure. This conceptual framework can be adapted to any research question.

met_example <- calculate_MS(met_example,
                   group_factors = c("tumor_normal","random_gender"))

Figure 4 shows that metabolic correlation module 2 (Creatine metabolism / Glutathione metabolism) was significantly associated with the tumor vs. normal comparison in the example data.

MS_plot(met_example,
        group_factor="tumor_normal",
        p_value_cutoff=0.05,
        p_adjust=FALSE)
Association of tumor vs normal trait with metabolic correlation modules. Module 2 (Creatine metabolism / Glutathione metabolism) showed a significant module significance with a higher relative abundance in normal prostate tissues. P-values were not adjusted for multiple testing.

Figure 4: Association of tumor vs normal trait with metabolic correlation modules
Module 2 (Creatine metabolism / Glutathione metabolism) showed a significant module significance with a higher relative abundance in normal prostate tissues. P-values were not adjusted for multiple testing.

In line with the volcano plots, the random grouping did not result in a significant association (figure 5)

MS_plot(met_example,
        group_factor="random_gender",
        p_value_cutoff=0.05,
        p_adjust=FALSE)
Association of random trait with metabolic correlation modules. No significant module significance can be observed. P-values were not adjusted for multiple testing.

Figure 5: Association of random trait with metabolic correlation modules
No significant module significance can be observed. P-values were not adjusted for multiple testing.

1.5 Exploration of individual metabolites within correlation module

Assessing the module significance for different sample traits facilitates an understanding of individual metabolic correlation modules. As for metabolomics, we are next interested in the role of the individual metabolite within module. To this end, Langfelder and Horvath suggest a ‘fuzzy’ measure of module membership defined as

\[\begin{equation} K^q = |cor(x_i,E^q)| \end{equation}\]

where \(x_i\) is the profile of metabolite \(i\) and \(E^q\) is the eigenmetabolite of module \(q\). Based on this definition, \(K\) describes how closely related metabolite \(i\) is to module \(q\). A meaningful visualization is consequently plotting the module membership over the p-value of the respective \(MetS\) measure (MOI_plot). As a third dimension, the color is scaled according to the effect size (i.e. difference in means).

MOI_plot(met_example,
         group_factor="tumor_normal",
         MOI = 2,
         label_colors=c("darkseagreen","dodgerblue"),
         p_adjust = FALSE) + xlim(c(-1,8))
Association of individual metabolites within module 2. Myo-inositol is most tightly linked to the module (e.g. eigenmetabolite).

Figure 6: Association of individual metabolites within module 2
Myo-inositol is most tightly linked to the module (e.g. eigenmetabolite).

For more information about the application and interpretation of metabolic correlation networks, please refer to the package vignette ‘Case_study’.

Session information

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17763)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                    LC_TIME=German_Germany.1252    
## 
## attached base packages:
##  [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] MetaboDiff_0.9.5            forcats_0.5.0               stringr_1.4.0               dplyr_1.0.0                 purrr_0.3.4                 readr_1.3.1                
##  [7] tidyr_1.1.0                 tibble_3.0.3                ggplot2_3.3.2               tidyverse_1.3.0             ComplexHeatmap_2.4.3        MultiAssayExperiment_1.14.0
## [13] SummarizedExperiment_1.18.2 DelayedArray_0.14.1         matrixStats_0.56.0          Biobase_2.48.0              GenomicRanges_1.40.0        GenomeInfoDb_1.24.2        
## [19] IRanges_2.22.2              S4Vectors_0.26.1            BiocGenerics_0.34.0         BiocStyle_2.16.1           
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1            backports_1.1.7         circlize_0.4.10         fastmatch_1.1-0         Hmisc_4.4-0             igraph_1.2.5            plyr_1.8.6             
##   [8] splines_4.0.2           digest_0.6.25           foreach_1.5.0           htmltools_0.5.0         magick_2.5.0            GO.db_3.11.4            fansi_0.4.1            
##  [15] phytools_0.7-70         magrittr_1.5            checkmate_2.0.0         memoise_1.1.0           cluster_2.1.0           doParallel_1.0.15       limma_3.44.3           
##  [22] fastcluster_1.1.25      annotate_1.66.0         modelr_0.1.8            jpeg_0.1-8.1            colorspace_1.4-1        blob_1.2.1              rvest_0.3.6            
##  [29] haven_2.3.1             xfun_0.15               crayon_1.3.4            RCurl_1.98-1.2          jsonlite_1.7.0          genefilter_1.70.0       impute_1.62.0          
##  [36] phangorn_2.5.5          ape_5.4-1               survival_3.1-12         iterators_1.0.12        glue_1.4.1              gtable_0.3.0            zlibbioc_1.34.0        
##  [43] XVector_0.28.0          GetoptLong_1.0.2        shape_1.4.4             maps_3.3.0              scales_1.1.1            vsn_3.56.0              DBI_1.1.0              
##  [50] Rcpp_1.0.5              plotrix_3.7-8           xtable_1.8-4            htmlTable_2.0.1         tmvnsim_1.0-2           clue_0.3-57             foreign_0.8-80         
##  [57] bit_1.1-15.2            preprocessCore_1.50.0   Formula_1.2-3           tsne_0.1-3              htmlwidgets_1.5.1       httr_1.4.2              RColorBrewer_1.1-2     
##  [64] acepack_1.4.1           ellipsis_0.3.1          pkgconfig_2.0.3         XML_3.99-0.5            farver_2.0.3            nnet_7.3-14             dbplyr_1.4.4           
##  [71] dynamicTreeCut_1.63-1   tidyselect_1.1.0        labeling_0.3            rlang_0.4.7             AnnotationDbi_1.50.3    munsell_0.5.0           cellranger_1.1.0       
##  [78] tools_4.0.2             cli_2.0.2               generics_0.0.2          RSQLite_2.2.0           broom_0.7.0             evaluate_0.14           yaml_2.2.1             
##  [85] knitr_1.30              bit64_0.9-7             fs_1.4.2                nlme_3.1-148            xml2_1.3.2              compiler_4.0.2          rstudioapi_0.11        
##  [92] png_0.1-7               affyio_1.58.0           clusterGeneration_1.3.5 reprex_0.3.0            stringi_1.4.6           highr_0.8               lattice_0.20-41        
##  [99] Matrix_1.2-18           vctrs_0.3.2             pillar_1.4.6            lifecycle_0.2.0         BiocManager_1.30.10     combinat_0.0-8          GlobalOptions_0.1.2    
## [106] data.table_1.13.0       cowplot_1.0.0           bitops_1.0-6            R6_2.4.1                latticeExtra_0.6-29     affy_1.66.0             bookdown_0.21          
## [113] gridExtra_2.3           codetools_0.2-16        gtools_3.8.2            MASS_7.3-51.6           assertthat_0.2.1        rjson_0.2.20            withr_2.2.0            
## [120] mnormt_2.0.2            GenomeInfoDbData_1.2.3  expm_0.999-5            hms_0.5.3               quadprog_1.5-8          rpart_4.1-15            coda_0.19-3            
## [127] rmarkdown_2.4           scatterplot3d_0.3-41    numDeriv_2016.8-1.1     WGCNA_1.69              lubridate_1.7.9         base64enc_0.1-3