This document serves as a demonstration and a tutorial for the DGSA package. DGSA stands for “Distance Based Generalized Sensitivity Analysis”, and it is a method for computing the importance of intput parameters in computer experiments. The method is capable of working with inputs of various types (continuous, categorical, functions) and outputs of pretty much any type since it relies on clustering of the outputs. Originally the method was developed by Fenwick et al (2014), and the implemented code closely follows the original development by Celine Scheidt that is available at github.com/SCRF_Public/DGSA.

The DGSA package expands the original visualization developments by employing advanced graphics provided by corrplot and ggplot2 packages.

DGSA demonstration on a demo dataset:

The first step in DGSA is to cluster the outputs in some way. Responses can be functional, vectors, scalars, or mixed. The most adequate clustering method depends on the data, therefore it is left to the user to decide which clustering method to chose in data pre-processing stage. The only cluster related input that is necesarry for dgsa are the cluster codes assigned to each observation (design points).

In the example given below we used simple principal component analysis with kmeans clustering.

library(DGSA)
INPUT         <- read.csv("../data/Input.csv")
OUTPUT.damage <- read.csv("../data/Output_damage.csv")
comps  <- prcomp(OUTPUT.damage)
scores <- t(t(comps$x[,1:2])/comps$sdev[1:2])
clustering = kmeans(scores, 2, 50)$cluster
plot(scores)
points(scores[clustering == 1,], col="red", pch = 19)
points(scores[clustering == 2,], col="blue", pch=19)

After clustering, the user can either proceed straight to computations, or perform some exploratory data analysis of the cdf’s in order to make an educated decision on how many bins to use for computation of sensitivities of interactions or perhaps even remove some parameters. The package provides convenient tools for such exploratory data analyses. The first function that we will introduce is called plotCDFS that, as the name suggests, plots parameter cdfs in the same form as they are used in the dgsa code. The code all* specifies that we are interested in cdfs of all available input parameters.

plotCDFS(clustering, INPUT, .code = "all*")

All CDFs

Similarly, by changing the .code parameter we can also plot cdfs of a single parameter, or even CDFs of interactions. The following two examples demonstrate such capabilities.

plotCDFS(clustering, INPUT, .code = "beta")
plotCDFS(clustering, INPUT, .code = "lambda")

CDFs of single parameters

plotCDFS(clustering, INPUT, .code = "beta|lambda", .nBins = 2)

CDFs of interaction for 2 bins

plotCDFS(clustering, INPUT, .code = "beta|lambda", .nBins = 3)

CDFs of interaction for 3 bins

Computing DGSA:

To compute parameter sensitivities with DGSA one should use the following function.

myDGSA <- dgsa(clustering, INPUT, .interactions = TRUE, .nBoot = 100, .nBins = 3, .alpha = 0.95, .parallel = FALSE)

Note the controls that are available to users: the number of bootstrapped samples (.nBoot), the number of bins for interactions (.nBins), significance test factor (.alpha), boolean operator (.interactions) specifying whether to compute interactions or not. In the current version parameter (.parallel) is disabled but it is left as an option for future implementation to speed up bootstrapping through parallelization.

This function returns a list with two components. The first component is a 3D matrix of size (nbClusters X nbParameters X nbParameters). Diagonal elements of this matrix are the main effects, while the off-diagonal elements are sensitivities of interactions. All elements are automatically normalized/sandardized by .alpha value from the boostrapped distributions. If the users want to change normalization factor the function given above has to be executed one more time, in other words the code does not save the bootstrapped samples like the original MATLAB code.

To aid better visualization of the main effects, and interactions we make use of the corrplot package which was originally developed for visualizations of covariance matrices. Without much effort we can gain significant insights into parameter interactions, perform parameter rankings through matrix rearrangement, and finally summarize the entire sensitivity analysis in one plot instead of using several paretto plots. The following wrapper function was developed for efficient communication with the corrplot functions.

Plotting the Results

plotMatrixDGSA(.dgsa, .hypothesis = TRUE, .method = "circle", ...)

Input parameter .dgsa is an object returned from the dgsa function (in our case myDGSA). The code will check if the object really came out of the dgsa function. The second parameter .hypothesis specifies whether to mark “non-sensitive” parameters in the correlation plot. The user has a plethora of options that enable advanced markup of non-significant sensitivities (more on that later). The third parmeter is .method which specifies the plotting method passed to the corrplot function. The choice of the method is only limited by the corrplot’s capabilities (see help(corrplot)). Finally, all other input parameters are passed to corrplot without change, this enables direct communication with the corrplot function (see help(corrplot) for more information).

Matrix Plots

Matrix Plot (w/o hypothesis)

plotMatrixDGSA(myDGSA, .hypothesis = FALSE)

Main/Interaction plot without hypothesis testing

Matrix Plot (w/ hypothesis)

plotMatrixDGSA(myDGSA, .hypothesis = TRUE)

Main/Interaction plot with hypothesis testing

One of the best features of the corrplot package is matrix reordering. Many mehtods have been implemented, mainly for covariance matrix reordering, however the one that was found to work best in the DGSA setting is “hiearachical clustering” with a “single” linkage. Results of this approach are given below. Notice that the parameters were ranked based on both their main effects and their interactions, and not only their main effects. A particularly interesting part is the lower right corner that ranks 7 parameters as the most important.

plotMatrixDGSA(myDGSA, order = 'hclust', hclust.method='single', .hypothesis = FALSE)

Main/Interaction plot with matrix reordering (hierarchical clustering method)

If we turn .hypothesis parameter to TRUE the code will add significance marks to the matrix sensitivity plot to indicate which sensitivities were found to be insignificant in hypothesis testing. The following example demonstrates such capability.

plotMatrixDGSA(myDGSA, order = 'hclust', hclust.method='single', .hypothesis = TRUE)

Main/Interaction plot with matrix reordering (hierarchical clustering method) + significance

More Matrix Plots

A few more additional plots produced with advanced corrplot settings.

Hclust Reordered

plotMatrixDGSA(myDGSA, order = 'hclust', hclust.method='single', tl.srt = 65)

Main/Interaction plot with parameter reordering (hierarchical clustering method) with rotated text

Number

plotMatrixDGSA(myDGSA, .method = "number", .hypothesis = FALSE)

Main/Interaction plot “number”

Square

plotMatrixDGSA(myDGSA, .method = "square", .hypothesis = FALSE)

Main/Interaction plot “square”

Shaded

plotMatrixDGSA(myDGSA, .method = "shade", .hypothesis = FALSE)

Main/Interaction plot “shade”

Pie

plotMatrixDGSA(myDGSA, .method = "pie", .hypothesis = FALSE)

Main/Interaction plot “pie”

Pie Lower

plotMatrixDGSA(myDGSA, .method = "pie", type = "lower", .hypothesis = FALSE)

Main/Interaction plot “pie” just lower diagonal part

Circle

plotMatrixDGSA(myDGSA, .method = "circle", diag=TRUE, .hypothesis = TRUE, tl.srt = 45,
               insig = "pch", pch = "+", pch.col = "black", pch.cex = 1.5)

Main/Interaction plot. An example of significancy symbol modification

Pareto Plots:

The DGSA package also has pareto plotting capabilities just like the original MATLAB code. For this feature we are using the advanced graphics from the ggplot2 package. All wrapper functions also have a capability of returning an ggplot object for fine tunning of fonts/color via inline methods such as g + ggtitle('mytitle'). etc. Users should consult ggplot2 documentation for more information.

Main Effects

plotParetoDGSA(myDGSA)

Pareto plot of main effects (ranked by mean)

Interactions Beta

plotParetoDGSA(myDGSA, .interaction = "beta")

Pareto plot of parameter sensitivity given parameter beta (interactions). Note “beta|beta” is just its main effect

Interactions Lambda

plotParetoDGSA(myDGSA, .interaction = "lambda")

Pareto plot of parameter sensitivity given parameter lambda (interactions). Note “lambda|lambda” is just its main effect

First steps with R-DGSA

Ogy Grujic

November 16th, 2017