R
in USAR
skill brought me to work in USA in Biostatistics DepartmentsR
packagehclust
hclust
library(s4vd)
data(lung200) # lung cancer data set (Bhattacharjee et al. 2001)
lung200.D.mat <- dist(lung200, method = "euclidean")
lung200.hclust <- hclust(t(lung200.D.mat), method = "average")
heatmap(lung200, Rowv = as.dendrogram(lung200.hclust), cexRow = 0.3, cexCol = 0.7)
hclust
- algorithmBegin with \(n\) observations and a measure (i.e. Euclidean distance) of all the \(n = n(n − 1)/2\) pairwise dissimilarities. Treat each observation as its own cluster.
For \(i=n,n−1,...,2\):
Examine all pairwise inter-cluster dissimilarities among the \(i\) clusters and identify the pair of clusters that are least dissimilar (that is, most similar). Fuse these two clusters.
Compute the new pairwise inter-cluster dissimilarities among the \(i-1\) remaining clusters.
hclust
- pros and consPros:
Cons:
hclust
- distance choice effect# Function to plot heatmap with hclust dendrogram (method = "average")
plot.lung200.heatmap.1 <- function(row.dist, col.dist, plt.title){
heatmap(lung200, main = plt.title,
cexRow = 0.3, cexCol = 0.7, labRow = FALSE, labCol = FALSE,
Rowv = as.dendrogram(hclust(row.dist, method = "average")),
Colv = as.dendrogram(hclust(col.dist, method = "average")))
}
D.mat.row <- dist(lung200, method = "euclidean")
D.mat.col <- dist(t(lung200), method = "euclidean")
plot.lung200.heatmap.1(D.mat.row, D.mat.col, "Euclidean dist")
D.mat.row <- dist(lung200, method = "maximum")
D.mat.col <- dist(t(lung200), method = "maximum")
plot.lung200.heatmap.1(D.mat.row, D.mat.col, "Maximum dist")
D.mat.row <- as.dist(1 - cor(t(lung200)))
D.mat.col <- as.dist(1 - cor(lung200))
plot.lung200.heatmap.1(D.mat.row, D.mat.col, "Correlation dist")