---
title: "Genetic correlation and narrow sense heritability of PSA"
author: "Malachy Campbell"
date: "8/7/2017"
output:
  rmdformats::html_clean:
    fig_width: 6
    fig_height: 6
    highlight: kate
    thumbnails: true
    lightbox: true
    gallery: true
---

```{r setup, include=FALSE}
knitr::opts_knit$set(root.dir = '~/Desktop/RR/')
```

#Background
The following random regression (RR) model was used for all subsequent analyses

\begin{align}
  \label{RR2}
\text{PSA}_{tjk} =\mu + \sum_{k=0}^{2}\phi(t)_{jk}\beta_k + \sum_{k=0}^{2}\phi(t)_{jk} u_{jk} + \sum_{k=0}^{1}\phi(t)_{jk} s_{jk} + e_{tjk}
\end{align}
As mentioned previously, $\beta$ is the fixed second-order Legendre polynomial to model the overall trend in the trait overtime, $u_{jk}$ and $s_{jk}$ are the $k^{th}$ random regression coefficients for additive genetic effect and random experiment of line $j$, $nr$ is the order of polynomial for the random effects, and $e_{tjk}$ is the random residual.
The random additive genetic effects are described using a second-order Legendre polynomial, while a first-order Legendre polynomial is used to describe the experiment effects across time points.

The random regrression model was compared to a conventional, single time point model (TP). The model is as follows \begin{align}
  \label{TP}
  \mathbf{y} &= \mathbf{Zu} + \mathbf{Qs} +  \mathbf{e} , 
\end{align}
Here, $\mathbf{y}$ is the PSA at time $t$; $\mathbf{Z}$ and $\mathbf{Q}$ are incidence matrices corresponding to the random additive genetic effect ($\mathbf{u}$), and random experimental effect ($\mathbf{s}$), respectively; and $\mathbf{e}$ is the random residual error. The variances were based on the following assumptions $\mathbf{u} \sim N(0, \mathbf{G}\sigma_g^2)$, $\mathbf{s} \sim N(0, \mathbf{I}\sigma_s^2)$, and $\mathbf{e} \sim N(0, \mathbf{I}\sigma_e^2)$. Here, $\sigma_g^2$ is the additive genetic variance; $\sigma_s^2$ is an environmental variance associated with experiment; and $\sigma_e^2$ is the residual variance. 

For both the RR and TP appraches a genomic relationship matrix ($\mathbf{G}$) was calculated using [VanRaden 2008](https://www.ncbi.nlm.nih.gov/pubmed/18946147).
\begin{align}
  \label{G}
\mathbf{G} = \frac{\mathbf{Z_{cs} Z_{cs}}' }{m}
\end{align}
Here, $\mathbf{Z_{cs}}$ is a centered and scaled $n \times m$ matrix, where $m$ is 33,674 SNPs and $n$ is the 357 genotyped rice lines. 

```{r load packages, echo=F}
library(knitr)
```

To estimate the narrow sense heritability, variance components were obtained for each random term using ASREML for the TP analyses and the RR approach. For the RR approach, additive genetic variance was obtained at each time points using methods described by [Mrode (2014)](https://books.google.com/books/about/Linear_Models_for_the_Prediction_of_Anim.html?id=qzACnwEACAAJ). Briefly, for time $i$ the genetic variance can be obtained by $\mathbf{t}_{i} \mathbf{\Omega} \mathbf{t}_{i}'$, where $\mathbf{t}_{i} = \phi_{ik}$, the $i$th row vector of the matrix of Legendre polynomials at different time points ($\mathbf{\phi}$) for the $i$th day of imaging, $\mathbf{\Omega}$ is the covariance matrix of RR coefficients for the  genetic effects, and $k$ is the order of fit. The variance of the experimental effect across time points was calculated using the same approach. For both the single time point analysis $h^2$ was estimated as $\frac{\sigma_g^2}{\sigma_s^2 + \sigma_s^2 + \sigma_e^2}$.

#Running the models
##as file for random regression
```{asreml .as file}
h2 RR model
 NID !A
 Exp !A
 Rep 2
 DayOfImaging 20 !I
 PSA !/100000
G2.grm
PSA.cleaned.csv !SKIP 1 !MAXITER 1000 !EXTRA 100 !WORKSPACE 6144 !ASUV !DOPATH 1

!PATH 1 #Model8: 27537.59 -55017.19 -54782.49
!ASSIGN USe !< !INIT
1.37E-03 1.22E-03 2.07E-03 3.32E-03 5.32E-03 7.88E-03 1.18E-02 1.53E-02 2.04E-02 2.76E-02 4.03E-02 5.80E-02 8.19E-02 0.114627 0.164452 0.224128 0.336555 0.522506 0.790597 1.17392
!>
!ASSIGN USg !< !INIT
0.629183
0.495574 0.394557
0.129607 0.104540 0.281278E-01
!>
!ASSIGN USp !< !INIT
0.594161E-01
0.392295E-01 0.260640E-01
!>
PSA ~ mu leg(DayOfImaging,2) !r us(leg(DayOfImaging,1) $USp).Exp us(leg(DayOfImaging,2) $USg).grm(NID) !f mv
residual id(2142).idh(DayOfImaging $USe)
```

##as file for single time point
This will run the single time point model at each day of imaging. "!RENAME" will create a new set of files for each time point (Y1...Y20). 
```{asreml TP model}
!RENAME !ARG Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12 Y13 Y14 Y15 Y16 Y17 Y18 Y19 Y20
SINGLE TP
 NID !A
 Exp !A
 Rep 2
 Y1 !/100000
 Y2 !/100000
 Y3 !/100000
 Y4 !/100000
 Y5 !/100000
 Y6 !/100000
 Y7 !/100000
 Y8 !/100000
 Y9 !/100000
 Y10 !/100000
 Y11 !/100000
 Y12 !/100000
 Y13 !/100000
 Y14 !/100000
 Y15 !/100000
 Y16 !/100000
 Y17 !/100000
 Y18 !/100000
 Y19 !/100000
 Y20 !/100000
G2.grm
PSA.cleaned.csv !SKIP 1 !MAXITER 1000 !EXTRA 100 !WORKSPACE 6144 !ASUV !DOPATH 1

!PATH 1
$1 ~ mu !r Exp grm(NID)
```

#Estimating $h^2$
##Single time point
For the TP analysis, 20 .asr files were created. This chunk of code parses those .asr files, extracts the variance components, and writes those to a file. The r code loads each file and creates an object with the variance components at each time point.
###Parsing asreml output
```{bash var comp files for TP, eval = F}
cd ~/Desktop/RR/TP/h2
for file in *.asr; do less $file | grep -A6 "Model_Term" | sed -n "1,4p" | tr -s " " > $file.varcomp.txt; done
```
###Estimating $h^2$ for single time point approach
```{r TP var comp}
library(reshape2)

files <- paste0("TP/h2/TPY", 1:20, ".asr.varcomp.txt")
for(i in 1:20){
  tmp <- read.table(files[i], skip=1, sep="")
  tmp$DOI <- i
  if(i == 1){
    final <- tmp
  }else{
    final <- rbind(final, tmp)
  }
}
final <- final[c("V1", "V5", "DOI")]
colnames(final)[1:2] <- c("Source", "Sigma")

Var.TP <- dcast(final, Source ~ DOI, value.var = "Sigma")
Var.TP <- Var.TP[2:21]
row.names(Var.TP) <- c("Exp", "G", "E")

Var.TP.per <- rbind(Var.TP[1,]/colSums(Var.TP)*100, Var.TP[2,]/colSums(Var.TP)*100, Var.TP[3,]/colSums(Var.TP)*100)
TP.h2 <- Var.TP[2,]/colSums(Var.TP)
```
##Random regression
$h^2$ for the RR model is straightforward, and is done as described above. The variance components below are obtained from the .asr file. 

###Defining some functions
The functions below were adapted from Mrode (2005) by [Gota Morota](http://morotalab.org/Mrode2005/rr/rr.html#section00020000000000000000).
```{r define functions for RR}
##Return coefficient matrix (lambda) of n-th order Legendre polynomials. Scaling method implemented by Gengler et. al. (1999) converts constant Legendre polynomial coefficients into 1
`legendre` <-
  function(n, gengler){
    
    if (nargs()==1){
      gengler <- TRUE	
    }
    
    if (gengler != TRUE & gengler != FALSE){
      gengler=TRUE	
    }
    
    N <- n+1
    L <- matrix(0,nrow=N, ncol=N)
    
    for(i in (1:N)){
      if(i==1){
        L[i,i] <- 1
      }
      else if(i==2){
        L[i,i] <- 1
      }
      else  {
        tmp <- L[i-1,]
        tmp2 <- as.numeric()
        tmp2 <- c(0,tmp[1:(N-1)])
        L[i,] <- (1/(i-2+1))*( (2*(i-2) + 1)*tmp2 -(i-2)*L[i-2,] )
      }
    }
    
    # Normalize
    for (j in (1:N)){	
      L[j,] <- (sqrt( (2*(j-1)+1)/2)  )*L[j,]
    }
    
    
    # Gengler (1999)
    if (gengler==TRUE){
      L <- sqrt(2)*L
    }
    
    return(L)
    
  }

##Given time points covariate and order of fit for Legendre polynomials, return matrix 'M' containing the polynomials of standardized time. 'M' is order t (number of time points) by k (order of Legendre polynomials)
stdtime <- function(t, n, tmax, tmin){
  if(missing(tmax)) {
    tmax <- t[which.max(t)]
  }
  if(missing(tmin)) {
    tmin <- t[which.min(t)]
  }
  
  N <- n+1
  M <- matrix(0, nrow=length(t), ncol=N)
  a <- -1 + 2*(t-tmin)/(tmax - tmin)
  M[,1] <- 1
  
  for (i in 2:N){
    M[,i] <- a^(i-1)
  }
  
  return(M)
}
```
###Estimating $h^2$ for random regression approach
```{r h2 for RR}
#Variance components from the .asr file
##Residual variance. Recall that idh was used, so each time point will have a unique variance
Resvar <- c(1.39E-03, 1.22E-03, 2.06E-03, 3.33E-03, 
            5.32E-03, 7.88E-03, 1.18E-02, 1.53E-02, 
            2.04E-02, 2.75E-02, 4.02E-02, 5.80E-02, 
            8.19E-02, 0.114587, 0.164339, 0.22392, 
            0.336068, 0.521511, 0.78897, 1.17139)

#Genetic variance. Here a second order Legendre polynomial was used. So G will be a 3 x 3 matrix of 
G.mat <- matrix(c(0.629183, 0.495574, 0.129607,
                  0.495574, 0.394557, 0.104540,
                  0.129607, 0.104540, 0.281278E-01), 3, 3)

E.mat <- matrix(c(0.594161E-01, 0.392295E-01,
               0.392295E-01, 0.260640E-01), 2, 2)

Phi <- stdtime(1:20, 2) %*% t(legendre(2, gengler = F))
G.COV <- Phi %*% G.mat %*% t(Phi)
Gvar <- diag(G.COV)

Phi <- stdtime(1:20, 1) %*% t(legendre(1, gengler = F))
E.COV <- Phi %*% E.mat %*% t(Phi)
Evar <- diag(E.COV)

Var.RR <- rbind(Evar, Gvar, Resvar)
Var.RR.per <- rbind(Evar/colSums(Var.RR)*100, Gvar/colSums(Var.RR)*100, Resvar/colSums(Var.RR)*100)
RR.h2 <- Var.RR[2,]/colSums(Var.RR)
```

##Plotting $h^2$
Here, I'll plot $h^2$ and the variance components for the RR and TP approaches. This chunk of code will generate Figure 4 in the paper.
```{r h2 (F4) plot}
#single column figure 3.25in width
#pdf("~/Desktop/RR/RR/Figures/Fig4.pdf", h=6, w=3.25, useDingbats = F, pointsize = 10)
par(mar=c(3,3,2,.2), mgp=c(1.8,0.5,0))
nf=layout(rbind(c(1,1), c(2,2), c(3,3)))

plot(13:32, RR.h2, pch=19, cex=0.3, ylab=expression(italic(h^2)), xlab="Days After Transplant", col="black", ylim=c( 0, 1) )
lines(13:32, RR.h2, col="black")

points(13:32, TP.h2, pch=21, cex=0.3, col="grey")
lines(13:32, TP.h2, lty=2, cex=0.3, col="grey")

legend("bottomright", c("TP", "RR"), col=c("grey", "black"), pch=c(21, 19), lty=c(2, 1), cex=1, pt.cex = 0.3)

mtext("A", 2,  adj=3.5, las=1, padj=-10, font=1, cex=0.8)

###Varmat for TP
plot(13:32, Var.TP[2,], pch=19, cex=0.5, ylab=expression(italic(sigma^2)), xlab="Days After Transplant", col="black", ylim=c( 0, max(colSums(Var.TP))*1.05) )
lines(13:32, Var.TP[2,], col="black")

points(13:32, Var.TP[1,], pch=20, col="grey", cex=0.5)
lines(13:32, Var.TP[1,], col="grey")

points(13:32, colSums(Var.TP), pch=21, cex=0.5)
lines(13:32, colSums(Var.TP), col="black", lty=2)

legend("topleft", 
       c(expression(italic({sigma^{2}}[g])), 
         expression(italic({sigma^{2}}[Exp])), 
         expression(italic(sigma^2))),
       lty=c(1,1,2), 
       col=c("black", "grey", "black"), 
       pch=c(19, 20, 21),
       cex=1, 
       pt.cex = 0.5)
mtext("B", 2,  adj=3.5, las=1, padj=-10, font=1, cex=0.8)

###Varmat for RR
plot(13:32, Var.RR[2,], pch=19, cex=0.5, ylab=expression(italic(sigma^2)), xlab="Days After Transplant", col="black", ylim=c( 0, max(colSums(Var.RR))*1.05) )
lines(13:32, Var.RR[2,], col="black")

points(13:32, Var.RR[1,], pch=20, col="grey", cex=0.5)
lines(13:32, Var.RR[1,], col="grey")

points(13:32, colSums(Var.RR), pch=21, cex=0.5)
lines(13:32, colSums(Var.RR), col="black", lty=2)

legend("topleft", 
       c(expression(italic({sigma^{2}}[g])), 
         expression(italic({sigma^{2}}[Exp])), 
         expression(italic(sigma^2))),
       lty=c(1,1,2), 
       col=c("black", "grey", "black"), 
       pch=c(19, 20, 21),
       cex=1, 
       pt.cex = 0.5)
mtext("C", 2,  adj=3.5, las=1, padj=-10, font=1, cex=0.8)
#dev.off()
```

#Phenotypic and genetic correlation of PSA
Here, we will calculate the phenotypic and genetic correlations for PSA. The genetic correlation is obtained from the RR model. The heatmaps for Figure 3 in the paper were generated using this code and were combined in Inkscape. [pheatmap](https://cran.r-project.org/web/packages/pheatmap/index.html) is my new favorite package for creating heatmaps, however I haven't yet figured out how to make a multipanel heatmap.
```{r RR gen cor, eval = F}
library(pheatmap)
library(RColorBrewer)
library(plyr)
library(reshape2)

PSA.df <- read.csv("~/Desktop/RR/RR/FinalFiles/PSA.cleaned.csv")

#phenotypic correlation
PSA.df <- dcast(PSA.df, NSFTV.ID + Exp + Rep ~ DayOfImaging, value.var="PSA")
PSA.cor <- cor(PSA.df[4:ncol(PSA.df)], use="complete.obs")
row.names(PSA.cor) <- 13:32
colnames(PSA.cor) <- 13:32

#Phenotypic correlation heatmap
HMcolors <- colorRampPalette(c("red", "yellow", "white"))(100)

pheatmap(PSA.cor,
         color = HMcolors,
         border_color = NA,
         cellwidth = 10.5,
         cellheight = 10.5,
         fontsize = 6,
         scale = "none",
         cluster_rows = F,
         cluster_cols = F,
         legend = T,
         display_numbers = T,
         number_format = "%.2f",
         number_color = "black",
         fontsize_number = 3.5,
         breaks = seq(0.48, 1, length.out = 100),
         filename = "~/Desktop/RR/RR/Figures/PhenoCor.pdf",
         width = 3.5,
         height = 3.5,
         show_rownames = T,
         show_colnames = T,
         labels_row = 13:32,
         labels_col = 13:32
)


#genetic correlation
gCOR <- cov2cor(G.COV)
row.names(gCOR) <- 13:32
colnames(gCOR) <- 13:32


HMcolors <- colorRampPalette(c("red", "yellow", "white"))(100)
pheatmap(gCOR,
         color=HMcolors,
         border_color = NA,
         cellwidth = 10.5,
         cellheight = 10.5,
         fontsize = 6,
         scale = "none",
         cluster_rows = F,
         cluster_cols = F,
         legend = T,
         display_numbers = T,
         number_format = "%.2f",
         number_color = "black",
         fontsize_number = 3.5,
         breaks = seq(0.5, 1, length.out = 100),
         filename = "~/Desktop/RR/RR/Figures/GenoCor.pdf",
         width = 3.5,
         height = 3.5,
         show_rownames = T,
         show_colnames = T,
         labels_row = 13:32,
         labels_col = 13:32
         )
```