Background

The following random regression (RR) model was used for all subsequent analyses

$\begin{align} \label{RR2} \text{PSA}_{tjk} =\mu + \sum_{k=0}^{2}\phi(t)_{jk}\beta_k + \sum_{k=0}^{2}\phi(t)_{jk} u_{jk} + \sum_{k=0}^{1}\phi(t)_{jk} s_{jk} + e_{tjk} \end{align}$

As mentioned previously, $\beta$ is the fixed second-order Legendre polynomial to model the overall trend in the trait overtime, $u_{jk}$ and $s_{jk}$ are the $k^{th}$ random regression coefficients for additive genetic effect and random experiment of line $j$ , $nr$ is the order of polynomial for the random effects, and $e_{tjk}$ is the random residual. The random additive genetic effects are described using a second-order Legendre polynomial, while a first-order Legendre polynomial is used to describe the experiment effects across time points.

The random regrression model was compared to a conventional, single time point model (TP). The model is as follows

$\begin{align} \label{TP} \mathbf{y} &= \mathbf{Zu} + \mathbf{Qs} + \mathbf{e} , \end{align}$

Here, $\mathbf{y}$ is the PSA at time $t$ ; $\mathbf{Z}$ and $\mathbf{Q}$ are incidence matrices corresponding to the random additive genetic effect ( $\mathbf{u}$ ), and random experimental effect ( $\mathbf{s}$ ), respectively; and $\mathbf{e}$ is the random residual error. The variances were based on the following assumptions $\mathbf{u} \sim N(0, \mathbf{G}\sigma_g^2)$ , $\mathbf{s} \sim N(0, \mathbf{I}\sigma_s^2)$ , and $\mathbf{e} \sim N(0, \mathbf{I}\sigma_e^2)$ . Here, $\sigma_g^2$ is the additive genetic variance; $\sigma_s^2$ is an environmental variance associated with experiment; and $\sigma_e^2$ is the residual variance.

For both the RR and TP appraches a genomic relationship matrix (

$\mathbf{G}$ ) was calculated using VanRaden 2008.

$\begin{align} \label{G} \mathbf{G} = \frac{\mathbf{Z_{cs} Z_{cs}}' }{m} \end{align}$

Here, $\mathbf{Z_{cs}}$ is a centered and scaled $n \times m$ matrix, where $m$ is 33,674 SNPs and $n$ is the 357 genotyped rice lines.

To estimate the narrow sense heritability, variance components were obtained for each random term using ASREML for the TP analyses and the RR approach. For the RR approach, additive genetic variance was obtained at each time points using methods described by Mrode (2014). Briefly, for time $i$ the genetic variance can be obtained by $\mathbf{t}_{i} \mathbf{\Omega} \mathbf{t}_{i}'$ , where $\mathbf{t}_{i} = \phi_{ik}$ , the $i$ th row vector of the matrix of Legendre polynomials at different time points ( $\mathbf{\phi}$ ) for the $i$ th day of imaging, $\mathbf{\Omega}$ is the covariance matrix of RR coefficients for the genetic effects, and $k$ is the order of fit. The variance of the experimental effect across time points was calculated using the same approach. For both the single time point analysis $h^2$ was estimated as $\frac{\sigma_g^2}{\sigma_s^2 + \sigma_s^2 + \sigma_e^2}$ .

Running the models

as file for random regression

h2 RR model
 NID !A
 Exp !A
 Rep 2
 DayOfImaging 20 !I
 PSA !/100000
G2.grm
PSA.cleaned.csv !SKIP 1 !MAXITER 1000 !EXTRA 100 !WORKSPACE 6144 !ASUV !DOPATH 1

!PATH 1 #Model8: 27537.59 -55017.19 -54782.49
!ASSIGN USe !< !INIT
1.37E-03 1.22E-03 2.07E-03 3.32E-03 5.32E-03 7.88E-03 1.18E-02 1.53E-02 2.04E-02 2.76E-02 4.03E-02 5.80E-02 8.19E-02 0.114627 0.164452 0.224128 0.336555 0.522506 0.790597 1.17392
!>
!ASSIGN USg !< !INIT
0.629183
0.495574 0.394557
0.129607 0.104540 0.281278E-01
!>
!ASSIGN USp !< !INIT
0.594161E-01
0.392295E-01 0.260640E-01
!>
PSA ~ mu leg(DayOfImaging,2) !r us(leg(DayOfImaging,1) $USp).Exp us(leg(DayOfImaging,2) $USg).grm(NID) !f mv
residual id(2142).idh(DayOfImaging $USe)

as file for single time point

This will run the single time point model at each day of imaging. “!RENAME” will create a new set of files for each time point (Y1…Y20).

!RENAME !ARG Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12 Y13 Y14 Y15 Y16 Y17 Y18 Y19 Y20
SINGLE TP
 NID !A
 Exp !A
 Rep 2
 Y1 !/100000
 Y2 !/100000
 Y3 !/100000
 Y4 !/100000
 Y5 !/100000
 Y6 !/100000
 Y7 !/100000
 Y8 !/100000
 Y9 !/100000
 Y10 !/100000
 Y11 !/100000
 Y12 !/100000
 Y13 !/100000
 Y14 !/100000
 Y15 !/100000
 Y16 !/100000
 Y17 !/100000
 Y18 !/100000
 Y19 !/100000
 Y20 !/100000
G2.grm
PSA.cleaned.csv !SKIP 1 !MAXITER 1000 !EXTRA 100 !WORKSPACE 6144 !ASUV !DOPATH 1

!PATH 1
$1 ~ mu !r Exp grm(NID)

Estimating $h^2$

Single time point

For the TP analysis, 20 .asr files were created. This chunk of code parses those .asr files, extracts the variance components, and writes those to a file. The r code loads each file and creates an object with the variance components at each time point. ###Parsing asreml output

cd ~/Desktop/RR/TP/h2
for file in *.asr; do less $file | grep -A6 "Model_Term" | sed -n "1,4p" | tr -s " " > $file.varcomp.txt; done

Estimating $h^2$ for single time point approach

library(reshape2)

files <- paste0("TP/h2/TPY", 1:20, ".asr.varcomp.txt")
for(i in 1:20){
  tmp <- read.table(files[i], skip=1, sep="")
  tmp$DOI <- i
  if(i == 1){
    final <- tmp
  }else{
    final <- rbind(final, tmp)
  }
}
final <- final[c("V1", "V5", "DOI")]
colnames(final)[1:2] <- c("Source", "Sigma")

Var.TP <- dcast(final, Source ~ DOI, value.var = "Sigma")
Var.TP <- Var.TP[2:21]
row.names(Var.TP) <- c("Exp", "G", "E")

Var.TP.per <- rbind(Var.TP[1,]/colSums(Var.TP)*100, Var.TP[2,]/colSums(Var.TP)*100, Var.TP[3,]/colSums(Var.TP)*100)
TP.h2 <- Var.TP[2,]/colSums(Var.TP)

Random regression

$h^2$ for the RR model is straightforward, and is done as described above. The variance components below are obtained from the .asr file.

Defining some functions

The functions below were adapted from Mrode (2005) by Gota Morota.

##Return coefficient matrix (lambda) of n-th order Legendre polynomials. Scaling method implemented by Gengler et. al. (1999) converts constant Legendre polynomial coefficients into 1
`legendre` <-
  function(n, gengler){
    
    if (nargs()==1){
      gengler <- TRUE   
    }
    
    if (gengler != TRUE & gengler != FALSE){
      gengler=TRUE  
    }
    
    N <- n+1
    L <- matrix(0,nrow=N, ncol=N)
    
    for(i in (1:N)){
      if(i==1){
        L[i,i] <- 1
      }
      else if(i==2){
        L[i,i] <- 1
      }
      else  {
        tmp <- L[i-1,]
        tmp2 <- as.numeric()
        tmp2 <- c(0,tmp[1:(N-1)])
        L[i,] <- (1/(i-2+1))*( (2*(i-2) + 1)*tmp2 -(i-2)*L[i-2,] )
      }
    }
    
    # Normalize
    for (j in (1:N)){   
      L[j,] <- (sqrt( (2*(j-1)+1)/2)  )*L[j,]
    }
    
    
    # Gengler (1999)
    if (gengler==TRUE){
      L <- sqrt(2)*L
    }
    
    return(L)
    
  }

##Given time points covariate and order of fit for Legendre polynomials, return matrix 'M' containing the polynomials of standardized time. 'M' is order t (number of time points) by k (order of Legendre polynomials)
stdtime <- function(t, n, tmax, tmin){
  if(missing(tmax)) {
    tmax <- t[which.max(t)]
  }
  if(missing(tmin)) {
    tmin <- t[which.min(t)]
  }
  
  N <- n+1
  M <- matrix(0, nrow=length(t), ncol=N)
  a <- -1 + 2*(t-tmin)/(tmax - tmin)
  M[,1] <- 1
  
  for (i in 2:N){
    M[,i] <- a^(i-1)
  }
  
  return(M)
}

Estimating $h^2$ for random regression approach

#Variance components from the .asr file
##Residual variance. Recall that idh was used, so each time point will have a unique variance
Resvar <- c(1.39E-03, 1.22E-03, 2.06E-03, 3.33E-03, 
            5.32E-03, 7.88E-03, 1.18E-02, 1.53E-02, 
            2.04E-02, 2.75E-02, 4.02E-02, 5.80E-02, 
            8.19E-02, 0.114587, 0.164339, 0.22392, 
            0.336068, 0.521511, 0.78897, 1.17139)

#Genetic variance. Here a second order Legendre polynomial was used. So G will be a 3 x 3 matrix of 
G.mat <- matrix(c(0.629183, 0.495574, 0.129607,
                  0.495574, 0.394557, 0.104540,
                  0.129607, 0.104540, 0.281278E-01), 3, 3)

E.mat <- matrix(c(0.594161E-01, 0.392295E-01,
               0.392295E-01, 0.260640E-01), 2, 2)

Phi <- stdtime(1:20, 2) %*% t(legendre(2, gengler = F))
G.COV <- Phi %*% G.mat %*% t(Phi)
Gvar <- diag(G.COV)

Phi <- stdtime(1:20, 1) %*% t(legendre(1, gengler = F))
E.COV <- Phi %*% E.mat %*% t(Phi)
Evar <- diag(E.COV)

Var.RR <- rbind(Evar, Gvar, Resvar)
Var.RR.per <- rbind(Evar/colSums(Var.RR)*100, Gvar/colSums(Var.RR)*100, Resvar/colSums(Var.RR)*100)
RR.h2 <- Var.RR[2,]/colSums(Var.RR)

Plotting $h^2$

Here, I’ll plot $h^2$ and the variance components for the RR and TP approaches. This chunk of code will generate Figure 4 in the paper.

#single column figure 3.25in width
#pdf("~/Desktop/RR/RR/Figures/Fig4.pdf", h=6, w=3.25, useDingbats = F, pointsize = 10)
par(mar=c(3,3,2,.2), mgp=c(1.8,0.5,0))
nf=layout(rbind(c(1,1), c(2,2), c(3,3)))

plot(13:32, RR.h2, pch=19, cex=0.3, ylab=expression(italic(h^2)), xlab="Days After Transplant", col="black", ylim=c( 0, 1) )
lines(13:32, RR.h2, col="black")

points(13:32, TP.h2, pch=21, cex=0.3, col="grey")
lines(13:32, TP.h2, lty=2, cex=0.3, col="grey")

legend("bottomright", c("TP", "RR"), col=c("grey", "black"), pch=c(21, 19), lty=c(2, 1), cex=1, pt.cex = 0.3)

mtext("A", 2,  adj=3.5, las=1, padj=-10, font=1, cex=0.8)

###Varmat for TP
plot(13:32, Var.TP[2,], pch=19, cex=0.5, ylab=expression(italic(sigma^2)), xlab="Days After Transplant", col="black", ylim=c( 0, max(colSums(Var.TP))*1.05) )
lines(13:32, Var.TP[2,], col="black")

points(13:32, Var.TP[1,], pch=20, col="grey", cex=0.5)
lines(13:32, Var.TP[1,], col="grey")

points(13:32, colSums(Var.TP), pch=21, cex=0.5)
lines(13:32, colSums(Var.TP), col="black", lty=2)

legend("topleft", 
       c(expression(italic({sigma^{2}}[g])), 
         expression(italic({sigma^{2}}[Exp])), 
         expression(italic(sigma^2))),
       lty=c(1,1,2), 
       col=c("black", "grey", "black"), 
       pch=c(19, 20, 21),
       cex=1, 
       pt.cex = 0.5)
mtext("B", 2,  adj=3.5, las=1, padj=-10, font=1, cex=0.8)

###Varmat for RR
plot(13:32, Var.RR[2,], pch=19, cex=0.5, ylab=expression(italic(sigma^2)), xlab="Days After Transplant", col="black", ylim=c( 0, max(colSums(Var.RR))*1.05) )
lines(13:32, Var.RR[2,], col="black")

points(13:32, Var.RR[1,], pch=20, col="grey", cex=0.5)
lines(13:32, Var.RR[1,], col="grey")

points(13:32, colSums(Var.RR), pch=21, cex=0.5)
lines(13:32, colSums(Var.RR), col="black", lty=2)

legend("topleft", 
       c(expression(italic({sigma^{2}}[g])), 
         expression(italic({sigma^{2}}[Exp])), 
         expression(italic(sigma^2))),
       lty=c(1,1,2), 
       col=c("black", "grey", "black"), 
       pch=c(19, 20, 21),
       cex=1, 
       pt.cex = 0.5)
mtext("C", 2,  adj=3.5, las=1, padj=-10, font=1, cex=0.8)

#dev.off()

Phenotypic and genetic correlation of PSA

Here, we will calculate the phenotypic and genetic correlations for PSA. The genetic correlation is obtained from the RR model. The heatmaps for Figure 3 in the paper were generated using this code and were combined in Inkscape. pheatmap is my new favorite package for creating heatmaps, however I haven’t yet figured out how to make a multipanel heatmap.

library(pheatmap)
library(RColorBrewer)
library(plyr)
library(reshape2)

PSA.df <- read.csv("~/Desktop/RR/RR/FinalFiles/PSA.cleaned.csv")

#phenotypic correlation
PSA.df <- dcast(PSA.df, NSFTV.ID + Exp + Rep ~ DayOfImaging, value.var="PSA")
PSA.cor <- cor(PSA.df[4:ncol(PSA.df)], use="complete.obs")
row.names(PSA.cor) <- 13:32
colnames(PSA.cor) <- 13:32

#Phenotypic correlation heatmap
HMcolors <- colorRampPalette(c("red", "yellow", "white"))(100)

pheatmap(PSA.cor,
         color = HMcolors,
         border_color = NA,
         cellwidth = 10.5,
         cellheight = 10.5,
         fontsize = 6,
         scale = "none",
         cluster_rows = F,
         cluster_cols = F,
         legend = T,
         display_numbers = T,
         number_format = "%.2f",
         number_color = "black",
         fontsize_number = 3.5,
         breaks = seq(0.48, 1, length.out = 100),
         filename = "~/Desktop/RR/RR/Figures/PhenoCor.pdf",
         width = 3.5,
         height = 3.5,
         show_rownames = T,
         show_colnames = T,
         labels_row = 13:32,
         labels_col = 13:32
)


#genetic correlation
gCOR <- cov2cor(G.COV)
row.names(gCOR) <- 13:32
colnames(gCOR) <- 13:32


HMcolors <- colorRampPalette(c("red", "yellow", "white"))(100)
pheatmap(gCOR,
         color=HMcolors,
         border_color = NA,
         cellwidth = 10.5,
         cellheight = 10.5,
         fontsize = 6,
         scale = "none",
         cluster_rows = F,
         cluster_cols = F,
         legend = T,
         display_numbers = T,
         number_format = "%.2f",
         number_color = "black",
         fontsize_number = 3.5,
         breaks = seq(0.5, 1, length.out = 100),
         filename = "~/Desktop/RR/RR/Figures/GenoCor.pdf",
         width = 3.5,
         height = 3.5,
         show_rownames = T,
         show_colnames = T,
         labels_row = 13:32,
         labels_col = 13:32
         )

Genetic correlation and narrow sense heritability of PSA

Malachy Campbell

8/7/2017

Background

Running the models

as file for random regression

as file for single time point

Estimating $h^2$

Single time point

Estimating $h^2$ for single time point approach

Random regression

Defining some functions

Estimating $h^2$ for random regression approach

Plotting $h^2$

Phenotypic and genetic correlation of PSA

Genetic correlation and narrow sense heritability of PSA

Malachy Campbell

8/7/2017

Background

Running the models

as file for random regression

as file for single time point

Estimating h2h^2

Single time point

Estimating h2h^2 for single time point approach

Random regression

Defining some functions

Estimating h2h^2 for random regression approach

Plotting h2h^2

Phenotypic and genetic correlation of PSA

Estimating $h^2$

Estimating $h^2$ for single time point approach

Estimating $h^2$ for random regression approach

Plotting $h^2$