Background
The following random regression (RR) model was used for all subsequent analyses
\[\begin{align} \label{RR2} \text{PSA}_{tjk} =\mu + \sum_{k=0}^{2}\phi(t)_{jk}\beta_k + \sum_{k=0}^{2}\phi(t)_{jk} u_{jk} + \sum_{k=0}^{1}\phi(t)_{jk} s_{jk} + e_{tjk} \end{align}\]As mentioned previously, \(\beta\) is the fixed second-order Legendre polynomial to model the overall trend in the trait overtime, \(u_{jk}\) and \(s_{jk}\) are the \(k^{th}\) random regression coefficients for additive genetic effect and random experiment of line \(j\), \(nr\) is the order of polynomial for the random effects, and \(e_{tjk}\) is the random residual. The random additive genetic effects are described using a second-order Legendre polynomial, while a first-order Legendre polynomial is used to describe the experiment effects across time points.
The random regrression model was compared to a conventional, single time point model (TP). The model is as follows \[\begin{align} \label{TP} \mathbf{y} &= \mathbf{Zu} + \mathbf{Qs} + \mathbf{e} , \end{align}\]Here, \(\mathbf{y}\) is the PSA at time \(t\); \(\mathbf{Z}\) and \(\mathbf{Q}\) are incidence matrices corresponding to the random additive genetic effect (\(\mathbf{u}\)), and random experimental effect (\(\mathbf{s}\)), respectively; and \(\mathbf{e}\) is the random residual error. The variances were based on the following assumptions \(\mathbf{u} \sim N(0, \mathbf{G}\sigma_g^2)\), \(\mathbf{s} \sim N(0, \mathbf{I}\sigma_s^2)\), and \(\mathbf{e} \sim N(0, \mathbf{I}\sigma_e^2)\). Here, \(\sigma_g^2\) is the additive genetic variance; \(\sigma_s^2\) is an environmental variance associated with experiment; and \(\sigma_e^2\) is the residual variance.
For both the RR and TP appraches a genomic relationship matrix (\(\mathbf{G}\)) was calculated using VanRaden 2008. \[\begin{align} \label{G} \mathbf{G} = \frac{\mathbf{Z_{cs} Z_{cs}}' }{m} \end{align}\]Here, \(\mathbf{Z_{cs}}\) is a centered and scaled \(n \times m\) matrix, where \(m\) is 33,674 SNPs and \(n\) is the 357 genotyped rice lines.
To estimate the narrow sense heritability, variance components were obtained for each random term using ASREML for the TP analyses and the RR approach. For the RR approach, additive genetic variance was obtained at each time points using methods described by Mrode (2014). Briefly, for time \(i\) the genetic variance can be obtained by \(\mathbf{t}_{i} \mathbf{\Omega} \mathbf{t}_{i}'\), where \(\mathbf{t}_{i} = \phi_{ik}\), the \(i\)th row vector of the matrix of Legendre polynomials at different time points (\(\mathbf{\phi}\)) for the \(i\)th day of imaging, \(\mathbf{\Omega}\) is the covariance matrix of RR coefficients for the genetic effects, and \(k\) is the order of fit. The variance of the experimental effect across time points was calculated using the same approach. For both the single time point analysis \(h^2\) was estimated as \(\frac{\sigma_g^2}{\sigma_s^2 + \sigma_s^2 + \sigma_e^2}\).
Running the models
as file for random regression
h2 RR model
NID !A
Exp !A
Rep 2
DayOfImaging 20 !I
PSA !/100000
G2.grm
PSA.cleaned.csv !SKIP 1 !MAXITER 1000 !EXTRA 100 !WORKSPACE 6144 !ASUV !DOPATH 1
!PATH 1 #Model8: 27537.59 -55017.19 -54782.49
!ASSIGN USe !< !INIT
1.37E-03 1.22E-03 2.07E-03 3.32E-03 5.32E-03 7.88E-03 1.18E-02 1.53E-02 2.04E-02 2.76E-02 4.03E-02 5.80E-02 8.19E-02 0.114627 0.164452 0.224128 0.336555 0.522506 0.790597 1.17392
!>
!ASSIGN USg !< !INIT
0.629183
0.495574 0.394557
0.129607 0.104540 0.281278E-01
!>
!ASSIGN USp !< !INIT
0.594161E-01
0.392295E-01 0.260640E-01
!>
PSA ~ mu leg(DayOfImaging,2) !r us(leg(DayOfImaging,1) $USp).Exp us(leg(DayOfImaging,2) $USg).grm(NID) !f mv
residual id(2142).idh(DayOfImaging $USe)
as file for single time point
This will run the single time point model at each day of imaging. “!RENAME” will create a new set of files for each time point (Y1…Y20).
!RENAME !ARG Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12 Y13 Y14 Y15 Y16 Y17 Y18 Y19 Y20
SINGLE TP
NID !A
Exp !A
Rep 2
Y1 !/100000
Y2 !/100000
Y3 !/100000
Y4 !/100000
Y5 !/100000
Y6 !/100000
Y7 !/100000
Y8 !/100000
Y9 !/100000
Y10 !/100000
Y11 !/100000
Y12 !/100000
Y13 !/100000
Y14 !/100000
Y15 !/100000
Y16 !/100000
Y17 !/100000
Y18 !/100000
Y19 !/100000
Y20 !/100000
G2.grm
PSA.cleaned.csv !SKIP 1 !MAXITER 1000 !EXTRA 100 !WORKSPACE 6144 !ASUV !DOPATH 1
!PATH 1
$1 ~ mu !r Exp grm(NID)
Estimating \(h^2\)
Single time point
For the TP analysis, 20 .asr files were created. This chunk of code parses those .asr files, extracts the variance components, and writes those to a file. The r code loads each file and creates an object with the variance components at each time point. ###Parsing asreml output
cd ~/Desktop/RR/TP/h2
for file in *.asr; do less $file | grep -A6 "Model_Term" | sed -n "1,4p" | tr -s " " > $file.varcomp.txt; done
Estimating \(h^2\) for single time point approach
library(reshape2)
files <- paste0("TP/h2/TPY", 1:20, ".asr.varcomp.txt")
for(i in 1:20){
tmp <- read.table(files[i], skip=1, sep="")
tmp$DOI <- i
if(i == 1){
final <- tmp
}else{
final <- rbind(final, tmp)
}
}
final <- final[c("V1", "V5", "DOI")]
colnames(final)[1:2] <- c("Source", "Sigma")
Var.TP <- dcast(final, Source ~ DOI, value.var = "Sigma")
Var.TP <- Var.TP[2:21]
row.names(Var.TP) <- c("Exp", "G", "E")
Var.TP.per <- rbind(Var.TP[1,]/colSums(Var.TP)*100, Var.TP[2,]/colSums(Var.TP)*100, Var.TP[3,]/colSums(Var.TP)*100)
TP.h2 <- Var.TP[2,]/colSums(Var.TP)
Random regression
\(h^2\) for the RR model is straightforward, and is done as described above. The variance components below are obtained from the .asr file.
Defining some functions
The functions below were adapted from Mrode (2005) by Gota Morota.
##Return coefficient matrix (lambda) of n-th order Legendre polynomials. Scaling method implemented by Gengler et. al. (1999) converts constant Legendre polynomial coefficients into 1
`legendre` <-
function(n, gengler){
if (nargs()==1){
gengler <- TRUE
}
if (gengler != TRUE & gengler != FALSE){
gengler=TRUE
}
N <- n+1
L <- matrix(0,nrow=N, ncol=N)
for(i in (1:N)){
if(i==1){
L[i,i] <- 1
}
else if(i==2){
L[i,i] <- 1
}
else {
tmp <- L[i-1,]
tmp2 <- as.numeric()
tmp2 <- c(0,tmp[1:(N-1)])
L[i,] <- (1/(i-2+1))*( (2*(i-2) + 1)*tmp2 -(i-2)*L[i-2,] )
}
}
# Normalize
for (j in (1:N)){
L[j,] <- (sqrt( (2*(j-1)+1)/2) )*L[j,]
}
# Gengler (1999)
if (gengler==TRUE){
L <- sqrt(2)*L
}
return(L)
}
##Given time points covariate and order of fit for Legendre polynomials, return matrix 'M' containing the polynomials of standardized time. 'M' is order t (number of time points) by k (order of Legendre polynomials)
stdtime <- function(t, n, tmax, tmin){
if(missing(tmax)) {
tmax <- t[which.max(t)]
}
if(missing(tmin)) {
tmin <- t[which.min(t)]
}
N <- n+1
M <- matrix(0, nrow=length(t), ncol=N)
a <- -1 + 2*(t-tmin)/(tmax - tmin)
M[,1] <- 1
for (i in 2:N){
M[,i] <- a^(i-1)
}
return(M)
}
Estimating \(h^2\) for random regression approach
#Variance components from the .asr file
##Residual variance. Recall that idh was used, so each time point will have a unique variance
Resvar <- c(1.39E-03, 1.22E-03, 2.06E-03, 3.33E-03,
5.32E-03, 7.88E-03, 1.18E-02, 1.53E-02,
2.04E-02, 2.75E-02, 4.02E-02, 5.80E-02,
8.19E-02, 0.114587, 0.164339, 0.22392,
0.336068, 0.521511, 0.78897, 1.17139)
#Genetic variance. Here a second order Legendre polynomial was used. So G will be a 3 x 3 matrix of
G.mat <- matrix(c(0.629183, 0.495574, 0.129607,
0.495574, 0.394557, 0.104540,
0.129607, 0.104540, 0.281278E-01), 3, 3)
E.mat <- matrix(c(0.594161E-01, 0.392295E-01,
0.392295E-01, 0.260640E-01), 2, 2)
Phi <- stdtime(1:20, 2) %*% t(legendre(2, gengler = F))
G.COV <- Phi %*% G.mat %*% t(Phi)
Gvar <- diag(G.COV)
Phi <- stdtime(1:20, 1) %*% t(legendre(1, gengler = F))
E.COV <- Phi %*% E.mat %*% t(Phi)
Evar <- diag(E.COV)
Var.RR <- rbind(Evar, Gvar, Resvar)
Var.RR.per <- rbind(Evar/colSums(Var.RR)*100, Gvar/colSums(Var.RR)*100, Resvar/colSums(Var.RR)*100)
RR.h2 <- Var.RR[2,]/colSums(Var.RR)
Plotting \(h^2\)
Here, I’ll plot \(h^2\) and the variance components for the RR and TP approaches. This chunk of code will generate Figure 4 in the paper.
#single column figure 3.25in width
#pdf("~/Desktop/RR/RR/Figures/Fig4.pdf", h=6, w=3.25, useDingbats = F, pointsize = 10)
par(mar=c(3,3,2,.2), mgp=c(1.8,0.5,0))
nf=layout(rbind(c(1,1), c(2,2), c(3,3)))
plot(13:32, RR.h2, pch=19, cex=0.3, ylab=expression(italic(h^2)), xlab="Days After Transplant", col="black", ylim=c( 0, 1) )
lines(13:32, RR.h2, col="black")
points(13:32, TP.h2, pch=21, cex=0.3, col="grey")
lines(13:32, TP.h2, lty=2, cex=0.3, col="grey")
legend("bottomright", c("TP", "RR"), col=c("grey", "black"), pch=c(21, 19), lty=c(2, 1), cex=1, pt.cex = 0.3)
mtext("A", 2, adj=3.5, las=1, padj=-10, font=1, cex=0.8)
###Varmat for TP
plot(13:32, Var.TP[2,], pch=19, cex=0.5, ylab=expression(italic(sigma^2)), xlab="Days After Transplant", col="black", ylim=c( 0, max(colSums(Var.TP))*1.05) )
lines(13:32, Var.TP[2,], col="black")
points(13:32, Var.TP[1,], pch=20, col="grey", cex=0.5)
lines(13:32, Var.TP[1,], col="grey")
points(13:32, colSums(Var.TP), pch=21, cex=0.5)
lines(13:32, colSums(Var.TP), col="black", lty=2)
legend("topleft",
c(expression(italic({sigma^{2}}[g])),
expression(italic({sigma^{2}}[Exp])),
expression(italic(sigma^2))),
lty=c(1,1,2),
col=c("black", "grey", "black"),
pch=c(19, 20, 21),
cex=1,
pt.cex = 0.5)
mtext("B", 2, adj=3.5, las=1, padj=-10, font=1, cex=0.8)
###Varmat for RR
plot(13:32, Var.RR[2,], pch=19, cex=0.5, ylab=expression(italic(sigma^2)), xlab="Days After Transplant", col="black", ylim=c( 0, max(colSums(Var.RR))*1.05) )
lines(13:32, Var.RR[2,], col="black")
points(13:32, Var.RR[1,], pch=20, col="grey", cex=0.5)
lines(13:32, Var.RR[1,], col="grey")
points(13:32, colSums(Var.RR), pch=21, cex=0.5)
lines(13:32, colSums(Var.RR), col="black", lty=2)
legend("topleft",
c(expression(italic({sigma^{2}}[g])),
expression(italic({sigma^{2}}[Exp])),
expression(italic(sigma^2))),
lty=c(1,1,2),
col=c("black", "grey", "black"),
pch=c(19, 20, 21),
cex=1,
pt.cex = 0.5)
mtext("C", 2, adj=3.5, las=1, padj=-10, font=1, cex=0.8)
#dev.off()
Phenotypic and genetic correlation of PSA
Here, we will calculate the phenotypic and genetic correlations for PSA. The genetic correlation is obtained from the RR model. The heatmaps for Figure 3 in the paper were generated using this code and were combined in Inkscape. pheatmap is my new favorite package for creating heatmaps, however I haven’t yet figured out how to make a multipanel heatmap.
library(pheatmap)
library(RColorBrewer)
library(plyr)
library(reshape2)
PSA.df <- read.csv("~/Desktop/RR/RR/FinalFiles/PSA.cleaned.csv")
#phenotypic correlation
PSA.df <- dcast(PSA.df, NSFTV.ID + Exp + Rep ~ DayOfImaging, value.var="PSA")
PSA.cor <- cor(PSA.df[4:ncol(PSA.df)], use="complete.obs")
row.names(PSA.cor) <- 13:32
colnames(PSA.cor) <- 13:32
#Phenotypic correlation heatmap
HMcolors <- colorRampPalette(c("red", "yellow", "white"))(100)
pheatmap(PSA.cor,
color = HMcolors,
border_color = NA,
cellwidth = 10.5,
cellheight = 10.5,
fontsize = 6,
scale = "none",
cluster_rows = F,
cluster_cols = F,
legend = T,
display_numbers = T,
number_format = "%.2f",
number_color = "black",
fontsize_number = 3.5,
breaks = seq(0.48, 1, length.out = 100),
filename = "~/Desktop/RR/RR/Figures/PhenoCor.pdf",
width = 3.5,
height = 3.5,
show_rownames = T,
show_colnames = T,
labels_row = 13:32,
labels_col = 13:32
)
#genetic correlation
gCOR <- cov2cor(G.COV)
row.names(gCOR) <- 13:32
colnames(gCOR) <- 13:32
HMcolors <- colorRampPalette(c("red", "yellow", "white"))(100)
pheatmap(gCOR,
color=HMcolors,
border_color = NA,
cellwidth = 10.5,
cellheight = 10.5,
fontsize = 6,
scale = "none",
cluster_rows = F,
cluster_cols = F,
legend = T,
display_numbers = T,
number_format = "%.2f",
number_color = "black",
fontsize_number = 3.5,
breaks = seq(0.5, 1, length.out = 100),
filename = "~/Desktop/RR/RR/Figures/GenoCor.pdf",
width = 3.5,
height = 3.5,
show_rownames = T,
show_colnames = T,
labels_row = 13:32,
labels_col = 13:32
)