Random Regression

Defining the mixed model equation

The following random regression model was used to model trajectories for PSA across the 20 time points and obtain estimates for \(\mathbf{\Omega}\) and \(\mathbf{P}\)

\[\begin{align} \label{RR1} \text{PSA}_{tijk} =\mu + \sum_{k=0}^{2}\phi_{jtk}\beta_k + \sum_{k=0}^{2}\phi_{jtk} u_{jk} + \sum_{k=0}^{1}\phi_{itk} s_{ik} + e_{tijk} \end{align}\]

In matrix notation, the model can be written as \[\begin{align} \label{RR3} \mathbf{y} &= \mathbf{Xb} + \mathbf{Zu} + \mathbf{Qs} + \mathbf{e} \end{align}\]

\(\mathbf{y}\) is a vector with an order equal to the number of observations and contains the \(PSA\) over the 20 days. \(\mathbf{X}\) is a covariable matrix for the fixed effects where the number of rows is equal to the number of observations (\(n\)) and the number of columns is equal to the order of Legendre polynomial used to model fixed effects (\(k_f\)). The matrices \(\mathbf{Z}\) and \(\mathbf{Q}\) are covariable matrices for the random additive genetic and random experimental effects, respectively. The number of rows for \(\mathbf{Z}\) is equal to the number of observations and the number of columns corresponds to the order of Legendre polynomial times the number of lines used to fit the additive genetic effect (\(q * k_g = 357 * 3 = 1,071\)). For \(\mathbf{Q}\) the number of columns would be 6 (\(e * k_s = 3 * 2\)) and the number of rows would be equal to the number of observations. We assume \(\mathbf{u} \sim N(0, \mathbf{G} \otimes \mathbf{\Omega})\), \(\mathbf{s} \sim N(0, \mathbf{I} \otimes \mathbf{P})\), and \(\mathbf{e} \sim N(0, \mathbf{I} \otimes \mathbf{D})\). Here, \(\mathbf{\Omega}\) and \(\mathbf{P}\) are the covariance matrices for the RR coefficients for the additive genetic and permanent environmental effects. \(\mathbf{D}\) is a diagonal matrix that allows for heterogeneous variances over the 20 time points.

The mixed model equation (MME) is

\[\begin{align} \begin{bmatrix} \mathbf{X' R^{-1} X} & \mathbf{X' R^{-1} Z} & \mathbf{X' R^{-1} Q}\\ \mathbf{Z' R^{-1} X} & \mathbf{Z' R^{-1} Z} + \mathbf{G^{-1}} \otimes \boldsymbol{\Omega} & \mathbf{Z' R^{-1} Q}\\ \mathbf{Q' R^{-1} X} & \mathbf{Q' R^{-1} Z} & \mathbf{Q' R^{-1} Q} + \mathbf{I} \otimes \mathbf{P}\\ \end{bmatrix} \begin{bmatrix} \mathbf{\hat{b}} \\ \mathbf{\hat{u}} \\ \mathbf{\hat{s}} \\ \end{bmatrix} &= \begin{bmatrix} \mathbf{X' R^{-1} y} \\ \mathbf{Z' R^{-1} y} \\ \mathbf{Q' R^{-1} y} \\ \end{bmatrix} \end{align}\]

Solving the above MME will give three RR coefficients for each line for the random genetic effects. Using these RR coefficients, the genetic values at each time point can be obtained. For line \(j\) the predicted genetic values (gBLUP) at each time point is given by \(gBLUP_j = \boldsymbol{\Phi} \hat{u}_j\).

Setting up covariable matrices for the RR approach

For each term, we define a matrix of Legendre polynomials evaluated at each time point. Recall that both the fixed and random additive genetic effect are modeled using a second-order Legendre polynomial. The covariable matrix \(\mathbf{X}\) is defined as \(\mathbf{X = X^{o} \Phi_f}\) where \(\mathbf{X^{o}}\) is a vector of 1 with length \(q*e\). Similarly, we define matrices \(\mathbf{Z}\) and \(\mathbf{Q}\) as \[\begin{align} \mathbf{Z} &= \mathbf{Z^o} \otimes \boldsymbol{\Phi_g} \\ \mathbf{Q} &= \mathbf{Q^o} \otimes \boldsymbol{\Phi_s} \end{align}\] \(\mathbf{Z^o}\) and \(\mathbf{Q^o}\) are incidence matrices that allocate temporal records to individuals and experiments respectively. The order of \(\mathbf{Z^o}\) is \(q*e \times q\) and \(\mathbf{Q^o}\) is \(q*e \times e\) (\(q\) is the number of individuals, \(e\) is the number of experiments).

The functions below were adapted from Mrode (2005) by Gota Morota

Calculating \(\textrm{Var(}\hat{\beta} \textrm{)}\) at each time point

The objective is to calculate SNP effects at each time point. For the single time point approach \(\textrm{BLUP(}\hat{\beta} \textrm{)} = \mathbf{W_{sc}' G^{-1} \hat{g} }\). Thus, \[\begin{align} \label{eq:9} \textrm{Var(}\hat{\beta} \textrm{)} &= \textrm{Var(} \mathbf{W_{sc}' G^{-1} \hat{u} } \textrm{)} \\ &= \mathbf{W_{sc}' G^{-1}} \textrm{Var(} \mathbf{\hat{u}} \textrm{)} \mathbf{G^{-1} W_{sc}} \end{align}\]

The prediction error variance (PEV) of \(\boldsymbol{\hat{u}}\) is \[\begin{align} \textrm{PEV(}\hat{\mathbf{u}} \textrm{)} = \mathbf{C^{22}} \sigma^2_e &= \textrm{Var(} \mathbf{u - \hat{u} }\textrm{)} \\ &= \textrm{Var(} \mathbf{u} \textrm{)} - \textrm{Var(} \hat{\mathbf{u}} \textrm{)} \\ &= \mathbf{G} \sigma^2_g - \textrm{Var(} \hat{\mathbf{u}} \textrm{)} \end{align}\]

By rearranging the equation above we obtain

\[\begin{align} \label{eq:13} \textrm{Var(} \hat{\mathbf{u}} \textrm{)} &= \mathbf{G} \sigma^2_g - \mathbf{C^{22}} \sigma^2_e \end{align}\]

To calculate the variance of the SNP effects, we can introduce equation (13) into equation (9) giving

\[\begin{align} \textrm{Var(} \hat{\boldsymbol{\beta}} \textrm{)} &= \mathbf{W'_{sc} G^{-1}} (\mathbf{G} \sigma_g^2 - \mathbf{C^{22}} \sigma^2_e) \mathbf{G^{-1} W_{sc} } \\ &= \mathbf{W'_{sc} G^{-1} W_{sc}} \sigma_g^2 - \mathbf{W'_{sc} G^{-1} C^{22}} \mathbf{G^{-1} W_{sc} \sigma^2_e } \end{align}\]

At each time point, \(\sigma_g^2\) is extracted from the corresponding the diagonal element of \(\boldsymbol{\Phi_g \Omega \Phi'_g}\).

\(\mathbf{C^{22}}\) is obtained by inverting the coefficient matrix of the MME (equation 3), and is of order \(q*k_g \times q*k_g\). The diagonal elements of \(\mathbf{C^{22}}\) contain the PEV for the RR coefficients for the additive genetic effect. To obtain the variance of SNP effects at each time point \(\mathbf{C^{22}}\) must be transformed so that the diagonal elements correspond to the PEV for GEBVs at each time point. We will refer to this \(q*t \times q*t\) matrix as \(\mathbf{C^{22}}^*\). Following , PEV for individual \(i\) at each time point can be obtained by taking the diagonal elements of \(\textrm{PEV}_i = \boldsymbol{\Phi_g} \mathbf{C_{ii}} \boldsymbol{\Phi_g'}\). \(\mathbf{C_{ii}}\) is a \(3 \times 3\) submatrix of RR coefficients from \(\mathbf{C_{22}}\) for individual \(i\). To extend this approach to the full \(\mathbf{C_{22}}\) matrix, we construct a block matrix of \(\boldsymbol{\Phi_{g}}\) (\(\boldsymbol{\Phi^*_{g}}\)) via \(\boldsymbol{\Phi^*_{g}} = \mathbf{I} \otimes \boldsymbol{\Phi_{g}}\) and obtain \(\boldsymbol{C^{22*}}\) by

\[\begin{align} \boldsymbol{C^{22*}} &= \boldsymbol{\Phi_{g}^*} \boldsymbol{C^{22}} \boldsymbol{\Phi_{g}^*}' \end{align}\]

Thus, \(\mathbf{C_{22}^*}\) is \(q*t \times q*t\) and the diagonal elements are the PEV for GEBVs at each time point. Finally, to calculate the variance of SNP effects at each time point at each, we extract the corresponding elements of \(\boldsymbol{C^{22*}}\) and introduce them into .

Setting up MME and obtaining C22*

Set up the MME.

Single time point approach

Defining the mixed model equation

The single time point approach is essentially a univariate gBLUP approach. We fit the following model at each time point \[\begin{align} \label{eq:VarSNP_matrix} \mathbf{y} &= \mathbf{Xb} + \mathbf{Zu} + \mathbf{Qs} + \mathbf{e} , \end{align}\] The matrices \(\mathbf{X, Z}\) and \(\mathbf{Q}\) correspond to incidence matrices for the fixed, random additive genetic and random experimental effect, respectively. Moreover, the dimensions for \(\mathbf{X, Z}\) and \(\mathbf{Q}\) are \(n \times 1\), \(n \times q\) and \(n \times e\), where \(n\) is the number of observations at each time point. We assume the random terms are distributed as follows \(\mathbf{u} \sim N(0, \mathbf{G}\sigma_g^2)\), \(\mathbf{s} \sim N(0, \mathbf{I}\sigma_s^2)\), and \(\mathbf{e} \sim N(0, \mathbf{I}\sigma_e^2)\).

The MME for the TP approach is \[\begin{align} \label{eq:TP.MME} \begin{bmatrix} \mathbf{X' R^{-1} X} & \mathbf{X' R^{-1} Z} & \mathbf{X' R^{-1} Q}\\ \mathbf{Z' R^{-1} X} & \mathbf{Z' R^{-1} Z} + \mathbf{G^{-1}} \lambda_g & \mathbf{Z' R^{-1} Q}\\ \mathbf{Q' R^{-1} X} & \mathbf{Q' R^{-1} Z} & \mathbf{Q' R^{-1} Q} + \mathbf{I} \lambda_s\\ \end{bmatrix} \begin{bmatrix} \mathbf{\hat{b}} \\ \mathbf{\hat{u}} \\ \mathbf{\hat{s}} \\ \end{bmatrix} &= \begin{bmatrix} \mathbf{X' R^{-1} y} \\ \mathbf{Z' R^{-1} y} \\ \mathbf{Q' R^{-1} y} \\ \end{bmatrix}, \\ \lambda_g = \frac{\sigma_e^2}{\sigma_g^2}; \lambda_s = \frac{\sigma_e^2}{\sigma_s^2} \end{align}\]

Calculating \(\textrm{Var(}\hat{\beta} \textrm{)}\) at each time point

Since the TP approach is a univariate gBLUP model calculating the variance of SNP effects is much simpler than the RR approach, and are given by

\[\begin{align} \textrm{Var(} \hat{\boldsymbol{\beta}} \textrm{)} &= \mathbf{W'_{sc} G^{-1} W_{sc}} \sigma_g^2 - \mathbf{W'_{sc} G^{-1} C^{22}} \mathbf{G^{-1} W_{sc} \sigma^2_e) } \label{eq:14} \end{align}\]

\(\mathbf{C^{22}}\) is obtained by inverting the MME .

Setting up MME for TP approach and running GWAS from GEBVs

#Define all matrices to store the results
pvalue.mat <- matrix(0, ncol = 20, nrow = ncol(Zsc))
SD.mat <- matrix(0, ncol = 20, nrow = ncol(Zsc))

for(day in 1:20){
  tmp.PSA.df <- PSA.df[c("NSFTV.ID", "Exp", paste0("X", day))]
  colnames(tmp.PSA.df)[3] <- "Y"
  
  ###############################################
  #Set up incidence matrices for each time point#
  ###############################################
  
  #Fixed effects X; n x 1
  X <- model.matrix(Y ~ 1, data = tmp.PSA.df); dim(X)
  #Random additive genetic Z; n x q
  Z <- model.matrix(Y ~ -1 + NSFTV.ID, data = tmp.PSA.df); dim(Z)
  #Random experiment Q; n x e
  Q <- model.matrix(Y ~ -1 + Exp, data = tmp.PSA.df); dim(Q)
  
  #########
  ###R ####
  #########
  #R = I * \sigma^2_e
  Rinv <- matrix(0, ncol = nrow(X), nrow = nrow(X))
  diag(Rinv) <- VarComps[3, day]
  Rinv <- ginv(Rinv)

  ############
  #Set up MME#
  ############
  lambda.g <- 1 / VarComps[2, day]
  lambda.q <- 1 / VarComps[1, day]
  
  C11 <- t(X) %*% Rinv %*% X; dim(C11) #1 x 1
  C12 <- t(X) %*% Rinv %*% Z; dim(C12) #1 x 357
  C13 <- t(X) %*% Rinv %*% Q; dim(C13) #1 x 3
  C21 <- t(Z) %*% Rinv %*% X; dim(C21) #357 x 1
  C22 <- (t(Z) %*% Rinv %*% Z) + (ginv(G) * lambda.g); dim(C22) #357 x 357
  C23 <- t(Z) %*% Rinv %*% Q; dim(C23) #357 x 3
  C31 <- t(Q) %*% Rinv %*% X; dim(C31) #3 x 1
  C32 <- t(Q) %*% Rinv %*% Z; dim(C32) #3 x 357

  Idnt <- matrix(0, nrow = 3, ncol = 3)
  diag(Idnt) <- 1
  C33 <- (t(Q) %*% Rinv %*% Q) + (Idnt * lambda.q); dim(C33) #3 x 3
  
  TopRow <- cbind(C11, C12, C13)
  MidRow <- cbind(C21, C22, C23)
  BotRow <- cbind(C31, C32, C33)

  #Left hand side of MME  
  LHS <- rbind(TopRow, MidRow, BotRow) #361 x 361
  InvLHS <- ginv(LHS)
  #Extract only the additive genetic parts
  C22 <- InvLHS[2:358, 2:358]
  
  
  #######################################
  #Variance of predicted breeding values#
  #######################################
  VarU.hat <- (G * VarComps[2, day]) - C22
  
  #########################
  #Variance of SNP effects#
  #########################
  VarBeta.hat <- diag(t(Zsc) %*% ginv(G) %*% VarU.hat %*% ginv(G) %*% Zsc)
  
  #summary(diag(VarBeta.hat))
  
  sd.SNPs <- sqrt(VarBeta.hat)
  
  SNPe_ad <- Eff.mat[,day] / sd.SNPs
  
  pvalue.mat[,day] <- 2*(1-pnorm(abs(SNPe_ad)))
  SD.mat[,day] <- sd.SNPs
  
  print(day)
}

write.table(pvalue.mat, "pval.mat.txt", sep = "\t", row.names = T, col.names = T)
write.table(SD.mat, "SDmat.mat.txt", sep = "\t", row.names = T, col.names = T)