---
title: "Predicting GEBVs with random regression models"
author: "Malachy Campbell"
date: "8/7/2018"
output:
  rmdformats::html_clean:
    fig_width: 6
    fig_height: 6
    highlight: kate
    thumbnails: true
    lightbox: true
    gallery: true
---

```{r setup, include=FALSE, echo = F}
knitr::opts_knit$set(root.dir = '~/Documents/Dropbox/Work/Manuscripts/2018_RandomRegressionGWAS/ThePlantGenome/Revision/New Analysis/')
```

# Background
The purpose of this script is to predict genomic breeding values (GEBVs) using a random regression model. The RR model is 
$$ PSA_{tjk} = \sum_{k=0}^{2}\phi(t)_{jk}\beta_k + \sum_{k=0}^{2}\phi(t)_{jk} u_{jk} + \sum_{k=0}^{1}\phi(t)_{jk} s_{jk} + e_{tjk}$$. Where $\beta$ is the fixed second-order Legendre polynomial to model the overall trend in the trait overtime, $u_{jk}$ and $s_{jk}$ are the $k^{th}$ random regression coefficients for additive genetic effect and random experiment of line $j$, and $e_{tjk}$ is the random residual. This model was seelcted through comparisons between several other RR models. The complete process is decribed in our previous paper in Plant Direct and on [bioRxiv](https://www.biorxiv.org/content/early/2018/07/12/319897).

Since asreml-R does not allow the use of Legendre polynomials all analysis were done with the standalone version of ASREML. Below the ".as" file is provided as well as the call from the commandline.

# Fitting the RR model.
Here is the .as file.
```{asreml .as file, execute = F, echo = T}
RR model selection
 NID !A
 Exp !A
 DOI 20 !I
 PSA !/100000
G2.grm
PSA.cleaned.mn.csv !SKIP 1 !MAXITER 1000 !EXTRA 100 !WORKSPACE 6144 !ASUV !SLOW 10 !DOPART 1

!PATH 1
!ASSIGN USe !< !INIT
1.37E-03 1.11E-03 1.87E-03 3.00E-03 4.81E-03 7.08E-03 1.04E-02 1.33E-02 1.78E-02 2.40E-02 3.52E-02 5.10E-02 7.23E-02 0.101765 0.1486 0.203647 0.308825 0.484353 0.743772 1.11587
!>
!ASSIGN USg !< !INIT
0.583091
0.456320 0.360961
0.117960 0.945203E-01 0.251287E-01
!>
!ASSIGN USp !< !INIT
0.603125E-01 
0.400317E-01 0.267477E-01
!>
PSA ~ leg(DOI,2) !r us(leg(DOI,1) $USp).Exp us(leg(DOI,2) $USg).grm(NID) !f mv
residual id(1071).idh(DOI $USe)
```

Here is how it was run.
```{bash run asreml, execute = F, echo = T}
cd RR_GP/

asreml RR.as
```

# Solving for GEBVs at each time point
GEBVs at each time point were be obtained following to [Mrode (2014)](https://books.google.com/books?hl=en&lr=&id=b6MKAwAAQBAJ&oi=fnd&pg=PR5&dq=mrode+linear+models&ots=8Q3ejuZBpU&sig=PVGj2Zi13XD9f_ATZt3Pt4X7r1U). For line $j$ at time $t$, the GEBVs can be obtained by $\text{gBLUP}_{jt} = \phi_t\hat{u}_j$; where $\phi_t$ is the row vector of the matrix of Legendre polynomials of order 2. 

The functions below were adapted from Mrode (2005) by [Gota Morota](http://morotalab.org/Mrode2005/rr/rr.html#section00020000000000000000)
```{r load functions for RR with leg polynomials, echo=T, eval=FALSE}
##Return coefficient matrix (lambda) of n-th order Legendre polynomials. Scaling method implemented by Gengler et. al. (1999) converts constant Legendre polynomial coefficients into 1
`legendre` <-
  function(n, gengler){
    
    if (nargs()==1){
      gengler <- TRUE	
    }
    
    if (gengler != TRUE & gengler != FALSE){
      gengler=TRUE	
    }
    
    N <- n+1
    L <- matrix(0,nrow=N, ncol=N)
    
    for(i in (1:N)){
      if(i==1){
        L[i,i] <- 1
      }
      else if(i==2){
        L[i,i] <- 1
      }
      else  {
        tmp <- L[i-1,]
        tmp2 <- as.numeric()
        tmp2 <- c(0,tmp[1:(N-1)])
        L[i,] <- (1/(i-2+1))*( (2*(i-2) + 1)*tmp2 -(i-2)*L[i-2,] )
      }
    }
    
    # Normalize
    for (j in (1:N)){	
      L[j,] <- (sqrt( (2*(j-1)+1)/2)  )*L[j,]
    }
    
    
    # Gengler (1999)
    if (gengler==TRUE){
      L <- sqrt(2)*L
    }
    
    return(L)
    
  }

##Given time points covariate and order of fit for Legendre polynomials, return matrix 'M' containing the polynomials of standardized time. 'M' is order t (number of time points) by k (order of Legendre polynomials)
stdtime <- function(t, n, tmax, tmin){
  if(missing(tmax)) {
    tmax <- t[which.max(t)]
  }
  if(missing(tmin)) {
    tmin <- t[which.min(t)]
  }
  
  N <- n+1
  M <- matrix(0, nrow=length(t), ncol=N)
  a <- -1 + 2*(t-tmin)/(tmax - tmin)
  M[,1] <- 1
  
  for (i in 2:N){
    M[,i] <- a^(i-1)
  }
  
  return(M)
}
```

Here, we'll take the sln files, extract the breeding values for the RR coefficients, and calcualte the gBLUPs at each time point.
```{r solve for BVs at each time point, echo = T, eval = F}
library(reshape2)

setwd("DataandCode/RR_GP/")

sln <- read.delim("RR.sln", sep="", header=T)
  
#gBLUPs for legendre polynomials
g.hat.y <- t(cbind(sln[grep("1.NSFTV", sln$Level) ,][,3],
                     sln[grep("2.NSFTV", sln$Level) ,][,3],
                     sln[grep("3.NSFTV", sln$Level) ,][,3]))
  
colnames(g.hat.y) <- sub("1.", "", sln[grep("1.NSFTV", sln$Level) ,][,2])
  
#Calculated gBLUPs at each time point
Phi <- stdtime(1:20, 2) %*% t(legendre(2, gengler = F))
ghat.t.y <- t(apply(g.hat.y, 2, function (x) Phi %*% x))
colnames(ghat.t.y) <- 1:20
gBLUP <- melt(ghat.t.y)
colnames(gBLUP) <- c("NSFTV.ID", "DayOfImaging", "gBLUP")

write.csv(gBLUP, "RR_GEBVs.csv", row.names = F)
```