--- title: "Predicting GEBVs with random regression models" author: "Malachy Campbell" date: "8/7/2018" output: rmdformats::html_clean: fig_width: 6 fig_height: 6 highlight: kate thumbnails: true lightbox: true gallery: true --- ```{r setup, include=FALSE, echo = F} knitr::opts_knit$set(root.dir = '~/Documents/Dropbox/Work/Manuscripts/2018_RandomRegressionGWAS/ThePlantGenome/Revision/New Analysis/') ``` # Background The purpose of this script is to predict genomic breeding values (GEBVs) using a random regression model. The RR model is $$ PSA_{tjk} = \sum_{k=0}^{2}\phi(t)_{jk}\beta_k + \sum_{k=0}^{2}\phi(t)_{jk} u_{jk} + \sum_{k=0}^{1}\phi(t)_{jk} s_{jk} + e_{tjk}$$. Where $\beta$ is the fixed second-order Legendre polynomial to model the overall trend in the trait overtime, $u_{jk}$ and $s_{jk}$ are the $k^{th}$ random regression coefficients for additive genetic effect and random experiment of line $j$, and $e_{tjk}$ is the random residual. This model was seelcted through comparisons between several other RR models. The complete process is decribed in our previous paper in Plant Direct and on [bioRxiv](https://www.biorxiv.org/content/early/2018/07/12/319897). Since asreml-R does not allow the use of Legendre polynomials all analysis were done with the standalone version of ASREML. Below the ".as" file is provided as well as the call from the commandline. # Fitting the RR model. Here is the .as file. ```{asreml .as file, execute = F, echo = T} RR model selection NID !A Exp !A DOI 20 !I PSA !/100000 G2.grm PSA.cleaned.mn.csv !SKIP 1 !MAXITER 1000 !EXTRA 100 !WORKSPACE 6144 !ASUV !SLOW 10 !DOPART 1 !PATH 1 !ASSIGN USe !< !INIT 1.37E-03 1.11E-03 1.87E-03 3.00E-03 4.81E-03 7.08E-03 1.04E-02 1.33E-02 1.78E-02 2.40E-02 3.52E-02 5.10E-02 7.23E-02 0.101765 0.1486 0.203647 0.308825 0.484353 0.743772 1.11587 !> !ASSIGN USg !< !INIT 0.583091 0.456320 0.360961 0.117960 0.945203E-01 0.251287E-01 !> !ASSIGN USp !< !INIT 0.603125E-01 0.400317E-01 0.267477E-01 !> PSA ~ leg(DOI,2) !r us(leg(DOI,1) $USp).Exp us(leg(DOI,2) $USg).grm(NID) !f mv residual id(1071).idh(DOI $USe) ``` Here is how it was run. ```{bash run asreml, execute = F, echo = T} cd RR_GP/ asreml RR.as ``` # Solving for GEBVs at each time point GEBVs at each time point were be obtained following to [Mrode (2014)](https://books.google.com/books?hl=en&lr=&id=b6MKAwAAQBAJ&oi=fnd&pg=PR5&dq=mrode+linear+models&ots=8Q3ejuZBpU&sig=PVGj2Zi13XD9f_ATZt3Pt4X7r1U). For line $j$ at time $t$, the GEBVs can be obtained by $\text{gBLUP}_{jt} = \phi_t\hat{u}_j$; where $\phi_t$ is the row vector of the matrix of Legendre polynomials of order 2. The functions below were adapted from Mrode (2005) by [Gota Morota](http://morotalab.org/Mrode2005/rr/rr.html#section00020000000000000000) ```{r load functions for RR with leg polynomials, echo=T, eval=FALSE} ##Return coefficient matrix (lambda) of n-th order Legendre polynomials. Scaling method implemented by Gengler et. al. (1999) converts constant Legendre polynomial coefficients into 1 `legendre` <- function(n, gengler){ if (nargs()==1){ gengler <- TRUE } if (gengler != TRUE & gengler != FALSE){ gengler=TRUE } N <- n+1 L <- matrix(0,nrow=N, ncol=N) for(i in (1:N)){ if(i==1){ L[i,i] <- 1 } else if(i==2){ L[i,i] <- 1 } else { tmp <- L[i-1,] tmp2 <- as.numeric() tmp2 <- c(0,tmp[1:(N-1)]) L[i,] <- (1/(i-2+1))*( (2*(i-2) + 1)*tmp2 -(i-2)*L[i-2,] ) } } # Normalize for (j in (1:N)){ L[j,] <- (sqrt( (2*(j-1)+1)/2) )*L[j,] } # Gengler (1999) if (gengler==TRUE){ L <- sqrt(2)*L } return(L) } ##Given time points covariate and order of fit for Legendre polynomials, return matrix 'M' containing the polynomials of standardized time. 'M' is order t (number of time points) by k (order of Legendre polynomials) stdtime <- function(t, n, tmax, tmin){ if(missing(tmax)) { tmax <- t[which.max(t)] } if(missing(tmin)) { tmin <- t[which.min(t)] } N <- n+1 M <- matrix(0, nrow=length(t), ncol=N) a <- -1 + 2*(t-tmin)/(tmax - tmin) M[,1] <- 1 for (i in 2:N){ M[,i] <- a^(i-1) } return(M) } ``` Here, we'll take the sln files, extract the breeding values for the RR coefficients, and calcualte the gBLUPs at each time point. ```{r solve for BVs at each time point, echo = T, eval = F} library(reshape2) setwd("DataandCode/RR_GP/") sln <- read.delim("RR.sln", sep="", header=T) #gBLUPs for legendre polynomials g.hat.y <- t(cbind(sln[grep("1.NSFTV", sln$Level) ,][,3], sln[grep("2.NSFTV", sln$Level) ,][,3], sln[grep("3.NSFTV", sln$Level) ,][,3])) colnames(g.hat.y) <- sub("1.", "", sln[grep("1.NSFTV", sln$Level) ,][,2]) #Calculated gBLUPs at each time point Phi <- stdtime(1:20, 2) %*% t(legendre(2, gengler = F)) ghat.t.y <- t(apply(g.hat.y, 2, function (x) Phi %*% x)) colnames(ghat.t.y) <- 1:20 gBLUP <- melt(ghat.t.y) colnames(gBLUP) <- c("NSFTV.ID", "DayOfImaging", "gBLUP") write.csv(gBLUP, "RR_GEBVs.csv", row.names = F) ```