About Us | Help Videos | Contact Us | Subscriptions
 

The Plant Genome - Article

 

 

This article in TPG

  1. Vol. 10 No. 2
    unlockOPEN ACCESS
     
    Received: Nov 16, 2016
    Accepted: Feb 06, 2017
    Published: May 18, 2017


    * Corresponding author(s): mes12@cornell.edu
 View
 Download
 Alerts
 Permissions
Request Permissions
 Share

doi:10.3835/plantgenome2016.11.0111

Multitrait, Random Regression, or Simple Repeatability Model in High-Throughput Phenotyping Data Improve Genomic Prediction for Wheat Grain Yield

  1. Jin Suna,
  2. Jessica E. Rutkoskiab,
  3. Jesse A. Polandc,
  4. José Crossab,
  5. Jean-Luc Janninkad and
  6. Mark E. Sorrells *a
  1. a Dep. of Plant Breeding and Genetics, Cornell Univ., Ithaca, NY 14853, USA
    b CIMMYT, Km. 45, Carretera México-Veracruz, El Batán, Texcoco CP 56237, Mexico
    c Dep. of Plant Pathology and Dep. of Agronomy, Kansas State Univ., Manhattan, KS 66506, USA
    d USDA–ARS R.W. Holley Center for Agriculture and Health, Ithaca, NY 14853, USA
Core Ideas:
  • HTP platforms used to measure secondary traits across time
  • Longitudinal data of secondary traits evaluated by SR, MT, and RR models, separately
  • BLUPs of secondary traits used in the multivariate pedigree and genomic prediction
  • Grain yield predictive ability was improved by 70%

Abstract

High-throughput phenotyping (HTP) platforms can be used to measure traits that are genetically correlated with wheat (Triticum aestivum L.) grain yield across time. Incorporating such secondary traits in the multivariate pedigree and genomic prediction models would be desirable to improve indirect selection for grain yield. In this study, we evaluated three statistical models, simple repeatability (SR), multitrait (MT), and random regression (RR), for the longitudinal data of secondary traits and compared the impact of the proposed models for secondary traits on their predictive abilities for grain yield. Grain yield and secondary traits, canopy temperature (CT) and normalized difference vegetation index (NDVI), were collected in five diverse environments for 557 wheat lines with available pedigree and genomic information. A two-stage analysis was applied for pedigree and genomic selection (GS). First, secondary traits were fitted by SR, MT, or RR models, separately, within each environment. Then, best linear unbiased predictions (BLUPs) of secondary traits from the above models were used in the multivariate prediction models to compare predictive abilities for grain yield. Predictive ability was substantially improved by 70%, on average, from multivariate pedigree and genomic models when including secondary traits in both training and test populations. Additionally, (i) predictive abilities slightly varied for MT, RR, or SR models in this data set, (ii) results indicated that including BLUPs of secondary traits from the MT model was the best in severe drought, and (iii) the RR model was slightly better than SR and MT models under drought environment.


Abbreviations

    BLUP, best linear unbiased prediction; BV, bivariate prediction; CT, canopy temperature; GNDVI, green NDVI; GS, genomic selection; HTP, high-throughput phenotyping; ISE, indirect selection efficiency; MT, multitrait; MV1, first multivariate prediction; MV2, second multivariate prediction; NDVI, normalized difference vegetation index; RNDVI, red NDVI; RR, random regression; SNP, single-nucleotide polymorphism; SR, simple repeatability; UV, univariate prediction

With the advent of next-generation sequencing technology, GS becomes feasible, making it possible to accelerate breeding cycles over traditional approaches. Genomic selection aims to take advantage of low-cost, genome-wide molecular markers to increase genetic gain of complex quantitative traits (Poland et al., 2012), and it has been applied in wheat (Rutkoski et al., 2011, 2012; Poland et al., 2012). On the other hand, there is increasing focus on HTP platforms (Araus and Cairns, 2014) that are capable of measuring large-scale, high-density phenotyping across time at high accuracy and low labor intensity (Yang et al., 2014). If the traits collected from HTP platforms are genetically correlated with the primary trait, such traits could be considered as secondary traits to improve rates of genetic gain for the primary trait in GS. In addition, using secondary traits collected from HTP platforms would also be useful to predict primary trait at early growth stages, as they could be phenotyped ahead of the primary trait like grain yield.

Grain yield is a complex quantitative trait that is controlled by many genes and is influenced by environment (Narjesi et al., 2015). Traits like CT and NDVI can be measured from HTP platforms continuously across plant growth stages. They are genetically correlated with grain yield and have been applied to predict grain yield in previous studies (Labus et al., 2002; Mason and Singh, 2014; Quarmby et al., 1993; Rees et al., 1993). The negative correlation between CT and grain yield can be used as a selection criterion in wheat breeding (Rees et al., 1993; Mason and Singh, 2014), particularly under heat-stressed environments (Mason and Singh, 2014). The green NDVI (GNDVI) and red NDVI (RNDVI) are vegetation indices calculated by the difference between near-infrared reflectance and green–red reflectance (Gitelson et al., 1996; Tucker, 1979), and wheat grain yield could be well estimated before harvest using NDVI based on a linear relationship between yield and the index (Labus et al., 2002; Quarmby et al., 1993). Multivariate pedigree and genomic prediction models have been shown to improve prediction accuracy when there is a strong correlation between secondary traits and a primary trait (Pszczola et al., 2013; Guo et al., 2014). Rutkoski et al. (2016) found that including CT and NDVI as secondary traits for grain yield in a multivariate prediction model could improve prediction accuracies vs. a univariate model.

In GS, it is more computationally efficient and simpler to use two-stage analysis than a one-stage analysis. Therefore, prior to using the secondary traits in prediction models, secondary traits could be first fit with the models that are able to handle data from HTP platforms. Canopy temperature and NDVI from HTP platforms are measured at many time points throughout the growth cycle and can be considered as longitudinal data. Such data sets are common in animal breeding for traits such as body weight of cattle growth. Models generally applied to longitudinal data in animal breeding include the SR (also named repeatability model in animal breeding), MT, and RR models. Rutkoski et al. (2016) used the SR model for secondary traits collected across time points by considering each time point within a growth stage (either vegetative or grain filling) as a repetitive collection for the same trait and incorporated best linear unbiased estimates of secondary traits from the SR model into multivariate pedigree and genomic prediction models. However, the SR model assumes constant variance and correlation at or between measurement dates, which may not be realistic for longitudinal data collected at different time points especially when time points span growth stages (Meyer and Hill, 1997). In the MT model, secondary traits at different time points are considered as multiple response variables for each line. However, a high correlation between consecutive measurements and computational requirements can restrict the application of the MT model (Speidel, 2011). The RR model has been applied to capture the change of a trait continually over growth stages with fewer parameters than the MT model in animal breeding. Furthermore, the assumptions for constant variances and correlations are not required in the RR model compared with the SR model (Meyer, 2000). A covariance at or between each time point can also be fitted in the RR model using different functions, typically Legendre polynomials (Meyer, 2005). Nonetheless, it has been observed that high-degree Legendre polynomials rapidly changed at the extremes, causing poor estimates of genetic parameters and variance components (Boligon et al., 2012; Meyer, 2005; Misztal, 2006). Compared with Legendre polynomials, spline functions with fractional polynomials are less prone to those problems because the growth curve is joined smoothly by short segments and each segment is only determined by each coefficient of the spline function (Boligon et al., 2012; Meyer, 2005; Misztal, 2006).

The objectives of this study were to (i) evaluate statistical models that could efficiently handle the longitudinal data for secondary traits from HTP platforms by comparing SR, MT and RR models and (ii) test the impact of the proposed models for the secondary traits based on their heritabilities, genetic correlations with grain yield, and genomic predictive abilities for grain yield.


Materials and Methods

Phenotypic Data

In our study, the data set was the same as reported in Rutkoski et al. (2016). We used 630 wheat inbreed breeding lines for subsequent analysis, and they were from the 2013–2014 growing season at the International Wheat and Maize Improvement Center (CIMMYT) in Obregon, Mexico. Those lines were grouped into 21 trials, and each trial includes 28 lines and two checks in an α-lattice design with three replicates and six blocks. The trials were planted in five diverse environments: optimal, early heat, late heat, drought, and severe drought. The sowing and irrigation conditions in each environment have been described in Rutkoski et al. (2016). High-throughput phenotyping data (secondary traits: CT and NDVI) were collected by the hyperspectral and thermal camera in an aircraft over wheat growth stages, and grain yield for those lines were evaluated. The number of dates available for each secondary trait varied in each environment. There were eight, seven, three, five, and six time points available for optimal, early heat, late heat, drought, and severe drought, respectively, and all time points were considered in either vegetative or grain filling stages.

Genotyping

Genome-wide genotyping of lines were based on genotyping-by-sequencing. Single-nucleotide polymorphism (SNP) markers were filtered, resulting in 12,083 SNPs available for 557 individuals. The criteria for filtering were as follows: the markers were removed if >80% of the individuals had missing data for a SNP, or if >20% of individuals were heterozygous for a SNP, and lines had >80% missing markers were removed. In addition, markers were also filtered for minor allele frequency <0.01, and missing data were imputed based on the mean of markers (Rutkoski et al., 2016).

Genetic Value Prediction

Two-step GS was applied in this study. In the first step, BLUPs of 630 lines were predicted for grain yield and secondary traits, separately. The BLUPs of secondary traits were predicted from the first two replicates, and the BLUPs of grain yield were predicted using the third replicate. Using BLUPs of secondary traits and grain yield in different replicates for model training and validation ensures that the genomic prediction accuracy for grain yield from secondary traits will not be inflated by the sharing of specific–environment effects.

For grain yield, BLUPs of each genotype were predicted based on a simple mixed model using the data from the third replicate within each environment, the matrix notation is as follows:where y is the vector of observations for grain yield; X, Z, W, and Q are incidence matrices corresponding to the fixed effect (b), random genetic effect (g), random environmental trial effect (t), and random environmental block effects (p); and e is the random residual errors. The variance and covariance structures are based on the following assumptions: , and is the genetic variance; and are environmental variances; is the residual variance; and I is the identity matrix.

For secondary traits, CT and NDVI, BLUPs of each genotype were predicted by fitting three models (SR, MT, or RR spline models) separately using the first two replicates. Since available time points varied for each secondary trait under each environment, three models were also fit for each trait within each environment.

Simple Repeatability Model

Data collected at each time point in the SR model were considered as repeated records of the same trait for each line. The SR model was the same as Eq. [1] with the fixed effect for time point and additional random effect for replicate, the matrix format is as follows:where y is the vector of observations for secondary traits; X, Z, W, S, and Q are incidence matrices corresponding to the fixed effect for time point (b), random genetic effect (g), random environmental effect for trial (t), random environmental effect for replicate (r), and random environmental effect for block (p); and e is the random residual errors. The variance and covariance structures are based on the following assumptions: and is the genetic variance; , and are environmental variances; is the residual variance; and I is the identity matrix.

Multitrait Model

For the MT model, data collected at each time point was considered as a single trait. The MT model in matrix form was expressed as the following:where yi is the vector of observations for the secondary trait at ith time point; Xi, Zi, Wi, Si, and Qi are incidence matrices relating the observations in yi to the fixed effects in bi, random genetic effects in gi, random environmental effects (trial, replicate, and block) in ti, ri, and pi, respectively, at ith time point; and n is number of time points for each secondary trait in each environment. The variance components are defined ascovariance matrix of random genetic effects, Kt, Kr, and Kp are covariance matrices of random environmental effects; R is a covariance matrix of random residual effects; I is an identity matrix with different orders for different random effects; and denotes the Kronocker product. In addition, Kg and R are assumed unstructured variance–covariance structures in the model, and Kt, Kr, and Kp are diagonal variance–covariance structures. The overall BLUP for each genotype in each environment from MT model were averaged over time points in each environment.

Random Regression Model using Cubic Smoothing Spline

Let yijklm be the observations for secondary trait for jth line, kth trial, lth replicate, and mth block at time point i. The following model is derived from White et al. (1999), which is equivalent to a RR model (Meyer, 1998, 2005), and additional permanent random environmental effects was included. The model was fitted using a cubic smoothing spline in ASReml-R (Butler et al., 2009). A spline is a curve that is joined continuously by multiple polynomial segments, and each joint is referred to as a knot (Meyer, 2005). A cubic smoothing spline consists of piecewise cubic functions, and a roughness penalty is imposed to determine the balance between smoothness and reliability of the data (White et al., 1999). Green and Silverman (1994) discussed details about cubic smoothing splines:where tijklm is the time point i for line j at trial k, replicate l, and block m and b0 (intercept) and b1 (slope) are the general linear regression over time tijklm as fixed. The rest of terms in Eq. [4] are random: bj0 (line) and bj1 (line × linear) capture the deviation from general linear regression for line j, similar for bk0 (trial); bk1 (trial × linear) for trial k; bl0 (replicate) and bl1 (replicate × linear) for replicate l; bm0 (block) and bm1 (block × linear) for block m; is a mean spline deviation, zn(tijklm) is the function of time points as covariables, cubic smooth spline function was implemented in this model, vn is the regression coefficient for each spline function zn(tijklm), and q is the number of regression coefficients (number of knots for spline in this model) fitted for genetic and environmental effects; (line × spline) and other similar terms, (trial × spline), (replicate × spline) and (block × spline), are deviation from the mean spline for additive genetic effect of line j and permanent environmental effects including trial k, replicate l, and block m for line j; and εijklm is the residual error term. In this model, number of knots was the same as the number of time points for each secondary trait under each environment. Detailed information about this model can be found in White et al. (1999).

The matrix notation for RR model is (Mrode, 2005):where y is the vector of observations for secondary traits, X is the incidence matrix corresponding to fixed effects and fixed regression coefficients, b is the vector for fixed effect and fixed regression. The matrices Z, W, S, and Q are covariable matrices relating to random genetic, random environmental trial, replicate, and block effects; and g, t, r, and p are vectors of random regressions for genetic and environmental effects. The variance components are assumed as: , where Kg, Kt, Kr, and Kp are covariance matrices (of the order equal to the order of the spline polynomial fitted) between random regression coefficients for random genetic and random environmental (trial, replicate, and block) effects, respectively; I is the identity with different orders corresponding to genetic and environmental effects, and denotes the Kronocker product.

The predict function implemented in ASReml-R was used to calculate the BLUP for each line at each time point. The overall BLUP for each line was averaged over time points in each environment, which was the same approach used for the MT model.

Heritability, Correlation, and Indirect Selection Efficiency

Variance components for narrow-sense heritability for each secondary trait and grain yield in each environment were estimated using the following model:where y is the BLUPs of genotypes from SR, MT, or RR spline models for secondary traits or BLUPs of genotypes from the mixed model Eq. [1] for grain yield; X and Z are incidence matrices corresponding to the fixed effect (b) and random genetic effect (g); and e is the random residual errors. The variance and covariance structures are based on the following assumptions: , where G is the genomic relationship matrix, or , where A is the pedigree relationship matrix and is the additive genetic variance; , where is the residual variance; and I is the identity matrix. Narrow-sense heritability was calculated as

Variance and covariance components for correlations were estimated using the multivariate model (Eq. [5]) in Rutkoski et al. (2016) in each environment. The response variables (y) were the BLUPs of genotypes from SR, MT, or RR models for secondary traits and BLUPs of genotypes from the mixed model (Eq. [1]) for grain yield. Genetic correlations between secondary traits and grain yield were calculated as follows:where rg(ST,GRYLD) is the genetic correlation between a secondary trait (either CT or NDVI) and grain yield, varg(ST) and varg(GRYLD) are the genetic variances of secondary trait and grain yield, respectively, and covg(ST,GRYLD) is the genetic covariance between a secondary trait and grain yield. Phenotypic correlations between secondary traits and grain yield were also calculated based on Pearson correlations.

Indirect selection efficiency (ISE) was estimated based on Falconer and Mackay (1996) under the assumption of the same intensity for both direct and indirect selection:where is the narrow-sense heritability for a secondary trait (CT or NDVI), is the narrow-sense heritability for grain yield, and rg(ST,GRYLD) is the genetic correlation between a secondary trait and grain yield estimated either using genomic or pedigree based relationships.

Genomic Prediction and Cross Validation

In the second step of GS, the BLUPs of 557 individuals from the SR, MT, or RR model for secondary traits or from the mixed model (Eq. [1]) for grain yield were used as dependent variables in the genomic prediction modeling. As the lines in this data set are replicated the same number of times in each environment, differential shrinkage of the BLUPs used as the dependent variable for genomic prediction is not an issue in this case. Four different prediction models (Supplemental Fig. S1) were used to compare SR, MT, and RR models and to check the efficiency of genomic and pedigree prediction with secondary traits.

The univariate prediction (UV) model with grain yield only is the same as the model expressed in Eq. [6], where y is the BLUPs of genotypes from the mixed model (Eq. [1]) for grain yield only.

Two multivariate genomic prediction models were applied to identify the improvement of genomic prediction for grain yield with additional secondary traits in the model fitting. For the first multivariate prediction (MV1) model, three secondary traits were included in both training and testing population (Supplemental Fig. S1). For the second multivariate prediction (MV2) model, three secondary traits were only included in the training population (Supplemental Fig. S1). The model was as follows:where n is the number of traits (here n = 4, including three secondary traits and grain yield); yi are BLUPs of genotypes from SR, MT, or RR spline models for secondary traits and BLUPs of genotypes from the mixed model Eq. [1] for grain yield; Xi and Zi are the fixed and random effects design matrix, individually, bi, gi, and ei are vectors of fixed effects, random genetic, and residual effects for each trait, respectively. Variance components were estimated by assuming , where G or A is the genomic or pedigree relationship matrices, and H is the variance–covariance matrix for traits; and, where I is an identity matrix, and R is the residual variance–covariance matrix between traits. Both H and R are assumed as unstructured.

In addition to those prediction models, the improvement of predictive ability from a single secondary trait was also tested. Bivariate prediction (BV) model, which included additional single secondary trait in both training and testing populations (Supplemental Fig. S1), was the same as MV model (Eq. [7]). The number of traits (n = 2) includes grain yield and one of the secondary traits in the model.

Five-fold cross-validation was applied for all predictions. For each fold, predictive abilities were calculated as the Pearson correlation between BLUPs of grain yield from mixed model (Eq. [1]) and estimated breeding values of grain yield from the above prediction models in the testing population based on either genomic or pedigree relationship matrix. Cross-validation was conducted within each environment.

Software Package

Data analysis was implemented in the R environment (R Development Core Team, 2010), and all models were fitted in ASReml-R (Butler et al., 2009). Genomic relationship matrix was calculated according to Eq. [15] in Endelman and Jannink (2012) using the R package rrBLUP (Endelman, 2011). The pedigree relationship matrix was estimated as 2× the coefficient of parentage using the International Crop Information System.


Results

Heritability

The heritability estimates for grain yield (Table 1) were lower than those of NDVI (Table 2) in all environments, whereas they were higher than the heritability estimates of CT (Table 2) in late heat and severe drought environments. Among five environments, relatively high heritabilities for secondary traits and relatively low heritability for grain yield were both observed in the optimal environment. Among SR, MT, and RR models, higher heritability estimates for secondary traits were observed using BLUPs from the MT model than the SR and RR models under most environments except severe drought. In addition, heritability estimates using the genomic relationship matrix were higher than those using the pedigree relationship matrix for both grain yield and secondary traits under all five environments.


View Full Table | Close Full ViewTable 1.

Heritability estimates and corresponding standard errors (in parentheses) for grain yield in five environments.

 
Environments Relationship matrix
Genomic (G) Pedigree (A)
Optimal 0.25 (0.09) 0.09 (0.04)
Early heat 0.37 (0.09) 0.32 (0.06)
Late heat 0.46 (0.10) 0.22 (0.06)
Drought 0.47 (0.10) 0.24 (0.06)
Severe drought 0.45 (0.09) 0.41 (0.06)

View Full Table | Close Full ViewTable 2.

Heritability estimates and corresponding standard errors (in parentheses) for secondary traits in five environments using best linear unbiased predictions of secondary traits from multitrait (MT), random regression (RR), and simple repeatability (SR) models.

 
Secondary trait† Environment MT .RR SR
G A§ G A G A
CT Optimal 0.65 (0.09) 0.51(0.06) 0.63 (0.09) 0.38 (0.06) 0.56 (0.09) 0.37 (0.06)
Early heat 0.55 (0.10) 0.45 (0.06) 0.47 (0.10) 0.39 (0.06) 0.45 (0.10) 0.38 (0.06)
Late heat 0.42 (0.10) 0.23 (0.06) 0.40 (0.10) 0.17 (0.06) 0.42 (0.09) 0.19 (0.06)
Drought 0.56 (0.09) 0.39 (0.06) 0.55 (0.10) 0.32 (0.06) 0.55 (0.09) 0.31 (0.06)
Severe drought 0.32 (0.09) 0.13 (0.05) 0.43 (0.10) 0.25 (0.06) 0.25 (0.09) 0.14 (0.06)
GNDVI Optimal 0.76 (0.08) 0.53 (0.05) 0.71 (0.09) 0.49 (0.06) 0.70 (0.09) 0.48 (0.06)
Early heat 0.63 (0.09) 0.56 (0.05) 0.59 (0.09) 0.50 (0.06) 0.57 (0.09) 0.48 (0.06)
Late heat 0.58 (0.09) 0.40 (0.06) 0.56 (0.10) 0.38 (0.06) 0.53 (0.09) 0.36 (0.06)
Drought 0.73 (0.08) 0.54 (0.05) 0.72 (0.08) 0.54 (0.05) 0.70 (0.08) 0.51 (0.06)
Severe drought 0.68 (0.08) 0.42 (0.05) 0.69 (0.09) 0.43 (0.05) 0.64 (0.09) 0.39 (0.06)
RNDVI Optimal 0.70 (0.09) 0.51 (0.06) 0.67 (0.09) 0.48 (0.06) 0.65 (0.09) 0.45 (0.06)
Early heat 0.59 (0.09) 0.51 (0.06) 0.58 (0.09) 0.47 (0.06) 0.55 (0.09) 0.45 (0.06)
Late heat 0.58 (0.09) 0.43 (0.06) 0.56 (0.09) 0.42 (0.06) 0.57 (0.09) 0.40 (0.06)
Drought 0.63 (0.09) 0.50 (0.06) 0.63 (0.09) 0.54 (0.06) 0.62 (0.09) 0.47 (0.06)
Severe drought 0.54 (0.09) 0.45 (0.05) 0.53 (0.09) 0.43 (0.06) 0.50 (0.09) 0.38 (0.06)
CT, canopy temperature; GNDVI, green normalized difference vegetation index; RNDVI, red normalized difference vegetation index.
G, Genomic relationship matrix.
§A, Pedigree relationship matrix.

Correlations Between Secondary Traits and Grain Yield

Correlations (Table 3) between secondary traits and grain yield varied across environments. Canopy temperature showed a higher correlation with grain yield than NDVI in most environments except severe drought, where NDVI had higher correlation with grain yield than CT. Among five environments, the highest correlations between secondary traits and grain yield were observed in the late heat for CT and in the severe drought for NDVI. Among SR, MT, and RR models, higher correlations between secondary traits and grain yield were observed using BLUPs from SR and RR models than MT model under most environments except severe drought where MT model was better than the other two. For CT, higher correlations with grain yield using the genomic rather than the pedigree relationship matrix were observed under late heat and drought, and for NDVI, this was observed in late heat and severe drought environments.


View Full Table | Close Full ViewTable 3.

Correlations between secondary traits and grain yield and corresponding standard errors (in parentheses) for genetic correlations.

 
Secondary trait† Environment MT RR SR
G A§ P G A P G A P
CT Optimal −0.69 (0.13) −0.74 (0.11) −0.34 −0.77 (0.11) −0.78(0.11) −0.35 −0.78 (0.12) −0.84 (0.10) −0.35
Early heat −0.74 (0.10) −0.80 (0.07) −0.52 −0.76 (0.10) −0.77(0.08) −0.50 −0.76 (0.10) −0.81 (0.07) −0.53
Late heat −0.90 (0.07) −0.88 (0.08) −0.40 −0.90 (0.07) −0.89(0.08) −0.39 −0.89 (0.07) −0.88 (0.09) −0.40
Drought −0.69 (0.11) −0.61 (0.12) −0.38 −0.69 (0.11) −0.61(0.12) −0.39 −0.69 (0.11) −0.56 (0.13) −0.36
Severe drought −0.58 (0.15) −0.65 (0.14) −0.34 −0.48 (0.15) −0.45(0.14) −0.33 −0.50 (0.18) −0.36 (0.17) −0.28
GNDVI Optimal 0.36 (0.16) 0.49 (0.15) 0.20 0.39 (0.16) 0.49(0.15) 0.20 0.41 (0.16) 0.53 (0.15) 0.20
Early heat 0.45 (0.14) 0.64 (0.08) 0.43 0.46 (0.14) 0.62(0.09) 0.40 0.48 (0.14) 0.66 (0.09) 0.42
Late heat 0.52 (0.12) 0.47 (0.13) 0.22 0.49 (0.13) 0.43(0.14) 0.21 0.48 (0.13) 0.42 (0.14) 0.20
Drought −0.43 (0.12) −0.47 (0.11) −0.39 −0.44 (0.12) −0.48(0.11) −0.39 −0.42 (0.13) −0.46 (0.11) −0.39
Severe drought −0.77 (0.07) −0.62 (0.08) −0.59 −0.72 (0.08) −0.56(0.09) −0.53 −0.70 (0.09) −0.56 (0.09) −0.53
RNDVI Optimal 0.30 (0.17) 0.42 (0.16) 0.19 0.32 (0.17) 0.36(0.16) 0.17 0.36 (0.17) 0.45 (0.16) 0.18
Early heat 0.49 (0.13) 0.67 (0.08) 0.45 0.55 (0.13) 0.64(0.09) 0.42 0.59 (0.13) 0.71 (0.08) 0.45
Late heat 0.64 (0.11) 0.56 (0.12) 0.27 0.61 (0.12) 0.52(0.12) 0.26 0.61 (0.12) 0.53 (0.13) 0.26
Drought −0.37 (0.14) −0.45 (0.11) −0.40 −0.38 (0.14) −0.51(0.11) −0.39 −0.35 (0.14) −0.47 (0.12) −0.39
Severe drought −0.70 (0.09) −0.59 (0.08) −0.54 −0.61 (0.11) −0.44(0.10) −0.44 −0.57 (0.12) −0.46 (0.10) −0.44
CT, canopy temperature; GNDVI, green normalized difference vegetation index; RNDVI, red normalized difference vegetation index.
G, Genomic relationship matrix.
§A, Pedigree relationship matrix.
P, phenotypic correlation.

Predictive Abilities

Four different prediction models (Supplemental Fig. S1) were tested in this data set. The multivariate MV1 model outperformed the MV2 model and the UV model; however, the MV2 model showed similar predictive ability as UV even though secondary traits were included in the training population for the MV2 model. From the comparison between prediction models UV and MV1 (Fig. 1; above MV1), the average improvement of predictive ability for grain yield over UV was 70% based on either genomic or pedigree relationship matrix. Besides, the predictive ability from the BV model with a single secondary trait (Fig. 2) was superior to the UV model as well, and including CT in BV provided better predictive ability for grain yield than NDVI in all environments except drought-stressed environments. Additionally, positive relationships were observed between predictive ability gained from a single secondary trait and ISE of single secondary trait from SR, MT, or RR models (Fig. 3).

Fig. 1.
Fig. 1.

Predictive ability comparison for grain yield between prediction models with secondary traits (MV1 and MV2) and without secondary trait (UV). MV1, multivariate prediction model with secondary traits in both training and testing populations; MV2, multivariate prediction model with secondary traits in training population only; UV, univariate prediction model with grain yield only; MT/RR/SR, multivariate prediction model MV1 or MV2 using best linear unbiased predictions (BLUPs) of secondary traits from multitrait (MT), random regression (RR), or simple repeatability (SR) model; G/A, genomic/pedigree relationship matrix.

 
Fig. 2.
Fig. 2.

Predictive ability comparison for grain yield between prediction models with a single secondary trait (BV) and without secondary trait (UV). BV, bivariate prediction model with a single secondary trait in both training and testing populations; UV, univariate prediction model with grain yield only; MT/RR/SR, bivariate prediction model BV using best linear unbiased predictions (BLUPs) of secondary traits from multitrait (MT), random regression (RR), or simple repeatability (SR) model; CT/GNDVI/RNDVI, bivariate prediction model BV including a single secondary trait CT, GNDVI, or RNDVI; G/A, genomic/pedigree relationship matrix.

 
Fig. 3.
Fig. 3.

Relationship between improved predictive ability and indirect selection efficiency (ISE) for single secondary trait. Improved predictive ability, predictive ability for grain yield from the bivariate prediction model (BV) minus predictive ability for grain yield from the univariate prediction model (UV); Model MTA/MTG, multitrait (MT) model based on pedigree (A) or genomic (G) relationship matrix; RRA/RRG, random regression (RR) model based on A or G relationship matrix; SRA/SRG, simple repeatability (SR) model based on A or G relationship matrix.

 

For secondary traits included in either multivariate or BV models, we used BLUPs of secondary traits from SR, MT, or RR models, individually. Based on the predictive ability from the multivariate MV1 model with all three secondary traits, among SR, MT, and RR models, MT and SR models performed slightly better than RR model in the heat-stressed environments (Fig. 1; above MV1). However, predictive ability based on BLUPs from the MT and RR models were superior to SR model in drought-stressed environments, where MT model was the best in severe drought environment and RR model was slightly better than others in the drought environment (Fig. 1; above MV1).


Discussion

Secondary Traits

Heritability and Correlation

It was reported in previous research that taking advantage of correlated and high heritability traits improved the prediction accuracy of low-heritability traits in a multivariate prediction model based on simulation (Jia and Jannink, 2012). In our data set, the heritability estimates for grain yield were only based on one replicate of the data, which resulted in a lower heritability estimate for grain yield in the optimal environment using the pedigree relationship matrix (0.09). Nevertheless, the rest of heritabilities were still moderate from 0.22 to 0.47. In contrast to the heritability of grain yield, the heritabilities of secondary traits were relatively high, which should be beneficial to the predictive ability for grain yield. In this study, the relatively high heritabilities of secondary traits contributed to substantial improvement of predictive ability for grain yield in the multivariate prediction model. For example, among five environments, although secondary traits only had moderate correlation with grain yield in the optimal environment, we observed the greatest improvement of predictive ability for grain yield in this environment (84% in genomic relationship, 101% in pedigree relationship), which should be a result of the higher heritabilities of secondary traits and the lower heritability of grain yield. In addition, even when the heritability estimate of the secondary trait was lower than the primary trait, the predictive ability was still improved as a result of the high correlation between those two traits, for example, CT in late heat environment. However, the amount of improvement was indeed affected by the heritability. Accordingly, both heritabilities and correlations of secondary traits with grain yield played significant roles to improve predictive ability for the primary trait in the multivariate prediction model. This was reflected in Fig. 3, where ISE calculated based on the square root of heritability ratio and correlation of secondary traits and grain yield was associated with the improvement of predictive ability.

Importance of Secondary Traits in the Prediction Model

Multivariate prediction models take advantage of genetically correlated secondary traits (Guo et al., 2014). Hence, including secondary traits in both training and test populations improved the prediction accuracy for grain yield by efficiently employing strong genetic correlations between secondary traits and grain yield. However, we also found that predictive abilities were not increased when the secondary traits only existed in training population (Fig. 1; below MV2), which was also reported by Pszczola et al. (2013) and Rutkoski et al. (2016). Based on previous studies, a multivariate prediction model would be superior to a UV model when the trait of interest has low heritability, missing data, or large differences in genetic and residual correlations compared with the secondary traits (Guo et al., 2014; Pszczola et al., 2013). One possible reason mentioned in Rutkoski et al. (2016) was the moderate heritabilities of grain yield in this data set resulted in the ineffectiveness of multivariate prediction model when secondary traits only in training population. However, it could be favorable in breeding to include secondary traits to make selections when the primary trait is difficult to measure under circumstances including severe weather conditions, limited seeds, or expensive cost of phenotyping (Guo et al., 2014; Rutkoski et al., 2016). In this study, the BLUPs for secondary traits and grain yield were predicted separately with data in different replicates (environments). The predictive ability of the UV model was slightly lower when only one replicate was used for grain yield (0.29 on average) than using all three replicates (0.34 on average). However, the predictive ability comparison between the UV model and the multivariate prediction models would be more robust as the environmental covariance between secondary traits and grain yield should be avoided based on data in different environments for them in the multivariate prediction models.

Single Secondary Trait

Heritability and correlation varied for each secondary trait in each environment, resulting in different predictive abilities for each secondary trait. Canopy temperature predicted grain yield best in the heat-stressed environments, which was consistent with the conclusion of Mason and Singh (2014), and NDVI predicted grain yield better than CT in the drought-stressed environments. In addition, we observed that including all secondary traits in the pedigree and genomic multivariate prediction model did not necessarily improve the predictive ability for grain yield compared with only a single secondary trait, indicating that the predictive ability in the multivariate prediction model was mostly determined by the secondary trait with the highest ISE. Therefore, a single secondary trait could be selected in different environments, based on the ISE, to provide the best prediction accuracy using a simple BV model. However, if computational resources allowed, incorporating all secondary traits in the prediction model would ensure stable and high prediction accuracy across all environments.

Genomic versus Pedigree Relationship

Both pedigree and genomic prediction models were compared in this study. Among five environments, the heritability estimates for grain yield from genomic data were higher than those from pedigree based relationship, whereas predictive abilities for grain yield from pedigree performed better than those from genomic based relationship in the UV model and even in the MV2 model in early heat and severe drought environments. Furthermore, including secondary traits in the MV1 model also resulted in similar or only slightly higher prediction accuracy from genomic than pedigree based relationship in all environments as was observed by Pszczola et al. (2013). We observed different scales in genomic- and pedigree-based relationship matrices (Supplemental Fig. S2), which could influence the comparability of heritability estimates from two relationships in this data set. In addition, CIMMYT has a reliable and deep pedigree database extending over several generations that contributes to predictive ability. Also, 557 lines were derived from 332 families in this population, giving an average of one to two lines per family. One advantage of genomic relationship vs. pedigree relationship is that it captures the variation within family as a result of Mendelian sampling (Pszczola et al., 2013), and the variance within family in this population may be very limited because of small family sizes compared with the variance among families. However, the relatively higher association between gains in predictive ability and ISE of a single predictive trait using genomic instead of pedigree based relationship indicated the gain in predictive ability from secondary traits was greater when genomic relationships were used.

Simple Repeatability, Multitrait, and Random Regression Models

Heritability and Correlation

Best linear unbiased predictions of secondary traits from the MT model provided higher heritability estimates than the other two models, SR and RR, under most environments except severe drought. Similar results were observed in previous studies (Ahrabi, 2015; Iwaisaki et al., 2005; Menéndez-Buxadera et al., 2008; Oh and See, 2008). In contrast to heritability estimates, the correlations between grain yield and secondary traits from the MT model were lower than the other two models under most environments except severe drought. Opposite trends for heritability and correlation were observed by comparing SR, MT, and RR models. Relatively higher heritability estimates of secondary traits among the three models were coupled with relatively lower correlations with grain yield.

Predictive Abilities

In this study, we observed that predictive abilities for grain yield using BLUPs of secondary traits from the RR model was only slightly better than the SR and MT models in the drought environment. However, in other studies, the RR model was considered to be superior to the SR and MT models. The number of time points in this data set might influence the growth curve adjustment of the RR model, particularly for those regions where data changes rapidly (Kranis et al., 2007; Mota et al., 2013). Increasing number of the time points will cause overparameterized analysis in the MT model (Oh and See, 2008). Thus, the small differences between the MT and RR models might be caused by the few time points available in our data set. As for the SR model, it also showed accuracy similar to the other two models. The correlations between BLUPs of secondary traits from three models were close to one, particularly for those between the SR and RR models. This was also reported in a cattle study using those three models (Gernand and König, 2014). The repeatability of secondary traits was high and similar across time points in our data set, which could be another reason for these results. Although the SR model is a simple and fast approach, it may not always be desirable to apply the model in longitudinal data because of limiting assumptions (Kranis et al., 2007). Since the RR model is attractive for longitudinal data because of the advantages of handling many unequally spaced time points and capturing the covariance between or at each time point (Huisman et al., 2002; Meyer, 2005), it would be valuable to investigate this model further using other data sets. Several directions could be taken for the RR model in the future: (i) evaluate the RR model using a data set including more time points to check if it would better model the genetic variation over time (Kranis et al., 2007); (ii) explore the number and location of knots for the spline RR model to improve the accuracy of the growth curve fit (Speidel, 2011); (iii) instead of using information from all time points, analyze estimated breeding values from each time point separately to select the optimum period in genomic prediction for grain yield; and (iv) incorporate genotype × environment effects in the prediction model to capture the relationship among environments.


Conclusion

The predictive abilities for grain yield were improved ∼70%, on average, by including BLUPs of secondary traits from the SR, MT, or RR models in the multivariate pedigree and genomic selection models. In this data set, our results showed that the RR or MT model was superior to the SR model in drought-stressed environments, and using a simple SR model for longitudinal data does not negatively affect the predictive ability for grain yield in MV models. However, more studies are necessary to further evaluate these models in other data sets. The advantages of the RR model over the SR and MT models might be more apparent for other data sets from HTP platforms that have more data collection time points within vegetative and grain filling stages. In addition, we observed that the predictive ability would only be improved if both training and test populations contain secondary traits. Single secondary trait could be selected to fit a simple BV model based on the highest ISE for a secondary trait, and this had similar prediction accuracy as when using all secondary traits in a multivariate prediction model.

Acknowledgments

Program activities were funded by the United States Agency for International Development (USAID) “Feed the Future Initiative” (Cooperative Agreement #AID-OAA-A-13-00051) and by participating US and host country institutions. Partial funding was provided by Hatch project 149-430. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of Cornell University, Kansas State University, CIMMYT, ICAR, and BISA and do not necessarily reflect the view of USAID.

 

References

Footnotes



Files:

Comments
Be the first to comment.



Please log in to post a comment.
*Society members, certified professionals, and authors are permitted to comment.