About Us | Help Videos | Contact Us | Subscriptions
 

The Plant Genome - Article

 

 

This article in TPG

  1. Vol. 10 No. 2
    unlockOPEN ACCESS
     
    Received: Aug 19, 2016
    Accepted: Nov 13, 2016
    Published: April 6, 2017


    * Corresponding author(s): mes12@cornell.edu
 View
 Download
 Alerts
 Permissions
Request Permissions
 Share

doi:10.3835/plantgenome2016.08.0082

Comparison of Models and Whole-Genome Profiling Approaches for Genomic-Enabled Prediction of Septoria Tritici Blotch, Stagonospora Nodorum Blotch, and Tan Spot Resistance in Wheat

  1. Philomin Julianaa,
  2. Ravi P. Singhb,
  3. Pawan K. Singhb,
  4. Jose Crossab,
  5. Jessica E. Rutkoskiab,
  6. Jesse A. Polandc,
  7. Gary C. Bergstromd and
  8. Mark E. Sorrells *a
  1. a Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY 14853
    b International Maize and Wheat Improvement Center (CIMMYT), Apdo, Postal 6-641, 06600 Mexico, D.F., Mexico
    c Wheat Genetics Resource Center, Dep. of Plant Pathology and Dep. of Agronomy, Kansas State Univ., Manhattan, KS 66506
    d Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY 14853

Abstract

The leaf spotting diseases in wheat that include Septoria tritici blotch (STB) caused by Zymoseptoria tritici, Stagonospora nodorum blotch (SNB) caused by Parastagonospora nodorum, and tan spot (TS) caused by Pyrenophora tritici-repentis pose challenges to breeding programs in selecting for resistance. A promising approach that could enable selection prior to phenotyping is genomic selection that uses genome-wide markers to estimate breeding values (BVs) for quantitative traits. To evaluate this approach for seedling and/or adult plant resistance (APR) to STB, SNB, and TS, we compared the predictive ability of least-squares (LS) approach with genomic-enabled prediction models including genomic best linear unbiased predictor (GBLUP), Bayesian ridge regression (BRR), Bayes A (BA), Bayes B (BB), Bayes Cπ (BC), Bayesian least absolute shrinkage and selection operator (BL), and reproducing kernel Hilbert spaces markers (RKHS-M), a pedigree-based model (RKHS-P) and RKHS markers and pedigree (RKHS-MP). We observed that LS gave the lowest prediction accuracies and RKHS-MP, the highest. The genomic-enabled prediction models and RKHS-P gave similar accuracies. The increase in accuracy using genomic prediction models over LS was 48%. The mean genomic prediction accuracies were 0.45 for STB (APR), 0.55 for SNB (seedling), 0.66 for TS (seedling) and 0.48 for TS (APR). We also compared markers from two whole-genome profiling approaches: genotyping by sequencing (GBS) and diversity arrays technology sequencing (DArTseq) for prediction. While, GBS markers performed slightly better than DArTseq, combining markers from the two approaches did not improve accuracies. We conclude that implementing GS in breeding for these diseases would help to achieve higher accuracies and rapid gains from selection.


Abbreviations

    APR, adult plant resistance; BA, Bayes A; BB, Bayes B; BC, Bayes Cπ; BL, Bayesian least absolute shrinkage and selection operator; BRR, Bayesian ridge regression; BV, breeding value; DArTseq, diversity arrays technology sequencing; GBLUP, genomic best linear unbiased predictor; GBS, genotyping by sequencing; IBWSN, International Bread Wheat Screening Nursery; IID, independent and identically distributed; LD, linkage disequilibrium; LS, least squares; NE, necrotrophic effectors; QTL, quantitative trait loci; rAUDPC, relative area under the disease progression curve; RKHS-M, reproducing kernel Hilbert spaces markers; RKHS-MP, reproducing kernel Hilbert spaces markers and pedigree; RKHS-P, reproducing kernel Hilbert spaces pedigree; RR-BLUP, ridge regression–best linear unbiased prediction; SNB, Stagonospora nodorum blotch; STB, Septoria tritici blotch; TS, tan spot

The major leaf spotting diseases threatening wheat (Triticum aestivum L.) are Septoria tritici blotch (STB) caused by Zymoseptoria tritici (Desm.) Quaedvlieg & Crous, SNB caused by Parastagonospora nodorum (Berk.) Quaedvlieg, Verkley & Crous and tan spot (TS) caused by Pyrenophora tritici-repentis (Died.) Drechsler. Among these, STB is an important disease in the temperate regions of the world and is considered to be the most damaging disease of wheat in Europe (Eyal et al., 1987; Goodwin, 2007; Orton et al., 2011; O’Driscoll et al., 2014). While the average annual yield loss in the UK was 20% when susceptible lines were not treated with fungicides, only 5 to 10% loss resulted from using resistant varieties and fungicide treatment (Fones and Gurr, 2015). About 70% ($1.2 billion) of the annual cereal fungicides in the European Union is used for STB management and fungicide resistance in Z. tritici populations is widespread (Torriani et al., 2015). This has made genetic resistance the preferred STB management strategy, which can be either qualitative (controlled by large effect major genes that follow the gene-for-gene model) or quantitative (controlled by few to many genes of moderate to small effects; Brown et al., 2015). Several genes for STB resistance have been reported, which include Stb1-Stb15, StbSm3, Stb16q, Stb17, Stb18, StbWW, and TmStb1 (Brown et al., 2015). Among these, Stb6 interaction shows a typical gene-for-gene relationship (Brading et al., 2002) and Stb17 is a gene for quantitative resistance expressed at the adult plant stage (Tabib Ghaffary et al., 2012).

Stagonospora nodorum blotch or glume blotch, is an important disease in the warm and moist growing areas of the world that can cause yield losses of up to 31% under high inoculum pressure (Bhathal et al., 2003). The relative importance of the causal necrotroph, P. nodorum varies in different parts of the world. It is a major pathogen of winter wheat in the United States (Crook et al., 2012), the second most economically important pathogen in the Western region in Australia (Murray and Brennan, 2009), and there was a shift in its prevalence in Europe when it was overtaken by Z. tritici populations in both the United Kingdom and Germany (Polley and Thomas, 1991; Meien-Vogeler et al., 1994). Tan spot or yellow spot is another devastating foliar disease that is a serious constraint to wheat production in Western Australia (Murray and Brennan, 2009). It can result in an average yield loss of 5 to 10%, but losses up to 50% can occur under conditions favorable for disease development (Shabeer and Bockus, 1988; Lamari and Bernier, 1989a, 1989b; De Wolf et al., 1998). While fungicides and agronomic practices are available for SNB and TS management, the deployment of resistant cultivars is the most sustainable, cost-effective, and environment-friendly strategy. Host–pathogen interactions for both P. nodorum and P. tritici-repentis follow the inverse gene-for-gene model. This involves the recognition of host-specific toxins or necrotrophic effectors (NE) by a host sensitivity gene resulting in a compatible interaction, leading to susceptibility. The nonrecognition of the toxin by the host results in an incompatible interaction, leading to resistance (Faris et al., 2013). For SNB, several interactions between NE and the host genes have been identified, which include Snn1 (Liu et al., 2004), Tsn1 (Friesen et al., 2006; Liu et al., 2006), Snn2 (Friesen et al., 2007), Snn3 (Friesen et al., 2008), Snn3-B1 and Snn3-D1 (Zhang et al., 2011), Snn4 (Abeysekara et al., 2009), Snn5 (Friesen et al., 2012), Snn6 (Gao et al., 2015), and Snn7 (Shi et al., 2015). For TS, six qualitative genes, Tsr1 to Tsr6, that interact with a range of host-specific toxins including ToxA, ToxB, and ToxC, have been reported (Singh et al., 2006; Tadesse et al., 2006a, 2006b).

Breeding for resistance to wheat leaf spotting diseases is a challenge because of the difficulties in phenotyping, the ephemeral nature of some of the known resistance genes, the emergence of new isolates, and the complex inheritance of genetic resistance. Hence, it is important to devise strategies to accelerate breeding for quantitative resistance, which is likely to be more durable. One promising approach that could help achieve this is genomic selection (GS; Meuwissen et al., 2001) which uses dense genome-wide markers to obtain the genomic estimated BVs of individuals. This enables selection prior to phenotyping, thereby leading to greater rates of genetic gain. The potential of GS to improve quantitative traits in wheat has been demonstrated in many empirical studies (Crossa et al., 2010, 2014; Heslot et al., 2012; Ornella et al., 2012; Rutkoski et al., 2014). In GS, a training population comprising individuals that have been genotyped and phenotyped for traits of interest is used to train a model that is used to predict the BVs of individuals in a selection population that is not phenotyped. While some studies comparing prediction model accuracies have been reported (Lorenzana and Bernardo, 2009; Crossa et al., 2010; Heffner et al., 2011b; Heslot et al., 2012), our objective was to compare the predictive ability of the LS approach (where selected loci were used as fixed effects) with genomic-enabled prediction models for STB, SNB, and TS resistance. The genomic prediction models evaluated include GBLUP, BRR, BA, BB, BC, BL, and RKHS-M. In addition, we also evaluated a pedigree based model, RKHS-P, and RKHS-MP that included both the pedigree- and marker-based relationship matrices. We also compared markers obtained from two whole-genome profiling approaches for genomic prediction: the GBS method (Poland et al., 2012) and the DArTseq, used by Diversity Arrays Technology, Canberra, Australia (http://www.diversityarrays.com/dart-application-dartseq).


Materials and Methods

For this study, we used CIMMYT’s 45th and 46th International Bread Wheat Screening Nurseries (IBWSNs) comprising 333 and 313 lines, respectively. The IBWSNs are large screening nurseries that are evaluated in multiple trials in Mexico and cooperating locations globally. They consist of 200 to 400 advanced lines from CIMMYT’s bread wheat breeding program. They are expected to have several novel genes for resistance and considerable variation in their BVs, making them ideal for building prediction models.

Disease Evaluation and Phenotypic Data

Adult Plant Resistance Evaluation for Septoria Tritici Blotch

Adult plant resistance to STB was evaluated at CIMMYT’s research station, Toluca, Mexico during the 2011, 2013, and 2014 crop seasons. The inoculum for STB was prepared according to Gilchrist-Saavedra et al. (2006) using a mixture of six aggressive strains: St1 (B1), St2 (P8), St5 (OT), St6 (KK), 64 (St 81.1) and 86 (St 133.4) at a concentration of 1 × 107 spores/mL. The nurseries were inoculated 45 d after planting using an ultra-low volume applicator. Two additional applications were made at weekly intervals. A border row of a susceptible spreader variety, Huirivis, and a resistant variety, Murga, was planted surrounding the field. The plants were evaluated using the double-digit scale (00–99) that is a modification of the Saari-Prescott 0–9 scale for rating foliar diseases (Saari and Prescott, 1975). The first digit gives the relative height of the disease spread vertically using the original 0–9 Saari-Prescott scale, and the second digit represents the percentage disease severity in terms of 0–9 (Eyal et al., 1987). Three to four evaluations were performed. The disease severity percentages were calculated from the scores using the formula: (first score/9) × (second score/9) × 100 and were used to obtain the relative area under the disease progression curve (rAUDPC). In 2014, there was high incidence of stripe rust and the two diseases became nearly inseparable. So, we included stripe rust severity as a covariate in all the models for this year.

Seedling Evaluation for Stagonospora Nodorum Blotch

Seedling resistance to SNB was evaluated in CIMMYT’s greenhouses in El Batan, Mexico, in 2014. Inoculum production and inoculation were done as described in Singh et al. (2006). The P. nodorum isolate Sn4 at a concentration of 1 × 106 spores/mL was used. Each entry was represented by four seedlings planted in six replications, and the check varieties Erik, Glenlea, 6B-662, and 6B-365 were planted every 20 rows. The second leaf of each seedling was scored for SNB disease reaction 7 d postinoculation, using the 1–5 lesion rating scale (Feng et al., 2004).

Seedling and Adult Plant Resistance Evaluation for Tan Spot

Seedling resistance and APR to TS were evaluated at CIMMYT’s greenhouses and fields at El Batan, Mexico, 2014. Race 1 (isolate Ptr1) was used and the inoculum was produced by the method described by Singh et al. (2011). The concentration of the inoculum (for both seedling and field inoculation) was adjusted to 4000 conidia/mL. Seedling inoculation and checks were similar to that for P. nodorum and were planted in six replications. Seven days postinoculation, the seedlings were rated for disease response based on a 1 to 5 lesion rating scale developed by Lamari and Bernier (1989a). Field inoculation and evaluation was similar to that for STB. A continuous border row of the susceptible spreader Glenlea and the resistant cultivar Erik was planted surrounding the field. The double digit scale was used and the rAUDPC was calculated from four evaluations done at weekly intervals.

The phenotypic distributions for all the diseases were transformed using the boxcox function in the R statistical program.

Genotyping

The two nurseries were genotyped using the GBS method described by Poland et al. (2012) for dense genome-wide coverage (Elshire et al., 2011). Markers with missing data > 50%, minor allele frequency < 10% and pairwise marker correlation (r2) > 0.95 (for redundancy) were filtered, that resulted in 5102 markers for the 45th IBWSN, 8066 markers for the 46th IBWSN and 8857 markers for the combined nurseries. We also filtered for lines with > 50% missing data and obtained 267, 305, and 566 lines for the 45th IBWSN, 46th IBWSN, and the combined nurseries, respectively. The lines in the 45th IBWSN were genotyped using both GBS and DArTseq platforms. After using the same filtering criteria as above, we obtained 5209 DArTseq markers for the 267 lines that, in combination with 5102 GBS markers, resulted in 10,311 markers for the combined marker set. The expectation-maximization algorithm was implemented in the R package rrBLUP (Endelman, 2011) to impute missing data.

Relationship Matrix, Linkage Disequilibrium, and Heritability Estimation

The genomic relationship matrix (G-matrix) was obtained according to VanRaden (2008) and implemented in the R package rrBLUP (Endelman, 2011). It was centered and standardized for all the analyses. The linkage disequilibrium (LD) across the wheat chromosomes was calculated using a subset of markers that were mapped. This comprised 3531 GBS markers and 4793 DArTseq markers for the 45th IBWSN. We used the r2 measure of LD, which is the square of the correlation coefficient between two loci. It measures the proportion of the variance of a response variable that is explained by a predictor variable (Hill and Robertson, 1968). The value of r is given bywhere D is the disequilibrium and p1, q1, p2, and q2 are the allele frequencies at the two loci. Heritability for the different traits was calculated on a line-mean basis. The average information-restricted maximum likelihood algorithm (Gilmour et al., 1995) implemented in the heritability package in R (Kruijer et al., 2015) was used to obtain estimates of the genetic and residual variances.

Prediction Models

Least Squares

A step-wise LS approach was used that involved selection of markers and estimation of effects for the selected markers. First, a genome-wide association analysis was conducted in the training set to identify the markers significantly associated with the trait. The markers were ranked according to their p-values for variable selection. For each iteration i through j, a marker was added to the model starting from the most significant marker:where y is the phenotype, μ is the mean, βi denotes the effect of the ith marker, and Xi denotes the ith marker’s genotype matrix. The fivefold cross validation accuracy was calculated within the training set after each iteration, and the model with j–1 markers was selected when the Accuracyj–1 > Accuracyj. But two closely linked markers having very similar p values will follow each other in the iteration, and adding the second marker will not improve the accuracy. Since this would stop the iteration and lead to the exclusion of other linked markers with lower p values, we removed markers that had pairwise marker correlation (r2) > 0.80 for this model. Model selection was followed by estimation of marker effects from the selected model that were then used to predict the BVs of the individuals.

Genomic Best Linear Unbiased Prediction

The GBLUP is a whole-genome regression approach that uses the G matrix calculated from markers and has been successfully applied to predict complex traits (Yang et al., 2010; de los Campos et al., 2013b; Habier et al., 2013). It is equivalent to the ridge-regression best linear unbiased prediction (RR-BLUP; Hoerl and Kennard, 1970; Whittaker et al., 2000; Piepho, 2009) when the similarity between the lines in genomic space is proportional to their genetic covariance (Habier et al., 2007; Goddard, 2009; Piepho, 2009; Endelman, 2011). The mixed model used in GBLUP to calculate BVs of individuals iswhere y is the vector of the response phenotypic trait, μ is the mean vector, u is the vector of genotype effects that are assumed to be multivariate normal random effects [u ∼ N(0, )], Z is the design matrix for the random effects, and ε is the vector of independent residuals assumed to have a multivariate normal distribution [ε ∼ N(0, I )]. It was implemented using the R package rrBLUP (Endelman, 2011).

Bayesian Models

The Bayesian models assume a prior marker effects distribution and are of the formwhere X is the incidence matrix for the markers and β is the vector of k marker effects. All the Bayesian models were implemented in the R package BGLR (Pérez and de los Campos, 2014). The default prior parameters were used with 50,000 iterations, and the first 5000 iterations were discarded as burn-in.

Bayesian Ridge Regression. The BRR is the Bayesian counterpart of the RR-BLUP, where the estimates of all the marker effects are shrunken toward zero. The shrinkage is independent of the effect size, but dependent on the frequency and the sample size (Gianola, 2013). The BRR is equivalent to the RR-BLUP, but instead of choosing the ridge parameter, a Gaussian prior that is independent and identically distributed (IID) with variance common to all the marker effects is used, i.e.,where βR is the vector of regression coefficients and is the a priori variance of marker effects (Pérez et al., 2010). Then, the variance parameter () was assigned a scaled-inverse χ2 density, , where are the prior degrees of freedom and scale, respectively. The default degrees of freedom parameter was used (dfβ = 5), and was solved by BGLR to match the model R2.

Bayes A. As markers may contribute differentially to the genetic variance, the assumption of a common variance to all the markers may not be realistic. So, Meuwissen et al. (2001) proposed the BA model, which induces marker-effect specific shrinkage. The scaled-t prior density is assigned to the marker effects which shrinks more strongly the markers with effects closer to zero, but does not penalize severely the markers with large effects (Xu, 2003; Gianola, 2013). This density is implemented in BGLR as a mixture of scaled-normal densities. The BA hierarchical model involves two stages. In the first stage of hierarchy, normal densities with mean zero and variance parameters that are marker-specific are assigned to the marker effects. In the second stage, IID scaled-inverse χ2 densities with known dfβ and Sβ are assigned to the marker variances (Gianola, 2013). The default degrees of freedom (dfβ = 5) was used and the scale parameter (Sβ) was assigned a γ density (G) with rate parameter (r) and shape parameter (s). The prior densities for this model, according to Pérez and de los Campos (2014), are represented as

Bayes B. Bayes B (BB, also proposed by Meuwissen et al., 2001) uses a mixture distribution prior where marker effects are assumed to be zero with probability, π, and marker effects are assumed to be drawn from a scaled-t distribution with probability, 1 – π. In BA, π = 0, but BB assumes that many markers have no effect at all, and hence π > 0 (Habier et al., 2011). Heffner et al. (2009) referred to this as a more realistic prior because certain regions of the genome are expected to have no quantitative trait loci (QTL) and thereby zero effect. While zeroing out marker effects might not be ideal for infinitesimal traits, Meuwissen et al. (2001) argued that genetic variances are distributed across loci such that only a few have genetic variance and eliminating the marker effects close to zero reduces the noise. BGLR treats the parameter π (proportion of nonnull effects) as unknown and assigns a beta (β) prior parameterized such that the expected value by E(π) = π0, and p0 is the number of prior counts. The prior densities for the BB model, according to Pérez and de los Campos (2014), are represented as

Bayes Cπ. The Bayes Cπ (BC) model that is an extension of the Bayes C model (Kizilkaya et al., 2010) is similar to BB, except that it uses a Gaussian distribution instead of the t-density distribution used by BB (Habier et al., 2011; Lorenz et al., 2011). The BC was developed to address the drawbacks of the BA and BB models. It treats the probability of markers with a zero effect (π) as unknown and estimates it instead of assuming a fixed π, as this could affect the shrinkage of marker effects (Habier et al., 2011). Hence, Bayes C is thought to be more flexible in modeling traits that are oligogenic to polygenic (Lorenz et al., 2011). The BC model implemented in BGLR is similar to BB, except that the variance parameter is estimated from the data,

Bayesian Least Absolute Shrinkage and Selection Operator. The classical least absolute shrinkage and selection operator (LASSO; Tibshirani, 1996) and its Bayesian counterpart (BL, Park and Casella, 2008) combine the features of both shrinkage and subset selection. The BL uses the double exponential distribution prior that has thick tails and places a higher density at zero (Pérez et al., 2010). This is implemented in BGLR as a mixture of scaled normal densities. The marker effects with large absolute values are shrunken less (de los Campos et al., 2009). Independent normal densities with mean zero and marker-specific variance parameter are assigned to the marker effects. A scaled-inverse χ2 density is assigned to the residual variance. The marker specific scale parameters, are assigned IID exponential densities with rate parameter (λ2/2) that was set to the default type γ in BGLR (Pérez and de los Campos, 2014). The prior densities for BL are represented as

Reproducing Kernel Hilbert Spaces

The RKHS semiparametric approach for genomic prediction (Gianola et al., 2006; Gianola and van Kaam, 2008) is expected to capture some nonadditive effects as it does not assume linearity. The RKHS model using a Gaussian kernel is of the formwhere xi and xj are the observed marker genotypes of individuals, wi and zi are the incidence vectors, β is the vector of location effects, u is the vector of additive genetic effects, αj is the regression coefficient, and εi is the error term [εi ∼ N(0, I ); Gianola et al., 2006]. The additive genetic effects u ∼ N(0,K ), where K is the reproducing Gaussian kernel, K(xi,xj) = exp{−[(xixj)′(xixj)]/h} and is the additive genetic variance. We used the BGLR package (Pérez and de los Campos, 2014) to implement three RKHS models: RKHS markers (RKHS-M) with the G-matrix calculated from markers; RKHS pedigree (RKHS-P) with the pedigree relationship matrix and RKHS markers; and pedigree (RKHS-MP) with two kernels comprising the marker and pedigree relationship matrices. These models were fitted with three arbitrarily chosen bandwidth parameters and the three accuracies were averaged.

Model Comparisons and GBS Marker Platform Comparisons

The Pearson’s correlation between the observed and the cross-validated BVs (prediction accuracy) was used to assess the predictive ability of the models. The 10-folds cross-validation was used and the training set comprised 240 lines in the 45th IBWSN, 275 lines in the 46th IBWSN, and 509 lines in the combined nurseries. To compare the BVs predicted by the different models, the Spearman’s rank correlations between the BVs for all the prediction models and traits were calculated. We also performed a hierarchical clustering to assess the similarity between the models. The cross-validated BVs for all the six datasets were standardized to zero mean and unit variance. A Euclidean distance matrix between the prediction models was then obtained using the standardized BVs and averaged for all the datasets. This distance matrix was then used to perform a hierarchical clustering of the models based on the Ward’s criterion and a dendrogram was constructed. We compared the prediction accuracies estimated from GBS markers, DArTseq markers, and a combined set including markers obtained from both platforms.


Results

Phenotypic Data Analysis

The phenotypic distributions for all the diseases in the combined dataset are shown in Fig. 1. The mean correlation for STB across the years was moderate (0.47). The mean correlation between seedling resistance to SNB and TS was also moderate (0.33). Tan spot seedling resistance and APR had a low (0.16) correlation, indicating that the genetic bases of seedling resistance and APR were different.

Fig. 1.
Fig. 1.

Phenotypic distributions for Septoria tritici blotch (STB), Stagonospora nodorum blotch (SNB), and tan spot (TS) in the 45th and 46th International Bread Wheat Screening Nursery entries. rAUDPC, relative area under the disease progression curve.

 

Relationship Matrix and Heritability Estimation

Heatmap of the relationship matrices for the 566 lines in the 45th and 46th IBWSN (Fig. 2) indicated that the pedigree relationship matrix shows a higher degree of relatedness among the lines than the genomic relationship matrix. The 566 lines comprised 366 crosses which included one family with eight full-sibs, one with seven full-sibs, three with six full-sibs, one with five full-sibs, 14 with four full-sibs, 27 with three full-sibs, 72 with two full-sibs, and 247 crosses represented by one individual per cross. The broad-sense, line-mean heritability was the highest for TS seedling (0.66), followed by SNB seedling (0.53) and STB APR (0.47). The broad-sense heritability for TS APR was moderate (0.57).

Fig. 2.
Fig. 2.

Heat map of the marker and pedigree based relationship matrices for the 566 lines in the 45th and 46th International Bread Wheat Screening Nursery, illustrating the familial relatedness (kinship) between the individuals.

 

Marker Data and Linkage Disequilibrium

We observed that GBS generated higher missing data and had a higher number of markers with minor allele frequencies close to zero than DArTseq (Fig. 3). Filtering for 50% missing data resulted in 13,913 GBS markers from 45,818 markers and 11,007 DArTseq markers from 11,211 markers. We also compared the marker distributions across all the chromosomes (Fig. 4) and found that chromosomes 2B and 4D had the highest and lowest proportion of GBS and DArTseq markers. The percentage of DArTseq markers on the A, B, and D genomes were 46.25, 46.35, and 7.4%. Similarly, the percentage of GBS markers on the A, B, and D genomes were 39.9, 50.1, and 10%. Overall, we did not observe significant differences in the percentage of marker coverage across the different chromosomes using these two whole-genome profiling approaches. The LD in the wheat chromosomes using the GBS and DArTseq markers (Fig. 5 and 6) showed striking similarities across many chromosomes. Chromosomes 2A, 4A, 4B, 6B, and 7B had large LD blocks which were observed using both the marker platforms. Although most of the D-genome chromosomes had similar LD patterns, it was hard to compare because of the limited number of markers. A few chromosomes (3A, 5A, and 4B) had slightly different LD patterns using the two marker platforms, but this could be due to the differences in the number of the GBS and DArTseq markers used to calculate LD. On chromosome 1A, four small LD blocks were observed using GBS markers but not with the DArTseq markers, thereby indicating clearly different LD patterns.

Fig. 3.
Fig. 3.

Missing marker data and minor allele frequencies of genotyping by sequencing (GBS) and diversity arrays technology-sequencing (DArTseq) markers. IBWSN, International Bread Wheat Screening Nursery.

 
Fig. 4.
Fig. 4.

Distribution of genotyping by sequencing (GBS) and diversity arrays technology-sequencing (DArTseq) markers in the wheat chromosomes expressed as percentage of total markers.

 
Fig. 5.
Fig. 5.

Linkage disequilibrium in the wheat chromosomes using the genotyping by sequencing markers expressed as r2 between marker loci. Blue represents r2 of 0–0.2, white represents r2 of 0.21–0.40, green represents r2 of 0.41–0.60, yellow represents r2 of 0.61–0.80, and red represents r2 of 0.8–1.0.

 
Fig. 6.
Fig. 6.

Linkage disequilibrium in the wheat chromosomes using the diversity arrays technology sequencing (DArTseq) markers expressed as r2 between marker loci. Blue represents r2 of 0–0.2, white represents r2 of 0.21–0.40, green represents r2 of 0.41–0.60, yellow represents r2 of 0.61–0.80, and red represents r2 of 0.8–1.0.

 

Prediction Accuracies for Septoria tritici Blotch Adult Plant Resistance

For STB APR, in the 45th IBWSN, the models that gave the highest accuracies were: RKHS-MP in the 2011 dataset; both RKHS-P and the RKHS-MP in the 2013 dataset, and RKHS-MP in the 2014 dataset (Table 1). The least accuracies were obtained using LS in all the datasets, and the increase in accuracy using the genome-wide markers was 82%. The average number of markers used as fixed effects in the LS model were six (2011), five (2013), and three (2014). The most significant marker that occurred at a higher frequency in the folds explained 7.6, 9.5, and 10% of the trait variation in the 2011, 2013, and 2014 datasets, respectively. The RKHS-P model performed similar to the genome-wide marker based models in all the datasets. In the 46th IBWSN, the models that gave the highest accuracies were: BB, BC, and BL in the 2011 dataset; BB in the 2013 dataset, and BA in the 2014 dataset. The genome-wide marker based models performed similar to LS in the 2014 dataset, but gave 95% increase in accuracy in the 2011 and 2013 datasets. The average number of markers used as fixed effects in the LS model were five (2013) and four (2011 and 2014). The most significant marker explained 9.2, 8.1, and 13.1% of the trait variation in the 2011, 2013, and 2014 datasets, respectively. Genomic prediction models did slightly better than the RKHS-P in all the datasets (16% average increase in accuracy). In the combined dataset, the models that gave the highest accuracies were: RKHS-MP in both the 2011 and 2013 datasets, and BB and BL in the 2014 dataset. The LS gave the lowest accuracies in all the datasets, and the increase in accuracy using genome-wide marker based models was 71%. The average number of markers used as fixed effects in the LS model were three, five, and six, explaining 7.2, 9.1, and 10% of the trait variation in the 2011, 2013, and 2014 datasets, respectively. The RKHS-P model performed similar to the genomic prediction models in the 2011 and the 2013 datasets, but gave slightly lower accuracy in the 2014 dataset. Similar accuracies were obtained using GBLUP, BRR, BA, BB, BC, BL, and RKHS-M models in the individual and combined nurseries across all the years. For the 2014 dataset, we also evaluated all the models without stripe rust as a covariate. This resulted in higher accuracies due to a large effect stripe rust resistance locus that was highly significant in all the folds and explained >20% variation. Using stripe rust as a covariate accounted for this, and the marker associated with stripe rust was no longer significant.


View Full Table | Close Full ViewTable 1.

Prediction accuracies for adult plant resistance (APR) to Septoria tritici blotch (STB) using different models in the 45th and 46th International Bread Wheat Screening Nurseries.

 
Model† STB (APR 2011)
STB (APR 2013)
STB (APR 2014)
45th 46th 45th and 46th 45th 46th 45th and 46th 45th 46th 45th and 46th
LS 0.24 ± 0.05 0.28 ± 0.06 0.22 ± 0.04 0.34 ± 0.05 0.19 ± 0.04 0.34 ± 0.05 0.20 ± 0.05 0.35 ± 0.05 0.28 ± 0.05
GBLUP 0.53 ± 0.04 0.43 ± 0.06 0.49 ± 0.04 0.46 ± 0.05 0.45 ± 0.04 0.49 ± 0.04 0.38 ± 0.07 0.39 ± 0.06 0.41 ± 0.04
BRR 0.53 ± 0.04 0.43 ± 0.06 0.48 ± 0.04 0.46 ± 0.05 0.45 ± 0.04 0.49 ± 0.04 0.37 ± 0.08 0.4 ± 0.06 0.40 ± 0.04
BA 0.53 ± 0.04 0.43 ± 0.07 0.49 ± 0.04 0.46 ± 0.06 0.47 ± 0.03 0.49 ± 0.04 0.38 ± 0.07 0.42 ± 0.06 0.41 ± 0.03
BB 0.53 ± 0.04 0.44 ± 0.07 0.49 ± 0.04 0.47 ± 0.06 0.48 ± 0.03 0.49 ± 0.04 0.37 ± 0.08 0.41 ± 0.06 0.42 ± 0.04
BC 0.53 ± 0.04 0.44 ± 0.06 0.49 ± 0.04 0.46 ± 0.05 0.46 ± 0.03 0.49 ± 0.04 0.39 ± 0.08 0.39 ± 0.06 0.41 ± 0.04
BL 0.51 ± 0.05 0.44 ± 0.06 0.48 ± 0.04 0.45 ± 0.05 0.46 ± 0.03 0.49 ± 0.04 0.38 ± 0.08 0.4 ± 0.06 0.42 ± 0.03
RKHS-M 0.53 ± 0.04 0.42 ± 0.06 0.49 ± 0.04 0.45 ± 0.05 0.46 ± 0.04 0.49 ± 0.04 0.39 ± 0.07 0.38 ± 0.06 0.41 ± 0.04
RKHS-P 0.54 ± 0.04 0.37 ± 0.05 0.48 ± 0.04 0.51 ± 0.04 0.41 ± 0.04 0.46 ± 0.04 0.37 ± 0.06 0.31 ± 0.04 0.34 ± 0.03
RKHS-MP 0.57 ± 0.03 0.42 ± 0.06 0.52 ± 0.03 0.51 ± 0.05 0.47 ± 0.04 0.50 ± 0.04 0.4 ± 0.07 0.39 ± 0.06 0.39 ± 0.03
BA, Bayes A; BB, Bayes B; BC, Bayes Cp; BL, Bayesian least absolute shrinkage and selection operator; BRR, Bayesian ridge regression; GBLUP, genomic best linear unbiased prediction; LS, least-squares; RKHS-M, reproducing kernel Hilbert spaces-markers; RKHS-P, reproducing kernel Hilbert spaces-pedigree; RKHS-MP, reproducing kernel Hilbert spaces-markers and pedigree.

Prediction Accuracies for Stagonospora nodorum Blotch Seedling Resistance

For SNB seedling resistance, the RKHS-MP model performed the best both in the nurseries and in the combined dataset (Table 2). Genome-wide prediction models resulted in 36% increase in accuracy over LS in the 45th IBWSN and the combined dataset, but gave similar accuracies in the 46th IBWSN. The average number of markers used as fixed effects in the LS model was five in all the datasets. The most significant marker explained 15, 17, and 16% of the variation in the 45th IBWSN, 46th IBWSN, and the combined datasets, respectively. Similar accuracies were obtained with the pedigree model (RKHS-P) and all genome-wide marker based models (GBLUP, BRR, BA, BB, BC, BL, and RKHS-M).


View Full Table | Close Full ViewTable 2.

Prediction accuracies for Stagonospora nodorum blotch (SNB) and tan spot (TS) using different models in the 45th and 46th International Bread Wheat Screening Nurseries.

 
Model† SNB (seedling)
TS (seedling)
TS (APR)
45th 46th 45th and 46th 45th 46th 45th and 46th 45th 46th 45th and 46th
LS 0.43 ± 0.05 0.45 ± 0.04 0.43 ± 0.04 0.51 ± 0.05 0.43 ± 0.04 0.41 ± 0.03 0.28 ± 0.06 0.34 ± 0.04 0.35 ± 0.03
GBLUP 0.57 ± 0.04 0.49 ± 0.04 0.60 ± 0.02 0.76 ± 0.02 0.57 ± 0.03 0.66 ± 0.02 0.47 ± 0.05 0.42 ± 0.06 0.56 ± 0.03
BRR 0.58 ± 0.04 0.49 ± 0.04 0.60 ± 0.02 0.76 ± 0.02 0.56 ± 0.03 0.66 ± 0.02 0.48 ± 0.05 0.4 ± 0.06 0.56 ± 0.03
BA 0.57 ± 0.04 0.49 ± 0.04 0.60 ± 0.02 0.76 ± 0.03 0.56 ± 0.04 0.66 ± 0.02 0.48 ± 0.05 0.41 ± 0.06 0.56 ± 0.03
BB 0.57 ± 0.04 0.49 ± 0.04 0.59 ± 0.02 0.76 ± 0.02 0.56 ± 0.03 0.66 ± 0.02 0.47 ± 0.05 0.41 ± 0.06 0.56 ± 0.03
BC 0.58 ± 0.04 0.49 ± 0.04 0.59 ± 0.02 0.76 ± 0.02 0.56 ± 0.03 0.66 ± 0.02 0.48 ± 0.05 0.41 ± 0.06 0.56 ± 0.03
BL 0.57 ± 0.04 0.49 ± 0.04 0.60 ± 0.02 0.75 ± 0.02 0.56 ± 0.03 0.66 ± 0.02 0.48 ± 0.05 0.42 ± 0.06 0.56 ± 0.04
RKHS-M 0.58 ± 0.04 0.49 ± 0.04 0.59 ± 0.03 0.75 ± 0.02 0.56 ± 0.02 0.66 ± 0.02 0.47 ± 0.05 0.41 ± 0.06 0.56 ± 0.03
RKHS-P 0.55 ± 0.04 0.49 ± 0.03 0.60 ± 0.02 0.65 ± 0.03 0.55 ± 0.04 0.62 ± 0.03 0.46 ± 0.04 0.38 ± 0.04 0.52 ± 0.03
RKHS-MP 0.59 ± 0.03 0.52 ± 0.03 0.63 ± 0.02 0.77 ± 0.03 0.58 ± 0.04 0.68 ± 0.02 0.52 ± 0.05 0.40 ± 0.05 0.57 ± 0.03
APR, adult plant resistance; BA, Bayes A; BB, Bayes B; BC, Bayes Cp; BL, Bayesian least absolute shrinkage and selection operator; BRR, Bayesian ridge regression; GBLUP, genomic best linear unbiased prediction; LS, least-squares; RKHS-M, reproducing kernel Hilbert spaces-markers; RKHS-P, reproducing kernel Hilbert spaces-pedigree; RKHS-MP, reproducing kernel Hilbert spaces-markers and pedigree.

Prediction Accuracies for Tan Spot Seedling Resistance

For TS seedling resistance, LS gave the lowest accuracies in all the datasets and the RKHS-MP model gave slightly higher accuracies (although it was not statistically different from the other genome-wide marker based models; Table 2). The increase in accuracy using genomic prediction models over LS was 48%. The average number of markers used as fixed effects in the LS model was four (45th IBWSN) and three (46th IBWSN and combined dataset). The most significant marker explained 23, 18, and 17% of the variation in the 45th IBWSN, 46th IBWSN and the combined datasets, respectively. The RKHS-P model performed similar to the genome-wide marker based models in the 46th IBWSN and in the combined dataset. But in the 45th IBWSN, genomic prediction models performed slightly better than the pedigree (15.4% increase in accuracy). The accuracies obtained using GBLUP, BRR, BA, BB, BC, BL, and RKHS-M models were similar.

Prediction Accuracies for Tan Spot Adult Plant Resistance

For TS APR, the RKHS-MP model gave the highest accuracy in the 45th IBWSN and the combined dataset. But GBLUP and BL models gave the highest accuracy in the 46th IBWSN. The LS gave the lowest accuracies and the increase in accuracy using genome-wide markers was 50%. The average number of markers used as fixed effects in the LS model were seven (45th IBWSN), one (46th IBWSN), and two (combined dataset). The most significant marker explained 11, 15, and 16% of the variation the 45th and 46th IBWSN and the combined dataset, respectively. There were no significant differences in the accuracies obtained from GBLUP, BRR, BA, BB, BC, BL, RKHS-M, and RKHS-P.

Comparisons between the Prediction Models

Overall, the RKHS-MP gave slightly higher accuracies and LS, the lowest, across all the datasets. The accuracies obtained using GBLUP, BRR, BA, BB, BC, BL, and RKHS-M models were similar. The RKHS-P model accuracies were not significantly different from the genome-wide marker based models. The average accuracies obtained using LS, genomic prediction models, RKHS-P, and RKHS-MP were: 0.27, 0.45, 0.42, and 0.46, respectively for STB; 0.44, 0.55, 0.55, and 0.58, respectively, for SNB; 0.45, 0.66, 0.61, and 0.68, respectively, for TS (seedling) and 0.32, 0.48, 0.45, and 0.50, respectively, for TS (APR). A cluster dendrogram with the hierarchical clustering of the prediction models based on cross-validated BVs (shown in Fig. 7), makes it clear that the LS was different from the other models and branched out separately. RKHS-P and the RKHS-MP models clustered together. The BL, BA, BB, BC, BRR, GBLUP, and RKHS-M models clustered together. The Spearman’s rank correlations between BVs for all the pairs of models (Table 3) shows that the BVs obtained from LS and RKHS-P had a moderate correlation with the BVs obtained from the genome-wide marker based models (0.57 and 0.68, respectively). The correlations among the BVs obtained from the other genome-wide marker based models were close to unity.

Fig. 7.
Fig. 7.

Cluster dendrogram showing the hierarchical clustering of the prediction models based on cross-validated estimated breeding values (BVs). RKHS, reproducing kernel Hilbert space; BLUP, best linear unbiased predictor; LASSO, least absolute shrinkage and selection operator.

 

View Full Table | Close Full ViewTable 3.

Spearman’s rank correlations between estimated breeding values (BVs) for all the pairs of models.†

 
LS GBLUP BRR BA BB BC BL RKHS-M RKHS-P RKHS-MP
LS 1.00 0.57 0.57 0.55 0.56 0.56 0.57 0.57 0.46 0.54
GBLUP 0.57 1.00 1.00 0.99 1.00 1.00 0.99 1.00 0.68 0.91
BRR 0.57 1.00 1.00 0.99 0.99 0.99 0.99 0.99 0.68 0.91
BA 0.55 0.99 0.99 1.00 0.99 0.99 0.99 0.99 0.68 0.91
BB 0.56 1.00 0.99 0.99 1.00 0.99 0.99 0.99 0.68 0.91
BC 0.56 1.00 0.99 0.99 0.99 1.00 0.99 0.99 0.68 0.91
BL 0.57 0.99 0.99 0.99 0.99 0.99 1.00 0.99 0.68 0.91
RKHS-M 0.57 1.00 0.99 0.99 0.99 0.99 0.99 1.00 0.68 0.91
RKHS-P 0.46 0.68 0.68 0.68 0.68 0.68 0.68 0.68 1.00 0.87
RKHS-MP 0.54 0.91 0.91 0.91 0.91 0.91 0.91 0.91 0.87 1.00
BA, Bayes A; BB, Bayes B; BC, Bayes Cp; BL, Bayesian least absolute shrinkage and selection operator; BRR, Bayesian ridge regression; GBLUP, genomic best linear unbiased prediction; LS, least-squares; RKHS-M, reproducing kernel Hilbert spaces-markers; RKHS-P, reproducing kernel Hilbert spaces-pedigree; RKHS-MP, reproducing kernel Hilbert spaces-markers and pedigree.

Comparisons between Two Whole-Genome Profiling Approaches for Genomic Prediction

In the STB APR 2011 dataset, GBS markers, DArTseq markers, and the combined marker set gave similar prediction accuracies using both LS and the genome-wide marker based models (Fig. 8). In the STB APR 2013 dataset, GBS markers performed slightly better than the DArTseq markers with both LS (36% increase in accuracy) and genome-wide marker based models (21% increase in accuracy). The accuracies obtained using the combined marker set were similar to the accuracies obtained from the GBS markers. In the case of the STB APR 2014 dataset, GBS markers, DArTseq markers, and the combined marker set gave similar accuracies with the LS. But with the genome-wide marker based models, GBS markers gave 36% increase in accuracy over DArTseq markers. The accuracies obtained using the combined marker set were not significantly different from the accuracies obtained from GBS markers. For SNB seedling resistance, GBS markers performed slightly better than the DArTseq markers with both LS (34% increase in accuracy) and genome-wide marker based models (36% increase in accuracy). In the case of TS, DArTseq markers performed slightly better (18% increase in accuracy) for seedling resistance and GBS markers performed slightly better (33% increase in accuracy) for APR, using LS. But with the genome-wide marker based models, GBS markers outperformed the DArTseq-markers for both seedling resistance and APR (33.3 and 51.6% increase in accuracy, respectively). The combined marker set had slightly lower accuracies than the GBS markers with the genome-wide marker based models for both seeding resistance and APR, but the differences were not significant.

Fig. 8.
Fig. 8.

Prediction accuracies for Septoria tritici blotch, Stagonospora nodorum blotch, and tan spot in the 45th International Bread Wheat Screening Nursery using genotyping by sequencing (GBS), diversity arrays technology-sequencing (DArTseq), and both.

 


Discussion

Among the diseases, the mean genomic prediction accuracies were the highest for seedling resistance to TS (0.66) and SNB (0.55), followed by APR to TS (0.48) and STB (0.45). The same trend was also observed with the LS approach and the highest prediction accuracies were obtained for seedling resistance to TS (0.45) and SNB (0.44), followed by APR to TS (0.32) and STB (0.27). These results indicate that genomic prediction models perform better than the LS approach, which is consistent with several previous studies (Meuwissen et al., 2001; Bernardo and Yu, 2007; Habier et al., 2007; Muir, 2007; Piyasatian et al., 2007; Lorenzana and Bernardo, 2009; Moser et al., 2009; Heffner et al., 2011a, 2011b; Rutkoski et al., 2014). The average increase in accuracy using genomic prediction models compared to the LS approach was 48%. This is consistent with several previous reports: 41% (Meuwissen et al., 2001); 18 and 43% for a trait that has high and low heritability, respectively (Bernardo and Yu, 2007); 32% (Piyasatian et al., 2007) and 28% (Heffner et al., 2011a). Bernardo (2014) used simulations to show that the known QTL can be fit as fixed effects only when they explain more than 10% of the genetic variance. In our study, the genetic variance explained by the most significant marker ranged from 7 to 23%. For STB APR, the significant markers used as fixed effects explained <10% variation (except in one dataset), which indicates that resistance in these nurseries is quantitative and controlled by many genes with moderate to small effects. However, we also observed that for SNB and TS seedling resistance where the most significant marker explained >10% variation, there were significant differences between the accuracies of LS and GBLUP, indicating that in addition to the large effect loci, there were also minor loci controlling these traits. These results clearly demonstrate the advantage of using genome-wide markers for complex traits that are controlled by several QTL and support the infinitesimal model of Fisher (1918).

The GBLUP and BRR models resulted in accuracies similar to the other Bayesian models, despite the assumption that all the marker effects have equal variance. The use of different prior distributions for the marker effects in the Bayesian models did not affect the prediction accuracies. This is consistent with several previous studies that report similarities between these models for different traits: RR-BLUP and Bayesian regression (same as BA) for two traits in dairy bulls (Moser et al., 2009); BA, BB, and BC using simulated and real data (Habier et al., 2011); RR-BLUP and BC for quantitative traits in elite North American oats (Asoro et al., 2011); RR, BA, BB, and BC for several traits in wheat (Heffner et al., 2011a); RR-BLUP, BC, and BL for Fusarium head blight resistance in barley (Lorenz et al., 2012); RR and BL for stem and stripe rust resistance in wheat (Ornella et al., 2012); GBLUP, BC, and BL for stem rust APR resistance in wheat (Rutkoski et al., 2014). However, few studies have reported slightly higher accuracies with some Bayesian models. Some of the models that showed slight superiority over others are: BA and BB over best linear unbiased predictor (BLUP) in simulations (9 and 16% increase in accuracy, respectively; Meuwissen et al., 2001); BB over GBLUP in simulations (Clark et al., 2011); BC and BA over RR-BLUP and BL for Fusiform rust resistance in Loblolly pine (Resende et al., 2012); RR-BLUP and BB over BA and BC models for predicting hybrid wheat performance (Zhao et al., 2013). Few studies have also reported the superiority of GBLUP over models that use different prior distributions especially when the trait was controlled by a few QTL with large effects (Luan et al., 2009; Zhong et al., 2009; Daetwyler et al., 2010). However, for the traits that we analyzed, the equal variance assumption still holds good and the differential shrinkage of the Bayesian models which involves higher computational time might be unnecessary.

The RKHS-M model performed similar to GBLUP in our study. While some studies (Gianola et al., 2006; Crossa et al., 2010; Howard et al., 2014) have reported that the nonparametric models performed better than the parametric ones, Crossa et al. (2013) concluded that there was no clear superiority of either of the models. An interesting observation was that the RKHS-P did very well and markers only gave 5.6% improvement in overall accuracies. This is consistent with several studies that have reported slight superiority of marker based models over the pedigree (Crossa et al., 2010; Spindel et al., 2015). However, the genomic-based relationship is expected to predict the allele sharing (within family variation) or the Mendelian sampling with better accuracy (Villanueva et al., 2005; Daetwyler et al., 2007; Goddard and Hayes, 2007). Some of the benefits of using the G-matrix include: avoiding selection of closely related sibs (Daetwyler et al., 2007), providing better accuracies when unrelated individuals are involved (van der Werf, 2009), and correcting for pedigree errors (Munoz et al., 2014). The high accuracies obtained with the pedigree in our study can be due to the following: the excellent pedigree recording system at CIMMYT that goes back several generations, small family sizes that have a minimal Mendelian sampling component for markers, and the inclusion of full-sibs in both the training and validation sets. But it should be noted that this resulted from the use of late generation lines and might not work as well for unselected lines in early generations. We also observed that the RKHS-MP model performed better than just the pedigree (9.9% increase in accuracy) and markers (3.6% increase in accuracy) alone and gave the highest accuracies for most datasets. This is consistent with several previous studies (de los Campos et al., 2009; Crossa et al., 2010, 2013; Burgueño et al., 2012). Thus, the pedigree in conjunction with molecular markers can enhance the accuracy of selections.

Our comparisons between the LD captured by the two whole-genome profiling approaches, GBS and DArTseq, indicated that they were similar except for a few chromosomes. For predictions using the LS approach, the accuracies were similar for GBS and DArTseq in the STB 2014 dataset and DArTseq performed better than GBS for TS seedling resistance (17.6% increase in accuracy). But, GBS performed slightly better than DArTseq in all the other datasets, resulting in 34% mean increase in accuracy. Similarly, GBS performed slightly better (28.4% increase in mean accuracy) than DArTseq for all the diseases using genomic prediction models. This is consistent with a previous study by Heslot et al. (2013), who obtained a higher accuracy using GBS markers compared with DArT markers. We attribute our results to the following: (i) Both the approaches use different restriction enzymes for complexity reduction. While GBS uses the combination of PstI and MspI (Poland et al., 2012), DArTseq uses two complexity reduction methods with PstI/HpaII and PstI/HhaI followed by selection of a subset of fragments (Sansaloni et al., 2011; Li et al., 2015). (ii) DArTseq is done at a higher sequencing depth and uses strict filtering criteria that generates markers with less missing data compared with GBS. But this could also lead to the loss of some rare informative markers. (iii) Although it was not possible to compare the differences in marker coverage across the genome using the two approaches (because the positions of all the markers were not available), inadequate marker coverage in regions associated with the trait could lead to higher accuracies with one approach over the other. An interesting observation was that the combined marker set with both GBS and DArTseq markers did not improve the prediction accuracies. This might be due to the redundancy in information captured by both these marker platforms. However, our results could be specific to the population used and the traits considered. It might not be necessarily true for other populations and traits.

In conclusion, we have used a range of models including a variable selection method, shrinkage methods, kernel-based methods, and two whole-genome profiling approaches to predict resistance to wheat leaf spotting diseases. Our results clearly indicate that using genomic prediction is advantageous to selecting based on a few markers in marker-assisted selection. While model choice and genotyping approach are key elements for implementing GS, the genetic architecture of the trait, heritability, marker density, LD between the QTL and the markers, training population size, and the relatedness between the individuals in the training and validation populations also play an important role in making decisions (de Roos et al., 2009; Lorenzana and Bernardo, 2009; Luan et al., 2009; Daetwyler et al., 2010; Clark et al., 2011; de los Campos et al., 2013a; Howard et al., 2014). We hope that implementing GS in breeding for complex leaf spotting disease resistance in wheat will result in higher accuracy and rapid gains from selection.

Acknowledgments

We thank Monsanto’s Beachell and Borlaug international scholars program for providing support to Miss Juliana and the research. We are also very grateful to Dr. Sukhwinder Singh, Dr. Susanne Dreisigacker, the USAID feed the future innovation lab at Kansas State University, CGIAR research program wheat and Seeds of discovery project for their contributions to the plant materials, phenotyping data, and genotyping data.

 

References

Footnotes


Comments
Be the first to comment.



Please log in to post a comment.
*Society members, certified professionals, and authors are permitted to comment.