About Us | Help Videos | Contact Us | Subscriptions

The Plant Genome - Article



This article in TPG

  1. Vol. 7 No. 3
    unlockOPEN ACCESS
    Received: Feb 09, 2014
    Published: July 10, 2014

    * Corresponding author(s): mes12@cornell.edu
Request Permissions


Genomic Selection for Quantitative Adult Plant Stem Rust Resistance in Wheat

  1. Jessica E. Rutkoskia,
  2. Jesse A. Polandb,
  3. Ravi P. Singhc,
  4. Julio Huerta-Espinod,
  5. Sridhar Bhavanie,
  6. Hugues Barbiera,
  7. Matthew N. Rousef,
  8. Jean-Luc Janninkg and
  9. Mark E. Sorrells *a
  1. a Dep. of Plant Breeding and Genetics, Cornell Univ., Ithaca, NY 14853
    b Dep. of Plant Pathology, Kansas State Univ., Manhattan, KS 66506
    c International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, 06600 El Batan, Mexico
    d Campo Experimental Valle de México INIFAP, Apdo. Postal 10, 56230 Chapingo, Edo de México, Mexico
    e CIMMYT, ICRAF House, United Nations Avenue, Gigiri, Village Market-00621, Nairobi, Kenya
    f USDA-ARS, Cereal Disease Lab. and Dep. of Plant Pathology, Univ. of Minnesota, St. Paul, MN 55108
    g USDA-ARS, Ithaca, NY 14853


Quantitative adult plant resistance (APR) to stem rust (Puccinia graminis f. sp. tritici) is an important breeding target in wheat (Triticum aestivum L.) and a potential target for genomic selection (GS). To evaluate the relative importance of known APR loci in applying GS, we characterized a set of CIMMYT germplasm at important APR loci and on a genome-wide profile using genotyping-by-sequencing (GBS). Using this germplasm, we describe the genetic architecture and evaluate prediction models for APR using data from the international Ug99 stem rust screening nurseries. Prediction models incorporating markers linked to important APR loci and seedling phenotype scores as fixed effects were evaluated along with the classic prediction models: Multiple linear regression (MLR), Genomic best linear unbiased prediction (G-BLUP), Bayesian Lasso (BL), and Bayes Cπ (BCπ). We found the Sr2 region to play an important role in APR in this germplasm. A model using Sr2 linked markers as fixed effects in G-BLUP was more accurate than MLR with Sr2 linked markers (p-value = 0.12), and ordinary G-BLUP (p-value = 0.15). Incorporating seedling phenotype information as fixed effects in G-BLUP did not consistently increase accuracy. Overall, levels of prediction accuracy found in this study indicate that GS can be effectively applied to improve stem rust APR in this germplasm, and if genotypes at Sr2 linked markers are available, modeling these genotypes as fixed effects could lead to better predictions.


    APR, adult plant resistance; BCπ, Bayes Cπ; BL, Bayesian Lasso; G-BLUP, genomic best linear unbiased prediction; GBS, genotyping-by-sequencing; GS, genomic selection; MLR, multiple linear regression; QTL, quantitative trait loci; STS, sequence tagged site

Stem rust is a globally widespread and highly damaging disease of wheat, capable of causing up to 100% yield losses in susceptible varieties (Park, 2007). After adoption of resistant varieties during the 1950s, outbreaks of stem rust became rare. However, the recent emergence of a new stem rust race group named Ug99 (Pretorius et al., 2000) capable of infecting the majority of the worlds’ wheat germplasm (Singh et al., 2006), has highlighted the need for breeding efforts focused on durable stem rust resistance.

Resistance to stem rust generally falls into two categories: (i) all stage resistance, which is often conferred by race-specific genes involved in pathogen recognition and associated with a hypersensitive response, and (ii) slow rusting APR, which is quantitative resistance often conferred by multiple loci, and is not associated with a hypersensitive response. Quantitative resistance is usually considered more durable than that conferred by pathogen recognition genes (Parlevliet, 2002); however, it must be improved over multiple cycles of selection using well-managed screening nurseries for evaluation.

Genomic selection (Meuwissen et al., 2001; reviewed by Lorenz et al., 2011, and Heffner et al., 2009) is breeding technology that may increase rates of genetic gain for quantitative traits. With GS, a genomic prediction model is used to predict breeding values of selection candidates, and selections are made based on these predictions. A model training population consisting of relevant individuals that have been both genotyped and phenotyped is used to calibrate the prediction model.

Various genomic prediction models have been developed. Models differ according to how markers of different effect sizes are treated. Genomic best linear unbiased prediction (Bernardo, 1994; Piepho, 2009) treats markers homogenously, whereas Bayesian methods such as BL (Park and Casella, 2008) and BCπ (Habier et al., 2011) treat markers of different effect sizes heterogeneously. Such methods are expected to better model traits with large-effect quantitative trait loci (QTL).

Because moderate effect genes, such as Sr2 and Lr34, also known as Sr57, are known to be involved in stem rust APR (Sunderwirth and Roelfs, 1980; Dyck, 1987; Singh et al., 2012), prediction models that attempt to realistically model these loci may be more accurate than a standard G-BLUP model. Markers linked to these loci could be predictive alone or modeled as fixed effects in combination with genome-wide markers. Similarly, seedling resistance phenotypes, which are often collected in addition to APR, could be useful fixed-effects predictor variables. The objective of this study was to compare prediction models for stem rust APR and to determine if explicitly modeling large-effect loci or seedling phenotypes as fixed effects in a G-BLUP model could lead to higher accuracies than those achieved with G-BLUP or Bayesian models.

Materials and Methods

Phenotypic Data

Adult Plant Stage

Three hundred sixty five advanced CIMMYT breeding lines were used in all analyses. Quantitative stem rust APR was phenotyped at the international Ug99 stem rust screening nurseries: Kenya Agricultural Research Institute, Njoro, Kenya, and the Ethiopian Institute of Agricultural Research, Debre Zeit, Ethiopia, between 2007 and 2012, as described in Yu et al. (2011). Data was from 12 environments (location and season combinations), three of which were at Debre Zeit. Kingbird and PBW343 served as moderately resistant and moderately susceptible check cultivars. Each breeding line, excluding the checks, appeared in approximately four of the 12 environments, and appeared only once per environment. Each plot consisted of two 70 cm rows spaced 30 cm apart. Disease severity was measured visually on a modified Cobb scale (Peterson et al., 1948). Measurements were taken between the early and late dough stage and a week to 10 d later. Phenotypic distributions within environments are shown in Fig. 1. A Box-Cox transformation was applied before all analyses (Box and Cox, 1964) to avoid nonnormal residuals.

Figure 1.
Figure 1.

Phenotypic distributions of stem rust severity within each environment. OS, off-season; MS, main-season.


Seedling Stage

Lines were evaluated at the seedling stage for reaction to Ug99 stem rust race TTKSK, isolate 04KEN156/04, at the USDA-ARS Cereal Disease Laboratory using cool and normal post-inoculation temperature treatments. Seedlings were inoculated as in Jin et al. (2007) and then placed in a growth chamber with a 14 h photoperiod at 18°C day and 15°C night for the cool treatment and 22°C day and 19°C night for the normal treatment. Seedling evaluations at both cool and normal treatments were replicated twice. Infection types on a zero to four scale as in Stakman et al. (1962) were recorded 14 d postinoculation and then converted into a numerical value from zero to nine as described by Zhang et al. (2011). Stakman infection types ≥ 3 were considered high infection types. Infection type “; ” describes the observation of visible chlorotic spots associated with hypersensitive resistance. When multiple infection types were observed on a single leaf, all infection types were recorded starting with the most commonly observed infection type.

Heritability Estimation

Broad sense heritability (H2) on a line mean basis was calculated according to Hallauer et al. (2010). Variance components were estimated in R v. 3.0.1 (R Development Core Team, 2010) using the package lme4 (Bates and Maechler, 2010).

Genotypic Data

Genome-Wide Genotyping

Genotyping-by-sequencing (Elshire et al., 2011) was used to generate genome-wide markers according to the protocol described in Poland et al. (2012a). A total of 27,434 polymorphisms were detected. Missing data were imputed using random forest imputation described in Poland et al. (2012b) as recommended by Rutkoski et al. (2013). Markers with >50% missing data were removed and a set of nonredundant GBS markers with pairwise r2 values < 0.8 were selected (Carlson et al., 2004), leaving 4040 markers.

Loci Targeted Genotyping

Markers targeted to Sr2 and Lr34 were genotyped using sequence tagged site (STS), simple sequence repeat, and KASPar (www.lgcgenomics.com) assays. All KASPar assays were run at the Eastern Regional Small Grains Genotyping Laboratory, Raleigh, NC. For Lr34, two gene based KASPar assays were used to determine presence or absence of the resistance allele based on sequence polymorphism reported by Lagudah et al. (2009). The STS marker csLV34 (Lagudah et al., 2006), 0.4 cM from Lr34, was also assayed. For Sr2, the simple sequence repeat marker gwm533 (Spielmeyer et al., 2003), the STS marker csSr2 (Mago et al., 2011), and a KASPar assay based on the polymorphism targeted by csSr2 (referred to as csSr2_KASPar) were used.

Genotypic Value Estimation

The R package rrBLUP (Endelman, 2011) was used to calculate the restricted maximum likelihood (REML) solutions for the mixed modelwhere Y is the vector of phenotypes, β is the vector of environment effects treated as fixed, u is the vector of genotype effects treated as random, X and Z are the design matrices relating β and u to the observations in Y, and ε is the vector of residual errors. Genetic values, u, were deregressed according to Garrick et al. (2009). Deregressed genetic values, YGV, were calculated aswhere is the genetic variance and PEV is a vector of prediction error variances. Solutions for both and PEV were returned from the mixed model fit using rrBLUP. Deregressed genetic values, YGV, were used to validate prediction models. Deregression was appropriate because individuals had different numbers of observations. Genetic values for individuals with few observations are shrunk more towards zero than genetic values of individuals with many observations.

Genome-Wide Association

Genome-wide association was performed using a mixed model accounting for kinship (Yu et al., 2006). According to Kang et al. (2010), variance components were estimated once by fitting the mixed model:

and . I is an identity matrix and G is a marker relationship matrix which was calculated according to VanRaden (2008), implemented in the R package GAPIT (Lipka et al., 2012). For each marker k with MAF ≥ 0.05, a total of 3903 markers, we estimated its effect βk and F-statistic, testing the null hypothesis that βk = 0, in the model:

βk is the effect of marker k, Xk is the marker genotype matrix of marker k, and . One thousand permutations (Churchill and Doerge, 1994) were used to calculate the p-value significance threshold at an experimentwise α of 0.05.

Prediction Models

Fixed Effects Models

Two MLR methods were used, A and B. MLR A consisted of a marker selection and marker effect estimation step. Both marker selection and marker estimation were performed within the model training set only. For variable selection, p-values from a genome-wide association analysis were used to rank markers. No kinship correction was used because markers that capture kinship are useful for prediction within the population of interest, even though they may not be linked to causative loci. Then, for each iteration i through l, a marker was added to the model:where β0 is the mean, βi is the effect of marker i, and Xi is the marker genotype matrix of marker i. After each iteration, the fivefold cross validation accuracy was calculated within the training set and when Accuracyl–1 > Accuracyl, the model with l – 1 markers was selected. Predicted breeding values of an individual j were calculated as

For MLR B, the marker selection step was done only among the five markers linked to candidate genes.

Mixed Models

For G-BLUP (Bernardo, 1994; Piepho, 2009), breeding values were predicted using the mixed model.where the solutions for u consist of the genomic estimated breeding values. G-BLUP was implemented using the R package rrBLUP (Endelman, 2011). G-BLUP A was a version of G-BLUP that included selected markers as fixed effects in the G-BLUP model and all markers as random effects. By selecting markers as fixed effects, we assume that each selected marker has a unique variance. For fixed effect variable selection, p-values from a genome-wide association analysis without structure correction were used to rank markers, then for each iteration i through l, a marker was added to the modelfor each iteration fivefold cross validation accuracy within the training set was calculated. When Accuracyl–1 > Accuracyl, the model with l – 1 fixed effect markers was selected. Predicted breeding values of each individual j were calculated as

For G-BLUP B, the fixed effect marker selection step was done only among the five markers linked to candidate genes. For G-BLUP T, the fixed effects were the seedling phenotypes for the normal and cool treatments.

Bayesian Models

The general model for BL (Park and Casella, 2008) and BCπ was:

X is a design matrix for the markers, and β is a vector of m marker effects. Predicted breeding values were estimated as:

For BL, the marginal prior of marker effects was a double exponential (Pérez et al., 2010). Bayesian Lasso was implemented in the R package BLR (de los Campos and Perez Rodriguez, 2010). For BCπ (Habier et al., 2011), the prior for βi depends on a common marker variance and the prior probability, π, that marker i has no effect. The priors and prior parameters were as described in Habier et al. (2011). BCπ was implemented in R using code adapted from R.L. Fernando (personal communication, 2010). For both BL and BCπ, a total of 60,000 iterations were used and the first 20,000 were excluded as burn-in.

Prediction Model Accuracy Calculation

Prediction accuracies were calculated using 10-fold cross validation. Cross validation folds were selected to be representative samples using cluster assignment information from hierarchical agglomerative clustering (Fraley and Raftery, 2002) implemented using the R package ‘mclust’ (Fraley et al., 2012). One accuracy value was computed for each model by computing the Pearson’s correlation (r) between the deregressed genetic values YGV and the predicted breeding values. Accuracies were computed using two different marker sets: GBS markers only, and all available markers. In addition to accuracy, Spearman’s rank correlations between the estimated breeding values for all possible pairs of prediction models was computed to compare prediction model outcomes.

Significance Testing among Prediction Model Accuracies

Statistical significance between prediction model accuracies was determined using paired, two-sided t tests performed by bootstrapping. The inference space for model comparison was CIMMYT spring wheat absent of major genes effective against stem rust race TTKST, evaluated for stem rust APR between 2007 and 2012, and identified as candidates for release to international partners, a population of about 500 lines. The set of 365 individuals from that population was randomly split into a training set of 265 individuals and a validation set of 100 individuals. Then, for each iteration, bootstrapped samples of the training set and validation sets were drawn. To simulate the sampling variability of polymorphisms detected using GBS, a sample of GBS markers of size 2694 (2/3 of the total markers) was also drawn. This is equivalent to taking a bootstrap sample of markers and then only using nonredundant markers for model fitting. Selection of nonredundant markers is a common practice before GWAS or GS. Using this sampled dataset, prediction accuracy was measured using all prediction models except BCπ and BL, which were excluded to reduce computational burden. This process was repeated for 1000 iterations. For a given pair of models, the accuracy vectors were subtracted to create a distribution of differences. A two-tailed p-value was calculated by calculating the frequency of values above or below 0, multiplied by two. Mean accuracies for each model were also calculated.

The bootstrap t testing procedure for model comparison relies on several assumptions. The first assumption is that the sample of 275 individuals in the training set and the sample of 100 individuals in the validation set are representative of the population from which they were originally sampled, which is met as long as the samples are sufficiently large and selected from the population at random. The second assumption is that the observations, in our case deregressed genetic values, are independent. Nonindependence can arise if the values consist of repeated measurements on the same individuals or if the data consists of clusters of individuals more similar to each other than what would be expected based on random sampling from the original population. The third assumption is that the observations are identically distributed, meaning that there are no systematic trends in the mean or variance of the values.


Phenotypic Data

Adult plant stem rust resistance was highly heritable, with a line mean broad sense heritability of 0.82. The absence of race-specific resistance genes effective against TTKST in the set of 365 lines was confirmed with seedling phenotypes, which were all high infection types under normal temperatures. Variation in high infection types was observed among the susceptible lines ranging from Stakman infection type “3” to “3+”. Under lower temperature conditions, 15 of the lines had low infection types ranging from “;13” to “3+;”. The resistance genes conferring these low infection types at the cool temperature treatment are not known. The seedling phenotypes converted to a numerical scale were weakly correlated with the genetic values for APR, with correlations of 0.1 and 0.19 for the normal and cool treatments, respectively. Both correlations were significant, with p-values of 0.049 and 3for the normal and cool treatments, respectively.

Genome-Wide Association Analysis

Eight markers were associated with stem rust resistance (Table 1). csSr2_ KASPar, explained 27% of the variation in the genotypic values. Both csSr2 and csSr2_ KASPar are tightly linked to Sr2 located on chromosome 3BS (Mago et al., 2011). Two other markers associated with stem rust resistance are known to be located on chromosome 3BS based on the Synthetic × Opata genetic map (Poland et al., 2012a). The remaining four associated markers have unknown map locations. Pairwise associations between significant markers, measured in r2 indicated that two markers of unknown map location are associated, r2 ≥ 0.4, with markers known to be on chromosome 3B (Fig. 2). The two remaining markers of unknown location are not associated with each other or other significant markers.

View Full Table | Close Full ViewTable 1.

Markers significantly associated with adult plant stem rust resistance.

Marker MAF† p-value Effect r2 Chromosome
csSr2_KASPar 0.29 3.38 × 10–10 0.54 0.27 3BS
csSr2 0.16 1.21 × 10–8 0.65 0.17 3BS
GBS_13164 0.19 1.62 × 10–6 0.6 0.15
GBS_11008 0.29 7.09 × 10–6 0.49 0.08 3BS
GBS_1863 0.20 1.01 × 10–5 0.51 0.17
GBS_7565 0.30 1.19 × 10–5 0.48 0.07
GBS_10286 0.12 2.83 × 10–5 –0.61 0.08
GBS_20803 0.32 4.27 × 10–5 0.42 0.19 3BS
Minor allele frequency.
Figure 2.
Figure 2.

Pairwise associations, measured in r2, between markers significantly associated with adult plant stem rust resistance.


The marker relationship matrix shows several small groups of closely related individuals, indicating family structure (Fig. 3). Based on pedigree information, 147 of the individuals were derived from 26 full-sib families. Individuals derived from the same full-sib family were found to group together based on the relationship matrix (Fig. 3). Principal components analysis of the relationship matrix also illustrated a similar pattern of family relationships (Fig. 4); however, principal components one and two explained only 14.4 and 2.9% of the variation, respectively. Correcting for kinship during genome-wide association was necessary to obtain uniformly distributed p-values (Fig. 5). Further correcting for population structure using principal components did not improve uniformity of p-values.

Figure 3.
Figure 3.

Heatmap of the marker relationship matrix illustrating family structure. Individuals derived from the same full-sib family share a common symbol.

Figure 4.
Figure 4.

Principal components (PC) analysis of the marker relationship matrix. Individuals derived from the same full-sib family share a common symbol.

Figure 5.
Figure 5.

Quantile-quantile plot of the p-values from genome-wide association comparing the p-value distribution to a uniform null distribution.


Prediction Model Accuracies

The marker set containing all markers, both GBS and gene targeted markers, always resulted in higher accuracies than the marker set containing only GBS markers based on accuracies calculated using cross validation (Table 2) and bootstrapping (Table 3). Among the GS models, G-BLUP B and G-BLUP A lead to the highest cross validation prediction accuracies, followed by G-BLUP T, BL, and BCπ. Based on a bootstrap significance testing procedure, probabilities that pairs of model accuracies were different due to chance (p-values) for all models, except BL and BCπ were estimated (Table 3). For comparisons between G-BLUP B, and ordinary G-BLUP or MLR models, p-values were always <0.15.

View Full Table | Close Full ViewTable 2.

Cross validation prediction accuracies for adult plant stem rust resistance using different prediction models and marker sets.

Prediction model† All markers GBS‡ markers only
MLR A 0.477 0.446
MLR B 0.468
G-BLUP A 0.607 0.577
G-BLUP B 0.618
G-BLUP 0.568 0.563
BL 0.579 0.561
BCπ 0.578 0.558
G-BLUP T 0.591 0.573
MLR A, multiple linear regression A, fixed effects selected among all markers; MLR B, fixed effects selected among candidate gene linked markers; G-BLUP A, genomic best linear unbiased prediction A, marker relationship matrix and fixed effects selected among all markers; G-BLUP B, marker relationship matrix and fixed effects selected among candidate gene linked markers; BL, Bayesian Lasso; BCπ, Bayes Cπ; G-BLUP T, marker relationship matrix and seedling phenotypes as fixed effects.

View Full Table | Close Full ViewTable 3.

Probabilities that pairs of model accuracies are not different based on bootstrapping.†

Model, accuracy GBS markers only
All markers
G-BLUP, 0.58 G-BLUP A, 0.54 G-BLUP T, 0.57 MLR A, 0.36 G-BLUP, 0.59 G-BLUP A, 0.63 G-BLUP B, 0.66 G-BLUP T, 0.58 MLR A, 0.51 MLR B, 0.56
GBS markers only
 G-BLUP, 0.58 1 0.52 0.95 0.08 0.69 0.39 0.12 0.99 0.57 0.79
 G-BLUP A, 0.54 0.52 1 0.84 0.1 0.43 0.14 0.07 0.79 0.82 0.9
 G-BLUP T, 0.57 0.95 0.84 1 0.21 0.89 0.65 0.47 0.84 0.72 0.88
 MLR A, 0.36 0.08 0.1 0.21 1 0.08 0.02 0.01 0.18 0.15 0.08
All markers
 G-BLUP, 0.59 0.69 0.43 0.89 0.08 1 0.44 0.15 0.91 0.51 0.67
 G-BLUP A, 0.63 0.39 0.14 0.65 0.02 0.44 1 0.62 0.68 0.15 0.34
 G-BLUP B, 0.66 0.12 0.07 0.47 0.01 0.15 0.62 1 0.5 0.09 0.12
 G-BLUP T, 0.58 0.99 0.79 0.84 0.18 0.91 0.68 0.5 1 0.68 0.84
 MLR A, 0.51 0.57 0.82 0.72 0.15 0.51 0.15 0.09 0.68 1 0.75
 MLR B, 0.56 0.79 0.9 0.88 0.08 0.67 0.34 0.12 0.84 0.75 1
G-BLUP A, genomic best linear unbiased prediction A, marker relationship matrix and fixed effects selected among all markers; G-BLUP T, marker relationship matrix and seedling phenotypes as fixed effects; MLR A, multiple linear regression A, fixed effects selected among all markers; G-BLUP B, marker relationship matrix and fixed effects selected among candidate gene linked markers; MLR B, multiple linear regression B, fixed effects selected among candidate gene linked markers.

The markers that were selected in MLR A, and G-BLUP A were csSr2_ KASPar, GBS_20803, csSr2, and GBS_1863 (Table 4). The map locations of GBS_20803 and GBS_1863 are unknown. The markers selected by G-BLUP B, the most accurate model, were csSr2_KASPar and csSr2. Differences in prediction model outcomes between pairs of prediction models are shown by their Spearman’s rank correlations between estimated breeding values from cross validation for all pairs of models shown in Supplemental Table S1. MLR B had the lowest correlations between all other models followed by MLR A.

View Full Table | Close Full ViewTable 4.

Markers used as fixed effects in different prediction models, their minor allele frequencies (MAFs), and the frequency they appeared in the models during cross-validation.†

Marker MAF Frequency selected as fixed effects
csSr2_KASPar 0.29 1 1 1 1
csSr2 0.16 0.5 1 0.8 1
GBS_20803 0.31 0.9 0.6
csLV34 0.37 0 0.4 0 0
gwm533 0.34 0 0.2 0 0
GBS_1863 0.20 0.1 0 0 0
MLR A, multiple linear regression A, fixed effects selected among all markers; MLR B, fixed effects selected among candidate gene linked markers; G-BLUP A, genomic best linear unbiased prediction A, marker relationship matrix and fixed effects selected among all markers; G-BLUP B, marker relationship matrix and fixed effects selected among candidate gene linked markers.


Genetic Architecture

The association analysis results confirm the importance of the Sr2 region, with the most significant Sr2 linked marker explaining 27% of the variation. Out of eight significant markers, only two markers did not appear to be at the Sr2 region. Sr2 linked markers have been reported by several stem rust APR studies (Yu et al., 2011; Njau et al., 2012; Singh et al., 2013). Interestingly, the most significant Sr2 linked marker was csSr2_ KASPar. This marker gave different results than the STS marker of csSr2, which has been reported to not be diagnostic for Sr2 in CIMMYT germplasm (Mago et al., 2011). Our results suggest that csSr2_ KASPar is capturing a different haplotype than the csSr2 STS marker. This may be due to restriction site polymorphism at the restriction enzyme cut site of the STS marker. Marker gwm533, which is still used for Sr2 genotyping, was not associated with resistance in this study, suggesting that this marker should be discontinued for Sr2 genotyping. In contrast with other studies (Dyck, 1987; Krattinger et al., 2009; Singh et al., 2012), this study did not find Lr34 to be associated with adult plant stem rust resistance. The frequency of the Lr34 resistance allele was 0.36, thus the lack of association between Lr34 and resistance was not due to low minor allele frequency. In the association mapping study by Yu et al. (2011), which used a similar set of germplasm and environments, Lr34 was also not found to be significant; however, several significant marker interactions with Lr34 were detected. Based on the inconsistencies in detection and the reported marker interactions, the effect of Lr34 appears to vary depending on the genetic background.

The relatively low number of QTLs that we detected is due largely to the confounding of QTL effects with family structure. Without correcting for population or family structure, 138 markers exceed the significance threshold, and the p-values do not follow a uniform distribution, indicating many spurious associations. Confounding of marker effects with family structure is not a problem for GS because GS capitalizes on relationship information to predict breeding values.

Prediction Models

A G-BLUP model including Sr2 linked markers as fixed effects was the most accurate model tested, and the probability that this model was different from MLR with Sr2 linked markers alone and G-BLUP with GBS markers only was 0.12 and 0.15, respectively. These results suggest that GS based on G-BLUP with Sr2 linked markers as fixed effects would lead to the greatest genetic gain if GS was imposed on the specific dataset used in this study. However, if GS were to be applied on a new sample of individuals, there is some probability that the outcomes of GS using G-BLUP with GBS markers only, or MLR using Sr2 linked markers only would be just as favorable as the outcomes of GS using G-BLUP with Sr2 linked markers as fixed effects.

Our finding that modeling selected markers as fixed effects in G-BLUP leads to improved accuracy over standard G-BLUP agrees with a recent simulation study (Bernardo, 2013) which found modeling a large-effect locus as fixed to be advantageous when heritability of the trait was >0.5 and the proportion of the genetic variance explained by the locus was >0.25. It is important to emphasize that, in this study, the markers selected as fixed effects were not assumed to be causative loci, thus variable selection and fixed effect estimation should occur each time the prediction model is trained.

The correlation between low temperature seedling and adult plant phenotypes was interesting, but not sufficient to be useful in combination with GS in the germplasm tested. Using the seedling data as fixed effects in G-BLUP did not consistently improve the prediction accuracy. Seedling data could be more predictive in another set of germplasm. On the other hand, if the level of APR can be explained well by seedling infection types, the resistance may be mostly qualitative, due to single race-specific genes. Thus, it may not be desirable to use this information source even if it is predictive of adult-stage resistance.

If we assume that two cycles of GS can be completed for every one cycle of phenotypic selection, and all other factors remain constant, then gain from selection from GS will exceed the gain from phenotypic selection when (GS accuracy × 2) is greater than the phenotypic selection accuracy. The GS accuracies we achieved in this study are sufficiently high to achieve greater gain from selection per unit time compared with phenotypic selection. Phenotypic selection accuracy, estimated as , was 0.9, and (GS accuracy × 2) was 1.12. The GS accuracies we observed were similar to those observed in a GS study that evaluated prediction accuracies for stem rust resistance in biparental populations (Ornella et al., 2012), however the results are difficult to compare due to different training population sizes.


This study indicates that GS would be an effective breeding method for quantitative stem rust resistance despite the fact that the trait is highly heritable and is conferred in part by large-effect loci. Although one of the advantages of GS is that prior knowledge about loci affecting the trait is not needed, we found that in this dataset using prior information to selectively genotype markers at loci previously found to have a moderately large effect on the trait enabled us to achieve higher prediction accuracies especially when using models which treat large-effect loci as fixed effects. To ensure the best results from GS, markers linked to large to moderate effect genes or loci previously found to affect the traits of interest should be included in the genotypic data as long as doing so does not delay selection or incur excessive costs. Using cross-validation within the training data, one can then decide if these loci-specific markers should be modeled as fixed effects. Although the alleles at known loci may be different from those of the population where the loci were detected, they may still be important regions that should be tagged with markers. As more genes are mapped and cloned in wheat for various traits, the effect of utilizing gene information for genomic prediction of other traits in wheat can be further studied.

Supplemental Information Available

Supplemental material is included with this article.

Supplemental Table S1. Deregressed genetic values for stem rust adult plant resistance.

Supplemental Table S2. Seedling phenotype scores.

Supplemental Table S3. Genotyping-by-by sequencing data and gene targeted genotypic data.


This research was funded by The Bill and Melinda Gates Foundation (Durable Rust Resistance in Wheat) and the United States Department of Agriculture-Agricultural Research Service (USDA-ARS) (Appropriation No. 5430-21000-006-00D). Partial support for J. Rutkoski was provided by a USDA National Needs Fellowship Grant #2008- 38420-04755 and an American Society of Plant Biology (ASPB)-Pioneer Hi-Bred Graduate Student Fellowship. Loci targeted genotyping was provided by the Eastern Regional Small Grains Genotyping Laboratory, Raleigh, North Carolina. Statistical advice was provided by the Cornell Statistical Consulting Unit. Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the USDA. USDA is an equal opportunity provider and employer.





Be the first to comment.

Please log in to post a comment.
*Society members, certified professionals, and authors are permitted to comment.