About Us | Help Videos | Contact Us | Subscriptions

The Plant Genome - Original Research

Strategies for Selecting Crosses Using Genomic Prediction in Two Wheat Breeding Programs


This article in TPG

  1. Vol. 10 No. 2
    unlockOPEN ACCESS
    Received: Dec 14, 2016
    Accepted: Mar 18, 2017
    Published: July 6, 2017

    * Corresponding author(s): jpoland@ksu.edu
Request Permissions

  1. Bettina Ladoa,
  2. Sarah Battenfieldcd,
  3. Carlos Guzmáne,
  4. Martín Quinckef,
  5. Ravi P. Singhe,
  6. Susanne Dreisigackere,
  7. R. Javier Peñae,
  8. Allan Fritzg,
  9. Paula Silvaf,
  10. Jesse Poland *c and
  11. Lucía Gutiérrez *ab
  1. a Statistics Dep., Facultad de Agronomía, Univ. de la República, Garzón 780, Montevideo 12900, Uruguay
    c Wheat Genetics Resource Center, Dep. of Plant Pathology, 1712 Claflin Rd., Kansas State Univ., Manhattan, KS 66506
    d (current address), AgriPro Wheat, Syngenta, 11783 Ascher Rd. Junction City, KS, 66441
    e CIMMYT, El Batan, Mexico, Mexico
    f Programa Nacional de Investigación Cultivos de Secano, Instituto Nacional de Investigación Agropecuaria, Est. Exp. La Estanzuela, Colonia 70000, Uruguay
    g Dep. of Agronomy, Kansas State Univ., Manhattan, KS 66506
    b Dep. of Agronomy, Univ. of Wisconsin–Madison, 1575 Linden Dr, Madison, WI 53706
Core Ideas:
  • Cross prediction strategies for grain yield and baking quality traits were compared.
  • Crosses for all parent combinations were obtained via genomic prediction models.
  • Mid-parent selection was similar to accounting for variance when selecting yield.
  • The variance had a larger impact in cross predictions for quality traits.


The single most important decision in plant breeding programs is the selection of appropriate crosses. The ideal cross would provide superior predicted progeny performance and enough diversity to maintain genetic gain. The aim of this study was to compare the best crosses predicted using combinations of mid-parent value and variance prediction accounting for linkage disequilibrium (VLD) or assuming linkage equilibrium (VLE). After predicting the mean and the variance of each cross, we selected crosses based on mid-parent value, the top 10% of the progeny, and weighted mean and variance within progenies for grain yield, grain protein content, mixing time, and loaf volume in two applied wheat (Triticum aestivum L.) breeding programs: Instituto Nacional de Investigación Agropecuaria (INIA) Uruguay and CIMMYT Mexico. Although the variance of the progeny is important to increase the chances of finding superior individuals from transgressive segregation, we observed that the mid-parent values of the crosses drove the genetic gain but the variance of the progeny had a small impact on genetic gain for grain yield. However, the relative importance of the variance of the progeny was larger for quality traits. Overall, the genomic resources and the statistical models are now available to plant breeders to predict both the performance of breeding lines per se as well as the value of progeny from any potential crosses.


    BL, Bayes Lasso; GS, genomic selection; INIA, Instituto Nacional de Investigación Agropecuaria; RIL, recombinant inbred line; RR-BLUP, ridge regression best linear unbiased predictor; QTL, quantitative trait loci; SNP, single nucleotide polymorphism; VLD, variance accounting for linkage disequilibrium; VLE, variance assuming linkage equilibrium

The main objective of plant breeding is to increase the yield, productivity, adaptation, and quality of crops while optimizing resource use (Allard 1960). Genetic gain in plant breeding is accomplished through the selection of best genetic combinations between genotypes (Fehr, 1987). The most important traits selected in breeding are generally complex quantitative traits that are influenced by many genes and the environment (Falconer and Mackay, 1996; Bernardo, 2010). Historically, strategies for selecting superior individuals have relied on phenotypic evaluation of individuals and pedigree information. However, at the end of the twentieth century, marker-assisted selection strategies were sought. To this end, quantitative trait loci (QTL) were detected to identify genes involved in the expression of complex traits through biparental QTL mapping (Lander and Botstein, 1989; Lande and Thompson, 1990), or genome-wide association studies (Jannink et al., 2001). These genetic markers were then used in marker-assisted selection strategies (Dekkers and Hospital, 2002). However, not all of the segregating QTL are identified in a given study and the ones that are identified are largely overestimated (Beavis, 1994). Therefore, methodologies for QTL identification have been insufficient to capture the effects of the many genes involved in common complex traits (Manolio et al., 2009) and the actual use of marker-assisted selection for identified QTL has been very limited in breeding programs (Bernardo, 2008).

Genomic prediction (Meuwissen et al., 2001) provides an alternative method of using genetic information in breeding decisions. Rather than using individual QTL information, genomic prediction involves the prediction of genotypic performance accounting for all genome-wide marker effects simultaneously (Meuwissen et al., 2001). When these predictions are used as a selection strategy, it is called genomic selection (GS). Genomic selection has been extensively researched in plants (reviewed in Heslot et al., 2015) while the implementation in plant breeding programs has been evaluated through optimization of the training population (Asoro et al., 2011; Zhao et al., 2013; Liu et al., 2015; Isidro et al., 2015) and genotype × environment interaction (Burgueño et al., 2012; Heslot et al., 2014; Jarquín et al., 2014; Lado et al., 2016). Additionally, strategies of GS to select the best crosses by modeling the variance of a cross have been proposed but no study has assessed the consequences in cross selection (Bernardo, 2014; Mohammadi et al., 2015; Tiede et al., 2015).

Crossing is one of the main decisions of a breeding program. The number of possible crosses is orders of magnitude larger than the number of feasible crosses (Witcombe et al., 2013). Although breeders attempt to take into consideration all available information on the potential parents to determine which crosses to make, many crosses are discarded in subsequent years, as they do not deliver superior progeny (Koebner and Summers, 2003; Heslot et al., 2015). Methods to select optimum crosses accounting for the performance per se of each line (e.g., yield, adaptability, quality) were initially proposed on the basis of the progeny’s mean and variance. The variance was predicted using morphological data, pedigree information, a few molecular markers, or a combination of these (Souza and Sorrells, 1991; Melchinger et al., 1998; Utz et al., 2001; Bertan et al., 2007; Casassola and Brammer, 2013). These methods were then extended to genome-wide marker coverage (Endelman, 2011; Bernardo, 2014; Mohammadi et al., 2015; Tiede et al., 2015). The progeny mean was well-predicted from the mid-parent value (Bernardo, 2014). However, genetic variances were difficult to predict using either phenotypic, pedigree, or genetic distances (Souza and Sorrells, 1991; Melchinger et al., 1998; Utz et al., 2001; Hung et al., 2012) but could be adequately modeled by simulating progeny by using genomic prediction (Bernardo, 2014; Mohammadi et al., 2015; Tiede et al., 2015). Endelman (2011) proposed a strategy of estimating the mean and variance for the progeny of a cross using genomic data, where the progeny mean was estimated as the mid-parent value of the estimated parental performance. In this method, the variance was estimated from the marker effects, assuming linkage equilibrium (VLE) between markers. Another strategy used to estimate progeny variance with large genomic data is to simulate the progeny, accounting for the true linkage structure and recombination, and then to predict the performance of the progeny (Bernardo, 2014; Mohammadi et al., 2015; Tiede et al., 2015).

The aim of this research was to evaluate the selection of crosses based on progeny variance, predicted either by assuming linkage equilibrium (VLE) or by accounting for linkage disequilibrium (VLD) between the parents. Specifically, we evaluated the impact of cross selection based only on mid-parent performance or mid-parent performance in a combination with progeny variance, where the variance was estimated either by assuming VLE or VLD. We analyzed data for grain yield in the INIA spring bread wheat breeding program in Uruguay, and grain protein content, mixograph optimal mixing time, and loaf volume quality traits from the CIMMYT spring bread wheat breeding program in Mexico. We evaluated multiple traits and breeding programs to limit possible trait or program bias in the prediction methodology results.

Materials and Methods

Plant Material


A total of 1465 spring bread wheat lines from the INIA Wheat Breeding Program were used as the training population for GS for grain yield. The INIA lines consisted of all the lines from the preliminary yield trails (F7) from 2010, 2011, and 2013, as well as the advanced (F8) and elite (F9) yield trials from 2010.

Grain yield evaluations were conducted in 35 environments in Uruguay including five locations evaluated in 4 yr and one location with four sowing dates. Locations used to evaluate the genotypes were Dolores (33°50′S, 58°14′W; 15 m asl), Durazno (33°33′S, 56°31′W; 91 m asl), La Estanzuela (34°20′S, 57°42′W; 81 m asl), Young (32°76′S, 57°57′W; 85 m asl), and Ruta2 (33°45′S, 57°90′W; 95 m asl). Four sowing dates were evaluated in La Estanzuela (LE1, LE2, LE3, and LE4). The evaluation years were 2010 to 2014. The trials from 2012 and most of the locations of 2014 were not included in this study because of a strong genotype by environment interaction (see experimental details in Lado et al., 2016).


In total, 6095 spring bread wheat lines from the CIMMYT Global Bread Wheat Breeding Program were evaluated for quality traits (Battenfield et al., 2016). The CIMMYT lines were materials in the preliminary yield trials between 2009 and 2015. All lines were grown in Ciudad Obregon, Sonora, Mexico (27°29′N, 109°56′W; 40 m asl). Three phenotypes were chosen to represent different stages of testing wheat as grain, dough, and final product. Grain protein content was determined by near-infrared spectroscopy, using NIR Systems 6500 (Foss, Denmark) using AACC International (2000a) Method 39–10, and reported at 12.5% moisture basis. Grain samples were tempered and milled using Brabender Quadrumat Senior (C. W. Brabender OHG, Germany). Mixograph optimal mixing time was assessed using the Swanson and Working Mixograph (National MFG Co., National Manufacturing Company, Lincoln, NE) according to AACC International Method 54–40A (AACC International, 2000b). Loaf volume was assessed on pup loaves baked according to AACC International (2000c) Method 10–09. Loaf volume and mixing time tests used the adjustment for optimal water absorption described in Guzmán et al. (2015).


Tissue was collected and the CTAB method (Saghai-Maroof et al., 1984) was used to isolate DNA for the genotyping-by-sequencing protocol as described in Poland et al. (2012a). The TASSEL-GBS pipeline (Glaubitz et al., 2014) was run with modifications for nonreference genomes in the INIA material (Poland et al., 2012b). The TASSEL Version 5.2 with alignment to the T. aestivum International Wheat Genome Sequencing Consortium genome assembly version 2.25 (Mayer et al., 2014) using Bowtie2 version 2.24 (Langmead and Salzberg, 2012) was used in the CIMMYT lines. Genotyping-by-sequencing tags with single nucleotide polymorphisms (SNPs) were mapped against the POPSEQ Synthetic W7984 by Opata M85 reference to determine the map position (Chapman et al., 2015). Single nucleotide polymorphisms were filtered by setting maximum missing values of 20% for the INIA and 50% for the CIMMYT populations. Different thresholds for missing data were used to keep the balance between coverage and good SNP quality at each population. Low tolerance for missing values was used to increase the reliability of the imputed SNPs. Individuals with more than 50% missing information were discarded. The final number of individuals were 1465 and 5984 in the INIA and CIMMYT populations, respectively. SNP imputation was conducted using multivariate normal expectation maximization imputation (Endelman, 2011; Poland et al., 2012a). This method assumes that marker genotypes follow a multivariate normal distribution. The imputation was done through multiple regression starting with the mean of the marker and was then updated using maximum likelihood to estimate the mean and covariance between lines with nonmissing data.

Phenotypic Data Analysis

Phenotypic best linear unbiased estimations were obtained for the grain yield of all genotypes present in each trial. Field analysis was conducted according to an experimental design that consisted of a series of smaller α-design trials connected through common checks. The following model was used to estimate INIA grain yield genotypic means:where µ is the overall mean, gi is the fixed effect of the ith genotype; ej is the fixed effect of jth year–location combination; tk(j) is the random effect of the kth trial nested within the jth year–location; rl(jk) is the random effect of the lth replicate nested within the kth trial and jth year–location; bm(ljk) is the random effect of the mth incomplete block nested within the lth replicate, kth trial, and the jth year-location; and εijkl is the residual error for the ith genotype in the mth block within the lth replicate in the lth trial located in the jth year–location; and tk(j), rl(jk), bm(ljk), and εijklm are random variables: tk(j)N(0,σt2), rl(jk)N(0,σr2), bm(ljk)N(0,σb2), and εijklmN(0, σe2). The best linear unbiased estimations were estimated with the nlme package (Pinheiro et al., 2013) in R statistical software (R Development Core Team, 2016). Heritability was estimated via the model shown in Eq. [1] but with genotypes as the random effect, being giN(0, Gσg2). G is the genotypic variance–covariance matrix between lines estimated by using markers, as was proposed by VanRaden (2008), and those estimated by using the A.mat function in the rrBLUP package (Endelman, 2011). Genotypic variance (σg2) and the average pairwise variance error of the best linear unbiased predictor () were estimated using the sommer R package (Covarrubias-Pazaran, 2016). Heritability for grain yield was estimated following Cullis et al. (2006):

For the CIMMYT population, quality data were not replicated within or across years and no statistical design was used within years, since only high-yielding selections from the yield trial were advanced to quality testing. The heritability of the quality traits was estimated as in Battenfield et al. (2016) for CIMMYT data from the ridge regression best linear unbiased predictor (RR-BLUP) model, which only models additive effects. Briefly,where is the phenotypic variance of the trait in the training population and is the error variance of the RR-BLUP model. The is assumed to include genotypic variance caused by dominance and epistatic effects, as well as any nongenetic errors. This leaves only additive genetic variance when is subtracted from .

Genomic Prediction Models

Two genomic prediction models were compared in order to estimate marker effects and the performance of each parent. The ridge regression model was evaluated via the rrBLUP package (Endelman, 2011) and the Bayesian Lasso (BL) model was calculated via the BGLR package (Pérez and de los Campos, 2010), both in R software (R Development Core Team, 2016). Genomic prediction models incorporated the genotypic matrix X codified as (−1, 0, 1), where +1 indicates that loci were homozygous to the most frequent allele at a given locus, −1 indicates that loci were homozygous to the least frequent allele at given locus, and 0 indicates heterozygous loci. The missing SNPs were imputed as fractions between −1 and 1, and then were rounded to the nearest whole number (to 1 if the SNP value was ≥0.5, −1 if the SNP value was less than or below −0.5, and 0 if the SNP value was between −0.5 and 0.5). For BL, initial arbitrary values of trait heritability and error variance were used, then values were optimized via prediction accuracy to calculate the hyperparameter rate and shape from the double exponential distribution. Next, the marker effects and parent performance values were estimated from the posterior distribution. For the ridge regression model, marker effects were estimated by using mixed models with markers considered as random with N(0, Iσmrk2). After that, to predict the performance of the parents, mixed models with lines (gi) as random effects were used with N(0, Gσg2), where G is the variance–covariance matrix estimated as in VanRaden (2008) via the A.mat function in the rrBLUP package (Endelman, 2011). The models were selected because of their use of different statistical approaches (mixed models and Bayesian, respectively) and their different assumptions about marker effects on predictions.

Random cross-validation was conducted by using 60% of the lines to train the model, predicting the remaining 40% of entries for both the INIA and CIMMYT datasets separately. Each model was iterated 100 times and then, the model with the best predictive ability was used for subsequent analysis. All genotyped parents were included in the training population to estimate the marker effects and the parents’ performance for grain yield (n = 1465), grain protein content, mixing time, and loaf volume (n = 5984), via the model with the best predictive ability. However, as there were no differences in the accuracy of predictions between the models, the RR-BLUP model was used to estimate marker effects for all traits for the remaining analyses.

Cross Selection Based on the Performance of Simulated Progeny

Progeny of all possible pairwise cross combinations were simulated and their performance was predicted for grain yield in the INIA population. However, in the CIMMYT population, only the lines evaluated in 2015 (n = 1425) were used as parents for progeny simulation and prediction. Reciprocal crosses were not predicted, as negligible maternal effects were assumed to be present (McNeal et al., 1968; Jumbo and Carena, 2008).

The VLE of the progeny within each cross () was predicted, taking into account allelic differences between the parents and markers effects for each trait, following Endelman (2011):where pk+ is the frequency of the kth SNP of biallele +1 for parents of progeny in ith cross and pk– is the same for the biallele −1, and is the vector of marker effects with a length M (the number of SNPs). Marker effects were estimated via RR-BLUP. For example, if both parents had the allele +1, the frequencies were pk+ = 1 and pk– = 0; if one parent had the allele +1 and the other had −1, the frequencies were pk+ = 0.5 and pk– = 0.5; if both parents had the alleles 0, pk+ = 0 and pk– = 0, and so on. Next, the distribution of predicted performance values for the population of progeny in each cross was simulated, assuming a normal distribution for the progeny. The parameters used in the normal distribution were the mean (equal to the mid-parent value) and the variance (equal to the progeny variance estimated as in Eq. [4]). The mean of the top 10% of the progeny was estimated for each cross.

To predict VLD, we used the PopVar package described in Mohammadi et al. (2015) and implemented in R software (R Development Core Team, 2016). Within PopVar, recombinant inbred lines (RILs) were simulated via the Rqtl package (Broman et al., 2003), which requires ordered markers with genetic distances to account for recombination. The function simulates recombination points along the chromosome by using independent crossovers, following the Stahl model (Stahl, 1979). Using this package, we simulated 1000 RILs for each cross and then estimated the predicted performance of each individual simulated RIL. The mean performance of the 1000 RILs was used to estimate the progeny mean performance of each cross. The variance of the predicted performance of the 1000 RILs was used to estimate the progeny variance performance of each cross. The mean of the top 10% of the progeny was estimated for each cross.

Finally, the best 1000 crosses were selected for grain yield, grain protein, and loaf volume on the basis of either their mid-parent performance value or the predicted performance of the top 10% of their progeny, assuming VLD or VLE. For mixing time, where an intermediate optimal time is desired, the best crosses were selected on the basis of having the lowest variance within a window of targeted predicted progeny performance values (i.e., 2.5–4.0 min.).

Weighting Progeny Variance in Cross Selection

Five values of performance were established as thresholds to illustrate the trade-offs of selection on the basis of maximizing the expected mean or accounting for the variance. Therefore, five groups with different weights on the relative importance of the variance were established. The maximum parental predicted performance minus 20, 40, 60, 80 or 100% of the difference between the maximum and the mean predicted performance was used to define the groups G1 (20%), G2 (40%), G3 (60%), G4 (80%), and G5 (100%). One hundred crosses were selected within each group, taking into account the mean predicted progeny performance above the threshold and the maximum variance for three traits (mixing time was excluded from this analysis because the selection criteria for this trait are different). Finally, the mean of all progenies and the mean of the top 10% of the progeny for the 100 selected progenies were calculated.

Results and Discussion

Genotype Data

For INIA data, a total of 81,044 SNPs were identified after the TASSEL pipeline was run and duplicate markers were removed. Of these, 23,771 were mapped using the POPSEQ genetic framework. Finally, we kept 3884 SNPs with less than 20% missing data and 1465 individuals with less than 50% missing data. For CIMMYT data, a total of 31,166 SNPs were identified after the TASSEL pipeline was run and duplicate markers were removed. Of these, 8443 SNPs were mapped using POPSEQ. Finally, we kept 1164 SNPs with less than 50% missing data and 5984 individuals with less than 50% missing data. The SNPs were distributed in all chromosomes with a median of 0 and 0 cM and maximum gap of 61 cM and 94 cM, for the INIA and CIMMYT genetic maps, respectively (Supplemental Fig. S1).

Phenotype Data

Mean grain yield for the INIA dataset was 5160 kg ha−1 (SE = 853) and the heritability was 0.67. The means of the three end-use quality traits from CIMMYT were 11.97% (SE = 0.77) for grain protein content, 3.13 min (SE = 0.80) for mixing time, and 802 cm3 (SE = 59) for loaf volume. The heritability of the CIMMYT quality traits were 0.67, 0.71, and 0.68 for grain protein content, mixing time, and loaf volume, respectively.

Genomic Selection Models

High correlations between observed and predicted breeding values were found for grain yield in the INIA dataset ( = 0.42) and for grain protein content ( = 0.63), mixing time ( = 0.64), and loaf volume ( = 0.59) in the CIMMYT dataset. The predictive ability using both models, BL and RR-BLUP, were similar. As expected, the correlation of marker effects between models decreased at the extreme values for each trait (Supplemental Fig. S2). The BL method induces less shrinkage than RR-BLUP in marker effects estimation as it sets some of the marker effects to zero and thus BL estimates fewer but larger effects than RR-BLUP (Park and Casella, 2008). Mixing time shows the largest shrinkage effect for RR-BLUP (Supplemental Fig. S2). This was expected, because mixing time is influenced by a few genes with a large effect (Payne et al., 1987). However, when the marker effect values were combined to predict performance of the parents, there was no difference between GS models (Supplemental Fig. S2). It has been documented elsewhere that different GS models had similar prediction ability in plants (Heslot et al., 2012). Thus because of the similarity in accuracies between RR-BLUP and BL predictions, all further analyses were conducted using RR-BLUP model.

Progeny Prediction

The predicted mean performance of the progeny was perfectly correlated between the mid-parent value and mean of the simulated progenies for all traits (data not shown). This result was expected because in an additive model, the expected value of the mean of the progeny calculated as the mid-parent value is the same as the mean of the RILs’ performance. The same results were found in maize (Zea mays L.) for silking date and protein (Bernardo, 2014) and in barley (Hordeum vulgare L.) for yield and deoxynivalenol content (Mohammadi et al., 2015).

The correlation between VLD and VLE was different for each trait and was probably associated with trait complexity (Fig. 1). The order of the variance correlation between estimations was grain yield < grain protein content < loaf volume < mixing time. Variance predictions of more complex traits are more inconsistent (Bernardo, 2008) and thus modeling variances for these traits has less impact in progeny predictions. This is probably because the error in the estimates of marker effects has a stronger influence on the variance than on the mean predictions of a cross (Zhong and Jannink, 2007). The higher variances correlation for mixing time found in our study can be explained by the fact that mixing time is a trait that is influenced by a few major genes (Payne et al., 1987; Simons et al., 2012).

Fig. 1.
Fig. 1.

Correlation between the variance of predicted progeny performance accounting for linkage disequilibrium (VLD) or assuming linkage equilibrium (VLE), for wheat grain yield, grain protein content, mixing time, and loaf volume.


The variance of predicted performance from 1000 progeny was larger for VLD than for VLE (Fig. 2 and Fig. 3). A potential explanation for this is as follows. The genetic variance for each pair of markers is the sum of the variance of each marker, plus twice the covariance between the two markers (Falconer and Mackay, 1996). When we assume linkage equilibrium, the markers were independent and therefore, the covariance between the markers was zero. If markers are in the coupling phase, the covariance is positive and therefore VLE is smaller than VLD (Lynch and Walsh, 1998; Bernardo, 2010). However, if the markers are in the repulsion phase, the covariance between markers becomes negative and therefore, VLE is larger than VLD (Lynch and Walsh, 1998; Bernardo, 2010). Hence, the relative magnitude of the variances (i.e., VLE and VLD) will depend on the marker phasing. In our case, we believe we have coupling phases in several situations. For example, one of the crosses that exhibits the largest differences in variance estimated by VLD and VLE showed a predominance of coupling phases for most chromosomes (Fig. 4).

Fig. 2.
Fig. 2.

Cross performance characterization. Expected mid-parent performance (MP) versus the predicted progeny variance accounting for linkage disequilibrium (VLD) for yield (A) and grain protein content (D). Expected MP versus the predicted progeny variance assuming linkage equilibrium (VLE) for yield (C) and grain protein content (F). The top 1000 crosses selected by MP (green), VLD (blue), VLE (red), and crosses in common among the two groups (black) are highlighted in each plot (A, C, D, F). Venn diagrams represent the 1000 crosses; the parents of those crosses selected by using the MP (green), VLD (blue), VLE (red); or common crosses among the three groups (black) for grain yield (B) and grain protein content (E).

Fig. 3.
Fig. 3.

Cross performance characterization. Expected mid-parent performance (MP) versus the predicted progeny variance accounting for linkage disequilibrium (VLD) for mixing time (A) and loaf volume (D). Expected MP versus the predicted progeny variance assuming linkage equilibrium (VLE) for mixing time (C) and loaf volume (F). The top 1000 crosses selected by using MP (green),VLD (blue), VLE (red), and crosses in common among the two groups (black) are highlighted in each plot (A, C, D, F). Venn diagrams represent the 1000 crosses or the parents of those crosses selected only by using VLD (blue), VLE (red) for mixing time (B), and MP (green), alongside common crosses among the three groups (black) for loaf volume (E).

Fig. 4.
Fig. 4.

Relationship between mid-parent performance (MP) and variance for all possible crosses predicted considering linkage disequilibrium (VLD; left) or linkage equilibrium (VLE; right). An extreme cross between a high-performing parent and a low-performing parent is highlighted, as well as the progeny of each of those parents (i.e., dark colors for the high-performing parent and light colors for the low-performing parent). The bottom panel of the figure represents the average allelic phasing for each chromosome calculated for adjacent close markers.


We found a triangular relationship between variance and the mean of predicted progeny performance for the crosses in all traits when modeling VLD, but not when modeling VLE (Fig. 2 and Fig. 3). This relationship was stronger for grain yield than for the other traits (Fig. 2 and Fig. 3). This triangular relationship is generated when lines with similar trait values at the end of their distribution (i.e., low × low or high × high) are crossed, resulting in progeny with extreme values for mean and low variance (Bernardo, 2014). On the other hand, when lines from opposite extremes are crossed (i.e., low × high), the progeny have intermediate means with higher variance. This is supported by previous findings by Bernardo (2014) and Mohammadi et al. (2015). Furthermore, as in Bernardo (2014), we found a linear relationship between the mean and variance of crosses from the same parent prediction when using VLD (Fig. 4), which caused the triangular shape. Recombination between extreme parents would account for differences arising from coupling and repulsion linkage. These differences may not have an immediate impact but could become relevant for longer-term selection (Bernardo, 2014).

Cross Selection

When selecting crosses based on mid-parent values or on the basis of the mean of the top 10% of the progeny via two variance estimates (VLD and VLE), we observed that many of the selected crosses and parents were the same for all three selection methods (Fig. 2 and Fig. 3). Of the 1000 best crosses based on the two variance methods, there were 879 crosses in common for grain yield, 757 for grain protein content, 769 for mixing time, and 738 for loaf volume (Fig. 2B,E and Fig. 3B,E). In addition, 837 of the common crosses selected for grain yield were the same ones with the highest mid-parent values. This was also confirmed by the high correlation (r = 0.94, p < 0.001) between total progeny mean and the mean of the top 10% of the predicted progeny performance for the 1000 best crosses (selected by top 10%). These results indicate that the predicted mean progeny performance is the strongest driver for selecting superior crosses for grain yield. Thus regardless of the approach for predicting progeny variance, the crosses with the highest mid-parent values also provided the best top 10% of the progeny. This was largely independent of the genetic distance between parents. Mid-parent strategies are superior for selecting crosses than a combination of mid-parent and progeny variance when there is more variation in the progeny means than in the progeny variance (Zhong and Jannink, 2007).

For quality traits, the percentage of crosses in common between the three cross prediction strategies were 56.7% for grain protein content and 59.8% loaf volume (Fig. 2 and Fig. 3). In addition, the correlations between the total progeny mean and the mean of the top 10% of the predicted progeny performance for the best 1000 crosses (selected by top 10%) were, 0.67 (p < 0.001) for grain protein content and 0.80 (p < 0.001) for loaf volume. These results indicate that for quality traits, the influence of variance was a more relevant factor in determining cross selection. The results found by Bernardo (2014) for silking date and protein concentration in maize are more similar to those found for wheat quality traits in which crosses with an undesirable mean value but high variance could reach the same mean as the top 10% of the progeny.

Accounting for more variance (i.e., by having lower thresholds) decreased the number of shared crosses between VLD and VLE (Fig. 5, left and right plots). Additionally, the mean of the top 10% of the progeny increased as the threshold value increased, regardless of cross variance (Fig. 5, middle plots). This effect was more pronounced for grain yield, which had the highest ratio between the variance of mean predicted progeny performance values and the variance of progeny SD among crosses (Table 1).

Fig. 5.
Fig. 5.

The best 100 crosses selected using five progeny mean predicted performance thresholds [G1 (20%) > G2 (40%) > G3 (60%) > G4 (80%) > G5(100%)] and maximum variance are indicated in two correlation plots within each trait. Plots on the left are the expected mid-parent performance (MP) versus the variance accounting for linkage disequilibrium (VLD). Plots on the right are the MP versus variance assuming linkage equilibrium (VLE). Blue and red points indicate the best 100 crosses selected by accounting for linkage disequilibrium or assuming linkage equilibrium, respectively. Black points indicate the common crosses between VLD and VLE. The mean predicted progeny performance values of 100% (MP; broken line) or 10% (MP T10; continuous line) of the top performing progeny considering linkage disequilibrium (VLD; blue) or linkage equilibrium (VLE; red) for each selection group (G1–G5) are shown in the middle plots.


View Full Table | Close Full ViewTable 1.

Variance of mean predicted progeny performance among crosses and variance of the SD of predicted progeny performance and the ratio between them . Predictions were obtained by considering linkage disequilibrium for four traits: grain yield, grain protein content, mixing time, and loaf volume.

Trait Ratio
Grain yield (kg ha–1) 90,969 28,400,576 312
Grain protein content (%) 0.056 0.022 2.5
Mixing time (min.) 0.070 0.012 5.8
Loaf volume (cm3) 0.093 0.011 8.5

Although the differences between the variance predicted by the strategies considering VLD or VLE are high for some traits, the parents and the crosses selected on the basis of the top 10% performance of the progeny are similar. Therefore, the use of VLE as a substitute for VLD is a reasonable compromise if computation time or the lack of a linkage map are important considerations. The use of VLE could be favored when one considers only one generation of improvement. For grain yield, where all three methods gave similar results, the use of the mid-parent value would be the simplest method of cross prediction.

On the basis of our results, the best strategy of maintaining diversity is not clear. Fu (2015) discussed the difficulty of measuring and accounting for the impacts of plant breeding on crop genetic diversity. Jacobson et al. (2015) and Rutkoski et al. (2015) found that the losses of genetic diversity incurred when using genome-wide selection is higher than that for phenotypic selection, although there are situations in which some losses of diversity are necessary to obtain increased genetic gains. However, to sustain genetic gain in wheat breeding programs, genetic diversity is essential because it allows effective recombination, which, in turn, creates more diversity (Fehr, 1987). This diversity is fundamental when new challenges are encountered, such as changing races in biotic stressors, or changing environmental factors and abiotic stressors (Dreisigacker et al., 2005; Borlaug, 2007; Babiker et al., 2015). Thus, Fu (2015) emphasized the importance of maintaining allele variability to have the possibility to improve under unfavorable conditions.


Mid-parent value has historically been used for cross selection in plant breeding. Here, progeny mean was estimated using genomic predicted mid-parent value, which was identical to the mean of the simulated progeny. Modeling the variance to predict the best crosses had a small impact for complex traits like grain yield. However, for quality traits, the influence of variance was a large factor in determining cross selection. Differences in selection assuming either VLD or VLE were small for all traits and the assumption of VLE was less computationally intensive. Although selection of parental combinations based on the expected progeny variance for yield had little impact on improving the overall cross performance, maintaining genetic diversity should also be considered. In reality, this may be accomplished through the choice of parents that achieve high average yield through slightly different strategies or yield components.

Supplemental Information

Supplemental Fig. S1. Single nucleotide polymorphism (SNP) map with marker distribution across the wheat genome in the INIA (on the left) and CIMMYT (on the right) populations. Single nucleotide polymorphisms were mapped against the POPSEQ Synthetic W7984 by Opata M85 reference.

Supplemental Fig. S2. RR-BLUP vs BL scatterplots for marker effects (upper row), mid-parent value (MP, middle row), and progeny variance assuming linkage equilibrium (VLE, bottom row) for grain yield, grain protein content, mixing time and loaf volume (traits by column).

Conflict of Interest Disclosure

The authors declare no conflict of interest.


Support for grain yield phenotyping was provided by INIA. We express our appreciation for the effort of the technical personnel of the INIA wheat breeding program: Leonardo Hernández, María Ferreira, Dumas Laun, Abel López, and José Flores. Support for phenotyping of the CIMMYT quality traits was provided by CGIAR Comite Regional Permanente (Permanent Regional Committee) WHEAT, Durable Rust Resistance Project, and Fondo Sectorial Secretaria de Agricultura, Ganaderia, Desarrollo Rural, Pesca y Alimentacion (Secretary of Agriculture, Livestock, Rural Development, Fishery and Food, of Mexico) Consejo Nacional de Ciencia y Tecnology (National Council on Sicence and Technology, of Mexico) (No. 146788: “Sistema de mejoramiento genético para generar variedades resistentes a royas, de alto rendimiento y alta calidad para una producción sustentable en México de trigo”) of the Mexican government. Funding for CIMMYT GS was provided by the US Agency for International Development (USAID) Feed the Future Initiative (USAID Cooperative Agreement No. AID-OAA-A-13-0005) and the Bill & Melinda Gates Foundation through a grant to Cornell University for “Genomic Selection: The next frontier for rapid gains in maize and wheat improvement”. The opinions expressed herein are those of the author(s) and do not necessarily reflect the views of USAID. Support for the doctoral work of SB was provided by Monsanto’s Beachell-Borlaug International Scholar’s Program. Support for the doctoral work of BL was provided by Agencia Nacional de Investigación e Innovación–Uruguay, through Grant POS_NAC_2013_1_11261 and by Comisión Sectorial de Investigación Científica–Uruguay, through grants in the program internships abroad.





Be the first to comment.

Please log in to post a comment.
*Society members, certified professionals, and authors are permitted to comment.