About Us | Help Videos | Contact Us | Subscriptions
 

The Plant Genome - Original Research

Potential of Association Mapping and Genomic Selection to Explore PI 88788 Derived Soybean Cyst Nematode Resistance

 

This article in TPG

  1. Vol. 7 No. 3
    unlockOPEN ACCESS
     
    Received: Nov 27, 2013
    Published: July 18, 2014


    * Corresponding author(s): neviny@umn.edu
 View
 Download
 Alerts
 Permissions
Request Permissions
 Share

doi:10.3835/plantgenome2013.11.0039
  1. Yong Baoa,
  2. Tri Vuongb,
  3. Clinton Meinhardtb,
  4. Peter Tiffinc,
  5. Roxanne Dennyd,
  6. Senyu Chend,
  7. Henry T. Nguyenb,
  8. James H. Orfa and
  9. Nevin D. Young *d
  1. a Dep. of Agronomy and Plant Genetics, Univ. of Minnesota, 411 Borlaug Hall, St. Paul, MN 55108-6026
    b Division of Plant Sciences and National Center for Soybean Biotechnology (NCSB), Univ. of Missouri, 25 Agriculture Building, Columbia, MO 65211-7140
    c Dep. of Plant Biology, Univ. of Minnesota, 250 Biological Sciences, St. Paul, MN 55108
    d Dep. of Plant Pathology, Univ. of Minnesota, 495 Borlaug Hall, St. Paul, MN 55108

Abstract

The potential of association mapping (AM) and genomic selection (GS) has not yet been explored for investigating resistance to soybean cyst nematode (SCN), the most destructive pest affecting soybean. We genotyped 282 representative accessions from the University of Minnesota soybean breeding program using a genome-wide panel of 1536 single nucleotide polymorphism (SNP) markers and evaluated plant responses to SCN HG type 0. After adjusting for population structure, AM detected significant signals at two loci corresponding to rhg1 and FGAM1 plus a third locus located at the opposite end of chromosome 18. Our analysis also identified a discontinuous long-range haplotype of over 600 kb around rhg1 locus associated with resistance to SCN HG type 0. The same phenotypic and genotypic datasets were then used to access GS accuracy for prediction of SCN resistance in the presence of major genes through a sixfold cross-validation study. Genomic selection using the full marker set produced average prediction accuracy ranging from 0.59 to 0.67 for SCN resistance, significantly more accurate than marker-assisted selection (MAS) strategies using two rhg1-associated DNA makers. Reducing the number of markers to 288 SNPs in the GS training population had little effect on genomic prediction accuracy. This study demonstrates that AM can be an effective genomic tool for identifying genes of interest in diverse germplasm. The results also indicate that improved MAS and GS can enhance breeding efficiency for SCN resistance in existing soybean improvement programs.


Abbreviations

    AM, association mapping; BCP, Bayesian Cπ; BLR, Bayesian LASSO regression; cMAS, conventional marker-assisted selection; FDR, false discovery rate; FI, female index; GEBV, genomic estimated breeding value; GS, genomic selection; LD, linkage disequilibrium; MAF, minor allele frequency; iMAS, improved marker-assisted selection; MAS, marker-assisted selection; PCA, principal component analysis; QTL, quantitative trait loci; RF, random forest; RR, ridge-regression best linear unbiased prediction; RRF, ridge-regression best linear unbiased prediction with major genes fitted as fixed effect; SCN, soybean cyst nematode; SNP, single nucleotide polymorphism; SVM, support vector machine

Soybean [Glycine max (L.) Merr.] is the world’s foremost source of vegetable protein and oil with total crop value exceeding $43.2 billion in the United States in 2012 (www.soystats.com, verified 18 June 2014). However, soybean is a host to several challenging pathogens and pests. The SCN (Heterodera glycines Ichinohe) is a highly recalcitrant endoparasite of roots that causes the most damaging disease in soybean (Koenning and Wrather, 2010). Planting SCN-resistant soybean cultivars, in combination with crop rotation, is the principal way of managing SCN (Chen et al., 2001). Consequently, a more thorough understanding of the genetic basis of SCN resistance will enable soybean scientists to develop more effective resistant cultivars and improved marker-based selection strategies can accelerate the breeding process for complex quantitative traits like SCN resistance.

Previous biparental mapping studies have identified a total of 164 quantitative trait loci (QTL), many only weakly supported, conferring SCN resistance (www.soybase.org). Rhg1 on chromosome 18 and Rhg4 on chromosome 8 were the two genes repeatedly mapped in multiple resistance accessions in biparental populations (reviewed by Concibido et al., 2004). However, a major limitation with this type of genetic mapping is that it only captures a portion of soybean’s overall genetic variability in SCN resistance because it is based on limited numbers of parents. With the advance of high throughput, cost-effective marker genotyping platforms, AM has proved to be a powerful genomic tool for whole genome analysis and genetic dissection of complex traits in crop species (Huang et al., 2010; Jia et al., 2013; Li et al., 2013; Mamidi et al., 2011). Given high-density marker panels, AM provides an opportunity to identify QTL at a higher mapping resolution by taking advantage of historical linkage disequilibrium (LD) with diverse germplasm collections including unstructured populations from breeding programs (Asoro et al., 2013; Mamidi et al., 2011; Sukumaran et al., 2012). In the University of Minnesota soybean breeding program, rhg1 is the only known resistance gene characterized and deployed in the germplasm collection. Potentially, AM can identify novel resistance loci in addition to the known rhg1 through more precise dissection of genetic architecture of SCN resistance not possible in the previous biparental populations. Compared with out-crossing species, fewer markers are needed in AM to cover the entire genome of self-crossing species such as soybean because LD extends over a longer distance (Lam et al., 2010). One of the constraints to AM is the existence of subpopulations in mapping population, which can cause spurious associations when trait variation is correlated with subpopulation structure. Mixed linear models have been developed and applied to AM to reduce the number of the false positive associations caused by population structure and relatedness (Yu et al., 2006; Zhang et al., 2010).

To deploy the major SCN resistance genes in breeding germplasm, several molecular markers have been developed and implemented in breeding programs by means of MAS (Concibido et al., 1996; Cregan et al., 1999; Mudge et al., 1997). Although the use of molecular markers can potentially reduce resources and time needed for breeding disease resistance (Young, 1999), only QTL with large effects are selected and ultimately deployed in improved cultivars. Additionally, the intensive use of large-effect resistance QTL can shift the virulence phenotypes of nematode populations to overcome resistance. Therefore, an alternative marker-based selection strategy that accesses a broader range of variation while breeding for durable SCN resistance is highly desirable. Genomic selection (GS) is a promising selection method for complex traits based on the use of all marker information to capture the genetic variance generated by numerous small-effect loci (Bernardo and Yu, 2007; Meuwissen et al., 2001). Genomic selection involves two phases: model training and prediction. In training phase, data mining algorithms are employed to train a GS model by fitting both phenotypic and genotypic data. In the prediction phase, genomic estimated breeding values (GEBVs) of experimental breeding lines can be calculated using only genotypic data. These GEBVs are then used to select the individual breeding lines for advancement in the breeding cycle. Since 2007, a rapidly increasing number of studies have evaluated the performance of GS for important traits in crop species. In maize (Zea mays L.), barley (Hordeum vulgare L.), and Arabidopsis thaliana datasets, the accuracy of predicting genotypic values was consistently higher with GS than with QTL-based selection (Lorenzana and Bernardo, 2009). Heffner et al. (2011) found that average prediction accuracies using GS could be 28% greater than with MAS and were 95% as accurate as phenotypic selection for single traits in wheat (Triticum aestivum L.). The advantage of genomic over phenotypic prediction of traits in oat (Avena sativa L.) was larger under lower heritability and a larger training dataset (Asoro et al., 2011). More recently, Lorenz et al. (2012) and Rutkoski et al. (2012) evaluated genomic prediction models for Fusarium head blight resistance in barley and wheat, respectively.

The objectives of this project were to apply two state-of-the-art genomic approaches, AM and GS, to (i) explore the diversity of SCN resistance currently present in a public soybean breeding germplasm by assembling a representative collection of soybean accessions; (ii) identify novel resistance loci beyond the known rhg1 gene, which might contribute to breeders’ efforts in improving SCN resistance in soybean breeding; and (iii) evaluate the potential of GS as an improved breeding approach for complex disease resistance in soybean.


MATERIALS AND METHODS

Germplasm and Genotyping

After pedigree analysis of historical crosses collected from the University of Minnesota soybean breeding program using PediTree (van Berloo and Hutten, 2005), we selected a panel of 282 representative accessions based on footprint values that were calculated as the sum of the weighted contribution of individual accessions. The selected accessions included ancestral lines, plant introductions, elite lines, advanced breeding lines, and released public cultivars (Supplemental Table S1). We conducted both AM and GS with the same set of 282 accessions. All selected accessions were planted in the field to increase and generate pure seeds in 2012. DNA was extracted from young leaves of each accession using DNeasy 96 Plant Kit (QIAGEN, Valencia, CA). We genotyped DNA samples using an Illumina GoldenGate SNP assay with the Universal Soy Linkage Panel 1.0 (Hyten et al., 2010). The SNPs with >5% minor allele frequency (MAF) and a missing data rate < 50% were retained, followed by imputation of missing SNP data based on population mean of each marker. Total 1247 SNP markers passed the filters and were used in the subsequent analysis.

Phenotyping

We performed a greenhouse assay of plant responses to SCN HG type 0 at the National Center for Soybean Biotechnology, University of Missouri in spring 2013. The greenhouse experiment was run twice with five plants for each accession in each run. Experiments were conducted according to the Standardized Cyst Evaluation 2008 protocol (Niblack et al., 2009). SCN resistance was determined by calculating the female index (FI) using Hutcheson as susceptible control line (Schmitt and Shannon, 1992). To process samples efficiently, we utilized a fluorescence-based scanner and imaging software to count cysts in Run 1 (Brown et al., 2010), and counted cysts in Run 2 using a stereomicroscope. Previous studies had proved the method as reliable and robust evaluation of SCN resistance (Guo et al., 2005). We then fitted FI estimates into a linear model: yij = u + gi + rj + gi × rj + εij, and performed the ANOVA with basic package anova in R (R Development Core Team, 2010), where yij was the FI of each plant, u was the intercept, gi was the genetic value of ith accession, rj was the mean effect of jth run, gi × rj was the interactive effect of ith accession and jth run, and εij was the residual. We represented the phenotypic value of each accession as the mean of FI across 10 individual plants.

Association Mapping

We first assessed population structure by a model-based approach as implemented in STRUCTURE (Pritchard et al., 2000) and principal component analysis (PCA) implemented in TASSEL (Bradbury et al., 2007). To avoid overestimation of subpopulation divergence due to tightly linked SNP markers (Falush et al., 2003), two subsets of 227 SNP markers with approximately 10 cM spacing were chosen for use in STRUCTURE. For each marker subset, k = 1 to 10 subgroups with each k value were modeled 10 times with a burn-in period and number of replications equal to 10,000 using an admixture model in STRUCTURE. The optimal k was then determined based on the rate of lnPr(X|k) change from k – 1 to k; Pr = posterior probability estimated from the bioinformatics tool STRUCTURE based on the Bayesian model we used in this study. Soybean accessions were assigned to subpopulations based on the highest mean of membership probability. All SNP markers were used to investigate the population structure with PCA implemented in TASSEL (Bradbury et al., 2007). In addition, we estimated the extent of LD and illustrated the pair-wise measures of LD (r2) as a heatmap using Haploview4.2 at the genomic region of interest including rhg1 and FGAM1 (Barrett et al., 2005). We performed the AM for SCN resistance with Kinship (K) and Population Structure + Kinship (Q+K) models, respectively, in the “rrBLUP” package (Endelman, 2011) in R (R Development Core Team, 2010). In Q+K model, the first three principal components from PCA were used to control the population structure. A false discovery rate (FDR) of 0.05 was used to control for false positive associations in AM. Manhattan plots were created based on the AM results with SNPEVG (Wang et al., 2012). By assuming the identified candidate genes act additively, a forward stepwise linear regression model with the FI estimates as dependent variables and the significant SNP markers as explanatory variables was constructed in R (R Development Core Team, 2010). Adjusted R2 values were estimated from the linear regression model representing the percentage of phenotypic variation explained by the candidate genes. These R2 values were likely up-biased estimations because the population structure and relatedness were not adjusted and these significant SNPs were preselected in the simple linear regression model.

Genomic Selection Model

Besides identifying candidate genes, the same set of phenotypic and genetic data was utilized to assess the genomic prediction accuracy for SCN resistance. The GS models we employed were: ridge-regression best linear unbiased prediction (RR; Bernardo and Yu, 2007; Endelman, 2011; Meuwissen et al., 2001), Bayesian LASSO regression (BLR; de los Campos et al., 2009; Park and Casella, 2008; Pérez et al., 2010), Bayesian Cπ (BCP; Habier et al., 2011), support vector machine (SVM; Long et al., 2011), and random forest (RF; González-Recio and Forni, 2011). The RR, BLR, SVM, and RF were implemented with the respective package rrBLUP, BLR, Kernlab, and randomForest in R (R Development Core Team, 2010). The BCP was implemented with GenSel software (BIGS.ansci.iastate.edu, verified 18 June 2014). Specifically, a total of 10,000 burn-ins and 40,000 saved iterations of Markov-Chain Monte Carlo (MCMC) were used in both BLR and BCP; vanilladot kernel and eps-svr type were used in SVM; and 500 trees and four branches were used in RF. All other parameters in each model were adopted from the guidelines and examples in the corresponding references and R package description. To account for major resistance genes identified through AM analysis, we constructed a ridge-regression best linear unbiased prediction with major genes fitted as fixed effects (RRF) model by treating the significant SNPs tagging the major genes as fixed effects while the rest of SNPs as random effects. For each fold in the subsequent cross-validation study, the significant SNP markers were first detected using the AM analysis within training set and the significant SNPs then fitted as fixed effect in the RRF model. Additionally, we extended the RRF model to FGAM, rhg, and rhg+FGAM models by fixing the significant SNP(s) tagging the corresponding genes as fixed effects.

Marker Assisted Selection Model

To set up reference methods to compare with, we constructed two MAS models: multiple linear regression model fitted with two rhg1-associated SNP markers as fixed effect to represent conventional MAS (cMAS) approach frequently implemented in the soybean breeding program for SCN resistance; and multiple linear regression model fitted with 35 top SNP markers selected based on their association with phenotypic variation as fixed effect to represent an improved MAS approach (iMAS). For each fold in the subsequent cross-validation study, SNP effects were estimated from the linear regression models in the training set and then used to predict phenotype of SCN resistance in the validation set. Because our AM study had identified four significant SNP markers tagging rhg1 gene, all six possible pairs of SNPs were sampled from the four and fitted as fixed effect in the cMAS model. The average of prediction accuracy from all six pairs was estimated. For the iMAS model, the top 35 SNP markers detected using the AM analysis within training set in each fold were fitted as fixed effect in the model. The linear regression analysis was performed with basic R package lm (R Development Core Team, 2010).

Cross Validation

We conducted a sixfold cross-validation study to avoid inflated estimation of the prediction accuracy of GS and MAS for SCN resistance. All soybean accessions first were randomly divided into six subsets. In each fold, five subsets of lines were used as training sets and the remaining subset was a validation set. Marker effects were estimated from eight different models by fitting the genotypic and phenotypic data in the training set. Marker-based prediction of line performance in the validation set was calculated by summing all marker effects of that line using only genotypic data. Prediction accuracy was calculated as the correlation between marker-based prediction and phenotypic values. The mean, SD, and confidence interval of the six folds were calculated for each model. One-tail paired t-test was used to compare the prediction performance of all assayed models to reference models cMAS and iMAS, respectively.

Marker Number

We also determined the effect of marker numbers on GS accuracy through a sixfold cross-validation by including random samples of 96, 192, 288, 384, 768, and 1152 SNPs from the full marker set in the respective RR and RRF model. Within each fold, this was repeated 100 times to avoid sampling bias for markers and the average of 100 replications was used to represent the prediction accuracy of each fold. The mean of the six folds was calculated. All prediction accuracies were estimated with R package rrBLUP (R Development Core Team, 2010).


RESULTS

Phenotypic Analysis

The assayed soybean accessions had a mean FI = 84 with a range from 2 to 154 (Fig. 1). ANOVA for the FI indicated that the effect of accession, run, and accession by run had significant effects (Table 1). The vast majority of accessions were moderately to completely susceptible to SCN HG type 0, with only 11 accessions being highly resistant (FI < 10) and eight accessions being moderately resistant (10 < FI < 30; Fig. 1). All 11 resistant accessions share a single resistance source of PI 88788 with no other resistance sources uncovered through pedigree searching.

Figure 1.
Figure 1.

Soybean cyst nematode (SCN) female index (FI %) for 282 soybean accessions. R, resistant (FI < 10), blue; MR, moderately resistant (10 < FI < 30), yellow; MS, moderately susceptible (30 < FI < 60), green; S, susceptible (FI > 60), red.

 

View Full Table | Close Full ViewTable 1.

Analysis of variance for female index (FI %) on soybean accessions in greenhouse assay.

 
Source of variation df MS† F-value P (>F)
RUN 1 371,171.43 119.63 <0.001
ACCESSION 281 21,716.07 7.00 <0.001
RUN × ACCESSION 280 4,971.15 1.60 <0.001
MS, mean of square.

Marker Profile, Population Structure

Distances between adjacent markers ranged from 0 to 29.4 cM, with a mean of 1.4 cM (Supplemental Table S2). Approximately 80% of adjacent markers were within 2 cM of one another and only 6% were >5 cM apart (Supplemental Table S2). Among a total of 1521 polymorphic markers, 69 markers had MAF < 0.01, and 232 markers had MAF < 0.05 (Supplemental Fig. S1).

Each subset used in STRUCTURE included a random set of 227 SNP markers spanning all 20 linkage groups with an average gap size of 10 cM. Plots of natural logarithm probability difference k and k – 1 (Δk) against k were similar in the pattern for two subsets (Supplemental Fig. S2). We observed the first rapid drop in posterior probability from k = 3 to k = 4 in both subsets, suggesting the presence of three theoretical subpopulations in our germplasm pool (Supplemental Fig. S2). The pair-wise PCA also suggested a pattern of three clusters in the germplasm pool (Fig. 2). Based on the incomplete pedigree information and breeder’s knowledge, we identified three distinct genetic backgrounds corresponding to the three subpopulations, namely, High Protein, High Yield, and Small Seeds (Fig. 2). Specifically, subpopulation High Protein contains mostly University-released high-protein specialty cultivars and high-protein ancestors such as Kasota, Kato, Proto, and Toyopro. Subpopulation High Yield contains mostly University-released commodity cultivars and North American elites such as Agassiz, Alpha, Amsoy, Archer, Bell, Capital, Chico, Clay, Evan, Freeborn, Mccall, and Traill. Subpopulation Small Seeds contains largely University-released food-type cultivars such as Minnatto and numerous Plant Introduction lines. These three groups were approximately coincident with the three major categories of breeding materials the University of Minnesota soybean breeding program has created in the history. Soybean accessions with moderate to high resistance to SCN were all included in subpopulation High Yield (Fig. 2), indicating the breeder’s historical efforts of stacking SCN resistance with high yield commodity cultivars.

Figure 2.
Figure 2.

Principal component (PC) analysis of the germplasm pool. Soybean accessions were assigned to three subpopulations based on their highest membership probability estimated in STRUCTURE, namely, High Protein, High Yield, and Small Seeds, represented by circles, triangles, and diamonds, respectively. The SCN resistance phenotypes, moderately resistant (MR), moderately susceptible (MS), resistant (R), and susceptible (S), are represented by the color of blue, red, green, and black, respectively.

 

Association Mapping

Since the results from both STRUCTURE and PCA indicated the presence of three theoretical subpopulations in our germplasm pool, we fixed the first three PCs in Q+K model. The quantile-quantile plots showed the Q+K model performed slightly better than the K model for controlling Type I error caused by population stratification (Supplemental Fig. S3). Thus, the Q+K model was implemented in AM for SCN resistance in the subsequent analysis. We detected a total of six significant SNPs, all on chromosome 18, in AM with FDR of 0.05, but observed no additional significant SNPs on any other chromosome. The linear regression using the six significant SNPs as the explanatory variables collectively explained 49% of phenotypic variation, which was likely biased upward by using the preselected SNPs based on their significant association with phenotypes in AM (Stanton-Geddes et al., 2013).

Four of the six highly significant SNPs were located 3 kb to 258 kb away from the center of rhg1 gene that has been reported to confer the resistance to SCN previously in numerous studies (Table 2, Fig. 3; reviewed by Concibido et al., 2004). The four significant SNPs were in a cluster of high LD (Fig. 4). By comparing the haplotypes among resistant and susceptible lines, we identified a >600 kb haplotype block of SNP markers including the four significant ones that appears consistent with intensive selection for SCN resistance in modern breeding efforts (Table 3).


View Full Table | Close Full ViewTable 2.

The significant single nucleotide polymorphisms detected from association mapping for soybean cyst nematode resistance.

 
Marker MAF† Chromosome Position (bp) Nearby gene –log10(P)
BARC-048271-10520 0.47 18 1,705,138 rhg1 5.04
BARC-G01477-00243 0.34 18 1,710,320 rhg1 4.34
BARC-012259-01773 0.16 18 1,776,719 rhg1 4.92
BARC-012295-01800 0.14 18 1,970,944 rhg1 3.84
BARC-047665-10370 0.08 18 2,833,147 FGAM1 11.96
BARC-019001-03050 0.28 18 55,961,797 Glyma18g46201 4.72
MAF, minor allele frequency.
Figure 3.
Figure 3.

Manhattan plot of association mapping with Q+K model for soybean cyst nematode (SCN) resistance. The green line represents the false discovery rate of 5%.

 
Figure 4.
Figure 4.

Linkage disequilibrium (LD) heatmap of 17 single nucleotide polymorphisms (SNPs) near rhg1 and FGAM1 on chromosome 18. Number in each cell is the pair-wise LD calculated as r2 × 100. Color scheme: r2 = 0, white; 0 < r2 < 1 shades of grey; r2 = 1, black. The SNPs within box are significantly associated with resistance to SCN HG type 0.

 

View Full Table | Close Full ViewTable 3.

Haplotype analysis of rhg1 locus conferring resistance to soybean cyst nematode (SCN) HG type 0.

 
Line† FI‡ BARC-054083-12329
BARC-040479-07752
BARC-012237-01756
BARC-048277-10538
BARC-048275-10534
BARC-G01477-00243
BARC-048271-10520
BARC-048801-10723
BARC-012259-01773
BARC-012289-01799
BARC-012295-01800
BARC-015067-02556
BARC-025777-05064
BARC-047665-10370
BARC-014395-01348
–660 –484 –28 –17 –8 –8 –3§ 6 64 245 258 442 584 1120 1735
R 2 GG GG TT¶ AA AA AA GG GG AA AA CC GG GG GG AA
R 2 GG GG TT AA AA AA GG GG AA AA CC GG GG AA GG
R 4 CC CC TT AA AA AA GG GG AA CC GG GG AA GG
R 5 GG GG TT AA AA AA GG GG AA AA CC GG GG AA GG
R 5 GG GG TT AA AA AA GG GG AA AA CC GG GG AA GG
R 5 GG GG TT AA AA AA GG GG AA AA CC GG GG AA GG
R 6 CC CC TT AA AA AA GG GG AA AA CC GG GG AA GG
R 6 GG GG TT AA AA AA GG GG AA AA CC GG GG AA AA
R 7 GG GG TT AA AA AA GG AG AA AA CC GG GG GG
R 7 GG GG TT AA AA AA GG GG AA AA CC GG GG GG GG
R 7 GG GG TT AA AA AA GG GG AA AA CC GG GG GG AA
MR 12 GG GG TT AA AA AA GG GG AA AA CC GG GG AA AG
MR 14 GG GG TT AA AA AA GG GG AA AA CC GG GG GG AA
MR 14 GG TT AA AA AA GG GG AA AA CC GG GG AA GG
MR 14 GG GG TT AA AA AA GG GG AA AA CC GG GG AA GG
MR 22 GG GG TT AA AA AA GG GG AA AA CC GG GG GG GG
MR 23 GG GG TT AA AA AA GG GG AA AA CC GG GG AA AA
MR 24 GG GG TT AA AA AA AA AA CC GG GG GG AA
MR 29 CC CC AA GG GG AA AA AA TT GG CC CC GG AG
MS 30 CG CG AA AG AT AA CC CC GG GG GG
MS 39 CC CC AA AA
MS 42 GG GG TT GG GG GG AA GG TT GG CC CC GG AA GG
MS 44 GG GG TT GG GG GG AA AA AA CC GG GG
MS 44 CC CC TT AA AA AA GG GG AA AA CC GG GG AA GG
MS 44 GG GG TT AA AA AA GG GG AA AA CC GG GG GG
MS 53 GG GG TT AA AA AA GG GG AA AA AA CC GG GG GG
MS 54 GG CC TT GG GG GG AA GG TT GG CC GG GG GG GG
MS 57 CG CG TT GG GG GG AA AA TT AA AA CC GG GG AA
MS 58 CC CC TT GG GG GG AA AA TT AA AA CC GG GG AA
S 63 CC CC TT AA AA GG AG AA AA GG GG AA
S 63 CC CC TT GG GG GG AA AA TT AA AA CC AG GG GG
S 66 GG CC TT GG GG GG AA AA TT AA AA CC GG GG AA
S 67 CC CC TT AA GG AA AA AA TT AA AA CC GG AA GG
S 67 CC CC TT AG GG AT GG GG GG GG
S 68 CC CC AA AA AA AA AA TT AA CC CC AA AA
S 68 CC CC TT AA AA AA GG GG AA AA AA CC GG AA GG
S 68 CC CC TT GG GG GG AA GG TT GG CC GG GG GG AA
S 68 GG GG TT GG GG GG AA AA TT AA AA CC GG GG AA
S 69 CG CG TT GG GG GG AA GG TT GG CC CC GG AA GG
S 69 GG GG TT GG GG GG AA AA TT AA CC GG GG GG
S 70 CC CC TT GG GG AA AA TT AA CC AG GG AA
S 70 CC CC TT GG GG GG AA GG TT GG CC GG GG GG AA
S 70 GG GG TT GG GG GG AA AA TT AA CC GG GG
S 70 CC CC TT GG GG GG AA AG TT GG AA CC GG GG AA
R, resistant (FI < 10); MR, moderately resistant (10 < FI < 30); MS, moderately susceptible (30 < FI < 60); S, susceptible (FI > 60). None of the accessions with FI > 70 have resistance haplotype, and are not included in this table.
FI, female index of SCN HG type 0.
§The values in kb in first row represent the single nucleotide polymorphisms with varying distances from rhg1.
Boldface values within the table represent a discontinuous, long-range haplotype of over 600 kb around rhg1 locus associated with resistance to SCN HG type 0.

SNP (BARC-047665-10370) with the –log10(P) value of 11.92 (Table 2) was located in the coding region of FGAM1 gene, which has previously been shown to exhibit differential gene expression in response to SCN feeding (Vaghchhipawala et al., 2004). This SNP was not in the same haplotype block with rgh1 (Table 3, haplotype blocks defined by complete LD between markers), but it was in the strong LD (r2 ≈ 0.8) with all significant SNP markers at rhg1 (Fig. 4).

Additionally, we identified a significant SNP (BARC-019001-03050) at Glyma18 g46201 on the opposite end of chromosome 18 that is predicted to encode a ring finger protein (Table 2; Fig. 3). BARC-019001-03050 was approximately 2.1 Mb away from an SSR marker (Satt517) tagging a previously described QTL associated with resistance to SCN HG type 0 in a biparental mapping population derived from a S08-80 × PI 464925B cross (Winter et al., 2007).

Prediction Accuracy

The same set of phenotypic and genotypic data used in the AM analyses was used to assess the genomic prediction accuracy for SCN resistance through a sixfold cross-validation. The prediction accuracy for SCN resistance with the RR model using the full marker set ranged from 0.44 to 0.80 with a mean of 0.66 (Table 4). When major genes were fitted as fixed effects, the RRF model resulted in a mean of prediction accuracy of 0.61, not significantly different from any other GS models (Table 4). Three to eight SNP markers were detected as significant through AM analysis in the training set in each fold, and subsequently fitted as fixed effects in the RRF model. The signals at rhg1 and FGAM1 were always significant across six folds. Both BLR and BCP models assume that only a small portion of loci are causal loci with large effects and noncausal loci have infinitesimal to no effect. The prediction accuracy ranged from 0.59 to 0.75, with a mean of 0.67 in the BLR model and ranged from 0.48 to 0.71 with a mean of 0.62 in the BCP model (Table 4). The SVM and RF models with capacity to capture nonadditive sources of genetic variability including dominance and epistasis did not outperform the additive linear regression models neither in the case of SCN resistance (Table 4).


View Full Table | Close Full ViewTable 4.

The mean, SD, and confidence interval (CI) of prediction accuracy with various models estimated from sixfold cross-validation.

 
Model† Fold
Mean SD CI‡ p-value§
1 2 3 4 5 6 cMAS iMAS
cMAS 0.55 0.46 0.56 0.42 0.43 0.51 0.49 0.06 0.05 0.03
iMAS 0.81 0.68 0.54 0.72 0.5 0.51 0.63 0.13 0.10 0.03
RR 0.77 0.8 0.64 0.64 0.44 0.67 0.66 0.13 0.10 0.01 0.25
RRF 0.76 0.75 0.62 0.59 0.32 0.64 0.61 0.16 0.13 0.04 0.41
BLR 0.64 0.66 0.75 0.65 0.59 0.74 0.67 0.06 0.05 0 0.26
BCP 0.68 0.71 0.68 0.58 0.59 0.48 0.62 0.09 0.07 0.01 0.44
RF 0.74 0.82 0.68 0.63 0.46 0.71 0.67 0.12 0.10 0 0.21
SVM 0.72 0.67 0.58 0.59 0.35 0.62 0.59 0.13 0.10 0.04 0.19
cMAS, conventional marker assisted selection; iMAS, improved marker assisted selection; RR, ridge-regression best linear unbiased prediction; RRF, ridge-regression best linear unbiased prediction with major genes fixed; BLR, Bayesian linear regression; BCP, Bayesian Cπ; RF, random forest; SVM, support vector machine.
α = 0.05.
§p-value obtained from one-tail paired t-test each of assayed models compared to cMAS and iMAS model, respectively.

We also compared the six GS models with two reference MAS models: cMAS model represented by multiple linear regression fitted with two rhg1-associated SNP markers, and iMAS model represented by multiple linear regression fitted with 35 top SNP markers detected through AM within training set. The mean prediction accuracy of cMAS model was only 0.49, significantly lower than that of all GS models and the iMAS model (Table 4). All GS models and the iMAS model performed equivalently (Table 4).

To further determine the sufficient number of markers for sustaining high accuracy and meanwhile reducing genotyping cost, we compared the prediction accuracy estimated from RR, rhg, FGAM, and rgh+FGAM models with different sizes of marker set. The prediction accuracy generally increased as the number of SNP markers increased, and the gain in accuracy became minimal when more than 288 SNPs were used (Fig. 5). The rhg+FGAM model using as few as 96 genome-wide SNPs as random effects produced the prediction accuracy of 0.60, while the RR model was only 0.50 (Fig. 5). Using more markers substantially increased the prediction accuracy in the RR model, but had little effect in the rhg+FGAM model (Fig. 5).

Figure 5.
Figure 5.

The mean of prediction accuracy with different major gene(s) fixed and different number of single nucleotide polymorphism (SNP) markers in genomic selection for soybean cyst nematode resistance. The prediction accuracy was the mean of six folds estimated from sixfold cross-validation with 100 replications within each fold. RR, ridge-regression best linear unbiased prediction; rhg+FGAM, RR with both rhg1 and FGAM1 fixed by treating four significant SNPs at rhg1 and 1 significant SNP at FGAM1 as fixed effects; FGAM, RR with FGAM1 fixed by treating 1 significant SNP at FGAM1 as fixed effects; rhg, RR with rhg1 fixed by treating 4 significant SNPs at rhg1 as fixed effects.

 


DISCUSSION

Limitations of the SNP Array

The relatively low SNP density on our genotyping panel might have limited the power of AM to identify causal variants or even generated biased results, so our study should be interpreted with caution. With only 1247 polymorphic SNPs spread across the approximately 1100 Mb soybean genome, markers generally would not be in complete LD with all the causal genes controlling variation in SCN resistance. Stanton-Geddes et al. (2013) empirically compared the AM gene candidates identified with resequencing data in Medicago truncatula versus reduced-representation SNP arrays and showed that SNP arrays could bias AM results. With more recently developed high-density SNP chip (Song et al., 2013) and sequencing-based genotyping including genotyping-by-sequencing (Elshire et al., 2011; Xu et al., 2013), AM should enable improved dissection of genetic variation and pinpoint causal variants more accurately in future investigations.

The average significant LD extent was 29.3 cM based on the marker panel used in this population (data not shown). Linkage disequilibrium was extensive, with long-range LD (>20 cM) observed in over half of all significant pair-wise intrachromosomal LD (Supplemental Table S3). This agrees with the LD pattern identified in soybean with resequencing data by Lam et al. (2010). Extremely high LD is one of the distinctive characteristics of soybean genome compared with most other crop species. In a high LD species, less number of markers is needed for mapping and MAS while more linkage drag and low mapping resolution are expected.

SCN Resistance Genes

The robustness of AM still enabled us to rediscover the SCN resistance gene: rhg1 using moderate-density SNP markers in a diverse set of breeding germplasm. In previous linkage mapping studies, rhg1 on chromosome 18 was repeatedly mapped in resistant accessions such as Peking, PI 88788, PI 209332, and PI 437654, while second gene, Rhg4 on chromosome 8, was identified only in Peking, PI 209332, and PI 437654 (reviewed by Concibido et al., 2004). Kim et al. (2010) pinpointed the rhg1 locus in a 67-kb genomic region on chromosome 18. More recently, Cook et al. (2012) performed fine mapping at rhg1 locus and found that copy number variation of a genomic segment spanning three genes was required to confer resistance.

By contrast, we failed to observe Rhg4 in our AM, presumably due to the lack of resistant accessions with Peking or other sources distinct from PI 88788 in the germplasm pool. We performed a pedigree search for all soybean lines with FI < 10 in our germplasm pool and only identified PI 88788 in their ancestry, but no other sources of resistance. To reveal the genetic basis of SCN resistance more thoroughly in further AM studies, a larger number of soybean accessions, including Plant Introduction lines with resistance sources distinct from PI 88788, is essential. Another possibility could be the confounding effects of two effective rhg genes, because we only evaluated plant response to SCN HG type 0 that is avirulent to both PI 88788 and Peking resistance sources. We might expect different responses if plants were inoculated with SCN HG type 2, which is virulent to PI 88788, but avirulent to Peking.

In addition to the strong signal at rhg1 locus, we identified a significant SNP with MAF of 8% within the coding region of FGAM1 gene. FGAM synthase expression has been shown to respond to nematode feeding in the Arabidopsis thalianaHeterodera schachtii system (Vaghchhipawala et al., 2004). Although the FGAM1 gene resides about 1.1 Mb away from rhg1 gene, its co-localized SNP exhibited extensive LD (r2 ≈ 0.8) with almost all rhg1-associated SNPs in the population (Fig. 4). To determine whether this significant SNP detected at FGAM1 was actually tagging the rhg1 locus, we tested a mixed model fitting rhg1 locus as a fixed effect in the AM analyses, and found the SNP at FGAM1 locus was still significantly associated with SCN resistance (Supplemental Fig. S4). This suggests that the FGAM1 locus independently accounts for a considerable amount of SCN resistance variation in our germplasm panel.

The significant SNP identified at the opposite end of chromosome 18 was in the noncoding region of gene Glyma18 g46201 annotated as ring finger protein (Schmutz et al., 2010). No apparent nucleotide-binding site-leucine-rich repeat (NBS-LRR) or other potential resistance genes are known to be located nearby (Schmutz et al., 2010). Although a previously described resistance QTL was found in the vicinity, the closer flanking SSR marker was still about 2.1 Mb away from this SNP (Winter et al., 2007). It is possible that this SNP may tag a distant causal gene within the nearby resistance QTL interval or other uncharacterized genomic regions associated with SCN resistance. The candidate gene Glyma18 g46201 deserves further confirmation and molecular characterization.

Genomic Selection Models

More sophisticated, but computation-intensive Bayesian and machine learning models did not outperform additive linear models such as RR and RRF in prediction accuracy (Table 4). This may indicate that the additive linear models already capture most of the genetic variation present in our germplasm pool. In other words, the causal genes controlling SCN resistance act additively, while nonadditive gene actions including dominance and epistasis apparently contribute relatively little to variation in SCN resistance. Genomic selection studies conducted in maize, wheat, oat, and barley for both agronomic and disease traits also suggested slight differences among various genomic prediction algorithms (Asoro et al., 2011; Lorenzana and Bernardo, 2009; Lorenz et al., 2012; Rutkoski et al., 2012).

By taking into account of the presence of major genes, the RRF model didn’t increase prediction accuracy compare with the RR model (Table 4), consistent with the findings in a simulation study in maize (Bernardo, 2013). In that study, when R2 was 50%, heritability of trait was 0.5 and population size was 250, similar to the case of SCN resistance we studied here, selection response from GS with major gene(s) fixed was only 6% greater than GS treating major gene(s) as random effects (Bernardo, 2013). Additionally, we compared the RRF models with different major gene(s) fixed and found the higher prediction accuracy with more genes fixed (Fig. 5). The difference of prediction accuracy among different RRF models was more distinct when few random markers were used for GS modeling (Fig. 5).

Genomic Selection versus Marker Assisted Selection

In soybean breeding populations targeting SCN resistance, MAS is frequently used to evaluate the SCN resistance of breeding candidates by detecting the flanking markers of major resistance genes such as rhg1 and Rhg4. However, breeders might still miss a large proportion of total genetic variation in MAS caused by numerous loci of small-moderate effects. Potentially, this additional variation in SCN response can be captured through the use of genome-wide SNPs implemented through GS. As indicated in Table 4, breeders may be able to improve average prediction accuracy up to 0.67 through the use of all marker information, which is significantly more accurate than the conventional MAS approach with accuracy of only 0.49.

Interestingly, the iMAS model using top 35 SNPs performed equivalently to the RR model using full marker set in our study (Table 4). This is in contrast to the results from empirical GS recurrent selection for yield and stover index in maize and cross-validation study for agronomic traits in wheat (Heffner et al., 2011; Massman et al., 2013). A likely reason for the high predictive accuracy of iMAS in our study could be that two genes of fairly large effect control considerable variation in SCN resistance in our germplasm pool. Among the top 35 SNPs used in iMAS, we observed the six significant SNPs detected in the full panel across all the six folds. This is consistent with the narrow genetic base of SCN resistance in U.S. breeding programs, with >90% of SCN resistant cultivars grown in the Midwest coming from just a single source, PI 88788 (Concibido et al., 2004). Continuously selecting for the same major resistant genes and growing the same resistant cultivars is likely to lead to nematodes that overcome resistance (Mitchum et al., 2007). Thus, introduction of exotic soybean germplasm to improve SCN resistance diversity should remain an important goal in current SCN breeding programs.

In the presence of major genes in a breeding population, the preferred marker-based selection strategy, either MAS or GS, depends on the breeding goal and resource allocation. For example, when soybean breeders would like to select candidate lines with moderate resistance or horizontal resistance besides the rhg1-conferring resistance, GS tends to be more accurate than MAS in predicting the quantitative difference in plant responses to SCN. In contrast, the haplotype information we identified (Table 3) appeared to be sufficient for breeders to select for the highly resistant candidates in the breeding population. Since the iMAS model using 35 top SNPs had a mean of prediction accuracy of 0.63, equivalent to the GS model using full marker set (Table 4), the iMAS obviously is more cost-effective and likely implemented in the current breeding program for SCN resistance. However, the iMAS wouldn’t be sustainably effective as GS in a high throughput breeding program because many of 35 SNP markers tend to be fixed in breeding germplasm after few cycles of selections. With continuously declining genotyping costs, the iMAS might be replaced by GS considering increased genetic gain per unit cost.

Both iMAS and RRF models are probably germplasm-specific because they only capture the resistance loci currently present in our germplasm pool. In other words, the predictive ability of our model might be limited if applied to a germplasm pool where SCN-relevant loci distinct from the ones in ours. The model should therefore be updated to integrate the markers associated with the novel resistance gene(s) that the breeders desire to introgress into the germplasm pool. To maintain high prediction accuracy, model updating also applies to GS to ensure that the target resistance genes are represented in GS models.

Implementing MAS/GS in SCN Resistance Breeding

More accurate marker-based prediction, either iMAS or GS, can effectively accelerate SCN resistance breeding by increasing genetic gain per unit time and reducing the need for extensive phenotyping in soybean breeding program. Since the 1950s, soybean breeders have consistently introduced production lines with a limited set of exotic sources of SCN resistance to introgress with high-yielding elite lines in soybean breeding programs, including the program at the University of Minnesota. The breeding program has successfully developed and released numerous SCN-resistant soybean cultivars adapted to various soybean growing regions across short maturity groups through conventional MAS followed by extensive field trials. Despite the success in SCN resistance breeding, breeders agree that a cost-effective solution for sustainable resistance is required to meet the challenge of emerging SCN races of more virulence. To achieve the benefits of marker-based selection strategies over phenotypic selection, recurrent selection for multiple cycles is an effective method to increase genetic gain per unit time for complex traits. Given the high estimate of prediction accuracy, MAS with 35 most important SNPs or GS with hundreds of genome-scale SNPs can be developed as a molecular tool to assist breeders in selecting SCN resistant lines in soybean breeding programs. The more accurate prediction of phenotypes allows breeders to increase the selection intensity, leading to fewer candidate lines to be tested in the following expensive yield trials.

In an actual breeding program, breeders usually select promising lines based on the evaluations of multiple traits. A multitrait GS model can be developed by simultaneously fitting phenotypic data from the evaluations of yield, SCN resistance, and other traits of interest as dependent variables in the model. Without the need for including additional markers specifically for SCN resistance, the multitrait GS model using the existing marker panel leads to simultaneous prediction of genetic values for several traits including SCN resistance. Moreover, the prediction accuracy for low-heritability traits, such as yield, could be improved by taking the advantage of their genetic relationship with high-heritability traits in the multitrait GS (Jia and Jannink, 2012).

Empirical recurrent selection with GS has been conducted to improve agronomic traits and introgress desired exotic traits to elite lines, exhibiting superiority to phenotypic selection and MAS in cross-pollinated crop species such as maize (Combs and Bernardo, 2013; Massman et al., 2013). Recurrent selection in self-crossing species like soybean requires laborious pollination to obtain sufficient F1 seeds in each recombination. Bernardo (2010) proposed a “select-combine-self” scheme for self-crossing crops and indicated that the selection response from GS with minimal crossing was comparable to maize by intensive use of a year-round nursery. Future studies to empirically determine the genetic gain per unit cost and time in GS or improved MAS compared with conventional MAS in an SCN breeding population will facilitate the adoption of improved marker-based selection strategy in the existing breeding program.


CONCLUSIONS

We present the first study using AM to dissect the genetic architecture of SCN resistance and evaluate the ability of GS in predicting SCN resistance. Significant signals were detected at two SCN resistance genes: rhg1 and FGAM1, plus the third locus located at the opposite end of chromosome 18. Estimates of high prediction accuracy suggest that improved MAS and GS has the potential for accurate prediction in SCN resistance breeding. Overall, our results indicate that advanced genomic tools provide insights into the genetic basis of complex disease traits as well as offer the possibility for greater genetic improvement in developing improved cultivars.

Supplemental Information Available

Supplemental information is included with this article.

Acknowledgments

We thank Gerald C. Decker, Darcy F. Weston, Rafeal H. Echenique, Phil Schaus, and other members of the University of Minnesota soybean project for providing the soybean seeds and conducting the field planting. This work also benefited from discussions with Dr. Rex Bernardo. This research was supported by Minnesota Agricultural Experiment Station project MN-22-015 and by Minnesota Soybean Production and Promotion Council grants14-12C and 18-13CN.

 

References

Footnotes



Files:

Comments
Be the first to comment.



Please log in to post a comment.
*Society members, certified professionals, and authors are permitted to comment.