About Us | Help Videos | Contact Us | Subscriptions
 

The Plant Genome - Original Research

Target Amplicon Sequencing for Genotyping Genome-Wide Single Nucleotide Polymorphisms Identified by Whole-Genome Resequencing in Peanut

 

This article in TPG

  1. Vol. 9 No. 3
    unlockOPEN ACCESS
     
    Received: Mar 28, 2016
    Accepted: Aug 28, 2016
    Published: October 6, 2016


    * Corresponding author(s): shirasaw@kazusa.or.jp
 View
 Download
 Alerts
 Permissions
Request Permissions
 Share

doi:10.3835/plantgenome2016.06.0052
  1. Kenta Shirasawa *a,
  2. Chikara Kuwatab,
  3. Manabu Watanabeb,
  4. Masanobu Fukamib,
  5. Hideki Hirakawaa and
  6. Sachiko Isobea
  1. a Kazusa DNA Research Institute, 2-6-7 Kazusa-Kamatari, Kisarazu, Chiba 292-0818, Japan
    b Chiba Prefectural Agriculture and Forestry Research Center, 808 Daizennocho, Midori, Chiba 266-0006, Japan

Abstract

Genome-wide genotyping data regarding breeding materials are essential resources for improving breeding efficiency, especially in plants with complex genomes with a high degree of polyploidy. Several current breeding efforts in cultivated peanut (Arachis hypogaea L.), which has a tetraploid genome, are devoted to developing high oleic acid cultivars. Genetic maps for such breeding programs have been developed using simple-sequence repeat (SSR) markers, the use of which requires time-consuming electrophoretic analyses. Next-generation sequencing (NGS) technology can overcome this technical hurdle. Initially, we attempted double-digest restriction site-associated DNA sequencing on peanut breeding materials used to develop high oleic acid cultivars. However, this method was not effective because few single nucleotide polymorphism (SNPs) were available because of low genetic diversity of the lines. The genome sequences of the probable diploid ancestors of cultivated peanut, A. duranensis Krapov. & W. C. Greg. and A. ipaënsis Krapov. & W. C. Greg., are available. Therefore, we next employed whole-genome resequencing analysis to obtain genome-wide SNP data. In this analysis, we observed large biases in the numbers and genomic positions of interspecific and intraspecific SNPs. For genome-wide genotyping, we selected a subset of SNPs covering the peanut genome as the targets of amplicon sequencing analysis. Using this technique, genome-wide genotypes of the breeding materials were easily and rapidly determined. The SNP information and analytic methods developed in this study should accelerate genetics, genomics, and breeding in peanut.


Abbreviations

    ddRAD-Seq, double-digest restriction-site-associated DNA sequencing; GBS, genotyping-by-sequencing; Ka/Ks, nonsynonymous/synonymous mutation ratio; MABS, marker-assisted backcrossing selection; NGS, next-generation sequencing; O/L, ratio of oleic to linoleic acid; QTL, quantitative trait loci; SNP, single nucleotide polymorphism; SSR, simple-sequence repeat; TAS, target amplicon sequencing; Ts/Tv, transitions/transversions ratio

Peanut is one of the most important crops in the world for food and oil production; this is especially true in semiarid tropical areas because of the species’ high drought tolerance. Consequently, improvements in the proportions of peanut oil components are important targets of breeding projects. In normal peanut cultivars, the major oil components are oleic acid and linoleic acid. Oleic acid is a monounsaturated fatty acid, whereas linoleic acid is polyunsaturated; therefore, oleic acid is less oxidized than linoleic acid and is considered to be better for health and long-term storage (Grundy 1986). The ratio of oleic to linoleic acid (O/L) is usually ∼4, but can reach as high as 30 or 40 in high-O/L cultivars (Norden et al., 1987). It is possible to breed such cultivars not only by conventional breeding methods (Koilkonda et al., 2013) but also by genetic modification (Yin et al., 2007).

Peanut has a tetraploid genome consisting of the A and B diploid genomes that are thought to have been contributed by A. duranensis and A. ipaënsis, respectively (Seijo et al., 2004, 2007). Consequently, the inheritance of traits related to oil components, as well as other agronomically important traits, is very complex. Over the last decade, genetic studies of peanut have advanced greatly: DNA markers were developed (Pandey et al., 2012); molecular genetic maps were established (Gautami et al., 2012; Shirasawa et al., 2013); and quantitative trait loci (QTL) were identified for agronomically important traits including oil components (Shirasawa et al., 2012; Leal-Bertioli et al., 2015; Kolekar et al., 2016). These resources can be used as tools for DNA marker-assisted selection to accelerate breeding programs in peanut as in other crop species (Perez-de-Castro et al., 2012). For example, a high-density genetic map of an F2 population, consisting primarily of SSR markers, revealed cosegregation of O/L phenotypes with FAD2 genotypes (Shirasawa et al., 2012). Accordingly, a breeding program for high-O/L peanut was initiated using a marker-assisted backcrossing selection (MABS) strategy (Koilkonda et al., 2013). However, because gel and capillary electrophoresis analyses are time-consuming, SSR markers are not suitable for analyzing large numbers of DNA markers across many samples within a single generation. This is especially true for genome-wide selection strategies with the goal of selecting desirable loci and eliminating undesirable genomic backgrounds.

Next-generation sequencing (NGS) technologies can generate huge amounts of DNA sequence data in a single experiment, and have therefore been utilized for genome-wide genotyping analyses such as genotyping-by-sequencing (GBS) and restriction site-associated DNA sequencing (RAD-Seq), in peanut (Zhou et al., 2014) as well as in other plant species (He et al., 2014). Recently, we established an analytical workflow for double-digest restriction-site-associated DNA sequencing (ddRAD-Seq) based on empirical and in silico optimization in tomato (Solanum lycopersicum L.) (Shirasawa et al., 2016a). While developing this workflow, we found that SNP density in the genome is a key determinant of SNP detectability in ddRAD-Seq. Therefore, we proposed that a target amplicon sequencing (TAS) strategy, in which marker genotypes are determined by sequencing of PCR fragments amplified from polymorphic loci (Mamanova et al., 2010), represents an alternative approach for genome-wide genotyping in plants with ultralow SNP densities. However, SNP calling in polyploids is still challenging (Clevenger et al., 2015) because it is possible to detect interhomeologue polymorphisms together with allelic SNPs when sequence reads from polyploids are mapped onto a diploid genome sequence (Trick et al., 2009). To overcome this issue, a bioinformatics tool is developed to identify true SNPs between polyploid genotypes (Clevenger and Ozias-Akins 2015).

Obviously, NGS technologies have contributed to whole-genome sequencing studies, and the genomes of >100 plant species have been sequenced in this manner (Michael and VanBuren, 2015). Quite recently, the genome sequences of the peanut ancestors A. duranensis and A. ipaënsis were reported (Bertioli et al., 2016). This achievement will make it possible to discover SNPs distributed throughout the peanut genome by a whole-genome resequencing strategy, in which concatenated two diploid genome sequences can be used as a reference for tetraploid peanuts. Once the genomic positions of SNPs have been identified, TAS is a useful strategy for determining genome-wide SNP genotypes even in plants with ultralow SNP density in which ddRAD-Seq is not effective.

In this study, we established a SNP genotyping platform to accelerate a breeding program aimed at developing a high-O/L peanut using MABS. First, we investigated the possibility of using ddRAD-Seq; however, whole-genome resequencing using the A. duranensis and A. ipaënsis genomes as references revealed that this technique was not suitable for our breeding materials because of their low genomic SNP densities. Next, we performed TAS to determine the genotypes of SNPs throughout the genomes of the breeding lines, revealing genome segments from donor parents and recipients.


Materials and Methods

Plant Materials and DNA Extraction

Five inbred lines of peanut, including four Virginia types (Nakateyutaka, YI-0311, Chiba-handachi, and Satonoka) and one Spanish type (Kintoki), were subjected to whole-genome resequencing analysis. An F2 mapping population (n = 190) derived from a cross between Nakateyutaka and YI-0311 (NYF2), which was used in our previous study (Shirasawa et al., 2012), was used for ddRAD-Seq analysis. Three breeding lines, Chiba 118 (BC2F2), Chiba 119 (BC3F3), and Chiba 121 (BC3F3), were used for demonstration of TAS in MABS. The three lines were siblings derived from a cross between YI-0311 as the donor of the high-O/L trait and Nakateyutaka as an elite recurrent parent (see Koilkonda et al. [2013] for details of the breeding scheme). A total of 20 plants (four, eight, and eight plants from Chiba 118, 119, and 121, respectively) were individually tested. In addition, other breeding materials from the BC3F1 population (n = 92) (specifically, 415BC3, derived from a cross between YI-0311 [donor] and Chiba-handachi [recurrent]) were also used for genome-wide genotyping by TAS. Genome DNA was extracted from leaves of each line using the DNeasy Plant Mini Kit (Qiagen).

Double-Digest Restriction-Site-Associated DNA Sequencing Analysis

Genomic DNA of each line of NYF2 and its parental lines, YI-0311 and Nakateyutaka, were double-digested with the restriction enzymes PstI and MspI. The ddRAD-Seq libraries were constructed and sequenced on a MiSeq (Illumina) in paired-end 251 bp mode as described by Shirasawa et al. (2016a).

Whole-Genome Resequencing Analysis

Paired-end sequencing libraries (insert size, 500 bp) for five peanut lines, Nakateyutaka, YI-0311, Chiba-handachi, Satonoka, and Kintoki, were prepared as described by Shirasawa et al. (2016b). The nucleotide sequences were determined using massively parallel sequencing by synthesis on a HiSeq2000 (Illumina) in paired-end 101 bp mode.

Target Amplicon Sequencing Analysis

Single nucleotide polymorphisms between YI-0311 and both Nakateyutaka and Chiba-handachi were selected from whole-genome resequencing data to cover the whole genome in 10-Mb intervals. Primer pairs for amplification of DNA fragments, including SNPs, were designed using Primer3 (Rozen and Skaletsky, 2000) to yield amplicons of 300 to 400 bp (Supplemental Table S1). Mixtures of primer pairs were prepared for multiplex PCR. Nested PCR amplifications were performed as follows: in the first PCR, DNA fragments were amplified from SNP loci, and the amplicons were barcoded to distinguish individuals in the second PCR. Reaction mixtures for the first PCR (5 μL) contained 2.5 ng of peanut genomic DNA, 1×QIAGEN Multiplex PCR Master Mix of Multiplex PCR Kit (Qiagen), and 0.1 μM of each primer (Supplemental Table S1). Thermal cycling conditions were as follows: 15 min initial denaturation at 95°C; 20 cycles of 30 s of denaturation at 94°C, 90 s of annealing at 60°C, and a 90 s extension at 72°C; and a final 10 min extension at 72°C. Amplicons from each line were pooled and used as template DNA for the second PCR. Reaction mixtures for the second PCR (10 μL) contained 0.2 ng of amplicons from the first PCR, 1× Buffer for KOD-plus- (Toyobo Life Science), 0.2 mM of dNTPs, 1.5 mM of MgSO4, 0.8 μM of each primer (Index primers reported in Shirasawa et al. [2016a]), and 0.4 U of KOD-plus- DNA polymerase (Toyobo Life Science). Thermal cycling conditions were as follows: a 3 min initial denaturation at 94°C; 35 cycles of 30 s of denaturation at 98°C, 60 s of annealing at 55°C, and a 2 min extension at 68°C; and a final 7 min extension at 72°C. The final PCR products were pooled and separated on a BluePippin 1.5% agarose cassette (Sage Science), and fragments of 400 to 750 bp were purified using the PCR Clean-Up Micro Kit (Favorgen Biotech Corporation). Concentrations of the resultant libraries were determined using the KAPA Library Quantification Kit (KAPA Biosystems) on an ABI-7900HT real-time PCR system (Life Technologies). Nucleotide sequences of the libraries were determined on a MiSeq (Illumina) in paired-end 300-bp mode.

Computational Data Processing

Data processing of sequence reads, including quality control of raw sequence reads, adaptor trimming, mapping of trimmed reads onto reference sequences, and primary SNP calling, was performed as described by Shirasawa et al. (2016a). Low-quality sequences were removed and adapters were trimmed using PRINSEQ and fastx_clipper in FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit). The filtered reads were mapped onto genome sequences concatenated of two wild Arachis diploids, A. duranensis and A. ipaënsis (Bertioli et al., 2016), as a reference using Bowtie 2 (Langmead and Salzberg, 2012). The resultant sequence alignment–map format (SAM) files were converted to binary sequence alignment–map format (BAM) files and subjected to SNP calling using the mpileup option of SAMtools and the view option of BCFtools (Li et al., 2009). Lengths of genome regions covered with more than one read at least were calculated with genomeCoverage option of BEDtools (Quinlan and Hall 2010). In the ddRAD-Seq and whole-genome resequencing analysis filtering criteria for high-confidence SNPs were as follows: (i) homozygous loci in both lines and (ii) SNP quality scores of 999 for each locus. The effects of SNPs on gene function were predicted using SnpEff v4.1g (Cingolani et al., 2012). In the TAS analysis, primary data processing was performed as described above. High-confidence SNPs were selected using VCFtools (Danecek et al., 2011) with the following criteria: (i) homozygous loci in the parental lines, (ii) depth of coverage ≥4 for each line, (iii) SNP quality value >10 for each locus, (iv) minor allele frequency ≥0.2 for each locus, and (v) proportion of missing data <0.5 for each locus.


Results

Double-Digest Restriction-Site-Associated DNA Sequencing for the NYF2 Mapping Population

In accordance with our previous study, we used the restriction enzymes PstI and MspI to construct ddRAD-Seq libraries from the NYF2 population (n = 190) and its parental lines Nakateyutaka and YI-0311. The libraries were sequenced to obtain 50.8 million reads (265,000 reads per line, on average). After trimming low-quality sequences and adaptors, 45.7 million high-quality reads (90.0%; 238,000 reads per line, on average) were mapped onto the concatenated genome sequences of the two wild peanut diploids, which were used as a reference for tetraploid peanut. From the high-quality data, an average of 86.9% of reads were successfully aligned onto the reference sequences, covering 2.9 Mb of the genome. However, only 87 SNPs were detected in the mapping data (Supplemental Fig. S1). Therefore, SNP density in the genomes of our materials could be estimated to be three SNPs per 100 kb. Considering our previous study indicating that SNP density in the genome influences the detectability of SNPs by ddRAD-Seq (Shirasawa et al., 2016a), the ddRAD-Seq technique is not effective for genome-wide genotyping in a MABS program.

Whole-Genome Resequencing of Five Peanut Lines to Identify Inter- and Intraspecific Single Nucleotide Polymorphisms

With the aim of obtaining genome-wide SNPs between parental lines of the MABS programs, as well as among Japanese peanut cultivars, we performed whole-genome resequencing analysis on the five lines: Nakateyutaka, YI-0311, Chiba-handachi, Satonoka, and Kintoki. An average of 656 million raw reads were obtained for each line, corresponding to ∼22× genomic coverage. After trimming of low-quality data and adaptor sequences, 82.5% of these were selected as high-quality reads. Of the high-quality reads, 97.1% could be aligned onto the reference genome sequences of A. duranensis and A. ipaënsis, covering 87.9% of the genome (83.0% of the A genome and 91.7% of the B genome) with at least one.

From the mapping alignment, 5,056,484 SNP candidates were discovered by imposing a SNP quality score threshold of 999 (Supplemental Table S2). To validate the accuracy of a subset of SNPs, randomly selected intraspecific SNP candidates were verified using the amplicon sequencing strategy described by Shirasawa et al. (2016a). This step revealed that seven out of the 145 tested candidates were false positives, but the remaining 138 were confirmed as actual SNPs, indicating that 95.2% of candidates were true positives. As a result of the mapping alignment of sequence reads from cultivated peanut onto the genomes of the wild species, A. duranensis and A. ipaënsis, the high-quality SNPs contained both polymorphisms between cultivated peanut and wild ancestors (interspecific SNPs) and among cultivated peanut lines (intraspecific SNPs). Thus, of the 5,056,484 SNPs, 4,791,359 were categorized as interspecific polymorphisms between cultivated peanut and either A. duranensis or A. ipaënsis, whereas 265,125 were categorized as intraspecific polymorphisms among the five cultivated peanut lines (Supplemental Table S2).

The interspecific SNPs consisted of 3,606,584 transitions and 1,184,775 transversions (Ts/Tv ratio = 3.0). The A and B genomes contained remarkably different numbers of SNPs: 4,523,863 and 267,496, respectively. The average density of SNPs was 434.6 per 100 kb (Ts/Tv ratio = 3.2) in the A genome and 19.9 SNPs per 100 kb (Ts/Tv ratio = 1.8) in the B genome (Supplemental Table S2) as reported by Bertioli et al. (2016). Furthermore, the SNPs were unevenly distributed over the genome (Supplemental Fig. S2A). Extreme biases were observed: SNP-enriched regions, so-called hot-spots, were observed in B02, B04, B05, B06, B07, B08, and B09, whereas cold-spots with very few SNPs were observed in A01 (40–80 Mb), A05 (40–80 Mb), and A09 (40–55 Mb).

On the other hand, the intraspecific polymorphisms comprised 186,056 transitions and 79,069 transversions (Ts/Tv ratio = 2.4), with 92,434 and 172,691 polymorphisms in the A and B genomes, respectively; two cold spots were present in B04 and B05 (Supplemental Fig. S2B). The SNP densities were 8.9 and 12.9 SNPs per 100 kb in the A and B genomes, respectively (Supplemental Table S2) with Ts/Tv ratios of 2.5 and 2.3. The average number of SNPs between any two of the five lines was 117,024 (4.9 SNPs per 100 kb) (Table 1), ranging from 25,549 between Nakateyutaka and Satonoka (1.1 SNPs per 100 kb) to 251,678 between Chiba-handachi and Kintoki (10.6 SNPs per 100 kb). An average of 229,466 SNPs (density, 9.6 SNPs per 100 kb over the genome) were identified between the Spanish type (Kintoki) and Virginia type (Nakateyutaka, YI-0311, Chiba-handachi, and Satonoka), whereas only 42,063 SNPs (1.8 SNPs per 100 kb) were identified among the Virginia types.


View Full Table | Close Full ViewTable 1.

Number and density of single nucleotide polymorphisms (SNPs) between the peanut cultivars. Upper and lower triangles indicate number and density (SNPs per 100 kb) of SNPs between the cultivars.

 
Cultivars
Chiba-handachi Nakateyutaka YI-0311 Satonoka Kintoki
Chiba-handachi 45,316 55,709 39,527 251,678
Nakateyutaka 1.9 44,241 25,549 222,208
YI-0311 2.3 1.9 42,036 218,861
Satonoka 1.7 1.1 1.8 225,115
Kintoki 10.6 9.3 9.2 9.4

We investigated the probable effects of these polymorphisms using the SnpEff software (Cingolani et al., 2012). Among interspecific and intraspecific SNPs, the predominant types were intergenic polymorphisms (91.1 and 94.4%, respectively), followed by intron variants (5.9 and 3.6%, respectively) (Supplemental Table S3). High-impact and moderate SNPs constituted, respectively, 0.1 and 1.2% of the interspecific SNPs and 0.1 and 1.0% of intraspecific SNPs. The nonsynonymous/synonymous mutation ratios (Ka/Ks) were 1.43 and 2.19 for interspecific and intraspecific SNPs, respectively.

Target Amplicon Sequencing for Genome-Wide Single Nucleotide Polymorphism Genotyping in Breeding Populations

The breeding populations for high-O/L acid peanut lines were derived from combinations of YI-0311 and either Nakateyutaka or Chiba-handachi. Of the 27,317 SNPs segregating in these combinations, we selected 244, covering the whole genome at 10 Mb intervals. Subsequently, 22 SNPs close to each of the two FAD2 loci on A09 and B09 were also added to the target. Thus, 288 SNPs (244 + 22 + 22) were selected for TAS analysis, and we designed primers against the flanking sequences of each target (Supplemental Table S1).

Because it would be time-consuming, expensive, and laborious to perform PCRs one by one with individual primer pairs across all breeding materials, we instead performed multiplex PCR. To determine the most effective number of primer pairs in a plex and the efficiency of DNA amplification by multiplex PCR, we tested five types of plex: (i) two subsets of 96-plex; (ii) four subsets of 48-plex; (iii) eight subsets of 24-plex; (iv) 12 subsets of 16-plex; and (v) 48 subsets of 4-plex. Genome DNAs of Nakateyutaka and YI-0311 were used as templates for the multiplex PCRs. The PCR amplicons were purified and sequenced, and the sequence data were mapped onto the reference genomes of the wild species. As expected, the success rate of SNP detection correlated negatively with the numbers of primers in the plex, ranging from 10.9% in 96-plex PCR to 85.7% (Fig. 1). Accordingly, we adopted 24-plex (71.1%) or 48-plex PCR (58.9%) for subsequent analyses.

Fig. 1.
Fig. 1.

Detectability of single nucleotide polymorphisms (SNPs) in the multiplex polymerase chain reaction (PCR): A total of 192 SNPs were divided into 2, 4, 8, 12, and 48 subsets for 96-, 48-, 24-, 16-, and 4-plex PCR, respectively, using genomic DNA from Nakateyutaka and YI-0311 as templates.

 

In the 20 BC2F2 or BC3F3 plants of Chiba118, Chiba119, and Chiba121, as well as in the parental lines, the target SNP loci were amplified with six subsets of 48-plex PCR (a total of 288 primer pairs). The amplicons were sequenced to obtain 3.2 million reads for each line. After trimming adaptors and low-quality sequences, 1.8 million reads per line were mapped on the reference genome, corresponding to a mean mapping rate of 95.8%. Data on SNP genotypes were obtained from 181 target loci with a depth of coverage of 1916 reads per line on average. Graphical genotypes indicated that the 20 lines had 88.4% of the Nakateyutaka genome on average, ranging from 84.9% in Chiba121-24 to 93.4% in Chiba118-8 (Fig. 2A). The remaining 11.6%, including the two FAD2 loci, were derived from the genome of YI-0311.

Fig. 2.
Fig. 2.

(previous page) Graphical genotypes of breeding materials: Each row indicates breeding lines, for example, (A) Chiba 118, Chiba 119, Chiba 121, and their parents (Nakateyutaka and YI-0311) and (B) 415BC3 and their parents (Chiba-handachi and YI-0311). The peanut chromosomes, A01 through B10, are shown as bars colored red, blue, yellow, or gray representing homozygotes of alleles of recurrent parents (Nakateyutaka or Chiba-handachi), homozygotes of alleles of the donor (YI-0311), heterozygotes, and missing data, respectively. The FAD2A and FAD2B loci, responsible for the oleic acid content in seeds, are indicated at the top.

 

On the other hand, in the 415BC3 population, including 92 BC3F1 plants and both parents, we amplified the target SNP loci with 12 subsets of 24-plex PCR. The amplicons were sequenced to obtain 521,400 reads for each line, and 520,100 filtered reads per line were mapped onto the reference genomes, in which an average mapping rate was 70.6%. Genotyping data were obtained from 128 loci with an average depth of coverage of 865 reads per line. Graphical genotypes indicated that the 92 lines had 55.4% of the Chiba-handachi genome on average ranging from 40.3% in 415BC3-39 to 75.3% in 415BC3-12 (Fig. 2B). The remaining 44.6%, including the FAD2B locus, was derived from the genome of YI-0311.


Discussion

Whole-genome resequencing strategies are powerful methods for discovering hundreds of thousands of genome-wide SNPs in plants. In this study, we performed large-scale SNP identification in cultivated peanut using the genome sequences of its diploid ancestors (A. duranensis and A. ipaënsis) as a reference. Because the genome diversity of the cultivated peanut is quite low, probably because of recent polyploidization from the two wild species, a great deal of effort was previously expended to develop DNA markers that can detect polymorphisms among the cultivars (Pandey et al., 2012). Simple-sequence repeats have been popular in genetic studies of cultivated peanut, and these markers have contributed to establishment of a genetic linkage map with 1114 loci (Shirasawa et al., 2012). However, many more DNA markers would be required to perform accurate genetic and genomic studies such as QTL analyses, genome-wide association studies, genomic selection, as well as breeding programs per se. The SNPs covering the whole genome, which can be easily and cost-effectively identified by whole-genome resequencing strategies, will satisfy the demand for additional markers for use in peanut genetics, genomics, and breeding.

Genotyping-by-sequencing methods, including ddRAD-Seq, are popular for genome-wide SNP genotyping because of their experimental flexibility and cost-effectiveness, which, in large part, are due to their exploitation of NGS technologies (Davey et al., 2011). Indeed, GBS and ddRAD-Seq have contributed to acceleration of plant genetics, genomics, and breeding (He et al., 2014). In peanut, a genetic linkage map was constructed based on a ddRAD-Seq analysis, in which genetically distant lines were used as parents of the mapping population (Zhou et al., 2014). In this study, however, few SNPs were made available by ddRAD-Seq technology as a result of the low genetic diversity of materials used for practical breeding purposes. In a previous study, in which we used ddRAD-Seq in tomato, we predicted that SNP density in the genome is a key determinant of SNP detectability (Shirasawa et al., 2016a). The SNP density in cultivated peanut (4.9 SNPs per 100 kb) was much lower than that in tomato (11.9–98.9 SNPs per 100 kb). To overcome this limitation, we propose whole-genome resequencing analysis followed by TAS as a possible alternative. Because of this technique’s high degree of experimental flexibility, lower DNA requirement, and high operating speed, it may be advantageous over SNP-chip array technologies.

Theoretically, BC2 and BC3 lines are expected to carry 12.5 and 6.25% of the genome of the donor parent, respectively. As expected, in the BC2F2 (Chiba118) and BC3F3 (Chiba119 and Chiba121) lines, the donor YI-0311 genome fractions were eliminated from the selected lines. However, in the BC3F1 population (415BC3), the fractions of the donor genome were much higher than the expected values. The Chiba 118, 119, and 121 lines were subjected to MABS with 19 to 42 SSR markers (Koilkonda et al., 2013), whereas no MABS was performed on the 415BC3. These contrasting results demonstrate the efficiency of MABS in peanut breeding. When we used SSRs in MABS, 2 to 3 mo were required to obtain genotypes of 19 of 42 markers across three lines of 108 to 178 plants. In addition, because of the small number of polymorphic SSR markers available, it was impossible to use SSR markers evenly distributed across the genome. In TAS, we can obtain genotype data from 128 to 181 SNPs from 112 plants within a few weeks, and the resultant SNPs are almost evenly spaced across the genome (Fig. 2). In our preliminary experiments, accuracy of the SNP calls were validated by Kompetitive allele-specific PCR analysis (LGC Genomics), in which 140 data points from 10 SNPs across all of the 14 lines of Chiba118, Chiba119, and Chiba121, including parents, were completely matched to the SNP calls in the TAS analysis (data not shown). Therefore, we predict TAS will accelerate MABS in peanut by increasing throughput and accuracy.

The genome coverage of the two wild species was significantly lower in the A genome (83.0%) than in the B genome (91.7%). In addition, the A genome had >20 times as many interspecific SNPs as the B genome (Supplemental Fig. S2A; Supplemental Table S2). These differences suggest that sequence similarity between the A. ipaënsis genome and the B genome of A. hypogaea is higher than that between the A. duranensis genome and the A genome of cultivated peanut. Consistent with this, diploid A-genome chromosomes are distinctly less similar to cultivated peanut sequences than B-genome chromosomes (Bertioli et al., 2016). Furthermore, the Ts/Tv ratio of interspecific SNPs in the A genome was greater than that of the interspecific SNPs in the B genome, suggesting that the origin of interspecific SNPs in the A genome is different from that in the B genome. However, the Ka/Ks ratios of interspecific SNPs in the A and B genomes were not significantly different, implying that both the A and B genomes were subject to evolutionary pressure after tetraploidization. For intraspecific SNPs, on the other hand, the genomic density, Ts/Tv ratio, and Ka/Ks ratio were comparable between the A and B genomes, suggesting that the genomes of cultivated peanut coevolved under similar selective pressures in breeding programs.

In conclusion, TAS technology is useful for genome-wide SNP genotyping in cultivated peanut, in which SNP density in the genome is too low to detect by a GBS strategy such as RAD-Seq. Because of the availability of the genomes of the diploid ancestors as a reference for the cultivated peanut (Bertioli et al., 2016), whole-genome resequencing technique enabled easy and rapid identification of genome-wide SNPs, which provide insights into Arachis genome evolution and adaptation and facilitate the development of DNA markers for use in peanut genetics, genomics, and breeding.

Footnotes

Nucleotide sequence data and the SNP information are available from the DDBJ Sequence Read Archive (accession numbers DRA004503–DRA004506) and the Kazusa Marker DataBase (Shirasawa et al., 2014; http://marker.kazusa.or.jp), respectively.

Supplemental Information Available

Supplemental Fig. S1. Physical positions of SNPs in the peanut genome, detected by ddRAD-Seq analysis: Red bars on peanut chromosomes (A01–B10) indicate SNPs detected by ddRAD-Seq analysis of Nakateyutaka, YI-0311, and their F2 progeny (NYF2).

Supplemental Fig. S2. SNP density maps in peanut: Density of interspecific (A) and intraspecific (B) SNPs. Line charts show densities of SNPs (SNPs/100 kb) in each peanut chromosome. SNP-rich and poor regions were indicated with red and blue bars.

Acknowledgments

We are grateful to S. Sasamoto and C. Mimani (Kazusa DNA Research Institute) for technical assistance. This work was supported by the Science and Technology Research Promotion Program for Agriculture, Forestry, Fisheries and Food Industry (Grant Number 26107C) and the Kazusa DNA Research Institute Foundation.

 

References

Footnotes


Comments
Be the first to comment.



Please log in to post a comment.
*Society members, certified professionals, and authors are permitted to comment.