Fragaria is the most commercially important soft fruit genus, primarily due to the cultivation of the genetically complex octoploid species F. ×ananassa (2n = 8x = 56). In 2009, the world production of strawberries exceeded 4.1 million t and the crop was valued in excess of US$4 billion (FAO, 2011). Due to its economic importance, F. ×ananassa has been the subject of much genetic research aimed at developing superior cultivars with enhanced disease resistance, fruit quality, and other characters, prompting the development of a number of molecular marker maps for this species (Rousseau-Gueutin et al., 2008; Sargent et al., 2009; van Dijk et al., 2010).
Simple sequence repeats (SSRs) have been the marker of choice in the genus Fragaria for linkage map development to date (Rousseau-Gueutin et al., 2008; Sargent et al., 2006; Sargent et al., 2009; Spigler et al., 2010; van Dijk et al., 2010) due to their abundance in the genome, their codominant and highly polymorphic nature, and their relative ease of development from enriched genomic libraries and expressed sequence tag (EST) collections (Celton et al., 2009; Sargent et al., 2003). The diploid Fragaria spp. reference map has been constructed using predominantly SSR markers, and a total of 272 have previously been mapped in the diploid Fragaria spp. reference linkage map Fragaria vesca ‘815’ × Fragaria bucharica ‘601’ (FV×FB) (Sargent et al., 2008). The reference map was developed from an interspecific F2 population derived from a cross between Fragaria vesca L. and its closest diploid relative Fragaria bucharica Losinsk.; it spans seven linkage groups, corresponding to the basic, seven-member Fragaria spp. chromosome set and covers a total of 528.1 cM. This map contains a total of 422 sequence-characterized sequence tagged site (STS) markers mapped in the full progeny (Ruiz-Rojas et al., 2010) and a further 230 STS markers mapped using the bin-mapping progeny (Illa et al., 2011; Sargent et al., 2008).
The recent publication of the woodland strawberry (F. vesca) genome sequence was a milestone in plant biology (Shulaev et al., 2011). Not only was it one of the smallest plant genomes ever to be sequenced, but sequencing was performed using only short-read sequencing technologies. The complex polyploid genome of the cultivated strawberry genome precludes de novo assembly of sequences generated using current sequencing technologies, and so F. vesca (2n = 2x = 14), a close relative of Fragaria ×ananassa Duchesne ex Rozier and widely regarded as the closest extant descendent of the octoploid's A genome donor (Davis et al., 2009), was chosen as a surrogate system for the development of a diploid reference genome sequence for the genus (Shulaev et al., 2011). The F. vesca ‘Hawaii 4’ (FvH4) genotype was chosen due to its amenability to genetic transformation, its self-compatibility and therefore highly homozygous genome, and its perpetually flowering “semperflorens” habit (Shulaev et al., 2011). Fragaria vesca ‘Hawaii 4’ was sequenced to 39x depth of coverage and the sequence is composed of a total of 219 Mbp of nucleotide data contained in more than 3200 genome sequence scaffolds.
Despite the large number of scaffolds, the majority of the genome sequence, 209.3 Mbp (96%), is contained in just 247 scaffolds of over 50 kbp in length (Shulaev et al., 2011). The genome was assembled into scaffolds and pseudochromosomes without a physical map, assigning the vast majority of contiguous sequences (198.1 Mbp of the sequence, contained in 204 sequence scaffolds) to a physical location on the F. vesca genome using molecular markers genetically mapped to the FV×FB linkage map (Shulaev et al., 2011). Of these, 131 scaffolds containing 169.5 Mbp of sequence have been anchored using the full FV×FB mapping progeny. The remaining 73 scaffolds were anchored using a selective mapping approach known as bin mapping (Sargent et al., 2008), the majority through sequencing of a reduced complexity genome scan of each of the six genotypes that comprise the selective mapping or “bin” set (Celton et al., 2010).
To complement and extend the existing SSR marker resources available for Fragaria spp., in particular for the development of saturated linkage maps, we developed a set of novel SSR markers from regions of the F. vesca genome that were unanchored and thus previously unmapped or had been anchored only through bin mapping. We designed 152 novel primer pairs flanking selected polymorphic SSRs and, through mapping of these loci and a further 42 previously published SSRs, anchored a further 28.2 Mbp of scaffolded genome sequence to precise positions on the FV×FB reference map using the full mapping progeny of 76 individuals. Using these additional data we have improved the resolution and completeness of the anchoring of the F. vesca genome and have created a set of revised pseudochromosomes for FvH4.
Materials and Methods
Fragaria vesca ‘815’ × Fragaria bucharica ‘601’ Diploid Fragaria Reference Population
Novel markers developed in this investigation were tested for polymorphism between the grandparental genotypes (F. vesca ‘815’ and F. bucharica ‘601’) of the F2 diploid Fragaria spp. reference mapping population FV×FB (Sargent et al., 2006). Segregation data for markers that were polymorphic between the grandparental genotypes were generated in the full FV×FB progeny of 76 individuals.
Simple Sequence Repeat Marker Development
Previously unanchored FvH4 sequence scaffolds over 50 kbp in length were submitted individually to the SSR server at the Genome Database for Rosaceae (Jung et al., 2008) to determine the precise locations of di- and trinucleotide SSR loci contained within each scaffold sequence. Di- and trinucleotide SSRs were investigated as these had been shown to be the most polymorphic motifs in previous mapping investigations in Fragaria spp. Cutoff criteria for candidate SSR marker loci were di- and trinucleotides with a minimum repeat length of 12 and 6 repeats, respectively. Where no SSRs were discovered under these criteria, the repeat length of dinucleotides was reduced to eight repeats. Simple sequence repeats were considered for marker development when they occurred with a minimum of 200 bp of high-quality sequence data flanking either side of the repeat region. A maximum of four pairs of polymerase chain reaction (PCR) primers targeting SSR loci were designed and synthesized per scaffold. Where possible, SSRs were selected for marker development at evenly spaced intervals throughout the scaffold sequence. Primers were designed to amplify products between 100 and 350 bp in length. All primer pairs were designed with PRIMER 3 (Rozen and Skaletsky, 2000) to have an oligonucleotide melting temperature of 55 to 65°C (optimum 60°C), a primer length of 20 to 24 bp (optimum 22 bp), and a 2-bp GC clamp at the 5′ end of each primer. Primers were synthesized by Integrated DNA Technologies Ltd (Leuven, Belgium). Primer pairs amplifying markers mapping to the FV×FB map were named with the prefix FvH4 followed by a four digit numerical identifier, that is, FvH40001.
Polymerase Chain Reaction Conditions and Product Visualization
Following the touchdown protocol described by Sargent et al. (2003), amplicons were generated from the grandparental genotypes of the mapping population FV×FB for novel SSR markers developed in this investigation and SSRs previously published by Sargent et al. (2008) and Spigler et al. (2010) that had not been previously scored in the full FV×FB mapping progeny. Initially, PCR products were assessed for polymorphism following agarose gel electrophoresis, ethidium bromide staining, and visualization over ultraviolet light. Where possible, at least one marker identified as polymorphic between the grandparental genotypes per sequence scaffold was labeled on the forward primer with one of two fluorescent dyes: 6-FAM or HEX (IDT, Leuven, Belgium). Labeled products were then sized following fractionation by capillary electrophoresis through a 3100 genetic analyzer (Applied Biosystems, Warrington, UK) and the data generated were collected and analyzed using the GENESCAN and GENOTYPER (Applied Biosystems, Warrington, UK) software applications.
Based on compatible product sizes and fluorescent dye colors, markers (primer pairs) were grouped into sets of up to 16 and multiplex PCR was performed using the “Type-it” PCR mastermix (Qiagen, Crawley, UK) following the manufacturer's recommendations, except that PCRs were performed in a final volume of 12.5 μL. Reactions were performed using the following PCR cycles: an initial denaturation step of 95°C for 5 min was followed by 28 cycles of 95°C for 30 sec, an annealing temperature of 55°C decreasing by 0.5°C per cycle until 50°C for 90 sec, and 72°C for 30 sec, followed by a 30 min final extension step at 68°C. Samples were analyzed by capillary electrophoresis as described above.
Data Analysis and Map Construction
Segregation data for all new markers were analyzed for cosegregation with previously mapped markers using the published data for the FV×FB mapping population (Ruiz-Rojas et al., 2010) to determine the map position of the markers and thus the physical positions on the FvH4 genome sequence of the sequence scaffolds from which these markers were derived. Data were analyzed using JOINMAP 4.0 (Van Ooijen, 2006) using the Kosambi mapping function. Linkage group construction was determined using a minimum logarithm of the odds (LOD) score threshold of 3.0, a recombination fraction threshold of 0.35, ripple value of 1.0, jump threshold of 3.0, and a triplet threshold of 5.0.
Creation of Pseudochromosomes
Pseudochromosomes were created from FvH4 sequence scaffolds over 50 kbp in length that had been anchored to the FV×FB reference map by markers developed in this investigation or using previously mapped markers. Sequence scaffolds were arranged in order based on their map positions on the FV×FB linkage map and, where possible (when at least two markers were mapped to the same scaffold with sufficient resolution to permit orientation), scaffolds were orientated in accordance with the linkage group on which they were located. Scaffolds were separated in the pseudochromosomes by arbitrary gaps of 10,000 nucleotides (N)10k to allow clear demarcation between the end of one scaffold and the beginning of the next within each pseudochromosome. Fragaria spp. pseudochromosomes were plotted alongside the FV×FB linkage groups using HARRY PLOTTER (Moretto et al., 2010).
Linkage Group and Pseudochromosome Nomenclature
For clarity, when referring to markers mapped to the FV×FB linkage map, linkage group nomenclature followed that of Vilanova et al. (2008): Fragaria spp. linkage groups (FG) FG1 through FG7. When referring to markers or scaffolds located to one of the FvH4 pseudochromosomes, nomenclature followed that of Shulaev et al. (2011): Fragaria spp. pseudochromosome (FC) FC1 through FC7.
Simple Sequence Repeat Identification in Fragaria vesca ‘Hawaii 4’ Sequence Scaffolds
A FASTA (Pearson, 1990) file containing the seven FvH4 pseudochromosomes created from FvH4 sequence scaffolds over 50 kbp in length that had been anchored to the FV×FB reference map was submitted to MSATCOMMANDER v0.8.2 for Windows (Faircloth, 2008) for SSR screening. All SSRs with a di- or trinucleotide motif at least 12 and 6 repeats, respectively, were identified and compiled, listing their repeat length, motif, and pseudochromosome position.
Simple Sequence Repeat Design, Amplification, Polymorphism, and Mapping in Fragaria vesca ‘815’ × Fragaria bucharica ‘601’
In total, 296 primer pairs were designed from FvH4 genome sequence scaffolds. Of these, 152 primer pairs (51%) fulfilled three key criteria: (i) they generated single-locus polymorphisms between the grandparental genotypes of the FV×FB mapping population, (ii) they were targeted to previously unmapped or bin-mapped sequence scaffolds and were thus located in areas of the genome from which SSR markers had not previously been developed, and (iii) they mapped to one of the seven diploid Fragaria spp. linkage groups. One hundred twenty-one of the remaining 144 primer pairs generated products that were monomorphic between the parents of the FV×FB mapping population. Of the remaining primer pairs, 14 generated complex, or nonspecific, amplification profiles, six failed to amplify a product, and three generated products of over 1 kbp in length that could not be reliably genotyped in the progeny. Supplemental Table S1 lists the locus names, primer sequences, repeat motif, repeat number, and European Molecular Biology Laboratory (EMBL) reference numbers of the 152 mapped primer pairs. Additionally, a further 42 polymorphic SSRs that had been developed in previous investigations (Sargent et al., 2008; Spigler et al., 2010) and that had been identified in scaffolds over 50 kbp in length but had not previously been scored in the full FV×FB mapping population were also mapped.
Genome Sequence Scaffolds Anchored to the Fragaria vesca ‘815’ × Fragaria bucharica ‘601’ Reference Map
The 194 SSR loci genotyped in the full mapping population anchored a total of 93 genome sequence scaffolds to the FV×FB reference map. Sixty of these scaffolds had been previously located to FV×FB mapping bins using a selective mapping strategy (Celton et al., 2010) while 33 had not previously been located to the FV×FB map. Simple sequence repeat markers developed for one scaffold (scf0513139) mapped to two distinct locations: one on FG2 and one on FG5 of the FV×FB reference map. The sequence of scaffold scf0513139 was therefore split at nucleotide 563,780 between the two mapped SSRs in an area of low sequence coverage, and the two sections of the scaffold were renamed 513139a and 513139b following the convention of Shulaev et al. (2011).
Ninety-three FvH4 genome sequence scaffolds were anchored using markers mapped in the full FV×FB progeny in this investigation, adding a further 28.8 Mbp of sequence to precise locations on the F. vesca genome. These, along with those anchored using markers mapped in the full FV×FB progeny by Shulaev et al. (2011), brings the total number of anchored scaffolds (including the two scaffolds created by splitting scaffold scf0513139) to 222. These scaffolds are anchored using a total of 411 genetic markers (Fig. 1), 194 mapped in the present investigation and a further 217 previously mapped markers (Ruiz-Rojas et al., 2010), and contain 197,682,269 nucleotides (197.7 Mbp) of sequence including embedded gaps. Using only markers that were contained in genome sequence scaffolds and by removing markers where their positioning on the FV×FB was questionable—through the erroneous positioning of an otherwise robustly positioned sequence scaffold within a region of the genetic map, poor fit of the data for a single marker in relation to other markers on the linkage group, or large amounts of data missing for a marker—we have produced an updated version of the reference linkage map for Fragaria spp. in which the positioning of the markers and underlying sequence scaffolds is robust and reliable. The distribution of scaffolds per linkage group in the updated map was not even, with FG3 containing the highest number of scaffolds (47) and FG7 the lowest (14). The distribution of sequence was also uneven, the longest pseudochromosome being FC6 (39.5 Mbp), which was almost twice the length of FC1 (20.0 Mbp), while FC3 contained the greatest number of nucleotides per centimorgan of its equivalent linkage group (687 kbp) and FC7 contained the fewest (372 kbp). Table 1 lists the number of scaffolds, linkage groups, and equivalent pseudochromosome lengths along with the number of nucleotides per scaffold and per centimorgan of each of the linkage groups of the FV×FB map.
|Linkage group||No. of scaffolds||Pseudochromosome (FC) length (bp)||Linkage group length (cM)||Megabase pairs per scaffold||Base pairs per centimorgan|
The 222 scaffolds mapped to the FV×FB reference map using the full mapping progeny now represent 90% of the FvH4 genome sequence scaffolds over 50 kbp in length. Figure 2 shows the physical positions of the 222 sequence scaffolds in relation to the markers to which they are anchored on the FV×FB reference map while Supplemental Table S2 lists the 411 genetic markers mapped to the FV×FB reference map that have been identified in genome sequence scaffolds herein or previously by Shulaev et al. (2011), their map positions, and the FvH4 genome sequence scaffolds that they anchor, along with the scaffold sizes in base pairs. Figure 3 displays scaffolds anchored to FG2 of the FV×FB linkage map in this investigation using the full mapping progeny and those anchored in the investigation of Shulaev et al. (2011) in which some scaffolds were anchored using bin mapping, showing the increased precision in the positioning of scaffolds in this investigation. A total of 10% (24) of scaffolds over 50 kbp in length, collectively containing 11.9 Mbp of sequence, were not anchored to the FV×FB map with the full FV×FB mapping progeny using SSRs developed in this investigation; however, of the 11.9 Mbp, 56% (6.7 Mbp) was previously bin mapped, leaving just 5.2 Mbp (2.4% of the FvH4 genome sequence contained in scaffolds over 50 kbp) unanchored to the Fragaria spp. genome. In most cases (92%), when scaffolds were not anchored in this investigation it was due to difficulties in designing primers for amplification of single SSR loci or a lack of polymorphism in the SSRs designed; however, in a very small number of cases (8% unmapped scaffolds), it was due to the absence of SSR motifs of sufficient length around which to design primers.
Fragaria vesca ‘Hawaii 4’ Pseudochromosomes
The seven FvH4 pseudochromosomes (FC1–FC7) containing the 222 genome sequence scaffolds over 50 kbp in length that were anchored to the FV×FB linkage map using the full mapping progeny in this investigation or by Shulaev et al. (2011), along with an eighth pseudochromosome, FC0, containing the 24 unanchored genome sequence scaffolds over 50 kbp in length, have been deposited at the strawberry genome browser (Shulaev et al., 2011; Jung et al., 2005) and have been denoted v1.1.
Simple Sequence Repeat Identification in Fragaria vesca ‘Hawaii 4’ Pseudochromsomes
From the 222 genome sequence scaffolds over 50 kbp in length anchored to the FV×FB genetic map, a total of 10,071 SSRs were identified, 7321 (72.7%) of which were dinucleotide and 2744 (27.3%) of which were trinucleotide, with a repeat length equal to or in excess of 12 and 6, respectively. Simple sequence repeat distribution across the seven FvH4 pseudochromosomes was random with no clustering observed in relation to physical distance (data not shown). There was a strong correlation (R2 = 0.923) between chromosome physical length and number of SSRs per chromosome, with an average of one SSR every 19,518 nucleotides. Simple sequence repeat motifs were present in the genome at different frequencies. Within the dinucleotides, the (AG)n motif was present at the highest frequency (59%), followed by (AT)n (37%) and (AC)n (4%) (Fig. 4a). The (CG)n motif was not found in the FvH4 genome sequence scaffolds with a repeat number n ≥ 12. Within the trinucleotides, the (AAG)n motif was present at the highest frequency (33%), followed by (AAT)n (21%) and (ATG)n (10%), the other seven trinucleotide repeat motifs making up the remaining 36% (Fig. 4b).
In this investigation we have developed and mapped a set of 152 novel polymorphic SSR markers from previously unmapped regions of the diploid Fragaria spp. genome, which will thus be useful for the continued development of genetic linkage maps in strawberry. Additionally, the mapping of these novel SSR markers has improved the precision of the anchoring of the FvH4 genome sequence assembly (Shulaev et al., 2011) through the location of a further 28.2 Mbp of sequence data to the diploid Fragaria spp. reference map FV×FB using the full mapping progeny (Ruiz-Rojas et al., 2010).
Due to their inherent transferability, codominant inheritance, and generally high levels of polymorphisms, SSRs have found enormous utility as genetic markers and have been used extensively as markers in numerous plant species including the development of reference maps for many rosaceous genera (Aranzana et al., 2003; Celton et al., 2009; Gisbert et al., 2009; Graham et al., 2004; Yamamoto et al., 2007). The SSR markers developed in this investigation mapped to discrete positions on the diploid Fragaria spp. reference map (Ruiz-Rojas et al., 2010) and permitted the precise anchoring of a further 28.2 Mbp Fragaria spp. genome sequence data than had previously been anchored (Shulaev et al., 2011). In total, 90% of the genome sequence scaffolds over 50 kbp in length have now been anchored using STS markers in the full FV×FB progeny and a total of 97.6% are anchored either using a conventional- or bin-mapping strategy. The sequencing of the FvH4 genome was not performed as a means to an end, and it was envisaged that the resultant sequence would be utilized a generic tool to assist the entire Fragaria spp. research community. Bin mapping, as performed by Shulaev et al. (2011), is a highly effective strategy for the placement of a large number of markers onto a linkage map with minimum experimental time and cost. However, the technique lacks precision and, while scaffolds anchored in this way were placed accurately in mapping bins on the FV×FB genetic map, the ordering of markers within the bins cannot be determined. Figure 3 displays the positions of scaffolds anchored in this investigation through the mapping of markers in the full FV×FB progeny compared to the placement of scaffolds using bin mapping by Shulaev et al. (2011). Mapping anchor markers in the full FV×FB progeny has increased the precision with which the scaffolds were anchored and therefore the utility of the anchored sequences for further research activities. Thus, the pseudochromosomes presented in this investigation will improve the utility and generic value of the FvH4 sequence as a reference tool for genomics studies within Fragaria spp. and between plant genera. Marker systems exploiting single nucleotide polymorphisms, such as cleaved amplified polymorphic sequence (CAPS) markers (Ruiz-Rojas et al., 2010) and high resolution melting analyses, have found utility in other species including apple (Malus pumila Mill.), potato (Solanum tuberosum L.), and almond [Prunus dulcis (Mill.) D. A. Webb] (Chagné et al., 2008; De Koeyer et al., 2010; Wu et al., 2010), but such markers cannot be readily applied to the cultivated strawberry and its wild progenitors due to the complex polyploid nature of their genomes. The SSRs developed and utilized for scaffold anchoring in this investigation will be of great utility for the genetic characterization of and the continued development of genetic linkage maps for the octoploid cultivated strawberry F. ×ananassa (Sargent et al., 2009; van Dijk et al., 2010).
As an additional contribution to Fragaria spp. genomic information resources needed for mapping and other marker-dependent applications in strawberry, we have also surveyed the content and distribution of SSR repeats within the FvH4 genome sequence. As with other plant species, such as Arabidopsis thaliana (L.) Heynh., rice (Oryza sativa L.), almond (Prunus dulcis), peach [Prunus persica (L.) Batsch], and rose (Rosa spp.) (Jung et al., 2005; McCouch et al., 2002), the most abundant repeat motifs for di- and trinucleotide SSRs in the FvH4 genome sequence were (AG)n and (AAG)n, respectively. The distribution of repeat motifs for SSRs throughout the whole FvH4 genome sequence was similar to that found in ESTs sequenced from F. ×ananassa for dinucleotide repeats but differed for trinucleotide repeats, with the (AAT)n and (ATG)n repeats present in much lower frequencies in the ESTs sequenced and analyzed by (Folta et al., 2005) using similar criteria to define the presence of di- and trinucleotide SSRs. In two more recent studies, however, (Zhang and Deng, 2010) found a similar percentage of (AAG)n trinucleotide repeats in SSRs found within the 17,565 Fragaria spp. ESTs deposited in the EMBL nucleotide sequence repository, with other motifs present at lower frequencies, and (Bombarely et al., 2010) reported similar proportions of both di- and trinucleotide repeats within a collection of 4500 F. ×ananassa ESTs to this investigation, suggesting that SSR repeat motifs in Fragaria spp. are distributed in similar frequencies within the coding and noncoding portions of the genome.
Simple sequence repeat distribution on the FV×FB genetic map was nonrandom, with clustering apparent on all seven linkage groups (Fig. 1) (Ruiz-Rojas et al., 2010; Sargent et al., 2006). The random physical distribution of SSR loci across the FvH4 pseudochromosomes suggests that the clustering of markers on the FV×FB linkage map is due to recombination suppression in specific regions of the genome. Such recombination suppression is apparent on linkage maps of other species, including papaya (Carica papaya L.) and barley (Hordeum vulgare L.), and has been shown to be located within centromeric regions (Chen et al., 2007; Wenzl et al., 2006). The direct effects of centromeric regions on the suppression of recombination have been demonstrated by (Lambie and Roeder, 1986) and thus we postulate that the regions of distinct recombination suppression apparent on all seven FV×FB linkage groups may be the centromeric regions of the seven Fragaria spp. chromosomes. It cannot be ruled out, however, that the clustering, particularly where more than one distinct region was observed on a single linkage group as on FG3, might have been caused by other factors such as low homology between the interspecific chromosomes due to high sequence divergence between F. vesca and F. bucharica or genomic rearrangements in the evolution of the two species since they diverged from a common ancestor.
Using the 152 SSRs developed in this investigation and 42 previously published SSRs we have anchored 93 scaffolds that had not previously been located, to precise but, in the majority of cases, unorientated positions using the full FV×FB mapping progeny and created seven pseudochromosomes composed of 222 sequence scaffolds covering 197.7 Mbp. As greater numbers of plant genomes are sequenced (Kaul et al., 2000; Ming et al., 2008; Shulaev et al., 2011; Tuskan et al., 2006; Velasco et al., 2007, 2010; Vogel et al., 2010; Yu et al., 2002), it has become possible to perform whole genome synteny analyses at the DNA sequence level (Gar et al., 2011; Illa et al., 2011; Jung et al., 2010). Marker density on the FV×FB map presented is one marker every 1.09 cM and, thus, anchoring the majority of the FvH4 sequence using the full mapping progeny will permit such analyses to be performed for Fragaria spp. with far greater precision than could have been achieved if scaffolds had been located using bin mapping alone, where markers were assigned to 45 mapping bins with an average bin length of 12.6 cM (Sargent et al., 2008). The seven FvH4 pseudochromosomes presented in this report will also assist resequencing efforts and genomic investigations in the cultivated strawberry, F. ×ananassa, and its wild octoploid progenitors Fragaria chiloensis (L.) Mill. and Fragaria virginiana Mill.
Supplemental Information Available
Supplemental material is available free of charge at http://www.crops.org/publications/tpg.