Research and development of improved crop varieties encompasses advances in genomics and biotechnology. Consensus genetic maps are available for many model and important crop and animal species including soybean [Glycine max (L.) Merr.] (Song et al., 2004), wheat (Triticum aestivum L.) (Somers et al., 2004), barley (Hordeum vulgare L.) (Wenzl et al., 2006), and chicken (Gallus-gallus domesticus L.) (Groenen et al., 2000) and are central to breeding and diversity initiatives. Recently, technological advances that have substantially reduced costs of sequencing and genotyping promoted the development of genome resources for many non-model species (Varshney et al., 2009). Linkage mapping in cowpea has progressed with marker technology to yield informative and increasingly dense genetic maps (Menendez et al., 1997; Ouedraego et al., 2002; Muchero et al., 2009).
Before this work an Illumina 1536 GoldenGate single nucleotide polymorphism (SNP) assay was developed and implemented to map 928 expressed sequence tag (EST)-derived SNPs in cowpea (Muchero et al., 2009). This map represented a substantial improvement over a previous but population-specific cowpea map, which utilized 441 amplified fragment length polymorphism (AFLP), restriction fragment length polymorphism (RFLP), and random amplified polymorphic DNA (RAPD) markers (Ouedraego et al., 2002). The 2009 cowpea SNP map contained 645 bins with 928 markers arranged on 11 linkage groups (680 cM) and was constructed by genotyping 632 individuals from six recombinant inbred line (RIL) populations. Both the 2009 consensus map and the improved map reported here are based entirely on EST-derived SNPs and all populations were genotyped for the same 1536 loci.
This SNP assay was also recently utilized in conjunction with simple sequence repeat (SSR) genotyping to develop the first genetic map of yard-long or asparagus bean [Vigna unguiculata subsp. sesquipedalis (L.) Verdc.] (Xu et al., 2011). Comparative analyses between subspecies revealed macro-synteny across most linkage groups and demonstrated the utility of the previous consensus map.
Here we report a new consensus map containing 1107 EST-derived SNP markers (856 bins), which was developed by integrating 13 population-specific maps. Improved methods of data analysis are realized in map characteristics and are apparent when surveying synteny of cowpea with soybean, Medicago truncatula Gaertn., and Arabidopsis thaliana (L.) Heynh. using HarvEST:Cowpea 1.27 (Wanamaker and Close, 2011) and Circos (Krzywinski et al., 2009). Development of this highly robust genetic map is of value to ongoing projects including genome assembly, marker assisted breeding, quantitative trait loci (QTL) analysis, map-based cloning, and comparative genomics.
Materials and Methods
The parents and progeny of 13 mapping populations were genotyped for 1536 SNPs using the GoldenGate assay as previously described (Muchero et al., 2009). Deoxyribonucleic acid isolation and preparation for genotyping also followed the methods described in Muchero et al. (2009). Detailed marker information and a plethora of other resources relevant to this work can be found online (Wanamaker and Close, 2011).
Raw data from the assay were imported into an Illumina GenomeStudio V2010.3 Workspace (Illumina, 2010) for analysis using genotype module V1.8.4. Custom workspaces were created for each mapping population to optimize cluster positions. Genotype calls were exported from GenomeStudio as spreadsheets for further data processing.
Data processing before mapping included the removal of apparently rogue individuals that exhibited excessive heterozygosity, nonparental genotypes, or no-call data points. Standards for these parameters were determined empirically for each population by obvious break points within the distribution of data. The parental phase of markers for which the parental genotype was uncertain was determined using the “Suspect Linkages” function within JoinMap4 (Van Ooijen, 2006). Genotypically identical individuals among the mapping populations were identified using the “Similarity of Individuals” function in JoinMap4 and were removed before mapping. Only SNPs with minor allele frequency >0.25 and >95% good calls were included for mapping.
Individual and Consensus Map Construction
Individual linkage maps representing 13 populations were constructed using JoinMap4 (Van Ooijen, 2006). Eleven of these populations were F8 to F10 RIL populations developed by inbreeding and single seed descent while the remaining two (IT84S-2246 × IT93K-503 and IT84S-2246 × ‘Mouride’) were F3–derived F4 families. These populations were selected on the basis of relevance to modern breeding programs, parental polymorphism, and segregation of agronomic traits. The mapping data sets used in Muchero et al. (2009) and Xu et al. (2011) were reused in the present work following the additional cleanup steps summarized above. Marker grouping was determined using logarithm of the odds (LOD) thresholds ≥4 while marker order was determined using LOD thresholds ≥6. These individual maps were compared to each other and to the Muchero et al. (2009) map to determine spurious linkages. When referenced to the consensus map, only two markers were found among the individual maps that corresponded to different consensus linkage groups. In these two scenarios the ultimate linkage group assignment agreed with the most popular assignment among the individual maps. The confounding marker among the population specific maps in which the assignment disagreed was removed, and the population was remapped.
The consensus map was constructed by first comparing the 13 population-specific maps generated using JoinMap4 (Van Ooijen, 2006) to define 11 consensus linkage groups, each of which consisted of at least one linkage group from each population. Consensus linkage groups (Vigna unguiculata linkage groups [VuLGs]) were constructed one at a time using MergeMap (Wu et al., 2008). A coefficient was applied to all map coordinates to correct for the inflation of map distances introduced by MergeMap and to normalize the map to 680 cM. Linkage group orientation was aligned to that of the Muchero et al. (2009) map for consistency and to facilitate comparisons between the two maps. MapChart (Voorrips, 2002) was used to align linkage maps and Circos (Krzywinski et al., 2009) was used for additional visualization of map characteristics.
Cowpea–soybean and cowpea–M. truncatula synteny was visualized using HarvEST:Cowpea 1.27 (Wanamaker and Close, 2011), which determines synteny based on BLASTX scores (<10−10) between cowpea unigenes containing a mapped SNP and translated gene models from reference genomes. For soybean the JGI Glyma1 database (Schmutz et al., 2010) was used while for M. truncatula the Medicago truncatula HapMap Mt3.5 database (Medicago truncatula HapMap Project, 2010) was used. TAIR 10 was utilized in HarvEST:Cowpea 1.27 to determine cowpea–A. thaliana synteny. Consensus map coordinates of cowpea unigenes were compared with chromosomal positions of soybean only if they contained at least five markers in common. Information was extracted from HarvEST to develop Circos (Krzywinski et al., 2009) diagrams.
Population Specific Maps
Characteristics of the 13 individual maps used in the construction of the consensus map are given in Table 1. The number of RILs in each population ranged from 56 in the ‘CB27’ × UCR 779 population to 160 in the CB27 × ‘IT82E-18’ population, with an average of 99 individuals. The number of SNPs mapped per population ranged from 155 in the IT84S-2246 × IT93K-503 population to 560 in the CB27 × UCR 779 population, with an average of 364 SNPs mapped per population. Population-specific map sizes ranged from 302 cM for IT84S-2246 × IT93K-503 to 710 cM for 524B × IT84S-2049 with an average size of 576 cM. The number of linkage groups for each population ranged from 14 (IT84S-2246 × IT93K-503) to 23 (CB27 × IT97K-556-6) with an average number of linkage groups per map of 17. Supplemental Table S1 provides a pairwise comparison of SNPs common among the 13 mapping populations. On average 136 SNPs are shared between a pair of mapping populations, ranging from IT84S-2246 × IT93K-503 and LB30#1 × LB1162 #7 sharing 27 markers to CB27 × IT82E-18 and CB27 × UCR 779 sharing 290 markers.
|Population||Individuals genotyped||Individuals used for mapping||Mapped SNPs||Linkage groups||Map size (cM)|
|CB27 × IT97K-566-6||95||92||438||23||505.56|
|CB27 × IT82E-18||166||160||430||23||701.15|
|CB27 × UCR 779||58||56||560||22||489.40|
|CB46 × IT93K-503-1||130||114||374||17||639.59|
|524B × IT84S-2049||91||85||438||22||710.09|
|Dan Ila × TVu-7778||113||79||288||22||549.56|
|Yacine × 58-77||141||97||435||22||650.98|
|Sanzi × Vita 7||142||122||413||19||753.22|
|IT84S-2246 × IT93K-503||93||88||155||14||302.46|
|IT84S-2246 × Mouride||92||87||347||15||595.33|
|TVu14676 × IT84S-2246-4||147||136||345||14||666.89|
|CB27 × 24-125B-1||108||87||329||23||526.75|
|LB30#1 × LB1162 #7||95||90||180||20||409.94|
The phases of 333 SNPs among all 13 mapping populations in which the parental genotypes were uncertain was inferred using the “Suspect Linkages” function within JoinMap4 (Van Ooijen, 2006). For 159 of these loci it was determined that the original phase designation was incorrect. Phases were subsequently reversed and the linkage group containing them was remapped. The phases of the remaining 174 SNPs were determined to have been called correctly (by chance) and were not inverted. Sixty-three duplicated pairs of individuals, which were genotypically identical for the SNPs under consideration, were solved by removing one of the contributing individuals. In addition, 120 individuals among the 13 mapping populations contained excessive numbers of nonparental genotypes and were removed before mapping. A synopsis of data processing for each mapping population can be found in Supplemental Table S2. Coordinates for the consensus map and the 13 population-specific maps can be found online (Wanamaker and Close, 2011).
Eleven linkage groups were constructed spanning 680 cM. Table 2 summarizes the number of markers mapped, the length (cM), and the number of bins for each VuLG. In total, 1107 markers were mapped across all 11 linkage groups with a range of 72 markers on VuLG 8 to 203 markers on VuLG 3. Linkage group length ranged from 45.2 cM on VuLG 9 to 92.4 cM on VuLG 3. Eight hundred and forty-five bins were mapped on the 11 linkage groups with an average distance between bins of 0.79 cM. This translates to an average of one bin per 733 Kb of the cowpea genome.
|Consensus VuLG||Current map||Muchero et al., 2009||Current map||Muchero et al., 2009||Current map||Muchero et al., 2009|
Figure 1 provides a graphical view of the cowpea map using Circos (Krzywinski et al., 2009), which includes the depiction of parameters characterizing map quality. In this figure five data tracks are drawn whose parameters are oriented so that maximum and minimum values are distal and proximal respectively to the ideogram's center. The first data track, a green histogram, symbolizes the average distance between bins for each VuLG where each white grid line represents a distance of 0.25 cM. The second data track, a purple histogram, displays the average number of markers per bin for each VuLG where each white grid line represents a value of 0.5 markers per bin. The third and fourth data tracks share the same radial position of initiation and thus share the same grid axis where the blue histogram overlaps the underlying red histogram and each white grid line represents 25 units. The blue and red histograms visualize the number of bins per VuLG and the number of markers per VuLG respectively The fifth data track, bands, reside within the VuLGs and depict the relative location of bins in each linkage group.
In the soybean and M. truncatula genome sequences, homeologous genes were identified for 85 and 80% of the SNPs mapped in cowpea, respectively. Supplemental Table S3 lists the soybean and M. truncatula chromosomes that are most syntenic with VuLGs based on the number of cowpea homeologs detected on syntenic chromosomes. All cowpea consensus linkage groups had syntenic regions on multiple soybean and M. truncatula chromosomes. Vigna unguiculata linkage group 1 and VuLG 3 displayed synteny with 7 of the 20 soybean chromosomes while VuLG 8 was syntenic with three soybean chromosomes. When compared to the eight chromosomes of M. truncatula, VuLG 2 and VuLG 3 were syntenic with seven chromosomes while VuLG 4, VuLG 5, and VuLG 7 were syntenic with only three chromosomes. All 11 VuLGs had regions with similarity to the five A. thaliana chromosomes but colinearity was nearly nonexistent and major genome rearrangement was obvious. The current cowpea consensus map accounts for all but three of the 191 SNPs mapped by Xu et al. (2011) in V. unguiculata subsp. sesquipedalis (population LB30#1 × LB1162 #7) and the two maps are highly syntenic across all VuLGs. Figure 2 uses Circos (Krzywinski et al., 2009) to display a genome-wide comparative view of the cowpea consensus map and the 20 chromosomes of soybean. This figure colors links based on cowpea VuLG origin and traces more than 2200 relationships between the two legume species. Expanded views of synteny with soybean are available in Supplemental Fig. S1 for each of the 11 VuLGs. Figure 3 juxtaposes syntenic views of VuLG 3 with soybean chromosome 17 using both the new and previous (Muchero et al., 2009) versions of the cowpea consensus map.
The 13 populations included in the genotyping and consensus map construction (Table 1) were chosen because of their relevance to modern breeding programs and diversity, especially considering the broad range of important traits for which they segregate. These traits include aphid resistance, bacterial blight resistance, cowpea weevil resistance, drought tolerance, Fusarium oxysporum L. wilt resistance, flower thrips and foliar thrips resistance, heat tolerance, individual grain weight, maturity, Macrophomina phaseolina L. resistance, nematode resistance, Striga resistance, seedling cold tolerance, virus resistance, and yield components.
The LB30#1 × LB1162 #7 population is derived from a cross of two V. unguiculata subsp. sesquipedalis accessions from China and not only shared the least amount of markers with the other maps but also contained a relatively low number of mapped markers (180) when compared to the average of all 13 populations (364). This is consistent with the initial mapping of this population recently published by Xu et al. (2011) in which SSR and SNP markers were employed to construct the first genetic map of asparagus bean. Their effort mapped many new loci that provided an additional perspective to the EST-derived SNP marker framework. Consensus maps constructed with and without the LB30#1 × LB1162#7 map were compared to determine the influence this population had on the consensus map. This map was included in the reported consensus map because it contributed three unique markers and did not impose conflicting marker assignments. It is important to consider the fact that the 1536 SNP assay used for genotyping was mainly developed for V. unguiculata subsp. unguiculata and this ascertainment bias may influence the interpretation of diversity for a distinct subspecies for which it was not optimized. This consideration is also important in interpreting marker number comparisons in Supplemental Table S1 because the number of SNPs mapped heavily influences the number of SNPs shared between populations. CB27 × IT82E-18 and CB27 × UCR 779 may have been expected to share the most markers not only because they share a parent (CB27) but the number of mapped SNPs for each of these populations was relatively high (430 and 560 respectively) when compared to the average (364).
High-throughput genotyping may have unexpected benefits to breeding programs as a quality filter. Data processing and cleanup before linkage mapping provides a map with better marker order, which ultimately affects downstream analyses such as QTL mapping or map-based cloning. In our analysis we identified a number of rogue individuals. Rogues are defined as those individuals with excessive no-call rates, heterozygosity, or nonparental alleles. Highly homozygous individuals with significant numbers of nonparental alleles are likely to have arisen from labeling or contamination errors during the eight or more cycles of planting or harvesting or seed cleaning and packaging operations required to develop an advanced RIL population. Individuals with heterozygosity greater than expected based on the level of inbreeding and significant number of nonparental alleles likely arose from outcrossing during the advancement of the RIL populations. Individuals with excessive heterozygosity, no calls, and nonparental genotypes were excluded before importing data into JoinMap4 (Van Ooijen, 2006). Another quality filter was to address genotypically identical individuals, which were removed before mapping and thus circumvented the possibility of biasing the map toward recombination events unique to those duplicated individuals. Supplemental Table S2 summarizes these quality control parameters for each of the 13 populations used in genotyping and mapping.
Genotyping a relatively large number of loci, 1536, allows for conservative thresholds to be applied when deciding which markers to include. In addition to thinning the populations, we only included SNPs with a minor allele frequency greater than 0.25 and less than 5% no calls. The “Suspect Linkages” tab in JoinMap4 (Van Ooijen, 2006) was utilized to identify SNPs with incorrect parental phase, which was subsequently corrected by inverting the genotype calls (A to b and B to a). These situations may have arisen due to a no call, low quality call, incorrect call, or monomorphic call for the parents and the marker is otherwise successful among the progeny. A monomorphic SNP among the parents that is polymorphic in the population can be attributed to the inability to genotype individuals genotypically identical to the true parents. After inversion, those SNPs for which the genotypes were inverted did not appear as “Suspect Linkages.” This genotype inversion was implemented on 159 of 333 SNPs for which the parental genotype was previously unknown. This ratio matches expectations that the parental phase was called correctly by chance for approximately 50% of the 333 SNPs. Supplemental Table S3 summarizes the results of these marker cleanup efforts. These improvements are reflected in marker order as indicated by improved colinearity when observing synteny with soybean (Fig. 3). However, this may be an incomplete comparison of map quality between the two consensus maps because other aspects besides data processing also were variable (number of mapping populations, mapped SNPs, etc.); nonetheless, improvements such as marker and bin density are also apparent.
This improved cowpea consensus map contains 1107 markers, a 19% increase in marker density compared to the previously reported 928 SNP consensus map (Muchero et al., 2009). Not only were 179 new markers mapped but the number of informative positions, bins, also increased. In total 211 bins, an increase of 33%, were added over all 11 VuLGs with a range of 12 bins on VuLG 4 and 34 bins on VuLG 3 and an average increase of 19 bins per linkage group. The average distance between bins was reduced from 1.05 to 0.79 cM. The total number of bins on a linkage map is a function of the number of individuals genotyped. By increasing the number of individuals the probability of observing unique meiotic crossover also increases. Traditionally the number of markers on a genetic map is the most popular statistic to report; however, when discussing map resolution bin statistics are more relevant. This map and the Muchero et al. (2009) map are somewhat unique when compared to other consensus genetic maps. Our approach was able to not only map expressed genes but also to be confident in their placement. Before the availability of high-throughput SNP genotyping typical consensus maps integrated different marker types (SSRs, RAPDs, RFLPs, AFLPs, and/or SNPs) even when only a few were shared among populations. This approach may yield more markers but the accuracy of marker order and distances, critical aspects of map quality, were compromised. Any one of the 13 populations used in the construction of this new cowpea consensus map shared on average 37% of its markers with any other map. This simplified approach to consensus mapping may yield more robust consensus maps when compared to maps that were constructed from only a few populations or those that used multiple marker systems.
Because they are derived from gene transcripts, the mapped SNPs can be used for explorations of structural and functional synteny across species. Ancestral soybean genome duplication is evident from the visualization of cowpea haplotype blocks being syntenic with more than one soybean chromosome (Supplemental Fig. S1). Chromosomal rearrangements were also observed between cowpea and soybean and can be easily visualized using the HarvEST:Cowpea comparative genome viewer (Wanamaker and Close, 2011) or the Circos (Krzywinski et al., 2009) diagrams included in Supplemental Fig. S1. Of the 1107 total mapped SNPs, 941 SNPs representing 85% of the genome had homeologs and exhibited synteny and colinearity with soybean. Vigna unguiculata linkage group 4 in Supplemental Fig. S1 provides an example of a common pattern observed when scanning synteny with soybean one linkage group at a time. Syntenic blocks between cowpea and soybean often span an entire linkage group of cowpea while in soybean the homeologous region is most often found on a single arm of a chromosome. This observation may indicate a similar mechanism of genome evolution to that which was recently described in grasses (Murat et al., 2011) where centromeric and/or telomeric recombination led to nested chromosome fusions and synteny break points. Current HarvEST:Cowpea resources indicate that from a genome perspective cowpea is more similar to soybean than to M. truncatula. However, the relationship is complex in that the ancestral genome of modern soybean underwent various modifications including duplication and diversification. As expected and previously observed (Muchero et al., 2009) A. thaliana–cowpea synteny is complicated by extensive chromosomal rearrangements; however, microsynteny with this model dicot is still informative. When comparing the current cowpea consensus map to the asparagus bean map (Xu et al., 2011) large scale genome conservation was observed. All 11 VuLGs were syntenic with this subspecies specific map and these relationships conveyed the very close evolutionary relatedness of the two subspecies.
The utility of this new cowpea consensus map complements ongoing genome sequencing and map-based cloning efforts. Whole genome sequence information regarding the context of these ESTs could provide insight into regulatory regions and splice junctions. In many genome sequencing projects the use of genetic maps is a popular tool for accurate assembly and genome finishing. This consensus map will help place sequence scaffolds to make a whole-genome assembly of cowpea. Analyses dependent on accurate and dense genetic maps including marker-assisted breeding, QTL analysis, and map-based cloning should consider map quality an important factor. In future work the utilities of Circos (Krzywinski et al., 2009) could be exploited to help infer an ancestral legume genome, which could promote discussion concerning the molecular mechanisms involved during the evolution of this important plant family.
Supplemental Information Available
Supplemental material is available free of charge at http://www.crops.org/publications/cs.