About Us | Help Videos | Contact Us | Subscriptions

The Plant Genome - Original Research

Sweet Sorghum Genetic Diversity and Association Mapping for Brix and Height


This article in TPG

  1. Vol. 2 No. 1, p. 48-62
    unlockOPEN ACCESS
    Received: Oct 5, 2008
    Accepted: Feb 12, 2009

Request Permissions

  1. Seth C. Murray,
  2. William L. Rooney,
  3. Martha T. Hamblin,
  4. Sharon E. Mitchell and
  5. Stephen Kresovich 
  1. S.C. Murray, M.T. Hamblin, S.E. Mitchell, and S. Kresovich, Institute for Genomic Diversity and Dep. of Plant Breeding and Genetics, Cornell Univ., Ithaca, NY 14853; W.L. Rooney, Dep. of Soil and Crop Sciences, Texas A&M Univ., College Station, TX 77843. S.C. Murray's present address: Dep. of Soil and Crop Sciences, Texas A&M Univ., College Station, TX 77843


Sweet sorghum [Sorghum bicolor (L.) Moench], like its close relative, sugarcane (Saccharum spp.), has been selected to accumulate high levels of edible sugars in the stem. Sweet sorghums are tall and produce high biomass in addition to sugar. Little has been documented about the genetic relationships and diversity within sweet sorghums and how sweet sorghums relate to grain sorghum racial types. In this study, a diverse panel of 125 sorghums (mostly sweet) was successfully genotyped with 47 simple sequence repeats (SSRs) and 322 single nucleotide polymorphisms (SNPs). Using both distance-based and model-based methods, we identified three main genetic groupings of sweet sorghums. Based on observed phenotypes and known origins we classified the three groups as historical and modern syrup, modern sugar/energy types, and amber types. Using SSR markers also scored in an available large grain sorghum germplasm panel, we found that these three sweet groupings clustered with kafir/bicolor, caudatum, and bicolor types, respectively. Using the information on population structure and relatedness, association mapping was performed for height and stem sugar (brix) traits. Three significant associations for height were detected. Two of these, on chromosomes 9 and 6, support published QTL studies. One significant association for brix, on chromosome 1, 12kb from a glucose-6-phosphate isomerase homolog, was detected.


    AFLP, Amplified fragment length polymorphism; BLAST, basic local alignment search tool; GLM, general linear model; HPLC, high performance liquid chromatography; LD, linkage disequilibrium; Mb, megabase; MLM, mixed linear model; NIRS, near-infrared spectroscopy; PC, principal coordinate; PCoA, principal coordinate analysis; QTL, quantitative trait locus (loci); RFLP, restriction fragment length polymorphism; SNP, single nucleotide polymorphism; SSR, simple sequence repeat

Sweet sorghums belong to the same domesticated species [Sorghum bicolor (L.) Moench] as grain, forage, and broomcorn sorghums but have been selected to accumulate high levels of sucrose in the parenchyma of juicy stems (Harlan and deWet, 1972; Vietor and Miller, 1990). Sweet sorghum sugar accumulation levels can be similar to that in sugarcane (Saccharum spp.), a close relative, though studies on enzymatic control and carbon transport suggest that the mechanism of accumulation is different (Lingle, 1987; Tarpley and Vietor, 2007). The stems of sweet sorghum are desired for food-grade syrup (stalks are pressed and juice is subsequently boiled) but also for fresh chewing and alcohol production in Brazil and India (House et al., 2000).

Recent demand for biofuel, in light of perceived Brazilian success with sugarcane, has caused a re-evaluation of sweet sorghums as a source of energy (Rooney et al., 2007; Vermerris et al., 2007). Up to 13.2 t/ha of total sugars, equivalent to 7682 L of ethanol per hectare can be produced by sweet sorghum under favorable conditions (Jackson et al., 1980). Sweet sorghum and other sugar crops have been researched for biofuel production in the U.S. for over 30 years (Lipinsky et al., 1977). Primary research, development, and breeding began in the late 1970s when the high cost of oil spurred interest in alternative energy sources. These investigations were ended by 1987 when petroleum costs had decreased (DOE-OSTI, 2008).

Sweet sorghums, also called sorgos, were originally brought to the U.S. as landraces from China (cv. Chinese Amber) and Africa (cvs. Orange, Sumac/Redtop, Gooseneck /Texas Seeded Ribbon Cane, Honey, White African, and others) via France in the 1850s for producing syrup (sirup) and forage (Winberry, 1980; Maunder, 2000). Many of these original sweet sorghum landraces continued to be selected by farmers regionally in the U.S. and were renamed. Other cultivars were introduced later: ‘Collier’ from South Africa, ‘McLean’ from Australia, and others with unknown origin such as ‘Folger,’ ‘Coleman,’ ‘Sugar Drip,’ and ‘Rex,’ referenced as early as 1923 (Sherwood, 1923; Vinall et al., 1936; Maunder, 2000).

Almost all sweet sorghum cultivars improved with modern methods were bred at the USDA-sponsored U.S. Sugar Crops Field Station in Meridian, MS, from the 1940s until it closed in 1983. The Meridian station used landraces for plant improvement and released improved syrup lines. A few lines were also selected for sugar production and energy (biomass tonnage) in collaboration with others across the U.S., notably Texas and Georgia. Of the syrup lines bred and released by the Meridian station, release notes suggest primary improvement was focused on improving disease resistance in high sugar lines. Disease can alter sorghum juice, reducing the desirability of syrup and contributing to lodging. Besides disease resistance, other selected traits include high brix (very few report stem sugar), low purity juicy stalks, high yields, stalk erectness, and good quality syrup.

The Meridian, MS, station additionally curated a “sweet sorghum world germplasm collection.” When it closed, materials were transferred to the USDA sorghum collection in Griffin, GA (Freeman, 1979; USDA-ARS, 2008). Many accessions from this collection, used in later breeding, were obtained in a 1945 collecting trip by Carl O. Grassl around the African center of sorghum domestication (Freeman, 1979). Six of these African landraces, specifically MN960, MN1048, MN1054, MN1056, MN1060, and MN1500 were used in the pedigrees of many U.S. released improved sweet sorghum lines (Table 1). This suggests that there may be a narrow genetic base for U.S. sweet sorghum cultivars resulting in close genetic relationships. If the genetic base is too narrow there may be difficulty in breeding from this material to develop energy types.

View Full Table | Close Full ViewTable 1.

Sweet sorghum panel cultivar names and associated information.

Name Full name Source Source 2 Type Parentage or place of origin Reference
7035S 7035S U PI 552851 ?
Atlas1 Atlas T ASA.61 HS
Atlas2 Atlas Sorgo T HS
Axtel Axtel T HS
Bailey Bailey K NSL 187557 MS Wiley, Tracy Duncan et al., 1984
Brandes Brandes T NSL 29336 MS Collier 706-C, MN1500 Coleman and Broadhead,1968
Brawley1 Brawley U PI 533998 MS Rex, White-seeded Collier USDA, 1958
Brawley2 Brawley T MS
CAmber1 Chinese Amber U PI 22913 A Maunder, 2000
CAmber2 Chinese Amber U PI 248298 A Maunder, 2000
CAmber3 Chinese Amber T ASA.45 A Maunder, 2000
Colier1 Collier U PI 19770 HS Maunder, 2000
Colier2 Collier T ASA.64 HS Maunder, 2000
Colier7 Collier 706C U PI 563032 HS Maunder, 2000
Colier3 Collier Meridian T HS Maunder, 2000
Colier4 Collier T PI 19770 HS Maunder, 2000
Colman1 Colman T ASA.52 HS Sherwood, 1923
Colman2 Colman (Young Meridian) T HS Sherwood, 1923
Cowley Cowley T MS Collier 706-C, MN1054, MN960, MN 1056, MN 1054, Early Folgers Hodo, MN 1060 Kresovich et al., 1985
CnAtlas Cunningham Atlas T HS
DkAmber Dakota Amber T ASA.48 A
Dale Dale K NSL 74333 MS Tracy, MN960 Broadhead et al., 1970
Danton Danton T ASA.65 HS
Della1 Della K MS BTx622, Dale Harrison and Miller, 1993
Della2 Della T MS BTx622, Dale Harrison and Miller, 1993
Della3 Della U PI 566819 MS BTx622, Dale Harrison and Miller, 1993
EFolger Early Folger T HS
EllisSo Ellis Sorgo T HS Leoti, Atlas Karper, 1949
Folger Folger T ASA.59 HS
Fremont Freemont Sorgo T Akron, Co HS
GaBlueR Georgia Blue Ribbon T HS Freeman et al., 1973
HoneyS1 Honey Sorghum U A Freeman et al., 1986
HoneyS2 Honey Sorghum T PI 181080 A aka MN2931
Iceberg Iceberg Sorgo T HS Orange type
KColier Kansas Collier T Anthony, Ks HS Maunder, 2000
KOrange Kansas Orange T ASA.51 HHS Maunder, 2000
Keller1 Keller K MS MER 50–1, Rio Broadhead et al., 1979
Keller2 Keller T MS MER 50–1, Rio Broadhead et al., 1981
Leoi Leoi U PI 154995 HS
Leoti Leoti T ASA.58 HS
M81E M81E K NSL 174431 MS Brawley, Rio Broadhead et al., 1981
McLeanS McLean (Starchy) T HS
McLeanW McLean (Waxy) T ASA.62 HS
MnAmber Minnesota Amber T ASA.46 A
Mn1054 MN 1054 U PI 152965 LMN Sudan Freeman, 1979
Mn1056 MN 1056 U PI 152967 LMN Sudan Freeman, 1979
Mn1060 MN 1060 U PI 152971 LMN Sudan Freeman, 1979
Mn1500 MN 1500 U PI 154844 LMN Uganda-aka Grassl Kresovich et al., 1988
Mn2812 MN 2812 U PI 167093 LMN Egypt/Turkey
Mn291 MN 291 U Grif 14968 LMN Extra Early Sumac
Mn3046 MN 3046 U PI 195754 LMN China
Mn3083 MN 3083 U PI 196586 LMN India/Taiwan
Mn410 MN 410 U PI 145619 LMN S. Africa
Mn4125 MN 4125 U PI 250583 LMN Egypt
Mn4466 MN 4466 U PI 255744 LMN Turkey, Taslik village
Mn822 MN 822 U PI 152694 LMN Kordofan, Sudan
Mn856 MN 856 U PI 152728 LMN Sudan
Mn960 MN 960 U PI 534165 LMN Sudan Freeman, 1979
N100 N100 T PI535785 MS Waconia, Wray Gorz et al., 1990
N108 N108 T PI535793 MS Saccharum Sorgo Gorz et al., 1990
N109 N109 T PI535794 MS White Collier, Grain Sorghum Line Gorz et al., 1990
N110 N110 T PI535795 MS Red X Gorz et al., 1990
N111 N111 T PI535796 MS Waconia Gorz et al., 1990
N98 N98 T PI535783 MS Rio, Waconia, Fremont, AN39, N4692 Gorz et al., 1990
N99 N99 T PI535784 MS Fremont, Theis Gorz et al., 1990
Orange1 Orange U PI 2363 HHS Maunder, 2000
Orange2 Orange U PI 533902 HHS aka MN 604 Maunder, 2000
Orange3 Orange T ASA.50 HHS Maunder, 2000
PI52606 PI52606 K PI52606 LMN
P526905 PI526905 K PI526905 L- zimB
P527045 PI527045 K PI527045 L- zimB
P550604 PI550604 K PI550604 ?
Ranchr1 Rancher 3 T Brookings, SD A Karper, 1949
Ranchr2 Rancher 3 T ASA.93 A Karper, 1949
RedAmbr Red Amber T ASA.49 A
RedTopT Red Top Tennesse T HS Winberry, 1980
Rex Rex U PI 534163 HS Sherwood, 1923
Rio1 Rio T MS Rex, MN 1048 Broadhead, 1972
Rio2 Rio T MS Rex, MN 1048 Coleman et al., 1965
RxOrng1 Rox Orange K HHS
RxOrng2 Rox Orange T HHS
WhitMam White Mammoth T G
Saccaln Saccaline T HS Vinall et al., 1936
Sapling Sapling T ASA.55 HS Vinall et al., 1936
Simon Simon K HS
Smith Smith U PI 511355 MS MN4004 (Grif 16302), MN 2754,Wiley, MN 48, MN 1056, others Kresovich and Broadhead, 1988
Sorgras Sorgrass U PI 563222 F
SucreDm Sucre Drome U PI 197542 LMN
SgrDrp1 Sugar Drip U PI 586435 HS Freeman et al., 1986
SgrDrp2 Sugar Drip U PI 146890 HS Freeman et al., 1986
SgrDrp3 Sugar Drip K HS Freeman et al., 1986
SgrDrp4 Sugar Drip T HS Freeman et al., 1986
SgrDrp5 Sugar Drip T Oklahoma A&M HS Freeman et al., 1986
SgrDrp6 Sugar Drip T Oklahoma A&M HS Freeman et al., 1986
Sumac1 Sumac U PI 63715 HHS Maunder, 2000
Sumac2 Sumac U PI 35038 HHS Maunder, 2000
Sumac3 Sumac U PI 534120 HHS Maunder, 2000
TxDblSw Texas Double Sweet K HS
Top76 Top 76–6 K PI 583832 MS Brandes, Collier 706-C, MN 1500, MN 1056 Day et al., 1995
Tracy Tracy T NSL 4029 MS White African, Sumac Stokes et al., 1953
Umbrela Umbrella K HS
WcAmber Waconia Amber T ASA.47 A Maunder, 2000
WxAtlas Waxy Atlas T HS
WhtAfr1 White African U PI 52606 G
WhtAfr2 White African T ASA.60 G
WhtAfr3 White African T Oklahoma A&M G
WileyRL Wiley R Line K HS Stokes et al., 1956
WileySo Wiley Sorgo T MS Collier, MN 822, MN 2046 Coleman et al., 1956
Wiliams Williams Sorgo T Ky. Certified MS Freeman et al., 1973
Wray Wray T MS Brawley, Rio, MN 856 Broadhead et al., 1978
BTx623 B.Tx623 T G BTx3197, SC170–6 Miller, 1976
BTx635 B.Tx635 T G Miller et al., 1992
BTx631 B.Tx631 T G Miller, 1986
BTx642 B.Tx642 T G SC35
P850029 P850029 T G
Macia Macia T G
Sureno Sureno T G S423,CS3541,E35 Meckenstock et al., 1993
ATx623 A.Tx623 T G
EA1083 SC599 T sc599 G IS17459C
EA1074 Rio 9188 T Rio 9188 G
EA1084 SC599–6-9188 T PI 593916 G
Forag41 T F
Forag73 T F TX631, TX2910
Ramada Ramada U NSL 107377 MS MER 45–45, MN 1056, MN 1054, MN1060 Freeman et al., 1974
Sart Sart U NSL 91616 MS Sudan Stokes et al., 1951
Abbreviated name used in later tables and figures.
K: University of Kentucky; T: Texas A&M University; U: USDA/ARS.
§USDA PI number or additional information to distinguish accessions.
A: amber; G: grain; HHS: historical sweet 1850s HS: historical sweet by 1923; MS: modern sweet; F: forage; ?: unknown or diverse.
#If known, parent lines for modern cultivars with pedigrees, place of origin for collected landrace material. Additional information in reference.

Although published pedigree information is available for some of the more recent sweet sorghum lines, the relationships with historic sweet cultivars and grain sorghums are poorly documented. A few genetic studies (Anas and Yoshida 2004, Casa et al., 2008) investigated grain sorghum germplasm panels that included some sweet sorghums. Further work by Seetharama et al. (1987) and Ritter et al. (2007) suggested that sweet sorghums are of polyphyletic origin, with relatives among kafir, caudatum, and other grain sorghum types.

Currently, there are no discrete objective criteria, such as a molecular marker or sugar concentration level, to differentiate sweet sorghums from grain sorghums. There are multiple generalized phenotypic differences: sweet sorghums are always tall, have high biomass and juicy stem [juicy versus dry stem is controlled by a major gene (Bennetzen et al., 2001)], and most importantly have high stem-sugar concentrations. Stem-sugar concentration may be quantitatively measured by high performance liquid chromatography (HPLC) or as brix, a measurement of soluble solids which in sorghums is mostly sucrose. Stem-sugar concentration inheritance is not simple; environment, genetic × environment interaction, and the genetic background (epistasis) all play a role. Within mapping populations, few QTL have been identified and they explain little variation given the moderate heritability (0.51 to 0.86) reported for the trait (Schlehuber, 1945; Clark, 1981; Natoli et al., 2002; Bian et al., 2006; Ritter, 2008; Murray et al., 2008a). In two different populations, Natoli et al. (2002) and Murray et al. (2008a), both identified the strongest QTL for stem sugar on chromosome 3, explaining 18, and 25% of the trait variance, respectively. Natoli et al. (2002), in an F2 population derived from a sweet sorghum × sweet sorghum cross, estimated the chromosome 3 QTL effect was 56% additive and 44% dominant. Murray et al. (2008a) used a recombinant inbred-line population derived from a sweet sorghum × grain sorghum cross, so only additive effects could be calculated. We chose to follow up the stem-sugar QTL on chromosome 3 as a candidate for association mapping in a diverse panel of sorghums.

Association mapping uses diverse material to associate genetic markers with a phenotype of interest, taking advantage of lower levels of linkage disequilibrium than are present in linkage populations. Association mapping has been used to identify genes of interest in many plant species with varying degrees of success (Wilson et al., 2004; Aranzana et al., 2005; Breseghello and Sorrells, 2006). In sorghum, a diverse grain sorghum germplasm panel for association mapping was previously reported by Casa et al. (2008). However, only eight of the 356 accessions could be considered “sweet sorghum” types. Though there likely was variation for brix, the panel was mostly dwarf grain sorghum uncharacteristic of tall and high-biomass sorghums of interest. We therefore assembled a panel that represents historically important U.S. sweet-sorghum cultivars, important sweet-landrace progenitors, and cultivars that would serve as non-sweet controls.

In this study we were interested in addressing three questions. (i) What are the genetic relationships among sweet sorghums in the United States? (ii) What are the genetic relationships among sweet and grain sorghums across grain racial classifications? (iii) Can we confirm the major QTL for total stem sugar (brix), or any of the QTL for height previously identified using association mapping?

Materials and Methods

Plant Material and Phenotypic Analysis

Two replicates of 125 diverse accessions were planted in College Station, Texas in 2006 (CS06) and 2007 (CS07), and one replicate was planted in Ithaca, NY in 2007 (ITH07). These accessions were primarily historical and modern sweet-sorghum cultivars, though grain, and forage sorghums were also included (Table 1). These accessions will subsequently be referred to as the “sweet sorghum panel.” Literature and the GRIN database (USDA-ARS, 2008) were used to identify cultivars as amber, historical sweet, modern sweet, modern sugar and energy, MN landraces (brought to Meridian, MS from Africa by C.O. Grassl), or grain types. We use the term “modern” to denote improved lines that have published pedigree information. Seed was obtained from a variety of sources for CS06 (Table 1), and seed bulked from self pollinated plants was planted for CS07 and ITH07. In CS06 and CS07, 3-m rows with 76 cm spacing (∼160,000 plants ha−1) were planted in a randomized complete block design. In ITH07 30 seeds were hand planted in 1.5-m rows with 76 cm spacing.

Some material was photoperiod sensitive and, depending on environment, there was a wide range for time of maturity. Plants were harvested when most accessions were in the soft-dough to hard-dough stage. By harvesting without regard to specific cultivar maturity we minimized the environmental effect, but likely caused biases in stem-sugar phenotypes due to flowering time, which peaks right before the hard dough stage (unpublished data). This would be expected to decrease our power but not create false positives. In each location, 1 m per row was harvested by cutting within 3 cm of the soil. Stems were separated from panicles and leaf tissue. Stem juice was extracted using a three roller mill. Brix was measured using a handheld refractometer. Measurements were collected on 1 m of row in CS06 and CS07. Measurements were collected from three random plants in ITH07. HPLC was performed according to Murray et al. (2008a). No HPLC analysis was performed for CS07 or ITH07. Plant height was averaged across each row from the soil to the top of the panicle for all three locations.

Genetic Analysis

Leaf tissues were collected from plants grown at the CS06 location. DNA was extracted from pooled tissue of five or more plants using a standard CTAB protocol (Doyle and Doyle, 1987). Forty-six polymorphic SSRs, used in the diverse association panel of Casa et al. (2008), were evaluated using the same equipment and published methods (Xcup19, Xtxp065, Xtxp287 were not included). One SSR, Xcup55, was not polymorphic in the sweet-sorghum panel and was excluded from further analysis, resulting in 45 SSRs shared with Casa et al. (2008). Two additional SSRs, Xtxp120 (Menz et al., 2002) and a new SSR were successfully added (Xcup75; primers sequences: TTGCTTCATTCAACGGGAATACA, TTCGATGCAGCGAGCTTTGG). An additional 384 SNP genotypes were collected using an Illumina Goldengate assay (Fan et al., 2006) at Cornell's Life Sciences Core Laboratories Center (Ithaca, NY) using recommended procedures (Illumina Inc., San Diego, CA). These 384 SNP assays were developed from SNPs discovered in previously published (Hamblin et al., 2004, 2005, 2006, 2007a) and unpublished [Murray, this study (sucrose pathways); Salas Fernandez et al., 2009 (carotenoid pathways)] resequencing studies, and were chosen both to provide genome-wide coverage and to survey variation in genes of interest. A total of 226 loci are represented in the panel, of which 39 loci are candidate genes; the remainder is distributed across all ten linkage groups. Genetically mapped loci were chosen from resequencing studies of unannotated restriction fragment length polymorphism (RFLP) probes (see Schloss et al., 2002). Supplemental Table 1 shows the GenBank accession numbers for reference sequences and map position, where available. Of the 384 Illumina SNP assays, 329 were successful, and 322 were polymorphic in the sweet-sorghum panel.

To identify candidate genes for brix, the major QTL for brix in a cross between a grain sorghum and a sweet sorghum from Murray et al. (2008a) was located on the sorghum genome sequence (Phytozome, http://www.phytozome.net/sorghum; verified 26 Jan. 2009) using BLAST analysis with sequence-based markers (Menz et al., 2002; Feltus et al., 2006). More than 100 starch and sucrose metabolism enzymes (Kanehisa et al., 2006) and sugar transport candidate genes from maize (Zea mays L.), sugarcane, tomato (Solanum lycopersicum L.), and rice (Oryza sativa L.) (NCBI, http://www.ncbi.nlm.nih.gov/; verified 26 Jan. 2009) were also placed on the sorghum genome using BLAST to identify co-localization with the chromosome 3 QTL. New SSRs within the chromosome 3 QTL were identified from Phytozome contig sequences using the program Tandem Repeats Finder (Benson, 1999). Primer 3 (Rozen and Skaletsky, 2000) was used to design all primer sequences. All sequencing was performed on sweet-sorghum cultivar Rio at Cornell University's Bioresource Center using a 3730 capillary sequencer. Trace files were investigated for polymorphisms between Rio and grain sorghum ‘BT×623’ in Sequencher 4.0 (Gene Codes Corp., Ann Arbor, MI).

Genetic Distance and Principal Coordinate Analysis

The program PowerMarker version 3.0 (Liu and Muse, 2005) was used to evaluate FST (Wright, 1965) and create genetic distance matrices (Nei, 1972). Distance matrices were double-centered, and used to obtain eigenvectors, which were plotted in NTSYS-pc Version 2.02 (Rohlf, 1990).

To compare sweet sorghums with the larger sorghum panel of Casa et al. (2008), Nei's 1972 genetic distance matrix was created in PowerMarker using the polymorphic SSRs that had been scored in all accessions in both studies. Eigenvectors were obtained implementing the cmdscale function (eig = TRUE) and then plotted using R (R Development Core Team, 2005). R cmdscale was used rather than NTSYS-pc for this analysis because the data set was so large. Using smaller test data sets, the two principal coordinate analyses (PCoA) gave identical results (Gower, 1966).

Population Structure, Relatedness, and Association Mapping

To minimize false positives in association mapping it is important to control for population structure and relatedness (Falush et al., 2003; Yu et al., 2006). Three programs were used to estimate the number of populations and assign cultivars’ membership in them: Structure, version 2.1 (Pritchard et al., 2000), InStruct (Gao et al., 2007), and NTSYS-pc. Because population structure estimates assume unlinked markers, SNP assays from the same physical locus were converted into 208 haplotypic loci. Phase ambiguities were called as missing alleles and loci with more than 20% missing alleles were eliminated. Excluding brix candidate gene markers on chromosome 3, and including SSRs, a total of 241 markers were used. In both Structure and InStruct, five independent runs having 5 × 105 burn-in and sampling iterations were conducted allowing k (number of populations) to vary between 1 and 15. For Structure, the ancestry model allowed for population admixture and correlated allele frequencies. For Instruct, population structure and individual selfing rates were inferred. Optimal k was identified using the marginal improvements in estimated logarithm of the likelihood of the data, greater than 0.5 posterior population assignment probability, and on consistency of the five independent runs. k was additionally inferred using the DIC criterion in InStruct. Once k had been determined for both Structure and InStruct, a run of 5 × 106 burn-in and sampling iterations were used. PCoA eigenvectors from haplotypes were also used as population assignments.

Using the package SPAGeDi 1.2 (Hardy and Vekemans, 2002), a kinship coefficient estimation matrix was created according to J. Nason (described in Loiselle et al., 1995). Association mapping was performed using the GLM and MLM procedure in TASSEL (Bradbury et al., 2007). Six Q (population structure) matrices, with different numbers of populations, were separately tested for model percent variation explained of brix and height phenotypes. Positive tests were reported using a significance threshold of p < 1.3 × 10−4, based on a stringent Bonferonni correction of 0.05 divided by 369 tests.


Genetic Analysis

Between all pair-wise comparisons of SNPs from different loci, linkage disequilibrium (LD) was minimal (Supplemental Fig. 1) in this panel, as expected with this low density of markers. Perfect LD (r2 = 1) was observed between at least two SNPs within each of four genes (SB00037, SB00076, SB00114, SB00130) and between two other pairs of SNPs (SB00124 and SB00027; SB00076 and SB00103) due to close physical distance.

Seventy-seven of the 125 cultivars were heterozygous or heterogeneous at one or more marker loci. Two known to be F1 forage hybrids segregated at the most marker loci, 41% (Forage 73) and 37% (Forage 41). MN landraces as a group averaged 22% heterozygous markers, with only MN960 having no heterozygous marker loci and Mn1054 having the most (37%). Departure from 1:1 ratios of alleles in some SNP assay results suggested that levels of heterozygosity were increased by pooling tissue from multiple individuals within cultivars, as landraces are often heterogeneous.

Cultivars in the sweet sorghum panel with identical names but different seed sources all had at least one genetic polymorphism (Table 2). With Sugar Drip, of the loci that differed, almost every possible combination of allele sharing across the six lines was observed. A few cultivars had very different names but identical genotypes potentially due to human error. ‘N110’ and ‘Sugar Drip 4’ were found to be exactly identical except for one locus with missing data. ‘Rox’ ‘Orange 2,’ ‘Saccaline,’ and ‘Sapling’ were also genetically identical. The phenotypes of these cultivars were very similar, so it appears possible the seed unintentionally came from the same source in error for the CS06 planting.

View Full Table | Close Full ViewTable 2.

Polymorphism within accessions with shared names.

Cultivar Accessions Shared alleles at 369 markers
Rio 2 359
Della 3 286
White African 3 282
Chinese Amber 3 194
Sumac 3 183
Orange 3 150
Sugar Drip 6 157

PCoA Relatedness

To identify accessions for use in breeding, it is useful to understand the relationships within the sweet sorghums and between sweet sorghum and grain sorghum's racial types. Genetic relationships were most easily seen by plotting the first two PCoA eigenvectors generated with the full SSR and SNP data set (Fig. 1). Three separate groups were observed and delineated based on historical references and breeding objectives. These three groups included a tight cluster of historical and modern syrup cultivars, modern sugar and energy sorghums with MN landraces, and amber types, which were the most diverse. Grain sorghums did not cluster in any one group. The first 12 PCoA eigenvectors explained 35.7, 21.4, 7.2, 6.3, 5.3, 4.4, 4.3, 3.6, 3.2, 3.1, 2.6, and 2.4% of the variation, respectively, totaling more than 100% due to model overfitting. The same three clusters seen in Fig. 1 were also observed when using only SNPs or only SSRs, though a few individuals did shift groups (data not shown). No clear relationships were observed when additional eigenvectors were plotted (data not shown).

Figure 1.
Figure 1.

PCoA eigenvector plot of sweet sorghum panel genetic similarity Nei's (1972) genetic distance was calculated from 47 SSRs and 318 SNPs.


To objectively assess sweet sorghum genetic relatedness to grain sorghum racial groups, PCoA analysis of SSR genotypes was used to compare the sweet sorghum panel to Casa et al.’s (2008) pure racial group (138 accessions, Supplemental Fig. 2). Comparing these two panels, the sweet sorghum historical and modern syrup group appeared most similar to kafir and to a lesser extent to bicolor. The modern sugar and energy sweet sorghum group appeared most similar to caudatum and possibly guinea types. The amber sweet sorghum group looked most similar to bicolor racial types but was more divergent than most of the material in the Casa et al. (2008) panel. The sweet panel had little material that was similar to durra types.

Candidate Gene Identification and Sequencing

The primary brix QTL identified in a cross between Rio and BT×623 (Murray et al., 2008a) was localized to a 15Mb sorghum super contig (Phytozome). A sorghum homolog to maize shrunken2—the large subunit of ADP-glucose pyrophosphorylase (Hamblin et al., 2007a), and a rice hypothetical monosaccharide transporter (NM_001053738) (NCBI) were the only sugar metabolism genes found to align to this Phytozome contig. Furthermore, these sequences were both located in a 2 Mb region flanked by the SSR marker bordering the QTL on the left, and an SSR marker close to the 2LOD peak border on the right (Supplemental Fig. 3). The full-length genes (as annotated), the 5′ and 3′ ends, and genetically close non-coding sequence were sequenced in Rio (a total of ∼20,000bp) and no polymorphisms with BT×623 genome sequence were observed. We then identified nine SSRs spaced through the 2 Mb interval. Only one out of the nine was found to be polymorphic between Rio and BT×623. This marker was included in all analyses (Xcup75).

Phenotypic Analysis

Brix and height values were recorded in three locations. For the sweet-sorghum panel in CS06, brix and HPLC-measured stem sugar had good correlation (r = 0.73, p > 2.2e−16), with outliers caused by bacterial degradation in HPLC samples. Height and brix were positively correlated across locations (Fig. 2). Height had higher correlations within and across locations than brix in this panel. For brix, ITH07 was more similar to CS06 than to CS07. ITH07 did not correlate well with CS locations for height, due to photoperiod sensitivity which delayed flowering in some cultivars.

Figure 2.
Figure 2.

Relationship within and between brix and height across three locations. Units are degree brix for brix and cm for height measured in Ithaca 2007 (ITH07), College Station 2006 (CS06) and 2007 (CS07). Trait names are presented in the center diagonal with a histogram of replicate mean values. Plot regression lines showing positive relationships were observed between all sets of measurements but correlation was best within height, specifically between CS06 and CS07.


Population Structure and Association Mapping

To control for false positives in association mapping, Q (population structure) and K (kinship) matrices were first constructed (Yu et al., 2006). K is unrelated to k, the number of populations used in the model for Q. Six separate Q matrices were calculated using the two most likely population assignments in each of three programs, InStruct, Structure, and NTSYS-pc. InStruct results suggested five or eleven populations were likely with little posterior probability increase after eleven (Fig. 3). InStruct DIC criteria also found eleven populations to be most probable. Structure results suggested either four or eleven populations as most probable. Structure posterior probability continued to increase marginally past eleven populations, but consistency of runs and population assignment decreased. Because the posterior probability is calculated differently in Structure and InStruct, these cannot be directly compared (H. Gao, personal communication, 2008). Using haplotypes for PCoA resulted in eigenvectors very similar to those obtained using individual markers in Fig. 1.

Figure 3.
Figure 3.

Results of population structure analysis using InStruct, Structure, and PCoA. Using haplotypes created from markers linked at the same locus, each method was run five times to develop population assignment vectors.


Association Analysis

Association mapping was performed for brix and height using the GLM procedure in TASSEL (Bradbury et al., 2007). Of the six Q matrices tested, models with 11 populations as inferred by InStruct and Structure explained the highest percent variation (Table 3). Models based on the smaller number of populations inferred by InStruct (k = 5) and Structure (k = 4) decreased the percent variation explained; the model with k = 4 also had a larger number of positive tests. Models using PCoA eigenvectors explained more variation than those with no Q matrix but much less than models based on Structure and InStruct analyses.

View Full Table | Close Full ViewTable 3.

Variation explained by models with population structure (Q matrix) and/or kinship (K matrix) for brix and height in the sweet sorghum panel using TASSEL.

Q matrix Number of populations (k) R2 model
Brix Height
 InStruct 11 0.39 0.28
 InStruct 5 0.28 0.13
 Structure 11 0.39 0.30
 Structure 4 0.20 0.07
 PCoA 12 0.09 0.14
 PCoA 5 0.04 0.06
 None 0 0 0
 InStruct + K 11 0.45 0.54
 InStruct + K 5 0.41 0.49
 Structure + K 11 0.49 0.55
 Structure + K 4 0.40 0.50
 PCoA + K 12 0.39 0.55
 PCoA + K 5 0.39 0.55
 None + K 0 0.37 0.48
General linear model.
Mixed linear model includes kinship matrix (K).

The MLM model, which included the kinship matrix, K, explained more variation than with Q alone. With MLM, results were nearly identical even if no Q matrix was added.

Using MLM with a Bonferroni corrected cutoff of 0.05 (1.3 × 10−4), five significant associations were detected for height, and one was detected for brix (Table 4). One marker, SB00016.1, was most significant for height and nearly significant for brix. For brix the only significant marker was SB00166.1.

View Full Table | Close Full ViewTable 4.

Markers with a significant p-value at 0.001 or highest FST in each category.


FST of Populations and Markers

Wright's (1965) classical FST (θ) was used to evaluate genetic differentiation between populations in the panel (Table 4). Four separate methods were used for dividing the material into populations to address different biological questions.

  • 1) Based on the a priori expectation of sorghum types [Table 1 (amber, historical syrup, grain, diverse)]. FST averaged 0.14 across loci (range: −0.04 to 0.47; negative FST values are likely due to imprecision in the estimation and should be interpreted as no genetic differentiation). Markers with high FST would be useful for distinguishing these a priori groups and might also be linked to traits important within only one population.

  • 2) Using the three groups identified in PCoA analysis (Fig. 1). FST averaged 0.26 (range: −0.02 to 0.77). Markers had higher FST than our a priori division. Markers with the highest FST would be useful for assigning germplasm with unknown background to these groups.

  • 3) Using a grouping based on brix. Cultivars in the top half highest brix in CS06, CS07, and ITH07 were in Population 3, cultivars in the bottom half for all locations were in Population 0. FST averaged 0.03 (range: −0.03 to 0.19).

  • 4) Using the number of times a cultivar was in the top half of average height for a location, similar to divisions for brix. FST averaged 0.02 (range: −0.04 to 0.23). Markers with high FST when separated by brix and height may be linked to the phenotype of interest, and useful for characterizing different germplasms.

Relationships between these estimates of FST and association results may suggest incomplete correction. Markers with high FST did not have significant associations with traits, except in the case of SB00016.1.


From historical publications on sweet sorghum, it initially seemed likely that sweet types might be closely related to each other and distant from grain sorghums. Two recent publications have suggested otherwise. Casa et al. (2008), using 377 diverse sorghums including eight sweet cultivars, found that while a few sweet sorghums clustered together they were generally as diverse as grain sorghums. (A. Casa, personal communication, 2007). This finding was supported by Ritter et al. (2007) who, using amplified fragment length polymorphism (AFLPs), showed that 31 sweet sorghums clustered within three of the five clusters containing 64 diverse grain sorghums.

Harlan and deWet (1972) and others have classified sorghums into five major races: bicolor, caudatum, durra, guinea, and kafir. These divisions are mostly based on panicle and grain characteristics as well as the regions of Africa and India where the races are commonly found. Sweet sorghums have not been bred for panicle or grain characteristics, and the referenced origins of sweet sorghum provide little insight. Therefore, the relationship of sweet sorghum to the traditional classification of major sorghum races was inconsistent.

Our study, like that of Ritter et al. (2007), identified three separate groups of sweet sorghum which often are classified together. We classify these major types as syrup (historical and some modern), modern sugar and energy types with associated landrace parents, and amber types. These divisions were supported by PCoA, measures of FST, phenotypic observations, and structure analysis. Structure analysis and association results suggested that, within these three sweet sorghum groups, as many as eight additional subpopulation divisions exist (Supplemental Table 2). Population structure analysis is somewhat subjective and depends on the criteria used and the germplasm evaluated. Although InStruct and Structure assigned these subpopulations similarly, we did not observe a historical or biological basis for this further subdivision excepted where noted below.

Historical and Modern Syrup

Within the sweet sorghum panel, the historical and modern syrup population had the best representation but the least diversity. Among sweet sorghums cultivars the historical cultivars are best known, and the modern cultivars are some of the most common for syrup, Orange, Sumac, White African, Collier, Sugar Drip, ‘N98’ through N110, ‘Della,’ and ‘Bailey.’ Phenotypically, this material generally had straight, tall, very juicy, medium-large diameter stalks. Across the cultivars the juice had high average brix, but lower than the sorghums developed for sugar production. Two of the sorghums developed for sugar and having very high brix, ‘Keller,’ and ‘Wray’ were near classification in this group based on PCoA. The clustering of the syrup types reflects selections from historical material and shared pedigrees from syrup × syrup crosses. Furthermore, cultivar release notes show that most modern syrup sorghums were developed within the Meridian, MS breeding program. InStruct and Structure divided this population into 4 subpopulations of 19, 18, 14, and 12 individuals (Supplemental Table 2). An interesting case is Sugar Drip, which is divided into two groups. Based on polymorphism data Sugar Drip was likely heterozygous at many loci, which became fixed as different sets of seeds were isolated and maintained separately.

Sugar and Energy

Modern sweet sorghum cultivars for sugar and energy production such as Rio, ‘Ramada,’ ‘Top76-6,’ and ‘M81E’ tended to cluster together with MN landrace cultivars. Most MN landraces in the panel were specifically chosen because they were in the pedigrees of modern sweet sorghum cultivars. These MN cultivars were also from the center of sorghum domestication around Sudan, Ethiopia, and Uganda. This population was very diverse for brix and height. Nearly all of the cultivars were photoperiod sensitive, and had very thick stalks, some with hard rinds like sugarcane. The modern sugar and energy cultivars had very high brix while the MN landrace progenitors did not. Many of these cultivars, especially MN1500, produced very high biomass. We initially believed that MN1500 was ‘Grassl,’ a cultivar selected from MN1500, but the high heterozygosity suggested that it is likely the landrace MN1500 and that seed for Grassl are no longer available. In contrast to the expectation that the sweet sorghums derived from MN cultivars would have a narrow genetic base, the heterogeneity in these landraces likely contributed to the diversity seen in the modern cultivars. Population analyses (Supplemental Table 2) further divided this population into groups of 24 (most sugar energy and MN cultivars), nine (Rio, Keller, Wray), and six (grain and forage).


Amber and honey sorghums were very distinct from the other two populations but were also very diverse within the population. The weak clustering of amber may be partially the result of a limited number of cultivars being included in this study. Amber sorghums are not included in published pedigrees of modern sweet sorghum but were among the earliest sweet sorghums introduced to the U.S. Unlike most sweet sorghums, amber types tended to senesce in CS06 and CS07 locations, but did not in ITH07. Possibly as a result, amber cultivars had relatively higher brix in ITH07 than in either CS06 or CS07. Amber types among the sweet sorghums also had the least consistency of brix between environments with cultivars having a high brix in only one location. This is why no amber cultivars were identified as top sugar producers. Structure and InStruct (Supplemental Table 2) divided the ambers into subpopulations of 12 (all but one cultivar with amber in the name and ‘Sucre Drome’), six (Honey, ‘7035S’), and three (grain sorghums). PCoA suggested that Honey sorghums were most like race Durra, suggesting geographic genetic relationships, since Honey accessions and Durra are both from India. The amber population also had some of the most unusual cultivars, e.g., 7035S was the tallest cultivar in the panel, had a very large stalk, and was the only cultivar not to tiller at all and to senesce before it flowered in CS06. Sucre Drome was an interesting cultivar in this panel because it was the only one with a “dry” stalk, carrying a dominant gene that reduced stem moisture by 50% of the panel average and may be useful for cellulosic biofuel.

Sweet and Grain Sorghum Comparison

PCoA was useful to visualize genetic distances between sorghum races, between our sweet sorghum panel and the panel of Casa et al. (2008), and between individuals. Using PCoA, races tended to cluster together but were not distinctly separated as observed in rice, or maize (rice—Thomson et al., 2007; maize—Liu et al., 2003; Warburton et al., 2008; Hamblin et al., 2007b). Rio and BT×623 appeared to be closely related, and both were fairly distant from much of the other material. This suggests that variation found in the bi-parental population investigated in Murray et al. (2008a, 2008b) was more likely to be functional and not confounded by extreme divergence of genetic backgrounds.

The relationships in the sweet sorghum panel using only SSRs appeared to be similar to what was seen when the 322 SNPs were also included. In contrast, the PCoA eigenvectors explained less genetic variation using only SSRs. This discrepancy likely resulted from more rare alleles per locus, fewer loci, and a larger and more diverse germplasm set. From the combined data sets it appeared that the syrup sweet sorghums clustered best with kafirs, and modern sugar energy sorghums and the landraces cluster best with caudatums. Amber types appeared to be poorly represented in the panel of Casa et al. (2008) but clustered most like bicolor types. In general, the SSR PCoA shows that the panels are structured very differently, the sweet sorghum panel has greater diversity from amber types, the panel of Casa et al. (2008) has much more diversity from durra and caudatum types.

Population Structure and Relatedness in the Sweet Sorghum Panel

We attempted three separate methods for population assignment of cultivars, Stucture, InStruct, and PCoA. Though they use different algorithms for calculation, all three methods suggested that three populations were an absolute minimum, and both four to five and 11 to 12 populations met our selection criteria. Though Structure is widely used for identifying population structure, the program was developed for natural outcrossing populations. The sweet sorghum panel violates Structure's assumption of Hardy-Weinberg equilibrium and many lines share close pedigrees. InStruct, based on Structure, is a more valid method for a self-pollinated domesticated crop such as sorghum, because it relaxes the assumption of Hardy-Weinberg equilibrium (Gao et al., 2007). It was therefore surprising that Structure and InStruct resulted in nearly identical conclusions in this study. Finally, principal component analysis has been proposed to correct for population structure (Price et al., 2006) and similarly PCoA has been used in association mapping by Cockram et al. (2008). PCoA explained far more variation in this study than in Cockram et al., but the results of this approach were still disappointing in controlling for population structure.

Two main problems with population structure estimates are that they are subjective, on the basis of selection criteria, and they reduce very complex relationships into only a few numbers for population assignment. Thus, it is difficult to completely correct for genetic relationships using structure alone. From our results and model fit, it appears that using the kinship matrix (K matrix; Yu et al., 2006) better controlled for relatedness than any measure of population structure (Q matrix). In fact, we had better fit and fewer positive tests using K without Q than with any Q alone. It seems likely that this will be true for most bred material where admixed diverse crosses are routine, and closely related material has been selected.

Brix and Height QTL Association

The Sorghum bicolor genome is estimated to contain 811Mbp of DNA (Price et al., 2005). With 369 markers, the coverage in this study averaged one marker in 2.2Mbps. Although sorghum has much greater LD than maize, extending from a few kb to over 35kb, on the basis of the results of Hamblin et al. (2005) we would need at least 55,000 polymorphic markers for a saturated whole genome scan. However, LD is expected to vary greatly across genomic regions and different germplasms investigated. On the basis of the pairwise linkage disequilibrium between markers (Supplemental Fig. 1) the linkage blocks were not saturated in this population.

Given the average extent of LD in sorghum (Hamblin et al., 2005), it is unlikely that any marker locus tested was a causal polymorphism for phenotypic variation, but instead likely linked to the causal polymorphism. Two of the five positive height associations Xgap72, and Xtxp265, were on the same chromosome about 10Mb apart. QTL for height and/or flowering time have been found in this location on chromosome 6, corresponding to the photoperiod sensitivity gene ma1 (Lin et al., 1995; Rami et al., 1998; Brown et al., 2006; Murray et al., 2008a). This gene has undergone extremely strong selection for temperate adaptation in sorghum and detection over a long physical distance was not surprising.

The most significant QTL in this study was found on chromosome 9 for height. QTL for height in this location have been detected both by QTL linkage analysis (Pereira and Lee, 1995; Lin et al., 1995; Murray et al., 2008a) and by association analysis (Brown et al., 2008). Association analysis in the panel of Casa et al. (2008) detected a peak approximately 400kb away, with significant locus associations on both sides of the marker (SB00016.1) used in this study (Brown et al., 2008). This locus would also be expected to have long range LD given the strength of selection in sorghum for height.

The only significant association for brix, on chromosome 1, has also not been previously reported in linkage mapping studies. However, Murray et al. (2008a) did detect a QTL peak near this region in one location (the closest marker was Txp482, 5Mb away). This peak explained up to 9% of the variation for brix and sugar, but was slightly below the stringent threshold for significance (unpublished data). On the physical genome sequence, a sorghum homolog to glucose-6-phosphate isomerase (EC is located ∼12kb away, the third closest predicted gene. Although this enzyme has not previously been implicated in stem sugar accumulation, it is known to convert D-glucose 6-phosphate into D-fructose 6-phosphate, both of which are important for synthesizing sucrose (Kanehisa et al., 2006).

We also attempted to identify additional markers for association mapping to support a QTL for Brix on chromosome 3 detected by Natoli et al. (2002) and Murray et al. (2008a), but were unsuccessful. Furthermore, association analysis using three SSRs and one SNP in this region did not detect any significant associations.

Implications for Germplasm Collection, Conservation, and Breeding

The results of this analysis suggest that for genetic studies, and/or core collection development, as few as five cultivars from the sweet sorghum panel could be selected to represent 90% of the SNP alleles identified. Thus, within the sweet sorghum panel, many of the accessions could be considered redundant for germplasm conservation, especially in the population of syrup cultivars. These differences reflect close pedigrees with similar parentage.

To identify the most informative markers to differentiate the three main groupings, FST for each marker was calculated between populations defined on the basis of PCoA. A few of the markers having high FST (PCoA column in Table 4) could be applied to identify a population for sweet sorghums not included in this panel.

The diversity partitioned within sweet sorghum and between sweet and grain sorghum has implications for how this germplasm should be maintained. An interesting observation regarding same named accessions, the six Sugar Drips for example, is that older cultivars were more diverse than the newer ones. There are two obvious explanations, residual heterozygosity would be greater for landraces than for elite cultivars, and over time more outcrossing is likely to occur. Inexpensive DNA markers may make testing easy, but it may be prudent, to reduce redundancy in core collections that duplicates of modern named materials should be removed before historical landraces.

For crop improvement, understanding the diversity present within the three identified groupings and their subgroupings is important. For breeding of syrup cultivars, a larger and less diverse selection of elite material from the modern syrup cultivars would be most useful. For breeding energy types for biofuel (lignocellulose and sugar), further selections from within the sugar and energy population and hybrids across groupings would be most appropriate.


We have identified three major groupings within sweet sorghum, each with multiple subgroupings. This information is beneficial for understanding the origin of sweet sorghums and to identify material for further improvement. These groupings showed some clustering similar to racial types within grain sorghums, but sweet and grain sorghums remain distinct in phenotype and origin. We have identified a marker with significant association for brix and identified a nearby candidate gene, glucose-6-phosphate isomerase, to be tested in the future. Future work within and across these populations may enable molecular cloning of genes responsible for stem-sugar accumulation in sorghum. Understanding the genetic basis for variation in stem sugar may ultimately allow genetic improvement of relatives with more complex genomes such as sugarcane, maize, switchgrass, and miscanthus.


We wish to thank Morris Bitzer and the USDA for seeds of some cultivars. We wish to thank Wenyan Zhu and Charlotte Acharya who provided valuable assistance in DNA extraction and genotyping. We also thank Karen Prihoda, Delroy Collins, Stephen Labar, Dustin Borden, and the field crews in the Texas A&M sorghum breeding program. Zhiwu Zhang, Patrick Brown, Peter Bradbury, and Michael Gore gave helpful suggestions for TASSEL analysis. We thank referees for helpful comments. Funding was partially provided by the USDA/DOE GTL program. InStruct analysis was performed on a cluster of the Computational Biology Service Unit from Cornell University which is partially funded by Microsoft Corporation.




  • All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher.


Be the first to comment.

Please log in to post a comment.
*Society members, certified professionals, and authors are permitted to comment.