About Us | Help Videos | Contact Us | Subscriptions

The Plant Genome - Article



This article in TPG

  1. Vol. 4 No. 1, p. 36-46
    Received: Nov 29, 2010
    Published: Mar, 2011



Use of Non-Normalized, Non-Amplified cDNA for 454-Based RNA Sequencing of Fleshy Melon Fruit

  1. Vitaly Portnoy,
  2. Alex Diber,
  3. Sarah Pollock,
  4. Hagai Karchi,
  5. Shery Lev,
  6. Galil Tzuri,
  7. Rotem Harel-Beja,
  8. Relly Forer,
  9. Vitaly H. Portnoy,
  10. Efraim Lewinsohn,
  11. Yaakov Tadmor,
  12. Joseph Burger,
  13. Arthur Schaffer and
  14. Nurit Katzir 
  1. V. Portnoy, S. Lev, G. Tzuri, R. Harel-Beja, E. Lewinsohn, Y. Tadmor, J. Burger, and N. Katzir, Dep. of Vegetable Research, Agricultural Research Organization, Newe Ya'ar Research Center, P.O. Box 1021, Ramat Yishay 30095, Israel; A. Diber, S. Pollock, and H. Karchi, Evogene Ltd., P.O. Box 2100, Rehovot 76121, Israel; R. Forer, DYN Diagnostics Ltd., P.O. Box 3063, Caesarea Industrial Park 38900, Israel; V.H. Portnoy, Dep. of Statistics, The Hebrew Univ., Jerusalem, Israel; A. Schaffer, Dep. of Vegetable Research, Agricultural Research Organization, Volcani Research Center, P.O. Box 6, Bet Dagan, 50250, Israel.


The melon (Cucumis melo L.) fruit is an important crop and model system for the genomic study of both fleshy fruit development and the Cucurbitaceae family. To obtain an accurate representation of the melon fruit transcriptome based on expressed sequence tag (EST) abundance in 454-pyrosequencing data, we prepared double-stranded complementary DNA (cDNA) of melon without the usual amplification and normalization steps. A purification step was also included to eliminate small fragments. Complementary DNAs were obtained from 14 individual fruit libraries derived from two genotypes, separated into flesh and peel tissues, and sampled throughout fruit development. Pyrosequencing was performed using Genome Sequencer FLX (GS FLX) technology, resulting in 1,215,359 reads, with mean length of >200 nucleotides. The global digital expression data was validated by comparative reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR) of 40 selected genes and expression patterns were similar for the two methods. The results indicate that high-quality, nonbiased cDNA for next-generation sequencing can be prepared from mature, fleshy fruit, which are notorious for difficulties in ribonucleic acid (RNA) preparation.


    40R, 40-d rind; cDNA, complementary DNA; DAA, days after anthesis; DEPC, diethylpyrocarbonate; EDTA, ethylenediaminetetraacetic acid; EST, expressed sequence tag; GS FLX, Genome Sequencer FLX; ICuGI, International Cucurbit Genomics Initiative; mRNA, messenger RNA; RNA, ribonucleic acid; RNA-Seq, ribonucleic acid sequencing; RT-qPCR, reverse transcription quantitative real-time polymerase chain reaction; SDS, sodium dodecyl sulfate

Melon (Cucumis melo L.) is an important crop species that is cultivated worldwide. The species is highly polymorphic: melon fruits vary markedly in shape, size, rind form, firmness, color, ripening behavior, and flavor. Fruits of wild and cultivated genotypes accumulate different levels of soluble sugars, organic acids, pigments, and aroma volatiles (Burger et al., 2006, 2009; Lester et al., 2001; Morales et al., 2004; Perin et al., 2002; Portnoy et al., 2008) affecting fruit quality through complex networks of metabolic pathways that are active during fruit ripening. Genomic and transcriptomic knowledge is currently limited for melon fruit despite its potential as an alternative model to tomato (Solanum lycopersicum L.) for fleshy fruit ripening (Ezura and Owino, 2008; Giovannoni, 2007). The International Cucurbit Genomics Initiative (ICuGI) database (available at http://www.icugi.org [verified 30 Dec. 2010]) currently includes circa 127,000 expressed sequence tags (ESTs), nearly a third of which have been derived from fruit libraries. The goal of this work was to enhance the study of melon fruit transcriptomes using a deep sequencing method, which can serve for the discovery of genes associated with major fruit metabolic pathways as well as those involved in fruit development and ripening.

Transcriptome analysis based on deep sequencing has been reported in a growing number of studies in recent years (e.g., Alagna et al., 2009; Ando and Grumet, 2010; Barakat et al., 2009; Emrich et al., 2007; Folta et al., 2010; Guo et al., 2010; Jones-Rhoades et al., 2007; Wang et al., 2009a; Weber et al., 2007). The massive parallel sequencing of complementary DNA (cDNA), utilizing Roche 454-pyrosequencing technology based on sequencing-by-synthesis (Margulies et al., 2005), has been the most extensively reported approach, especially for organisms whose genome is not yet available, although more recently alternative technologies of Illumina and SOLiD have been used (Metzker, 2010; Wang et al., 2009b). Nevertheless, 454 pyrosequencing is still advantageous for the study of organisms whose genome is not available, such as melon (Vera et al., 2008; Wang et al., 2009b). The high-throughput generation of massive sequence collections representing gene expression enables various applications, including gene or marker discovery and comparisons of genome-wide expression patterns. Measuring gene expression accurately is not a trivial matter and, although advanced methodologies are currently available, they each have their drawbacks (Metzker, 2010; Meyers et al., 2004; Wang et al., 2009b). The comparison of transcript counts among various genotypes, tissues, and time points reflects relative gene expression with statistical significance that increases with increasing number of transcripts compared (Audic and Claverie, 1997; Torres et al., 2008; Weber et al., 2007). Due to the large number of reads obtained by this technology, 454 pyrosequencing provides highly significant estimates of relative gene expression (Mane et al., 2009). Nevertheless, the precision of the approach still has not fully been accepted. Recently, a growing number of studies have addressed the issue of measuring transcriptome composition by counts of reads obtained by high-throughput sequencing of complementary DNA, also termed ribonucleic acid sequencing (RNA-Seq) (Nagalakshmi et al., 2008; Wang et al., 2009b), an approach that is widely applied (Marioni et al., 2008; Mortazavi et al., 2008; Nagalakshmi et al., 2008; Wang et al., 2009b; Wilhelm and Landry, 2009; Wilhelm et al., 2008).

One major factor determining the precision of transcriptome analysis is the quality of the cDNA. Most available protocols for cDNA library preparation (Wang et al., 2009b; Ying, 2004) include an amplification step aimed at obtaining the required quantities of cDNA even when the source tissue is limited. However, even a single round of amplification can distort the representation of messenger RNA (mRNA) in a cDNA library (Metzker, 2010; Sambrook and Russell, 2001; Wang et al., 2009b) and primary, nonamplified libraries have been recently suggested for studies of transcript abundance (Metzker, 2010; Wang et al., 2009b). Moreover, several of the protocols include a normalization step to detect rare transcripts (Wang et al., 2009a; Ying, 2004); however, for the purpose of digital expression, this may lead to a distorted representation of transcripts in a given sample.

Based on these considerations, nonamplified, non-normalized cDNA libraries were constructed in this study to obtain a true representation of gene expression along fruit development in the fleshy melon fruit. This is particularly significant in light of the fact that obtaining adequate levels of high-quality ribonucleic acid (RNA) of fleshy fruit in general of and melon in the climacteric and postclimacteric stages in particular is not trivial (Gasic et al., 2004; Pandit et al., 2007; Shellie et al., 1997). The transcriptome was pyrosequenced by 454 technology, followed by digital expression analysis. To validate the accuracy of this approach, expression levels of 40 genes were analyzed by reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR) and the expression patterns obtained by the two methods were compared.


Plant Material

Two melon cultivars, PI414723 (Cucumis melo L. subsp. agrestis (Naudin) Pangalo; Momordica group) and ‘Dulce’ (Cucumis melo L. subsp. melo; Reticulatus group) were used (Harel-Beja et al., 2010). Fifty plants of each cultivar were grown in the field at the Newe Ya'ar Research Center under standard conditions. Female flowers were tagged on day of anthesis and fruits were collected at 10-d intervals until maturation. The PI414723 fruits reached maturity at 30 d after anthesis (DAA) and those of ‘Dulce’ at 40 DAA. Rind and flesh tissues from 30 separate fruits were collected for each developmental stage, cut into small cubes, and immediately frozen in liquid N in three bulks of 10 fruits each.

Preparation of Total RNA and Messenger RNA and Their Analysis

Total RNA was isolated by a method used in our laboratory that includes three major steps: (i) the application of two detergents, sodium dodecyl sulfate (SDS) and sodium lauroylsarcosine, for lysis and denaturation of cellular constituents (Carpenter and Simon, 1998; LaClaire and Herrin, 1997), (ii) high salt concentration to remove polysaccharides (Fang et al., 1992), and (iii) LiCl precipitation for the separation of RNA from DNA (Sambrook and Russell, 2001). Briefly, frozen fruit tissues (10 g) were pulverized with a mortar and pestle in liquid N. Pulverized tissues were mixed well by vortexing in a 50-mL tube with 10 mL of extraction buffer containing 0.2 M Tris-HCl (pH 9.0), 0.2 M ethylenediaminetetraacetic acid (EDTA), 0.4 M NaCl, and 2% (w/v) SDS, and incubated at 65°C for 5 min. Then 30% (w/v) sodium lauroylsarcosine was added to a final concentration of 2% (v/v), and the mixture was vortexed and incubated at 65°C for 2 to 3 min. An equal volume of phenol was added to the solution, vortexed, and centrifuged at 5000 g for 5 min. The aqueous phase was transferred to a new 50-mL tube on ice and the phenol phase was re-extracted with 5 mL of extraction buffer. Aqueous phases were pooled and re-extracted with an equal volume of phenol, vortexed, and centrifuged at 5000 g for 5 min. Following three rounds of chloroform-isoamyl alcohol (24:1, v/v) extractions, nucleic acids were precipitated with 1/10 volume of 3 M sodium acetate (pH 5.3) and 2 volumes of 95% (v/v) ethanol. The resulting nucleic acid pellet was dissolved in 10 mL 2 M LiCl at 4°C overnight.

Total RNA was precipitated by centrifugation at 15,000 g for 10 min at 4°C and dissolved in 0.5 mL diethylpyrocarbonate (DEPC) water. After reprecipitation with 1/10 volume of 3 M sodium acetate (pH 5.3) and 2 volumes 95% ethanol, the pellet was dissolved in 50 to 100 μL DEPC water. The quality of the RNA was analyzed by (i) ND-1000 spectrophotometer (Nanodrop Technologies, Wilmington, DE); (ii) electrophoresis on a formaldehyde-agarose gel, and (iii) Agilent 2100 Bioanalyzer RNA chip (Agilent Technologies, Santa Clara, CA) (Supplemental Fig. S1A and S1B). Yield and quality (A260/280 ratio) of the various RNA samples are presented in Supplemental Table S1. The quality of the RNA was best indicated by the RNA integrity number (RIN) parameter obtained by the Agilent bioanalyzer, which ranged between 7.6 and 9.3 in the 14 samples, indicating the high quality of the RNA samples (Schroeder et al., 2006) (Supplemental Table S1).

Poly(A)+ RNA (mRNA) was purified from 1 mg of total RNA using the PolyATtract mRNA isolation system (Promega, Madison, WI) according to the manufacturer's instructions. Qualitative and quantitative analyses of mRNA were evaluated by (i) ND-1000 spectrophotometer and (ii) electrophoresis on formaldehyde-agarose gel (Supplemental Fig. S1C). To ascertain that the purified mRNA did not contain genomic DNA, the melon expansin gene (DV632686) was PCR amplified in all 14 samples, using primers for part of the gene that includes an intron (# 34; Supplemental Table S2 and Supplemental Fig. S1D). The amplification test was performed using first-strand cDNA which was synthesized using Reverse-iT first Strand Synthesis kit (ABgene Inc., Epsom, UK). All mRNA samples were free of genomic DNA, as demonstrated in Supplemental Fig. S1D.

Complementary DNA Preparation

Seven cDNA libraries of fruit rind were prepared using Just cDNA Double-Stranded cDNA Synthesis Kit (Stratagene, La Jolla, CA) according to the manufacturer's instructions (Protocol A). First-strand cDNA synthesis was conducted initially with 5 μg of poly(A)+ RNA and oligo(dT) primer. Following the run of the first titration plate (see 454 Pyrosequencing section below and Supplemental Table S3), the first strand was synthesized again for the four lowest quantity samples, using ∼10 μg poly(A)+ RNA and oligo(dT) primer (Protocol B). This procedure yielded fragments that are 3′-enriched relative to the entire transcriptome. The advantages of this 3′ enrichment are that it provides a better estimate of the number of unique transcripts within a particular transcriptome and greater depth of coverage is achieved in the 3′ ends of the transcripts (Emrich et al., 2007). A 5-μL aliquot of the first-strand cDNA was kept for quality control and for RT-qPCR. This modification was also applied for the preparation of the seven cDNA libraries of fruit flesh. Second-strand cDNA synthesis was performed according to the manufacturer's instructions, except for the resuspension of the final pellets in 30 μL Tris-EDTA buffer instead of the recommended 9 μl.

In addition, to increase the mean read length (Supplemental Table S3), a double-stranded cDNA purification step was included (Protocol B). Complementary DNA was purified using MEGAquick-spin PCR & Agarose Gel DNA Extraction system (iNtRON Biotechnology, Inc., Sangdaewon-Dong, Korea) according to the manufacturer's instructions. The quantity and quality of the cDNA were analyzed by (i) comparison of UV images of spotted cDNA samples on an EtBr agarose plate with DNA samples of known concentration (according to Stratagene recommendations), (ii) ND-1000 spectrophotometer (Supplemental Table S4), (iii) Agilent 2100 Bioanalyzer DNA chip, and (iv) running samples of first- and second-strand cDNA synthesis reaction on a 5% polyacrylamide gel with consequent staining by SYBR Green and EtBr (Supplemental Fig. S1E). To ensure that the first- and double-strand cDNA do not contain genomic DNA, the cDNA was amplified using primers for expansin, as described above for the mRNA (Supplemental Fig. S1D).

454 Pyrosequencing

454 Pyrosequencing was performed on Genome Sequencer FLX (GS FLX) by DYN G.S. Ltd., Caesarea, Israel. Genome Sequencer FLX DNA libraries were generated following the manufacturer's protocol (USM-00032.A 12/07). In general, sequencing was performed according to the standard procedure described previously (Margulies et al., 2005). First, seven of the 14 samples were sequenced using a 40 × 75 PicoTiterPlate divided into 8 regions (Supplemental Table S3). Following this run, modifications were applied to double-stranded cDNA synthesis and purification, as described above. Finally, approximately 4 μg of purified double-stranded cDNA per library was used for 454 sequencing. Four sequencing runs were performed using 70 × 75 PicoTiterPlates divided into four regions each.

Sequence Analysis

The cDNA sequences were clustered using LEADS clustering and assembly software as described previously (Akiva et al., 2006; David et al., 2002; Hu et al., 2001; Sorek et al., 2002; Xie et al., 2002). Briefly, the software cleans the expressed sequences of vectors and filters them for repeats and low-complexity regions. It then aligns all of these expressed sequences to each other, taking alternative splicing into account, and clusters overlapping expressed sequences into “clusters” that represent genes or partial genes. The clusters are assembled and putative RNA transcripts are predicted, representing all alternative splicing forms. Lastly, the transcripts are utilized to predict proteins. The LEADS parameters were optimized to deal with short 454 reads. In addition to the 454 sequences, we included in the analysis all publicly available melon sequences (available at http://www.icugi.org [verified 30 Dec. 2010]; version 2.0, a total of 35,751 sequences).

Digital Expression Analysis

Transcript abundance in each library was calculated by counting the number of library reads clustered in the gene, normalized by the total number of reads per library multiplied by 106, resulting in a parts-per-million (ppm) value.

Reverse Transcription Quantitative Real-Time Polymerase Chain Reaction Analysis

Primers for the 40 selected genes (Supplemental Table S2) were designed using Gene Runner 3.0 software (Hastings Software, Inc., Hastings, NY). The sequences used for the primer design are all available in the melon database (available at http://www.icugi.org [verified 30 Dec. 2010]). All primers were designed to sequences that were conserved between the two genotypes—PI414723 and ‘Dulce’—to ensure that mismatches do not affect the expression patterns.

Aliquots of the first-strand cDNA (see above) of the samples analyzed by 454 sequencing were also analyzed by RT-qPCR. First-strand cDNA was diluted 500 times in a solution of 3 mM Tris-HCl (pH 7.2) and 0.2 mM EDTA. A 1-μL aliquot was used for each reaction, performed using the ABI Prism 7000 Sequence Detection System (Applied Biosystems, Foster, CA). Amplifications were conducted using the ABsolute QPCR SYBR Green Mixes (ABgene) as described previously (Portnoy et al., 2008). All samples were run in technical triplicates. Thermal cycling was initiated by 15 min at 95°C, followed by 40 cycles of 90°C for 15 s and 60°C for 1 min. A melting-curve analysis was performed for each reaction to confirm the specificity of the amplification. The reference gene cyclophilin was used for normalization (Supplemental Table S2). Cycle threshold (Ct) values were determined by the ABI Prism 7000 SDS software and exported into MS Excel workbook (Microsoft Inc., Redmond, WA) for statistical analysis. Real-time efficiencies (E) were calculated from the slopes of standard curves for each gene (E = 10[–1/slope]) (Supplemental Table S2). The relative expression ratio (R) was calculated according to Pfaffl (2001) and compared to the 10-d sample of each tissue.

Comparisons between Digital Expression and RT-qPCR Patterns

The expression levels obtained by digital expression analysis of the 454 data and RT-qPCR analyses were normalized to median across 14 samples (see cDNA preparation above) for each gene. Gene expression was calculated as Log10 of the normalized values. Similarity of expression patterns between the two methods was determined by Pearson correlation.


Generation of Non-Normalized, Nonamplified Complementary DNA Libraries

We developed a useful method for preparing cDNA for RNA-Seq analysis from the fleshy melon fruit, which can be used to investigate expression patterns and sequence polymorphism of the melon fruit transcriptome. Double-stranded cDNA synthesis was performed using the Just cDNA Synthesis Kit (Stratagene), which does not include amplification or normalization steps. Initially, we compared two cDNA libraries derived from the same mRNA preparation (mature 40-d Dulce rind). One library (A) was prepared from 5 μg poly(A)+ mRNA and the cDNA library was not size purified; the other library (B) was prepared from 10 μg poly(A)+ mRNA and the resulting cDNA library was size purified. Figure 1 compares the size distribution in the sample of ‘Dulce’ 40-d rind (40R) prepared according to the original protocol (A) with that prepared with the two modifications (B). The results clearly show the effect on mean read length (178 bp versus 243 bp, respectively) and total yield (7.3 Mbp versus 19.6 Mbp, respectively). To obtain 10 μg poly(A)+ mRNA, RNA preparation was started with 10 g of fruit tissues leading to a mean of 1 mg total RNA (Supplemental Table S1) and circa 15 μg poly(A)+ mRNA. This amount of starting material is in the range of 10 times higher than that used in reports in which methods that include amplification steps were applied (e.g., Ando and Grumet 2010; Gasic et al., 2004). To test the reproducibility of the library preparation and the pyrosequencing process, double-stranded cDNA was prepared twice from one of the poly(A)+ mRNA samples (mature 40-d Dulce flesh) and the two libraries were run on two separate plates. Similar results obtained for each library (Table 1) confirmed the reproducibility of the method.

View Full Table | Close Full ViewTable 1.

Data obtained from two libraries prepared twice from the same poly(A)+ messenger (mRNA) sample (Dulce 40 flesh).

Libraries Mbp Reads Mean length, bp
Dulce 40F A 11.99 64,963 185
Dulce 40F B 11.87 60,398 196
40F, 40-d flesh.
Figure 1.
Figure 1.

Comparison of frequency distribution of length of reads obtained by 454 pyrosequencing of two ‘Dulce’ 40-d rind (40R) libraries prepared using different protocols. A. Protocol A. B. Protocol B, which includes a purification step. Number of reads, mean length, and total yields are presented for each library. Mb, Mbp.


Data Analysis and Mapping to the Publicly Available Expressed Sequence Tag Libraries

A total of 243.83 Mbp from 1,215,359 reads were generated from the different libraries, irrespective of preparation method. Of these, 1117,036 reads (225.75 Mbp) remained after quality, complexity, and primer trimming. Clustering algorithms were subsequently applied for assembly. The 1,150,657 sequences (1,117,036 reads together with the remaining 33,621 public sequences) were assembled into 67,477 contigs (Table 2A). The size distribution of the contigs is presented in Fig. 2A (mean contig size 409 bp) and the distribution of number of reads per contig is shown in Fig. 2B. The 32,357 singleton contigs constituted only 2.8% of all reads, while the remaining 97.2% of the reads (1,118,300 reads) comprised 35,120 nonsingleton contigs, indicating the saturation of the library (Table 2 and Fig. 2). Large contigs of more than 10 reads comprised 90.3% of the reads.

View Full Table | Close Full ViewTable 2.

Summary statistics for clustering data relating to melon database and distribution between genotypes and tissues.

Consideration Contigs % Reads % Contigs† %
A. Sizes of contigs
Singletons 32,357 48.0 32,357 2.8
Nonsingletons 35,120 52.0 1,118,300 97.2
Large contigs (>10 reads) 13,418 19.9 1,038,774 90.3 13,416 100
Total 67,477 100 1,150,657 100 13,418 100
B. 454 contigs mapped to ICuGI melon database
Contigs aligning to melon database 11,093 16.4 886,004 77.0 8619 64.2
454 contigs only 54,700 81.1 262,510 22.8 4797 35.8
Melon ICuGI database only 1684 2.5 2143 0.2 2 0.0
Total 67,477 100 1,150,657 100 13,418 100
C. Distribution of genotypes in 454
Contigs of PI414723 and ‘Dulce’ 24,610 37.4 1,084,362 94.4 13,173 98.2
Contigs of PI414723 only 15,786 24.0 24,199 2.1 101 0.8
Contigs of ‘Dulce’ only 25,397 38.6 39,953 3.5 142 1.1
Total 65,793 100 1,148,514 100 13,416 100
D. Distribution of fruit tissues in 454
Contigs of rind and flesh 24,879 37.8 1,081,470 94.2 13,098 97.6
Contigs of rind only 21,681 33.0 39,526 3.4 244 1.8
Contigs of flesh only 19,233 29.2 27,518 2.4 74 0.6
Total 65,793 100 1,148,514 100 13,416 100
Large contigs are included in the nonsingletons.
ICuGI, International Cucurbit Genomic Initiative; database available at http://www.icugi.org (verified 10 Jan. 2011), version2.
Figure 2.
Figure 2.

Length (A) and number of reads (B) of assembled contigs.


The mapping to the publicly available melon sequences (available at http://www.icugi.org as version2 [verified 10 Jan. 2011]) is presented in Table 2B. In total, 77% of the 454 reads (886,004 reads assembled into 11,093 nonsingleton contigs) directly matched a specific melon transcript in the ICuGI database with a p-value of less than 9 × 10−5. Over 22% of the 454 reads (262,510 reads assembled into 54,700 contigs) did not align to the melon collection of the ICuGI database (which in version 2 comprises circa one-third of fruit-derived ESTs) and, hence, potentially identified novel transcripts. On the other hand, only 1684 contigs from the melon ICuGI database did not align to the 454 data and over 75% of those originated from nonfruit libraries, including root, phloem, and other vegetative tissues, as described in the ICuGI web site. The remainder were derived from fruit libraries representing genotypes other than the two studied here. In addition, combining the GS FLX sequences with the ICuGI database sequences enabled contig joining, thereby decreasing the number of large contigs to 8619.

The distribution of reads between genotypes and tissues is presented in Tables 2C and D. A comparison of the reads between the two melon genotypes demonstrates that 1,084,362 (94.4%) reads, assembled into 24,610 (37.4%) contigs, were common to the two tested genotypes, thus providing a rich data source for single nucleotide polymorphism (SNP) discovery. Approximately 1,081,470 (94.2%) reads, assembled into 24,879 (37.8%) contigs, were common to both the rind and the flesh (Table 2D), while the remainder were specific to either the rind or the flesh. Most of the common contigs (circa 98%), both between genotypes and between tissues, were large. Of the contigs distinguishing between genotypes (5.6% of the reads), 1.9% were large contigs (Table 2), circa 25% were contigs of 2 to10 reads, and the rest were singletons. Similar values were obtained for tissue specific contigs. The large contigs are expected to be of genes expressed explicitly in one genotype or tissue and are therefore of special interest (examples for these are presented in Supplemental Fig. S2, e.g., 2A #11 and 2B #35). On the other hand, the singletons are expected to be of genes expressed at low level that by chance were detected in one genotype or tissue and not in the other. The number of the latter, and of short sequences that were not assembled, is expected to be reduced with increase in reads. Only 1587 of the reads (0.13%) were mapped to different plant viruses, suggesting slight field contamination of the field-grown fruit, but these might be useful for an ecological characterization of the viral metagenomic population. A similar percentage of contamination was detected by Gonzalez-Ibeas et al. (2007).

Quality Validation of the RNA Sequencing

To evaluate the quality of the data, correlation analysis among all melon EST libraries, from both this study and the melon ICuGI database, was performed (Fig. 3). The ICuGI libraries included fruit, leaf, cotyledon, root, and phloem libraries. As expected, the closer the libraries in terms of genotype, tissue, or developmental stage, the higher the correlations obtained. Highest correlations were obtained between the replications of the same mRNA source, as described above: 0.96 for ‘Dulce’ 40-d flesh (40F) (reproducibility experiment) and 0.95 for ‘Dulce’ 40R (comparison of cDNA protocols). A high correlation was also detected between libraries of the same developmental stage: 0.87 for 20 DAA flesh samples of the two genotypes and 0.86 for two consecutive samples (10 and 20 DAA) of ‘Dulce’. In contrast, lowest correlations were detected between the vegetative and fruit libraries.

Figure 3.
Figure 3.

Heat-map analysis of correlations among 16 complementary DNA (cDNA) libraries sequenced by 454 technology and 10 cDNA libraries sequenced by traditional methods available at the International Cucurbit Genomics Initiative (ICuGI) database (version 2). The numbers in the cells are Pearson correlation coefficients between two samples. Color brightness is proportional to the correlation coefficient. Good correlation (bright red color) was found between sequences obtained by both sequencing methods from libraries of the same genotypes, tissues, and developmental stages.


Digital expression profiling was performed based on counting the reads clustered in a gene in each library. Altogether, the expression analysis was based on 1,150,657 reads obtained from four full plates with an average of approximately 76,000 reads per library (Supplemental Table S5). To validate the quantitative measures of transcript abundance using this method, expression patterns of 40 selected genes were compared with expression profiles obtained by RT-qPCR using the same cDNA from all 14 libraries (Table 3 and Fig. 4 and 5). The counts in the 454 data varied from 26 to 4327 reads for the 40 genes, and 4327 was the highest count for a single gene. Half of the genes were chosen from the highly abundant transcripts (Table 3: #1–9 from young fruits and #21–31 from mature fruits) and the other half were selected arbitrarily. The normalized expression patterns obtained by both methods were compared using the Pearson correlation coefficient (Table 3 and Fig. 4 and 5). In general, high positive correlations were found between the patterns obtained by both techniques, irrespective of the number of reads: >0.90 for 16 of the genes, >0.70 for 17 of the genes, and >0.50 for five genes (Table 3; Fig. 4). The correlations for only two genes were lower than 0.50 (0.46 and 0.41). Pearson correlation coefficient was also calculated for the non-normalized expression values and correlations were similarly high (Table 3 and Supplemental Fig. S2).

View Full Table | Close Full ViewTable 3.

List of genes showing differential expression based on read counts in 454 libraries of PI414723 and ‘Dulce’ and used for validation of expression quantification.

Accession no. Gene or enzyme 454 reads PC Abbreviation
A. Genes upregulated in young fruits
1 AF230211 Translationally controlled tumor protein 4136 0.86 TCTP
2 DQ288986 Alcohol dehydrogenase 3696 0.91 ADH 1
3 AM723137 Catalase isozyme 3 3131 0.87 CAT3
4 EB715472 HyPRP2 2633 0.94 HyP
5 EB715966 No hits found 2584 0.84
6 AF436852 Pyruvate decarboxylase 2509 0.88 PDC
7 DV632484 Ribulose bisphosphate carboxylase 2393 0.79 RBCS
8 DV634533 Glyceraldehyde-3-phosphate dehydrogenase 2041 0.77 GPDH
9 DV634305 Chlorophyll a/b-binding protein 1685 0.91 CAB1
10 DV634171 Dehydrin 1515 0.81 DEH
11 DV632443 Lipid transfer protein isoform 4 1473 0.91 LTP
12 DV634731 Polyubiquitin 10 1450 0.46 UBQ10
13 DV634376 Chlorophyll a/b-binding protein 1033 0.90 CAB2
14 DV632050 Elongation factor 1-alpha 896 0.41 EF1A
15 DV633092 Vacuolar cation/proton exchanger 1a 739 0.88 CAX
16 DV634937 Elongation factor 1-gamma 513 0.64 EF1B
17 AM718345 Clathrin heavy chain 349 0.50 Clathrin
18 DQ267934 13S-lipoxygenase 235 0.73 LOX1
19 DV634569 Hypothetical protein 106 0.53 OR1
20 DV634330 RNA-binding protein 26 0.70 RRM
B. Genes upregulated in mature fruits
21 DV633682 Adenosylhomocysteinase 4327 0.87 SAHH
22 X69935 ACC oxidase 4152 0.98 ACO1
23 AF426403 Abscisic acid response protein 4019 0.95 ASR1
24 DV632379 Type-2 metallothionein 3159 0.92 Type-2 MT
25 Z70522 Major latex protein 2996 0.94 pMEL7
26 DV632508 Branched-chain-amino-acid transaminase 2994 0.93 BCAT
27 DV633298 No hits found 2788 0.88
28 DV633532 Stearoyl-acyl carrier protein desaturase 2666 0.99 DES
29 DV632631 Acyl carrier protein 2458 0.93 ACP
30 DV634049 ECERIFERUM 1, octadecanal decarbonylase 1921 0.94 CER1
31 Z70521 Alcohol acetyltransferase 1753 0.94 AAT1
32 DV633593 Glutathione S-transferase 1411 0.92 GST
33 DV633781 Beta-1,3-glucanase 1186 0.88 BG
34 DV632686 Expansin 941 0.92 EXP1
35 DV631909 Ripening-related protein grip22 861 0.87 Kiw
36 DV634429 Sucrose-phosphate synthase 1 417 0.70 SPS1
37 DV632450 No hits found 192 0.86
38 DV633411 UDP-glucose 4-epimerase 1 173 0.68 UGE1
39 DV633715 Acireductone dioxygenase 59 0.65 ARD4
40 DV632398 No hits found 47 0.75
PC, Pearson correlation: between Log10 reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR) and Log10 454 pyrosequencing expression level.
RNA, ribonucleic acid.
Figure 4.Figure 4.
Figure 4.

Continued on next page. Comparison of expression patterns of selected genes as determined by 454 pyrosequencing (dark columns) and reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR) (gray columns). A. Twenty genes upregulated in young fruit. B. Twenty genes upregulated in mature fruit. Transcript abundance in 454 libraries was measured by counting and the same complementary DNA (cDNA) libraries were analyzed by RT-qPCR. Both 454 digital expression values (ppm) and RT-qPCR relative expression ratio (R) values were normalized to the median across 14 samples for each gene. Gene expression was plotted as Log10 (y axis) of the 454 and RT-qPCR normalized values. X axis: Days after anthesis; rind (R) and flesh (F) tissues; genotypes: PI414723 and ‘Dulce’. Gene numbers and abbreviations are according to Table 3. Numbers in parentheses indicate similarity of expression patterns between the two methods, as determined by Pearson correlation.

Figure 5.
Figure 5.

Scatter-plot comparison of the expression level (Log10) of 40 genes by 454 pyrosequencing (x axis) and reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR) (y axis). Correlation was based on 560 ratio points calculated for 40 genes across developmental stages, fruit tissues (rind and flesh), and two genotypes.


Especially high correlations (>0.90; Table 3 and Fig. 4) were detected for genes expressed strongly in a particular library, such as those with strong expression at the mature stage (PI414723 30 DAA or ‘Dulce’ 40 DAA). These included, for example, genes encoding ACC oxidase (pMEL1; Balague et al., 1993), major latex protein and alcohol acetyltrasferase (pMEL7 and pMEL2; Aggelis et al., 1997), and expansin (Rose et al., 1997), which have been reported in the early literature as being highly expressed genes in melon. The highly expressed genes also included genes for volatile metabolism, such as alcohol acetyltrasferase (Aharoni et al., 2000; Shalit et al., 2001; Yahyaoui et al., 2002) and branched-chain amino acid transferase (Gonda et al., 2010; Hadfield et al., 2000), which characterize the ripening, aromatic and climacteric melon fruit, as well as other genes associated with ripening components of sugar accumulation (sucrose phosphate synthase) (Burger et al., 2006; Lester et al., 2001) and fruit softening (β-1,3-glucanase) (Nishiyama et al., 2007; Rose et al., 1998). Reciprocally, the highly expressed genes in young fruit libraries included photosynthetic genes (chlorophyll a/b-binding protein and ribulose biphosphate carboxylase) and dehydrin (Porat et al., 2004). Recently, Ando and Grumet (2010) reported the transcriptome of young (8 d) cucumber (Cucumis sativus L.) fruit, and the highly expressed genes reported there overlapped with many of the highly expressed genes of young fruit in our study.


The transcriptomic results presented in this study were obtained from nonamplified, non-normalized cDNA libraries. In general, transcriptome studies that have been recently presented based on deep-sequencing technologies (Ando and Grumet, 2010; Folta et al., 2010; Guo et al., 2010) have relied on cDNA libraries prepared with an amplification step, to overcome the problem of low RNA concentration. This step may affect the relative representation of genes and, recently, the importance of a nonamplified template source for quantitative applications such as RNA-Seq has been emphasized by Wang et al. (2009) and Metzker (2010), so as not to alter the representational abundance of mRNA molecules.

The results presented here are even more significant in light of the fact that, in general, mature fleshy fruit poses a challenge for the preparation of an adequate amount of high-quality RNA (Gasic et al., 2004; Pandit et al., 2007; Shellie et al., 1997). This can also be seen in the RNA-quality data of the different libraries studied, whereby the RNA derived from young fruit was of higher quality than that from mature fruit (Supplemental Table S1). To date, RNA preparation from fruit for next-generation transcriptome sequencing has been reported for immature cucumber fruit (Ando and Grumet, 2010), strawberry (Fragaria ×ananassa Duchesne ex Rozier) (Folta et al., 2010), grape (Vitis vinifera L.) (Bellin et al., 2009) and olive (Olea europaea L.) (Alagna et al., 2009). In all cases, amplification and normalization steps were implemented. With the expected increase in next-generation transcriptome sequencing, the described method and results of this study should serve as a useful tool for the investigation of expression patterns and sequence polymorphisms of agriculturally important crop plants, including many fleshy fruit in general, and cucurbit fruit in particular.

Supplemental Information Available

Supplemental material is available free of charge at http://www.crops.org/publications/tpg.


This study was supported by Office of the Chief Scientist in the Israeli Ministry of Industry, Trade, and Labor through Magnet Program and the Israeli Bio-TOV Consortium. Contribution No. 132/2010 of the Agricultural Research Organization, Israel.




  • All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher.

Be the first to comment.

Please log in to post a comment.
*Society members, certified professionals, and authors are permitted to comment.