Transcriptome analysis based on deep sequencing has been reported in a growing number of studies in recent years (e.g., Alagna et al., 2009; Ando and Grumet, 2010; Barakat et al., 2009; Emrich et al., 2007; Folta et al., 2010; Guo et al., 2010; Jones-Rhoades et al., 2007; Wang et al., 2009a; Weber et al., 2007). The massive parallel sequencing of complementary DNA (cDNA), utilizing Roche 454-pyrosequencing technology based on sequencing-by-synthesis (Margulies et al., 2005), has been the most extensively reported approach, especially for organisms whose genome is not yet available, although more recently alternative technologies of Illumina and SOLiD have been used (Metzker, 2010; Wang et al., 2009b). Nevertheless, 454 pyrosequencing is still advantageous for the study of organisms whose genome is not available, such as melon (Vera et al., 2008; Wang et al., 2009b). The high-throughput generation of massive sequence collections representing gene expression enables various applications, including gene or marker discovery and comparisons of genome-wide expression patterns. Measuring gene expression accurately is not a trivial matter and, although advanced methodologies are currently available, they each have their drawbacks (Metzker, 2010; Meyers et al., 2004; Wang et al., 2009b). The comparison of transcript counts among various genotypes, tissues, and time points reflects relative gene expression with statistical significance that increases with increasing number of transcripts compared (Audic and Claverie, 1997; Torres et al., 2008; Weber et al., 2007). Due to the large number of reads obtained by this technology, 454 pyrosequencing provides highly significant estimates of relative gene expression (Mane et al., 2009). Nevertheless, the precision of the approach still has not fully been accepted. Recently, a growing number of studies have addressed the issue of measuring transcriptome composition by counts of reads obtained by high-throughput sequencing of complementary DNA, also termed ribonucleic acid sequencing (RNA-Seq) (Nagalakshmi et al., 2008; Wang et al., 2009b), an approach that is widely applied (Marioni et al., 2008; Mortazavi et al., 2008; Nagalakshmi et al., 2008; Wang et al., 2009b; Wilhelm and Landry, 2009; Wilhelm et al., 2008).
One major factor determining the precision of transcriptome analysis is the quality of the cDNA. Most available protocols for cDNA library preparation (Wang et al., 2009b; Ying, 2004) include an amplification step aimed at obtaining the required quantities of cDNA even when the source tissue is limited. However, even a single round of amplification can distort the representation of messenger RNA (mRNA) in a cDNA library (Metzker, 2010; Sambrook and Russell, 2001; Wang et al., 2009b) and primary, nonamplified libraries have been recently suggested for studies of transcript abundance (Metzker, 2010; Wang et al., 2009b). Moreover, several of the protocols include a normalization step to detect rare transcripts (Wang et al., 2009a; Ying, 2004); however, for the purpose of digital expression, this may lead to a distorted representation of transcripts in a given sample.
Based on these considerations, nonamplified, non-normalized cDNA libraries were constructed in this study to obtain a true representation of gene expression along fruit development in the fleshy melon fruit. This is particularly significant in light of the fact that obtaining adequate levels of high-quality ribonucleic acid (RNA) of fleshy fruit in general of and melon in the climacteric and postclimacteric stages in particular is not trivial (Gasic et al., 2004; Pandit et al., 2007; Shellie et al., 1997). The transcriptome was pyrosequenced by 454 technology, followed by digital expression analysis. To validate the accuracy of this approach, expression levels of 40 genes were analyzed by reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR) and the expression patterns obtained by the two methods were compared.
MATERIALS AND METHODS
Two melon cultivars, PI414723 (Cucumis melo L. subsp. agrestis (Naudin) Pangalo; Momordica group) and ‘Dulce’ (Cucumis melo L. subsp. melo; Reticulatus group) were used (Harel-Beja et al., 2010). Fifty plants of each cultivar were grown in the field at the Newe Ya'ar Research Center under standard conditions. Female flowers were tagged on day of anthesis and fruits were collected at 10-d intervals until maturation. The PI414723 fruits reached maturity at 30 d after anthesis (DAA) and those of ‘Dulce’ at 40 DAA. Rind and flesh tissues from 30 separate fruits were collected for each developmental stage, cut into small cubes, and immediately frozen in liquid N in three bulks of 10 fruits each.
Preparation of Total RNA and Messenger RNA and Their Analysis
Total RNA was isolated by a method used in our laboratory that includes three major steps: (i) the application of two detergents, sodium dodecyl sulfate (SDS) and sodium lauroylsarcosine, for lysis and denaturation of cellular constituents (Carpenter and Simon, 1998; LaClaire and Herrin, 1997), (ii) high salt concentration to remove polysaccharides (Fang et al., 1992), and (iii) LiCl precipitation for the separation of RNA from DNA (Sambrook and Russell, 2001). Briefly, frozen fruit tissues (10 g) were pulverized with a mortar and pestle in liquid N. Pulverized tissues were mixed well by vortexing in a 50-mL tube with 10 mL of extraction buffer containing 0.2 M Tris-HCl (pH 9.0), 0.2 M ethylenediaminetetraacetic acid (EDTA), 0.4 M NaCl, and 2% (w/v) SDS, and incubated at 65°C for 5 min. Then 30% (w/v) sodium lauroylsarcosine was added to a final concentration of 2% (v/v), and the mixture was vortexed and incubated at 65°C for 2 to 3 min. An equal volume of phenol was added to the solution, vortexed, and centrifuged at 5000 g for 5 min. The aqueous phase was transferred to a new 50-mL tube on ice and the phenol phase was re-extracted with 5 mL of extraction buffer. Aqueous phases were pooled and re-extracted with an equal volume of phenol, vortexed, and centrifuged at 5000 g for 5 min. Following three rounds of chloroform-isoamyl alcohol (24:1, v/v) extractions, nucleic acids were precipitated with 1/10 volume of 3 M sodium acetate (pH 5.3) and 2 volumes of 95% (v/v) ethanol. The resulting nucleic acid pellet was dissolved in 10 mL 2 M LiCl at 4°C overnight.
Total RNA was precipitated by centrifugation at 15,000 g for 10 min at 4°C and dissolved in 0.5 mL diethylpyrocarbonate (DEPC) water. After reprecipitation with 1/10 volume of 3 M sodium acetate (pH 5.3) and 2 volumes 95% ethanol, the pellet was dissolved in 50 to 100 μL DEPC water. The quality of the RNA was analyzed by (i) ND-1000 spectrophotometer (Nanodrop Technologies, Wilmington, DE); (ii) electrophoresis on a formaldehyde-agarose gel, and (iii) Agilent 2100 Bioanalyzer RNA chip (Agilent Technologies, Santa Clara, CA) (Supplemental Fig. S1A and S1B). Yield and quality (A260/280 ratio) of the various RNA samples are presented in Supplemental Table S1. The quality of the RNA was best indicated by the RNA integrity number (RIN) parameter obtained by the Agilent bioanalyzer, which ranged between 7.6 and 9.3 in the 14 samples, indicating the high quality of the RNA samples (Schroeder et al., 2006) (Supplemental Table S1).
Poly(A)+ RNA (mRNA) was purified from 1 mg of total RNA using the PolyATtract mRNA isolation system (Promega, Madison, WI) according to the manufacturer's instructions. Qualitative and quantitative analyses of mRNA were evaluated by (i) ND-1000 spectrophotometer and (ii) electrophoresis on formaldehyde-agarose gel (Supplemental Fig. S1C). To ascertain that the purified mRNA did not contain genomic DNA, the melon expansin gene (DV632686) was PCR amplified in all 14 samples, using primers for part of the gene that includes an intron (# 34; Supplemental Table S2 and Supplemental Fig. S1D). The amplification test was performed using first-strand cDNA which was synthesized using Reverse-iT first Strand Synthesis kit (ABgene Inc., Epsom, UK). All mRNA samples were free of genomic DNA, as demonstrated in Supplemental Fig. S1D.
Complementary DNA Preparation
Seven cDNA libraries of fruit rind were prepared using Just cDNA Double-Stranded cDNA Synthesis Kit (Stratagene, La Jolla, CA) according to the manufacturer's instructions (Protocol A). First-strand cDNA synthesis was conducted initially with 5 μg of poly(A)+ RNA and oligo(dT) primer. Following the run of the first titration plate (see 454 Pyrosequencing section below and Supplemental Table S3), the first strand was synthesized again for the four lowest quantity samples, using ∼10 μg poly(A)+ RNA and oligo(dT) primer (Protocol B). This procedure yielded fragments that are 3′-enriched relative to the entire transcriptome. The advantages of this 3′ enrichment are that it provides a better estimate of the number of unique transcripts within a particular transcriptome and greater depth of coverage is achieved in the 3′ ends of the transcripts (Emrich et al., 2007). A 5-μL aliquot of the first-strand cDNA was kept for quality control and for RT-qPCR. This modification was also applied for the preparation of the seven cDNA libraries of fruit flesh. Second-strand cDNA synthesis was performed according to the manufacturer's instructions, except for the resuspension of the final pellets in 30 μL Tris-EDTA buffer instead of the recommended 9 μl.
In addition, to increase the mean read length (Supplemental Table S3), a double-stranded cDNA purification step was included (Protocol B). Complementary DNA was purified using MEGAquick-spin PCR & Agarose Gel DNA Extraction system (iNtRON Biotechnology, Inc., Sangdaewon-Dong, Korea) according to the manufacturer's instructions. The quantity and quality of the cDNA were analyzed by (i) comparison of UV images of spotted cDNA samples on an EtBr agarose plate with DNA samples of known concentration (according to Stratagene recommendations), (ii) ND-1000 spectrophotometer (Supplemental Table S4), (iii) Agilent 2100 Bioanalyzer DNA chip, and (iv) running samples of first- and second-strand cDNA synthesis reaction on a 5% polyacrylamide gel with consequent staining by SYBR Green and EtBr (Supplemental Fig. S1E). To ensure that the first- and double-strand cDNA do not contain genomic DNA, the cDNA was amplified using primers for expansin, as described above for the mRNA (Supplemental Fig. S1D).
454 Pyrosequencing was performed on Genome Sequencer FLX (GS FLX) by DYN G.S. Ltd., Caesarea, Israel. Genome Sequencer FLX DNA libraries were generated following the manufacturer's protocol (USM-00032.A 12/07). In general, sequencing was performed according to the standard procedure described previously (Margulies et al., 2005). First, seven of the 14 samples were sequenced using a 40 × 75 PicoTiterPlate divided into 8 regions (Supplemental Table S3). Following this run, modifications were applied to double-stranded cDNA synthesis and purification, as described above. Finally, approximately 4 μg of purified double-stranded cDNA per library was used for 454 sequencing. Four sequencing runs were performed using 70 × 75 PicoTiterPlates divided into four regions each.
The cDNA sequences were clustered using LEADS clustering and assembly software as described previously (Akiva et al., 2006; David et al., 2002; Hu et al., 2001; Sorek et al., 2002; Xie et al., 2002). Briefly, the software cleans the expressed sequences of vectors and filters them for repeats and low-complexity regions. It then aligns all of these expressed sequences to each other, taking alternative splicing into account, and clusters overlapping expressed sequences into “clusters” that represent genes or partial genes. The clusters are assembled and putative RNA transcripts are predicted, representing all alternative splicing forms. Lastly, the transcripts are utilized to predict proteins. The LEADS parameters were optimized to deal with short 454 reads. In addition to the 454 sequences, we included in the analysis all publicly available melon sequences (available at http://www.icugi.org [verified 30 Dec. 2010]; version 2.0, a total of 35,751 sequences).
Digital Expression Analysis
Transcript abundance in each library was calculated by counting the number of library reads clustered in the gene, normalized by the total number of reads per library multiplied by 106, resulting in a parts-per-million (ppm) value.
Reverse Transcription Quantitative Real-Time Polymerase Chain Reaction Analysis
Primers for the 40 selected genes (Supplemental Table S2) were designed using Gene Runner 3.0 software (Hastings Software, Inc., Hastings, NY). The sequences used for the primer design are all available in the melon database (available at http://www.icugi.org [verified 30 Dec. 2010]). All primers were designed to sequences that were conserved between the two genotypes—PI414723 and ‘Dulce’—to ensure that mismatches do not affect the expression patterns.
Aliquots of the first-strand cDNA (see above) of the samples analyzed by 454 sequencing were also analyzed by RT-qPCR. First-strand cDNA was diluted 500 times in a solution of 3 mM Tris-HCl (pH 7.2) and 0.2 mM EDTA. A 1-μL aliquot was used for each reaction, performed using the ABI Prism 7000 Sequence Detection System (Applied Biosystems, Foster, CA). Amplifications were conducted using the ABsolute QPCR SYBR Green Mixes (ABgene) as described previously (Portnoy et al., 2008). All samples were run in technical triplicates. Thermal cycling was initiated by 15 min at 95°C, followed by 40 cycles of 90°C for 15 s and 60°C for 1 min. A melting-curve analysis was performed for each reaction to confirm the specificity of the amplification. The reference gene cyclophilin was used for normalization (Supplemental Table S2). Cycle threshold (Ct) values were determined by the ABI Prism 7000 SDS software and exported into MS Excel workbook (Microsoft Inc., Redmond, WA) for statistical analysis. Real-time efficiencies (E) were calculated from the slopes of standard curves for each gene (E = 10[–1/slope]) (Supplemental Table S2). The relative expression ratio (R) was calculated according to Pfaffl (2001) and compared to the 10-d sample of each tissue.
Comparisons between Digital Expression and RT-qPCR Patterns
The expression levels obtained by digital expression analysis of the 454 data and RT-qPCR analyses were normalized to median across 14 samples (see cDNA preparation above) for each gene. Gene expression was calculated as Log10 of the normalized values. Similarity of expression patterns between the two methods was determined by Pearson correlation.
RESULTS AND DISCUSSION
Generation of Non-Normalized, Nonamplified Complementary DNA Libraries
We developed a useful method for preparing cDNA for RNA-Seq analysis from the fleshy melon fruit, which can be used to investigate expression patterns and sequence polymorphism of the melon fruit transcriptome. Double-stranded cDNA synthesis was performed using the Just cDNA Synthesis Kit (Stratagene), which does not include amplification or normalization steps. Initially, we compared two cDNA libraries derived from the same mRNA preparation (mature 40-d Dulce rind). One library (A) was prepared from 5 μg poly(A)+ mRNA and the cDNA library was not size purified; the other library (B) was prepared from 10 μg poly(A)+ mRNA and the resulting cDNA library was size purified. Figure 1 compares the size distribution in the sample of ‘Dulce’ 40-d rind (40R) prepared according to the original protocol (A) with that prepared with the two modifications (B). The results clearly show the effect on mean read length (178 bp versus 243 bp, respectively) and total yield (7.3 Mbp versus 19.6 Mbp, respectively). To obtain 10 μg poly(A)+ mRNA, RNA preparation was started with 10 g of fruit tissues leading to a mean of 1 mg total RNA (Supplemental Table S1) and circa 15 μg poly(A)+ mRNA. This amount of starting material is in the range of 10 times higher than that used in reports in which methods that include amplification steps were applied (e.g., Ando and Grumet 2010; Gasic et al., 2004). To test the reproducibility of the library preparation and the pyrosequencing process, double-stranded cDNA was prepared twice from one of the poly(A)+ mRNA samples (mature 40-d Dulce flesh) and the two libraries were run on two separate plates. Similar results obtained for each library (Table 1) confirmed the reproducibility of the method.
|Libraries||Mbp||Reads||Mean length, bp|
|Dulce 40F A||11.99||64,963||185|
|Dulce 40F B||11.87||60,398||196|
Data Analysis and Mapping to the Publicly Available Expressed Sequence Tag Libraries
A total of 243.83 Mbp from 1,215,359 reads were generated from the different libraries, irrespective of preparation method. Of these, 1117,036 reads (225.75 Mbp) remained after quality, complexity, and primer trimming. Clustering algorithms were subsequently applied for assembly. The 1,150,657 sequences (1,117,036 reads together with the remaining 33,621 public sequences) were assembled into 67,477 contigs (Table 2A). The size distribution of the contigs is presented in Fig. 2A (mean contig size 409 bp) and the distribution of number of reads per contig is shown in Fig. 2B. The 32,357 singleton contigs constituted only 2.8% of all reads, while the remaining 97.2% of the reads (1,118,300 reads) comprised 35,120 nonsingleton contigs, indicating the saturation of the library (Table 2 and Fig. 2). Large contigs of more than 10 reads comprised 90.3% of the reads.
|A. Sizes of contigs|
|Large contigs (>10 reads)||13,418||19.9||1,038,774||90.3||13,416||100|
|B. 454 contigs mapped to ICuGI melon database|
|Contigs aligning to melon database||11,093||16.4||886,004||77.0||8619||64.2|
|454 contigs only||54,700||81.1||262,510||22.8||4797||35.8|
|Melon ICuGI database only||1684||2.5||2143||0.2||2||0.0|
|C. Distribution of genotypes in 454|
|Contigs of PI414723 and ‘Dulce’||24,610||37.4||1,084,362||94.4||13,173||98.2|
|Contigs of PI414723 only||15,786||24.0||24,199||2.1||101||0.8|
|Contigs of ‘Dulce’ only||25,397||38.6||39,953||3.5||142||1.1|
|D. Distribution of fruit tissues in 454|
|Contigs of rind and flesh||24,879||37.8||1,081,470||94.2||13,098||97.6|
|Contigs of rind only||21,681||33.0||39,526||3.4||244||1.8|
|Contigs of flesh only||19,233||29.2||27,518||2.4||74||0.6|
The mapping to the publicly available melon sequences (available at http://www.icugi.org as version2 [verified 10 Jan. 2011]) is presented in Table 2B. In total, 77% of the 454 reads (886,004 reads assembled into 11,093 nonsingleton contigs) directly matched a specific melon transcript in the ICuGI database with a p-value of less than 9 × 10−5. Over 22% of the 454 reads (262,510 reads assembled into 54,700 contigs) did not align to the melon collection of the ICuGI database (which in version 2 comprises circa one-third of fruit-derived ESTs) and, hence, potentially identified novel transcripts. On the other hand, only 1684 contigs from the melon ICuGI database did not align to the 454 data and over 75% of those originated from nonfruit libraries, including root, phloem, and other vegetative tissues, as described in the ICuGI web site. The remainder were derived from fruit libraries representing genotypes other than the two studied here. In addition, combining the GS FLX sequences with the ICuGI database sequences enabled contig joining, thereby decreasing the number of large contigs to 8619.
The distribution of reads between genotypes and tissues is presented in Tables 2C and D. A comparison of the reads between the two melon genotypes demonstrates that 1,084,362 (94.4%) reads, assembled into 24,610 (37.4%) contigs, were common to the two tested genotypes, thus providing a rich data source for single nucleotide polymorphism (SNP) discovery. Approximately 1,081,470 (94.2%) reads, assembled into 24,879 (37.8%) contigs, were common to both the rind and the flesh (Table 2D), while the remainder were specific to either the rind or the flesh. Most of the common contigs (circa 98%), both between genotypes and between tissues, were large. Of the contigs distinguishing between genotypes (5.6% of the reads), 1.9% were large contigs (Table 2), circa 25% were contigs of 2 to10 reads, and the rest were singletons. Similar values were obtained for tissue specific contigs. The large contigs are expected to be of genes expressed explicitly in one genotype or tissue and are therefore of special interest (examples for these are presented in Supplemental Fig. S2, e.g., 2A #11 and 2B #35). On the other hand, the singletons are expected to be of genes expressed at low level that by chance were detected in one genotype or tissue and not in the other. The number of the latter, and of short sequences that were not assembled, is expected to be reduced with increase in reads. Only 1587 of the reads (0.13%) were mapped to different plant viruses, suggesting slight field contamination of the field-grown fruit, but these might be useful for an ecological characterization of the viral metagenomic population. A similar percentage of contamination was detected by Gonzalez-Ibeas et al. (2007).
Quality Validation of the RNA Sequencing
To evaluate the quality of the data, correlation analysis among all melon EST libraries, from both this study and the melon ICuGI database, was performed (Fig. 3). The ICuGI libraries included fruit, leaf, cotyledon, root, and phloem libraries. As expected, the closer the libraries in terms of genotype, tissue, or developmental stage, the higher the correlations obtained. Highest correlations were obtained between the replications of the same mRNA source, as described above: 0.96 for ‘Dulce’ 40-d flesh (40F) (reproducibility experiment) and 0.95 for ‘Dulce’ 40R (comparison of cDNA protocols). A high correlation was also detected between libraries of the same developmental stage: 0.87 for 20 DAA flesh samples of the two genotypes and 0.86 for two consecutive samples (10 and 20 DAA) of ‘Dulce’. In contrast, lowest correlations were detected between the vegetative and fruit libraries.
Digital expression profiling was performed based on counting the reads clustered in a gene in each library. Altogether, the expression analysis was based on 1,150,657 reads obtained from four full plates with an average of approximately 76,000 reads per library (Supplemental Table S5). To validate the quantitative measures of transcript abundance using this method, expression patterns of 40 selected genes were compared with expression profiles obtained by RT-qPCR using the same cDNA from all 14 libraries (Table 3 and Fig. 4 and 5). The counts in the 454 data varied from 26 to 4327 reads for the 40 genes, and 4327 was the highest count for a single gene. Half of the genes were chosen from the highly abundant transcripts (Table 3: #1–9 from young fruits and #21–31 from mature fruits) and the other half were selected arbitrarily. The normalized expression patterns obtained by both methods were compared using the Pearson correlation coefficient (Table 3 and Fig. 4 and 5). In general, high positive correlations were found between the patterns obtained by both techniques, irrespective of the number of reads: >0.90 for 16 of the genes, >0.70 for 17 of the genes, and >0.50 for five genes (Table 3; Fig. 4). The correlations for only two genes were lower than 0.50 (0.46 and 0.41). Pearson correlation coefficient was also calculated for the non-normalized expression values and correlations were similarly high (Table 3 and Supplemental Fig. S2).
|Accession no.||Gene or enzyme||454 reads||PC||Abbreviation|
|A. Genes upregulated in young fruits|
|1||AF230211||Translationally controlled tumor protein||4136||0.86||TCTP|
|2||DQ288986||Alcohol dehydrogenase||3696||0.91||ADH 1|
|3||AM723137||Catalase isozyme 3||3131||0.87||CAT3|
|5||EB715966||No hits found||2584||0.84||–|
|7||DV632484||Ribulose bisphosphate carboxylase||2393||0.79||RBCS|
|9||DV634305||Chlorophyll a/b-binding protein||1685||0.91||CAB1|
|11||DV632443||Lipid transfer protein isoform 4||1473||0.91||LTP|
|13||DV634376||Chlorophyll a/b-binding protein||1033||0.90||CAB2|
|14||DV632050||Elongation factor 1-alpha||896||0.41||EF1A|
|15||DV633092||Vacuolar cation/proton exchanger 1a||739||0.88||CAX|
|16||DV634937||Elongation factor 1-gamma||513||0.64||EF1B|
|17||AM718345||Clathrin heavy chain||349||0.50||Clathrin|
|B. Genes upregulated in mature fruits|
|23||AF426403||Abscisic acid response protein||4019||0.95||ASR1|
|24||DV632379||Type-2 metallothionein||3159||0.92||Type-2 MT|
|25||Z70522||Major latex protein||2996||0.94||pMEL7|
|27||DV633298||No hits found||2788||0.88||–|
|28||DV633532||Stearoyl-acyl carrier protein desaturase||2666||0.99||DES|
|29||DV632631||Acyl carrier protein||2458||0.93||ACP|
|30||DV634049||ECERIFERUM 1, octadecanal decarbonylase||1921||0.94||CER1|
|35||DV631909||Ripening-related protein grip22||861||0.87||Kiw|
|36||DV634429||Sucrose-phosphate synthase 1||417||0.70||SPS1|
|37||DV632450||No hits found||192||0.86||–|
|38||DV633411||UDP-glucose 4-epimerase 1||173||0.68||UGE1|
|40||DV632398||No hits found||47||0.75||–|
Especially high correlations (>0.90; Table 3 and Fig. 4) were detected for genes expressed strongly in a particular library, such as those with strong expression at the mature stage (PI414723 30 DAA or ‘Dulce’ 40 DAA). These included, for example, genes encoding ACC oxidase (pMEL1; Balague et al., 1993), major latex protein and alcohol acetyltrasferase (pMEL7 and pMEL2; Aggelis et al., 1997), and expansin (Rose et al., 1997), which have been reported in the early literature as being highly expressed genes in melon. The highly expressed genes also included genes for volatile metabolism, such as alcohol acetyltrasferase (Aharoni et al., 2000; Shalit et al., 2001; Yahyaoui et al., 2002) and branched-chain amino acid transferase (Gonda et al., 2010; Hadfield et al., 2000), which characterize the ripening, aromatic and climacteric melon fruit, as well as other genes associated with ripening components of sugar accumulation (sucrose phosphate synthase) (Burger et al., 2006; Lester et al., 2001) and fruit softening (β-1,3-glucanase) (Nishiyama et al., 2007; Rose et al., 1998). Reciprocally, the highly expressed genes in young fruit libraries included photosynthetic genes (chlorophyll a/b-binding protein and ribulose biphosphate carboxylase) and dehydrin (Porat et al., 2004). Recently, Ando and Grumet (2010) reported the transcriptome of young (8 d) cucumber (Cucumis sativus L.) fruit, and the highly expressed genes reported there overlapped with many of the highly expressed genes of young fruit in our study.
The transcriptomic results presented in this study were obtained from nonamplified, non-normalized cDNA libraries. In general, transcriptome studies that have been recently presented based on deep-sequencing technologies (Ando and Grumet, 2010; Folta et al., 2010; Guo et al., 2010) have relied on cDNA libraries prepared with an amplification step, to overcome the problem of low RNA concentration. This step may affect the relative representation of genes and, recently, the importance of a nonamplified template source for quantitative applications such as RNA-Seq has been emphasized by Wang et al. (2009) and Metzker (2010), so as not to alter the representational abundance of mRNA molecules.
The results presented here are even more significant in light of the fact that, in general, mature fleshy fruit poses a challenge for the preparation of an adequate amount of high-quality RNA (Gasic et al., 2004; Pandit et al., 2007; Shellie et al., 1997). This can also be seen in the RNA-quality data of the different libraries studied, whereby the RNA derived from young fruit was of higher quality than that from mature fruit (Supplemental Table S1). To date, RNA preparation from fruit for next-generation transcriptome sequencing has been reported for immature cucumber fruit (Ando and Grumet, 2010), strawberry (Fragaria ×ananassa Duchesne ex Rozier) (Folta et al., 2010), grape (Vitis vinifera L.) (Bellin et al., 2009) and olive (Olea europaea L.) (Alagna et al., 2009). In all cases, amplification and normalization steps were implemented. With the expected increase in next-generation transcriptome sequencing, the described method and results of this study should serve as a useful tool for the investigation of expression patterns and sequence polymorphisms of agriculturally important crop plants, including many fleshy fruit in general, and cucurbit fruit in particular.
Supplemental Information Available
Supplemental material is available free of charge at http://www.crops.org/publications/tpg.