Previous large-scale expression profiling studies in maize have been performed with multiple oligonucleotide microarray platforms (Barr et al., 2010; Gardiner et al., 2005; Hayes et al., 2010; Liu et al., 2008; Sawers et al., 2007; Sekhon et al., 2011; Skibbe et al., 2009; Wang et al., 2010; Zheng et al., 2010; Zhu et al., 2009). These arrays are limited to measuring known sequences and thus cannot fully sample the transcriptome. Recent advances in sequencing technology have led to improved methods for expression profiling through ultrahigh throughput sequencing of messenger ribonucleic acid (mRNA). The RNA-seq method of whole transcriptome profiling involves converting mRNA into complementary DNA (cDNA), which then is sequenced in a massive and parallel manner involving short reads of 36 to 150 bp (Wang et al., 2009b). Short read mapping algorithms (Langmead et al., 2009; Trapnell et al., 2009) and the ability to convert millions of mapped reads to normalized expression values relative to annotated or unannotated gene models (Trapnell et al., 2010) allow for robust expression estimates in organisms with sequenced genomes. Thus, not only does RNA-seq data provide detailed expression phenotypes for genes but it also is informative for improving the quality of gene model annotation.
Though a few RNA-seq studies in maize have been completed (Eveland et al., 2010; Li et al., 2010; Wang et al., 2009a), none have thoroughly examined diverse reproductive tissues. To explore the maize reproductive transcriptome, we generated 10.5 Gbp of whole transcriptome sequence from 12 reproductive tissues from the reference B73 maize inbred line that represent male, female, and developing seed tissue types as well as leaves to provide a contrasting tissue. The transcriptomes of these diverse tissues represent 31,239 genes (79.2%) from the maize v2 annotation (Maize Genome Sequencing Project, 2011; Schnable et al., 2009). We have analyzed these data in the context of gene family expression, differential expression, tissue restricted expression, gene coexpression networks, and network relationships to previously characterized genes and genes of unknown function. This dataset and subsequent analyses provide an invaluable source of information for the community including gene family classification, tissue specific transcriptional networks, and candidate genes for targeted work in improving maize grain yield.
Materials and Methods
Plant Materials and RNA Isolation
Plants of the reference maize genotype, B73, were grown under field conditions at an average density of 37,000 plants ha−1 in East Lansing, MI, during the summer of 2009. The soil type at this location is Marlette fine sandy loam, 2 to 6% slopes. Before planting, fertilizer 19–19–19 N–P–K was applied at a rate of approximately 454 kg ha−1. Five weeks after emergence, one tablet of slow release fertilizer (15–11–8, Osmocote plus, Scotts, Marysville, OH) was applied to each plant. Thirteen tissues described in Table 1 were harvested and immediately frozen in liquid N. An independent sample of leaf tissue grown under growth chamber conditions (26/24°C day/night temperatures and 12 h photoperiod) was harvested to assess the repeatability of the data. Total RNA was isolated using a modified hot phenol method. One gram of tissue was ground with liquid N and immediately transferred into a mixture of prewarmed phenol (pH 4.3) and extraction buffer (100 mmol LiCl, 100 mmol Tris pH 8.5, 10 mmol ethylenediaminetetraacetic acid, 1% sodium dodecyl sulfate, and 15 mmol dithiothreitol), vortexed, and incubated at 60°C for 20 min. Chloroform:isoamyl alcohol (24:1) was added and the mixture was vortexed and centrifuged at 11,000 rpm for 10 min at 4°C. The aqueous phase was transferred into phenol:chloroform:isoamyl alcohol (25:24:1), vortexed, and centrifuged at 11,000 rpm for 10 min at 4°C. The previous step was repeated twice with phenol:chloroform:isoamyl alcohol (25:24:1) and twice with chloroform:isoamyl alcohol (24:1). The RNA pellet was precipitated with 4 M LiCl, washed with 70% ethanol, and dissolved in diethylpyrocarbonate water. Genomic DNA was removed by deoxyribonuclease (DNase) treatment using the DNA-Free kit (Ambion, Austin, TX). RNA quantity was determined using a Nanodrop (Thermo Scientific, Wilmington, DE), and RNA quality was checked using an Agilent bioanalyzer (Agilent Technologies, Santa Clara, CA).
|Tissue name||Tissue description||Time of harvest|
|Leaves (V2)||Pooled leaves||20 d after sowing|
|Preemergence cob (V15)||Immature cob||10 d before cob emergence|
|Postemergence cob (VT)||Whole cob||Silk emergence|
|Silk (R1)||Mature silk||Anthesis|
|Ovule (R1)||Mature ovule with stigma and style removed||Anthesis|
|Preemergence tassel (V18)||Post meiotic whole tassel including anthers||10 d before tassel emergence|
|Postemergence tassel (VT)||Post meiotic whole tassel including anthers||Tassel emergence from whorl|
|Whole Anthers (VT)||Whole anther including pollen dissected from tassel structural tissue||Anthesis|
|Pollen (VT)||Mature pollen||Anthesis|
|Seed 5 DAP (R1)||Whole seed||5 DAP|
|Seed 10 DAP (R2)||Whole seed||10 DAP|
|Embryo 25 DAP (R4)||Developing embryo||25 DAP|
|Endosperm 25 DAP (R4)||Developing endosperm||25 DAP|
RNA Sequencing Library Construction and Illumina Sequencing
Approximately 5 to 10 μg of total RNA was used for construction of each RNA-seq library. Polyadenylated RNA purification, RNA fragmentation, cDNA synthesis, and polymerase chain reaction (PCR) amplification was performed according to the Illumina RNA-seq protocol (Cat # RS-100-0801, Illumina, Inc., San Diego, CA). Parallel sequencing was performed using an Illumina Genome Analyzer II (Illumina, Inc., San Diego, CA) at the Research Technology Support Facility at Michigan State University (East Lansing, MI) and the Institute for Genome Sciences at the University of Maryland School of Medicine (Baltimore, MD). Single-end sequence reads between 35 and 40 bp were generated, and RNA-seq read quality was evaluated based on the Illumina purity filter, percent low quality reads, and distribution of phred-like scores at each cycle. All data presented passed the quality control filtering based on these metrics. Sequences are available in the Sequence Read Archive at the National Center for Biotechnology Information (Sequence Read Archive study number SRP006463; NCBI, 2011).
RNA Sequencing Data Analysis
Sequence reads for each tissue sample were mapped to v2 of the B73 reference genome (AGPv2; Maize Genome Sequence Project, 2011; Schnable et al., 2009), hereafter referred to as v2 pseudomolecules, using the quality and splice site aware alignment algorithms, Bowtie version 0.12.7 (Langmead et al., 2009) and TopHat version 1.2.0 (Trapnell et al., 2009) in conjunction with SAMtools version 0.1.7 (Li et al., 2009). The minimum and maximum intron length was set to 5 and 60,000 bp, respectively; all other parameters were set to the default values.
Normalized gene expression levels were calculated using Cufflinks version 0.9.3 (Trapnell et al., 2010) and are reported as fragments per kilobase pair of exon model per million fragments mapped (FPKM). The maximum intron length parameter was set to 60,000 bp. A reference annotation was supplied based on the AGPv2 5b filtered gene set (Maize Genome Sequence Project, 2011), hereafter referred to as v2 annotation, and the v2 pseudomolecules were provided to permit the use of the bias detection and correction algorithm. The quartile normalization option was used to improve differential expression calculations of lowly expressed genes. All other parameters were used at the default settings. Raw FPKM values for all tissues are provided (Supplemental Table S1). Cufflinks provides the upper and lower bound FPKM values for the 95% confidence interval of the abundance of each gene. For summary statistics of the number of genes expressed in each tissue and genes with expression restricted to a single tissue within this study, a gene was considered expressed in a given tissue if the FPKM 95% confidence interval lower boundary was greater than zero. If a gene had a lower boundary value greater than zero in only one tissue, that gene was classified as having expression restricted to that tissue within this study. To evaluate the effect of sampling depth on expression estimates, 5, 10, and 15 million reads were randomly selected from the total pool of reads for all tissues, and for tissues with sufficient reads, 20 and 25 million reads were also selected.
Pearson product-moment correlation analyses of log2 FPKM values among RNA-seq libraries were performed using PROC CORR with SAS version 9.2 (SAS Institute, 2003), and all log2 FPKM values less than zero were set to zero. Only tests significant at p = 0.05 are shown. The heat map of correlation values was clustered with hierarchical clustering using a Pearson correlation distance metric and average linkage, and bootstrap support values were calculated from 1000 replicate searches using Multiple Experiment Viewer Software (MeV) version 4.5 (Saeed et al., 2003). Differential expression analysis was conducted using the cuffdiff program within Cufflinks version 0.9.3 (Trapnell et al., 2010) utilizing the mapping results described above. For pairwise gene comparisons between two tissues, cuffdiff first calculates the variance of the transcript abundance in each tissue accounting both for the variation across replicates and the uncertainty in the expression estimate based on mapping uncertainty. When replicates are not available, a Poisson distribution is assumed, and the variation of the count across replicates is equal to the mean of the count. The variance for a gene's abundance is then determined using the variance in abundance for each isoform of that gene. The test statistic T is calculated as the log of the ratio of FPKM values for the two tissues divided by the variance of the log of the ratio. A two-tailed Student's t test is then used to determine if genes are differentially expressed. For additional information on how the differential expression test is conducted see Trapnell et al., 2010. The minimum number of alignments at a gene required to test was set to 100. Quartile normalization and a false discovery rate of 0.01 after Benjamini-Hochberg correction for multiple testing were used. The v2 pseudomolecules and v2 annotation were provided as input parameters. All other parameters were used at the default levels.
Wiggle tracks were generated to visualize read coverage across the genome using the wiggles program within TopHat version 1.2.0 (Trapnell et al., 2009). The wiggle tracks were displayed on the Integrated Genome Browser version 6.4.1 (Nicol et al., 2009) using the GFF file for the v2 annotation from maizesequence.org (Maize Genome Sequence Project, 2011).
RNA Sequencing and Microarray Comparative Analyses
For correlation analyses of RNA-seq to microarray, gene expression data from an extensive maize developmental tissue atlas (Sekhon et al., 2011) was used. Because this microarray data set was analyzed against version 1 (v1) of the reference genome (Schnable et al., 2009), FPKM values for the 13 tissues in this dataset were determined using the v1 pseudomolecules (Mazie Genome Sequence Project, 2011) and the 4a.53 filtered gene set (Maize Genome Sequence Project, 2011), hereafter referred to as v1 pseudomolecules and v1 annotation, with the same parameters as described above. For the RNA-seq to microarray comparative analysis only, genes with more than one predicted isoform were removed from both RNA-seq and microarray datasets. The remaining genes present in both datasets were used for downstream analyses. The FPKM values and probe intensities were log2 transformed, and Spearman rank correlations were performed using PROC CORR (SAS Institute, 2003). Only tests significant at p = 0.05 are shown. Spearman rank correlations, rather than Pearson correlations, were used due to the discrepancy in scale between the platforms.
Identification and Analysis of Paralogous Gene Families
Paralogous gene families were identified by OrthoMCL (Chen et al., 2007; Li et al., 2003) with default parameters. To eliminate the situation where alternative isoforms of one gene are grouped in different clusters, only peptide sequences of the representative gene models of the v2 annotation were used. A representative gene model was defined as the gene model that produces the longest peptide sequence among all alternative isoforms of a gene. OrthoMCL gene family assignments for all maize genes are provided (Supplemental Table S1). Pearson product-moment correlations were performed for log2 FPKM values of all gene pair combinations within the same gene families using a custom java script, and a random distribution of Pearson correlation coefficients (PCCs) was generated from correlations of 100,000 gene pairs not in the same gene family. The heat map of FPKM values was generated with MeV version 4.5 (Saeed et al., 2003).
Gene Coexpression Network Analysis
Gene expression values from the 13 tissues in this dataset were used to generate gene networks and to identify modules of highly correlated genes using the weighted gene coexpression network analysis method (WGCNA) (Zhang and Horvath, 2005). All gene FPKM expression values were log2 transformed, and any log2 FPKM values less than 1 were set to 0. Genes without variation across tissues were filtered out using a coefficient of variation (CV = σ/μ) cutoff; genes with a CV less than 1.0 were discarded from further network analysis. An R version 2.10.1 (R Development Core Team, 2011) implementation of the WGCNA was then used to identify gene modules (Langfelder and Horvath, 2008). Signed networks were constructed using Pearson correlation values for all filtered gene expression values. The β and treecut parameters that are used within WGCNA were chosen as 13 and 0.7, respectively. Eigengenes were calculated using the function moduleEigengenes within the WGCNA package (Langfelder and Horvath, 2007). Eigengenes are the first right-singular vector of the singular value decomposition of the module expression data. The heat map of eigengenes for each gene module was constructed using MeV version 4.5 (Saeed et al., 2003). Previously characterized maize genes that have been associated with the maize v2 annotation (Maize Genome Sequence Project, 2011; Schnable and Freeling 2011) were crossreferenced with the genes from each coexpression module.
Results and Discussion
Maize Tissue and RNA Sequencing Data Descriptions
The tissues selected for RNA-seq profiling represent different organ types as well as samples of some tissues at multiple developmental time points. The tissues used in this experiment represent four larger groups: male tissues (pre- and postemergence tassels, whole anthers, and pollen), female tissues (pre- and postemergence cobs, mature silks, and ovules), seed tissues (whole seed 5 and 10 d after pollination [DAP], embryo, and endosperm 25 DAP), and a vegetative tissue (leaves) (Table 1). Leaves were included to provide a basis for comparison of reproductive and nonreproductive tissue. This dataset provides an in-depth survey of the maize transcriptome in reproductive tissues.
The potential for biases in comparisons between the libraries was determined based on the total number of purity-filtered reads, the percentage of reads mapped to the v2 pseudomolecules, and the relationship between read depth and tissue complexity as measured by the number of expressed genes. Purity filtered reads ranged from 17.1 to 29.9 million reads per sample and the percentage of reads mapped per library ranged from 84.6 to 90.6% (Table 2) indicating a similar performance of library construction and sequencing across the samples. In addition, no relationship was observed between number of genes detected as expressed and number of reads sampled across a range of tissues (Supplemental Fig. S1). Randomly selected subsets of the total pool of reads for each tissue ranging from 5 to 25 million reads were used to further evaluate the effect of sampling depth on gene expression detection (Supplemental Fig. S2). The simulation demonstrates a clear positive relationship between sampling depth and numbers of expressed genes at lower sequencing depths (5 to 15 million reads). The number of expressed genes, however, begins to plateau at approximately 15 million reads, corresponding to the minimum sampling depth of all libraries in this study. To assess the repeatability of this data in an independent experiment, RNA from an additional leaf sample at approximately the same developmental time point as the field grown leaf sample was also sequenced. The correlation between the two leaf samples was R2 = 0.90 providing evidence for the repeatability of this dataset.
|Tissue||Number of purity filtered reads||Number of mapped reads||Maximum FPKM||Number of expressed genes||Number of tissue restricted genes|
|Leaves||17.1 million||14.5 million||128,735||21,956||492|
|Preemergence cob||18.3 million||16.0 million||17,256||23,338||70|
|Postemergence cob||18.3 million||15.8 million||17,784||23,948||44|
|Silk||18.9 million||16.8 million||257,614||23,015||138|
|Ovule||29.9 million||26.3 million||33,621||25,003||108|
|Preemergence tassel||19.8 million||17.0 million||24,576||25,165||88|
|Postemergence tassel||18.6 million||16.1 million||35,766||24,984||42|
|Whole anthers||27.0 million||24.0 million||127,465||22,178||96|
|Pollen||27.9 million||25.3 million||1,004,070||13,418||206|
|Seed 5 DAP||19.2 million||16.5 million||22,966||24,390||37|
|Seed 10 DAP||26.4 million||22.9 million||26,927||24,486||105|
|Embryo 25 DAP||19.9 million||17.0 million||36,127||22,493||188|
|Endosperm 25 DAP||23.6 million||20.3 million||283,698||22,887||251|
To further assess the quality of our datasets, we evaluated the expression of eight known genes with characterized expression patterns. As expected based on previous reports, opaque endosperm2 and shrunken-2 were expressed nearly exclusively in the endosperm 25 DAP tissue (Bhave et al., 1990; Schmidt et al., 1990), leafy cotyledon1 was expressed predominantly in the embryo 25 DAP tissue (Lotan et al., 1998), NADP malic enzyme3 was highly expressed in the leaves (Ku et al., 1996), branched silkless1, although at low levels, was expressed in pre- and postemergence cob tissues (Chuck et al., 2002), male gametophyte specific1 and tassel serine threonine kinase1 were both highly expressed in all of the male tissues (Fu et al., 2001; Hamilton et al., 1989), and waxy1 was expressed in endosperm 25 DAP, pollen, ovule, and floral structures containing these tissues (Nelson and Rines, 1962) (Supplemental Table S2). Thus, our tissues, resulting libraries, and expression sets are consistent with previous studies.
RNA Sequencing Transcriptome Profiles
Summary statistics revealed transcriptional diversity across reproductive tissues. In all tissues, the minimum FPKM values were zero while maximum FPKM values ranged from 17,256 in preemergence cob to 1,004,070 in pollen (Table 2). Intriguingly, phenotypically unique tissues such as mature silk, pollen, and endosperm 25 DAP had the most extreme range of expression levels. Of the top five most highly expressed genes in pollen, two encode a pollen-specific protein C13 (GRMZM2G022347 and GRMZM2G317406), one encodes an arabinoglucan protein 6 (GRMZM2G175243), and the remaining two encode conserved genes of unknown function (GRMZM2G041448 and GRMZM2G073155). Tissues with extreme maximum FPKM values were also extreme for other summary metrics. For example, pollen had the fewest number of genes expressed among all tissues with only 13,418 genes expressed (Table 2) consistent with a previous microarray study that showed only 10,539 transcripts expressed in mature pollen (Ma et al., 2008). Similarly, within reproductive tissues, endosperm 25 DAP had the most genes with expression restricted to one tissue type, that is, genes with expression in only one of the tissues included in this study. In contrast to reproductive tissues, the leaf sample had the most genes overall with expression restricted to a single tissue within this study (Table 2).
These differences in transcriptome profiles were reflected in correlation and cluster analyses among tissues (Fig. 1). Overall, PCC values ranged from 0.33 (leaves vs. pollen) to 0.97 (preemergence cob vs. postemergence cob). As expected, leaf tissue was poorly correlated with most reproductive tissues (PCC < 0.65) and showed the highest correlation with mature silks (PCC = 0.68). Among female tissues, the two stages of cob and ovule were all highly correlated with each other, and mature silk was more similar to developing seed than to female parts. Correlations greater than 0.73 were observed among tassels and anthers, which is not surprising given that anthers were included in the tassel samples. Pollen had the most unique transcriptome with PCC < 0.5 in all comparisons to nonmale tissues. Finally, among seed tissues, the early development stages (whole seed 5 and 10 DAP) showed high correlation to each other (PCC = 0.95) but lower correlations to endosperm 25 DAP (PCC = 0.78 and 0.80) and to developing embryo (PCC = 0.74 and 0.75). Interestingly, the relatively high correlations observed among cob, seed, ovule, and embryo tissues (PCC > 0.80) indicate a substantial subset of shared gene expression patterns in these tissues despite the cob being exclusively maternally derived and the seed and embryo being both maternally and paternally derived.
Comparative Analysis between RNA Sequencing and Microarray Data
We compared our RNA-seq data with a similar, microarray-based genome-wide expression dataset from maize (Sekhon et al., 2011). The microarray study assayed gene expression in 60 tissues including 11 that are developmentally similar to our RNA-seq tissues; pollen and postemergence tassel were the two samples in the RNA-seq dataset without similar tissues in the microarray data. Because the microarray data analysis used v1 annotation, we used FPKM values for the RNA-seq libraries calculated relative to this gene set as well. Only representative isoforms were used and all genes with more than one predicted isoform were excluded from the analysis. The resulting intersection of genes without predicted isoforms and present in both the microarray and RNA-seq analyses was 17,573 or 54% of the v1 annotation.
A nonparametric correlation statistic was used to compare the RNA-seq and microarray expression datasets. Spearman rank correlation coefficients (SCCs) of log2 expression values were calculated between 11 corresponding tissue pairs in the two datasets as well as for three dissimilar tissue pairs as negative controls (Table 3). Among the 11 comparisons between similar tissues (Table 3), the SCC values ranged from 0.68 to 0.82. The highest correlations (SCC > 0.8) were observed between the two stages of cob and in tissues where the developmental stages were systematically measured as DAP, including whole seed and embryo. Lower SCC values (SCC < 0.8) were observed in leaves, silk, ovule, tassel, anther, and endosperm comparisons where developmental stages and tissue types were slightly different between the studies. Overall, correlation coefficients between analogous tissues (0.68 to 0.82) were greater than comparisons between dissimilar tissues (0.30 to 0.56) indicating that similar expression patterns for this subset of maize genes were detected by the two different platforms.
|RNA-seq library||Microarray experiment||SCC||Tissues types|
|Leaves (V2)||First leaf (V3)||0.76||Similar|
|Preemergence cob (V15)||Immature cob (V18)||0.82||Similar|
|Postemergence cob (VT)||Prepollination cob (R1)||0.81||Similar|
|Silk (R1)||Silks (R1)||0.78||Similar|
|Ovule (R1)||Whole seed 2 DAP (R1)||0.79||Similar|
|Preemergence tassel (V18)||Meiotic tassel (V18)||0.68||Similar|
|Whole anthers (VT)||Anthers (R1)||0.74||Similar|
|Whole seed 5 DAP (R1)||Whole seed 4 DAP (R1)||0.80||Similar|
|Whole seed 10 DAP (R2)||Whole seed 10 DAP (R2)||0.81||Similar|
|Embryo 25 DAP (R4)||Embryo 24 DAP (R4)||0.82||Similar|
|Endosperm 25 DAP (R4)||Endosperm 24 DAP (R4)||0.74||Similar|
|Pollen (VT)||Silks (R1)||0.30||Dissimilar|
|Leaves (V2)||Endosperm 24 DAP (R4)||0.51||Dissimilar|
|Embryo 25 DAP (R4)||Anthers (R1)||0.56||Dissimilar|
Scatter plots of log2 expression values from various tissue pairs are shown in Fig. 2. These plots illustrate a primary difference between the two expression platforms, which is the broader range of expression detected by RNA-seq compared to microarray (Agarwal et al., 2010) as seen in the low-end expression values between the platforms (Fig. 2). We observed RNA-seq values near zero (FPKM = 1) in the entire range of microarray intensity values similar to previous comparisons of the two technologies (Marioni et al., 2008). This could be caused by cross-hybridization of the oligonucleotide probes to multiple genes and/or background noise in the microarray experiments. This limitation of the microarray platform prohibits accurate detection of expression values for genes belonging to paralogous gene families, which are predominant in plant genomes (Spannagl et al., 2011), underscoring an advantage of RNA-seq for characterization of complete transcriptomes.
Gene Expression in Paralogous Gene Families
To investigate paralogous gene expression patterns in reproductive tissues, we constructed paralogous gene family groups using the OrthoMCL algorithm (Chen et al., 2007; Li et al., 2003) (Supplemental Table S1). Of the 39,456 predicted genes in the v2 annotation, approximately 85% (33,457) were assigned to 4975 paralogous gene families of two or more members. The distribution of gene family size (Fig. 3A) shows that a large number of gene families (1829) have two members reflective of an ancient whole genome duplication in maize and previously identified nearly identical paralogs (Schnable et al., 2011, 2009). The remaining gene families (3029) contain 3 to 30 members with 117 gene families containing >30 members. The largest gene family includes 1435 genes, many of which have annotations related to phosphorylation or protein kinase activity. For all genes, we compared qualitative gene expression levels in our RNA-seq dataset with assignment to gene families (Fig. 3B). Of the 6161 single copy genes, 68% had measurable expression in one or more tissues compared to 81% of paralogous genes. Genes classified as not expressed in any of our tissues (8217 total) may be expressed in tissues not represented in our study or expressed below our detection or sampling limits, annotation artifacts, or induced by external stimuli.
It is hypothesized that genes in large gene families exhibit more diversified expression patterns than genes in smaller gene families due to evolutionary mechanisms such as subfunctionalization, neofunctionalization, or pseudogenization (Force et al., 1999; Yim et al., 2009). To assess the similarity of expression patterns among gene family members in the context of flower and seed development, we calculated pairwise Pearson correlations of log2 FPKM expression values for all gene combinations within each family. Distributions of PCC values (Fig. 3C) show that as many as 65% of within-family gene pairs are correlated at ≥0.5 in two-member families. This proportion steadily decreases as gene family size increases thus supporting this hypothesis. In the largest gene families tested (gene family size of 21–30 genes), 31% of gene pairs had PCC values greater than or equal to 0.5, which is higher than expected among random correlations (15%; Fig. 3C). This suggests that there are subsets of genes within all sizes of gene families that have retained conserved expression patterns in developing flower and seed tissues.
An example of divergent gene family expression in male reproductive tissues is a 15-member gene family annotated as a rapid alkalization factor (RALF; Fig. 3D). Five members show high levels of expression in pollen consistent with previously described roles for these proteins (Covey et al., 2010; Zhang et al., 2010) while the remaining members have low levels or complete lack of expression in pollen. These results demonstrate the utility of RNA-seq for deciphering complex expression patterns of large gene families in plant transcriptomes. More specifically, this dataset is valuable for identifying particular genes and gene family members involved with reproductive organ development in maize.
Differential Expression and Tissue Restricted Expression Across Tissues
To understand the global transcriptional profile differences of functionally diverse maize tissues, we characterized differential expression of genes expressed across multiple tissues, such as GRMZM2G019404, which encodes a plasma membrane adenosine triphosphatase (ATPase) (Fig. 4A) that is more expressed in leaves, preemergence tassel, and whole seed 5 DAP compared to all other tissues. Understanding differences in expression patterns between tissues provides insights into what makes these tissues unique. The number of differentially expressed genes between tissues ranged from 2653 (∼6.7% of the genes in the v2 annotation) for preemergence cob versus postemergence cob to 18,962 (∼48.1%) for pollen versus ovule (Table 4). Not surprisingly, the three comparisons with the fewest number of differentially expressed genes were those between tissues with repeated measures including cob (pre- and postemergence), tassel (pre- and post emergence), and whole seed (5 and 10 DAP). Comparisons between compound tissues and their component tissues such as postemergence cob versus ovule and ovule versus whole seed 5 DAP were also similar. For example, expression of a putative polyphenol oxidase is restricted to multiple female tissues including silk, ovule, and postemergence cob (Fig. 4B), consistent with its proposed role in oxidation of reproductive tissues after wounding (Sukalovic et al., 2010). Interestingly, 8 of the 10 pairwise tissue comparisons with the highest number of differentially expressed genes included pollen and the remaining two included whole anthers, a compound tissue that contains pollen (Table 4). This set of genes included a putative subtilisin-like protease (GRMZM2G006366; Fig. 4C) that has a rice (Oryza sativa L.) homolog known to be involved in anther development (Yoshida and Kuboyama, 2001). Due to limitations in the algorithm used for testing differential expression, genes with an expression value of zero in one of the tissues could not be tested. As pollen has fewer genes expressed than any other tissue, it also had the fewest tests conducted. However, it had the highest (89%) percentage of differentially expressed genes across all pairwise comparisons with the other 12 tissues. Whole anthers was the next highest tissue with 77% significant tests; the tissue with the lowest percentage of significant tests was whole seed 5 DAP with only 57%. These results lend support to the finding that for the few genes that are expressed in pollen relative to the other tissues (Table 2), the global expression patterns of those genes are grossly different than that in the other vegetative and reproductive tissues.
|Preemergence cob||Postemergence cob||Silk||Ovule||Preemergence tassel||Postemergence tassel||Whole anthers||Pollen||Seed 5 DAP||Seed 10 DAP||Embryo 25 DAP||Endosperm 25 DAP|
|Leaves||14,541 (75%)||14,195 (73%)||12,547 (69%)||14,930 (70%)||13,361 (70%)||12,795 (70%)||13,510 (83%)||13,578 (90%)||13,294 (69%)||14,022 (70%)||14,551 (77%)||13,793 (76%)|
|Preemergence cob||2653 (15%)||12,202 (63%)||8393 (41%)||13,677 (67%)||14,184 (70%)||15,664 (80%)||16,942 (90%)||10,869 (56%)||11,365 (57%)||9186 (50%)||12,527 (66%)|
|Postemergence cob||11,543 (60%)||6493 (32%)||12,869 (63%)||13,460 (67%)||15,494 (79%)||16,972 (90%)||9496 (50%)||10,519 (53%)||8689 (47%)||11,953 (63%)|
|Silk||11,049 (53%)||10,809 (56%)||11,123 (58%)||14,145 (76%)||15,966 (89%)||8781 (47%)||8694 (45%)||13,123 (69%)||12,623 (68%)|
|Ovule||12,263 (57%)||12,961 (60%)||16,316 (76%)||18,962 (90%)||8296 (40%)||10,472 (50%)||11,189 (54%)||12,649 (61%)|
|Preemergence tassel||3522 (19%)||12,514 (68%)||16,117 (88%)||10,160 (51%)||11,425 (56%)||13,691 (68%)||12,444 (64%)|
|Postemergence tassel||11,301 (66%)||14,942 (87%)||10,743 (55%)||11,972 (58%)||14,059 (71%)||12,891 (68%)|
|Whole anthers||9386 (83%)||14,811 (76%)||15,470 (76%)||15,161 (81%)||14,293 (80%)|
|Pollen||17,029 (90%)||17,886 (90%)||15,841 (90%)||15,275 (90%)|
|Seed 5 DAP||4810 (25%)||12,169 (63%)||12,083 (64%)|
|Seed 10 DAP||12,930 (65%)||12,533 (64%)|
|Embryo 25 DAP||11,088 (62%)|
In addition to differential expression between tissues, tissue exclusive or restricted gene expression is involved in distinguishing form and function of cells and ultimately whole tissues (Aoyama et al., 1995; Lafos et al., 2011; Lauter et al., 2005). Leaves had the most genes (492) with expression restricted to a single tissue within this study (Table 1; Supplemental Table S3). Excluding leaves, endosperm 25 DAP (251 genes), pollen (206 genes), and embryo 25 DAP (188 genes) had the most genes with expression restricted to a single tissue within this study (Table 1). An example of endosperm restricted expression within the tissues in this study is shown with GRMZM2G118205, which encodes a polycomb protein (Fig. 4D) known to be involved with imprinting processes in maternal tissues (Jullien and Berger, 2009). These differentially expressed genes and genes with expression restricted to a single tissue within this study are strong candidates for further analysis to resolve the molecular basis of maize flower and grain development.
Gene Coexpression Networks in Maize Reproductive Tissues
Gene network analysis is often performed with gene expression data to identify possible relationships between genes. Typically, gene networks are created based on correlation values derived from gene expression levels across a series of treatments or conditions. Although definitive interactions cannot be concluded from correlation analyses, gene networks do often reflect true biological relationships (Borghi et al., 2010; Ma et al., 2007; Mao et al., 2009; Mutwil et al., 2010; Ren et al., 2010; Xin et al., 2009). Here, we have used WGCNA to generate a correlation-based gene coexpression network and to identify modules of genes where all members of a module are more highly correlated with each other than they are to genes outside the module (Zhang and Horvath, 2005).
The 8751 genes that passed the CV filter, and thus have variation across tissues, were used for network analysis. Of the 8751 genes, a total of 6347 were assigned to 17 gene modules that contained between 49 and 1645 genes each and 2404 genes were not assigned to any module (Fig. 5; Supplemental Table S1). To visualize the expression patterns of the genes in each module, eigengenes for each module were calculated and displayed in a heat map (Langfelder and Horvath, 2007) (Fig. 5). The eigengene analysis indicates that each coexpression module represents a set of genes that are strongly expressed in only one or, at most, a few developmentally related tissues. The gene coexpression modules suggest relationships between larger tissue subgroups described in previous sections (Table 1; Fig. 1). For example, module 13 contains a large number of genes that are expressed in pre- and postemergence cobs as well as ovules and 25 DAP embryos. The coexpression analyses also highlight notable differences between similar tissues. Module 9 and module 10 contain relatively small numbers of genes that are almost exclusively expressed in either preemergence or postemergence cobs. Module 8 is the largest module with 1645 genes expressed in all male tissues sampled here while modules 2 through 5 each contain many fewer genes that are specifically expressed in preemergence tassels, postemergence tassels, anthers, and pollen, respectively.
A total of 121 previously characterized maize genes were found within 13 gene modules (Supplemental Table S4), and many are known to have tissue-specific expression similar to the patterns seen in the modules. For example, pollen extensin-like1, a pollen-specific cell wall extension gene (Rubinstein et al., 1995), was identified within module 8 that contains genes that are expressed in pollen containing male tissues. The gene viviparous1 codes for a transcription factor involved in seed development and maturation (Suzuki et al., 2003) and is a member of the 25 DAP embryo-specific module 16. Module 9 contains genes that are strongly expressed in preemergence cob tissue including the ramosa1 gene, which is a transcription factor that regulates early inflorescence development (Gallavotti et al., 2010). The identification of a number of known genes in coexpression modules that recapitulate expected expression patterns of those genes provides evidence for the biological relevance of these gene modules. Furthermore, the network analysis performed here provides a powerful tool for identifying candidate genes involved in related developmental pathways.
Ribonucleic acid sequencing-based expression profiling provides a unique opportunity to explore the transcriptional diversity among tissues on a global level. In this study, we have explored RNA-seq based transcriptional profiles for 12 reproductive tissues and one vegetative tissue in maize. Dramatic differences in whole transcriptome expression patterns were observed between these tissues, in particular for pollen, a tissue that is known to be unique both in terms of form and function.
In a comparative analysis with a previous microarray based expression study, we demonstrated the relative advantages of RNA-seq including a broader range of expression values and the ability to differentiate expression of paralogous genes. The capacity to distinguish members of genes families is of critical importance in plant species such as maize where the majority of genes are in paralogous families. Through the use of RNA-seq we were able to identify diverged expression patterns and genes with expression restricted to a single tissue as determined in this study within these families.
Using gene coexpression network analysis, we identified modules of genes that are expressed in very specific stages of floral development as well as genes that are more generally expressed across multiple male, female, or seed tissues. These analyses, as well as the differential gene expression and tissue restricted expression analyses, provide a core set of candidate genes for future analysis to further characterize maize reproductive development and grain yield potential.
Supplemental Information Available
Supplemental material is available free of charge at http://www.crops.org/publications/tpg.