Abstract
The appropriate timing of bolting and flowering is one of the keys to the reproductive success of Isatis indigotica. Several flowering regulatory pathways have been reported in plant species, but we know little about flowering regulatory in I. indigotica. In the present study, we performed RNA-seq and annotated I. indigotica transcriptome using RNA from five tissues (leaves, roots, flowers, fruit, and stems). Illumina sequencing generated 149,907,857 high-quality clean reads and 124,508 unigenes were assembled from the sequenced reads. Of these unigenes, 88,064 were functionally annotated by BLAST searches against the public protein databases. Functional classification and annotation assigned 55,991 and 23,072 unigenes to 52 gene ontology (GO) terms and 25 clusters of orthologous group (COG) categories, respectively. A total of 19,927 unigenes were assigned to 124 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and 80 candidate genes related to plant circadian rhythm were identified. We also identified a number of differentially expressed genes (DEG) and 91 potential bolting and flowering-related genes from the RNA-seq data. This study is the first to identify bolting and flowering-related genes based on transcriptome sequencing and assembly in I. indigotica. The results provide foundations for the exploration of flowering pathways in I. indigotica and investigations of the molecular mechanisms of bolting and flowering in Brassicaceae plants.
Isatis indigotica, a biennial herbaceous plant belonging to the Brassicaceae family, is distributed widely across the Chinese mainland. The dried leaves and roots of the plant, which are also known as “Ban-Lan-Gen” and “Da-Qing-Ye” in China, have commonly been used as medicines for hundreds of years to treat mumps, febrile diseases, eruptive diseases, sore throat, and inflammatory diseases (National Pharmacopoeia Committee, 2015). Bolting and flowering are crucial developmental stages in the life cycle of I. indigotica. Premature bolting limits vegetative growth and reduces yield and quality of medicinal products of I. indigotica.
The formation of bolting and flowering is of great importance in plant life cycle and marks the transition from vegetative growth to reproductive development. A proper bolting and flowering time is crucial for reproductive success and high productivity (Amasino and Michaels, 2010; Srikanth and Schmid, 2011). The interplays of central flowering genes regulate the transition from vegetative growth to reproductive development (Moon et al., 2005; Parcy, 2005). The genes involved in flowering control have been characterized in Arabidopsis thaliana and have been integrated into multiple flowering pathways, including vernalization, aging, photoperiod, the gibberellin (GA), and autonomous pathways (Amasino and Michaels, 2010; Srikanth and Schmid, 2011; Wang, 2014). In recent years, functional genes and regulatory pathways related to flowering time have also been investigated in many crops, including maize [Zea mays (Dong et al., 2012)], rice [Oryza sativa (Shrestha et al., 2014; Tsuji et al., 2011)], strawberry [Fragaria ×ananassa (Sánchez-Sevilla et al., 2014)], and soybean [Glycine max (Jung et al., 2012)].
RNA-seq is an important tool for obtaining transcripts from certain plant tissues under specific physiological conditions or at specific developmental stages (Strickler et al., 2012; Wang et al., 2009). Illumina (San Diego, CA) sequencing techniques have increased gene discoveries in the life sciences in recent years (Zhang et al., 2013). RNA-seq technology has been extensively used in transcriptome analyses of model plant species, such as A. thaliana (Zhu et al., 2013), maize (Dukowic-Schulze et al., 2014; Thakare et al., 2014), soybean (Stamm et al., 2014), rice (Huang et al., 2014), and non-model plant species, such as wild strawberry [Fragaria vesca (Mouhu et al., 2009)] and radish [Raphanus sativus (Nie et al., 2016)]. Tang et al. (2014) conducted transcriptome sequencing of I. indigotica by using Illumina technology to determine the genes involved in the biosynthesis of the active ingredient and its derivatives at the vegetative growth stage. There is still a lack of studies of the molecular mechanisms of bolting and flowering regulation of I. indigotica, and further transcriptome sequencing studies of I. indigotica at reproductive stages are needed.
In this study, Illumina technology was used to sequence mRNAs of I. indigotica from various tissues (flowers, leaves, stems, roots, and fruit) to investigate the molecular mechanisms of bolting and flowering regulation of I. indigotica. On the basis of unigene assembly and analysis of DEGs, several candidate genes related to the flowering pathway were analyzed by quantitative real time polymerase chain reaction (qRT-PCR). The results of this study will enhance our understanding of the bolting and flowering-time regulatory networks in I. indigotica and provide clues for further investigations on the molecular genetic mechanisms underlying bolting and flowering regulation in the Brassicaceae plants.
Materials and Methods
Plant materials and RNA isolation.
The experiment was carried out at the greenhouse of Nanjing Agricultural University (Nanjing, China). The seeds of I. indigotica cultivar Bozhou was sown in plastic pots (29.6-cm diameter, 19.7-cm height) containing a compost of humus and vermiculite at a ratio of 1:1 on 8 Sept. 2015. Plants were grown under a natural light condition at 25 °C maximum and −2 °C minimum with 60% to 75% air humidity. The plants were irrigated with distilled water once in every 5 d until they were sampled. The plants were sampled when some fruit appeared and flowering was still lasting. A total of 500 mg of different tissues of I. indigotica, including roots, stems, leaves, fruit, and flowers were collected on 7 Apr. 2016 and then stored at −80 °C until analyzed. Three biological replicates from distinct plants were harvested for each tissue. Total RNA was extracted from each tissue using the Trizol plus kit (Biouniquer, Nanjing, China) and treated with DNase I to remove contaminated DNA. We used the 2100 Bioanalyzer to analyze the quality and integrity of the DNase I–treated RNA (Agilent Technologies, San Francisco, CA).
RNA-seq library construction and sequencing.
mRNAs were purified from total RNA using the Oligotex mRNA Midi Kit (Qiagen, Dusseldorf, Germany) and quantified using a spectrophotometer (Nano-Drop 2000; Thermo Scientific, Waltham, MA) and used to generate the cDNA library according to the Illumina manufacturer’s instructions. Briefly, fragmentation buffer was added to interrupt mRNA to short fragments. Random hexamer primers were added to these short fragments to synthesize the first-strand cDNA. The second-strand cDNA was synthesized using the Super-Script double-stranded cDNA synthesis kit (Invitrogen, Carlsbad, CA) and purified with a QiaQuick PCR extraction kit (Qiagen).
De novo assembly of transcriptome.
Transcriptome de novo assembly was conducted using the Trinity program based on the de Bruijn graph algorithm (Grabherr et al., 2011). Clean reads were first assembled to form longer fragments named contigs. Based on the paired-end reads, different contigs from the same transcript sequences and the distance among them were detected and calculated. These contigs were then further assembled by the Trinity to generate unigenes which have no extension on either end. To quantify gene expression abundance, fragments per kilobase per transcript per million mapped reads (FPKM) was used (Mortazavi et al., 2008). The expression level of each unigene was calculated with the formula: FPKM = (106 × C × 103)/NL, where C is the number of reads that uniquely aligned to a certain unigene, N is the total number of reads that uniquely aligned to all unigenes, and L is the number of bases on this unigene.
Functional annotation and classification.
Unigenes were annotated against the publicly available protein databases NCBI nonredundant protein [NR (Polashock et al., 2010)], Swiss-Prot (Sato et al., 2006), COG (Natale et al., 2000), GO (Boyle et al., 2004), protein family [Pfam (Bateman et al., 2002)], euKaryotic ortholog groups [KOG (Li et al., 2003)], and KEGG (Wixon and Kell, 2000) using BLASTx with an E-value cutoff of 1.0E−05. Based on the BLAST results, the coding sequence of unigenes were determined based on their orthologous proteins. If a unigene did not have a hit in any database, ESTScan software was used to find potential coding regions (Iseli et al., 1999), including the nucleotide (5′–3′) and amino acid sequences of the coding regions. Based on the NR annotations, the Blast2GO program was used to gain GO annotations for the unigenes using a cutoff value of 1.0E−05 at the second level according to molecular functions, cellular components, and biological processes (Conesa et al., 2005). GO functional classification and distribution of gene functions of each assembled unigene were performed using WEGO software at the macro level (Ye et al., 2006).
Differential expression analysis.
RNA-seq by expectation maximization, which allows for the assessment of transcript abundances based on the mapping of RNA-seq reads to the assembled transcriptome, was used for transcript abundance estimation of the de novo–assembled transcripts (Li and Dewey, 2011). Differential expression analysis of two groups was performed using the DEGseq R package (1.10.1) (Wang et al., 2010). The false discovery rate (FDR) method was used to determine the threshold probability value in multiple tests. The probability values were adjusted using the Benjamini–Hochberg procedure (Benjamini and Hochberg, 1995). If the FDR was smaller than 0.05 (FDR ≤ 0.01) and the absolute value of the log2 fold change was ≥2, the unigene was considered to be a significant DEG. The Pearson correlation coefficient was calculated among the five samples according to the gene expression profiles.
GO enrichment analysis of the DEGs was implemented by the GOseq R packages based on the Wallenius noncentral hypergeometric distribution (Young et al., 2010). The numbers of all DEGs (upregulated and downregulated) were calculated for each GO term. In addition, KOBAS software was used to test the statistical enrichment of DEGs in KEGG pathways. The enrichment factor, which means the ratio of the DEGs number and the number of genes that have been annotated in this pathway was used to represent enrichment intensiveness (Mao et al., 2005).
RT-PCR analysis.
qRT-PCR was performed to validate the relative expression levels of several flowering genes in different I. indigotica tissues. Total RNA from various tissue samples was isolated and reverse transcribed into cDNA using the SuperScript III First-Strand Synthesis System (18080400; Invitrogen) following the manufacturer’s protocol. The unigene specific primers for qRT-PCR were designed using Primer Premier 5 (PREMIER Biosoft Intl., Palo Alto, CA). Three biological replicates were subjected to the qRT-PCR assay, which was carried out using the SYBR green methodology and the ABI 7500 real-time PCR system (Applied Biosystems, Carlsbad, CA) with Actin as the reference gene. The relative fold expression changes were calculated using the 2−ΔΔCT method (Livak and Schmittgen, 2001).
Results
Transcriptome sequencing and de novo transcriptome assembly.
RNA-seq was performed using mRNA isolated from leaves, roots, flowers, fruit, and stems of I. indigotica in the reproductive stage to obtain a complete set of I. indigotica transcripts. A total of 149,907,857 high-quality clean reads consisting of 44.97 Gbp of sequences were obtained after removing adapter sequences, low-quality reads, and ambiguous reads. Flowers and roots had the largest (33,718,037) and the fewest (26,789,893) number of clean reads, respectively (Table 1). The numbers of the reads for leaf and stem were close and were 28,623,805 and 28,825,056, respectively. There were 31,951,066 reads for fruit. These reads were used to assemble transcripts by using Trinity (Platel and Jain, 2012). A total of 176,074 transcripts with a median size (N50) of 1338 bp and an average length of 940.22 bp were generated (Table 2); 124,508 unigenes were obtained with an N50 of 1027 bp and an average length of 802.70 bp. Length distributions of assembled transcripts and unigenes were presented in Fig. 1. About 52% of the 124,508 unigenes were longer than 500 bp, and 8644 unigenes (6.94%) were longer than 2 kb (Table 2).
Summary of the sequencing data from the five tissues of Isatis indigotica.
Summary of the transcriptome assembly of Isatis indigotica.
Functional annotation ofI. indigotica transcriptome.
To assign functional annotations of all the assembled unigenes, the sequence similarity search was performed against the NR, Swiss-Prot, COG, GO, Pfam, KOG, and KEGG public protein databases using a cutoff E-value of 1.0E−5. Consequently, a total of 88,064 unigenes (70.73% of all unigenes) found matches in at least one of the public databases (Table 3; Supplemental Table 1). Further blast searches showed that 23,072, 55,991, 50,154, 55,409, 19,927, 55,189, and 80,287 unigenes had a match in the COG, GO, Pfam, KOG, KEGG, Swissprot, and NR databases, respectively.
Summary of functional annotation for Isatis indigotica unigenes.
Sequence similarity search against the NCBI NR database detected 80,287 annotated sequences (91.17% of all unigenes) that shared significant identity with known proteins (Table 3). The homologous species distribution is shown in Fig. 2. A total of 19.06% of the annotated sequences had matches to genes from Eutrema salsugineum, followed by Brassica napus (14.97%), Brassica rapa (12.89%), and Camelina sativa (6.28%) (Fig. 2). The top-hits were mostly Brassicaceae species, which indicated that the assembly and annotation of I. indigotica transcriptome are reliable.
GO and COG classification.
To further explore the functions of the I. indigotica unigenes, GO analyses were performed and 55,991 unigenes were categorized into 52 GO classes including 16 cellular components, 17 molecular functions, and 19 biological processes (Fig. 3). Under the cellular components, the three largest percentages of genes were “cell part” (42,536), “cell” (42,415), and “organelle” (33,702). Within the molecular function category, “binding” (29,463), “catalytic activity” (26,105), and “transporter activity” (3377) were the most highly represented categories. For the biological process category, the largest proportions of genes were clustered into “metabolic process” (35,975), “cellular process” (35,105 sequences), and “single-organism process” (30,137). However, only a few sequences came from the terms of “extracellular matrix part,” “metallochaperone activity,” and “translation regulator activity.”
In the COG classification, 23,072 unigenes were assigned to 25 COG categories (Table 3; Fig. 4). Of these, “general function prediction only” was dominant (6178 sequences), followed by “replication, recombination, and repair” (3175), “transcription” (2768), “translation, ribosomal structure and biogenesis” (2715), and “signal transduction mechanisms” (2390). Only a few genes matched the terms “extracellular structures” (1) and “nuclear structure” (9).
Tissue-specific transcriptome analysis and identification of DEG in I. indigotica.
The Pearson’s distance correlation matrix, statistics of DEGs numbers, GO classification, and enriched KEGG pathways were used for comparative analysis. The RNA-Seq data were used to assess differences in the expression of genes in different tissues of I. indigotica, including leaves, flowers, fruit, stems, and roots. The FPKM values representing the expression levels of unigenes were calculated and compared among leaves, flowers, fruit, stems, and roots. The Pearson’s distance correlation matrix was generated to compare the transcriptomes from each sample. The correlation dendrogram showed the relative relationships among the five tissues visually (Fig. 5). The results suggest that leaves were most similar to fruit, and roots were significantly different from other tissues (Fig. 5).
We compared the expression levels of unigenes from different tissues of I. indigotica (Fig. 6). A unigene was regarded as a DEG when the fold change was ≥2 and the FDR was <0.01. There were 3690 DEGs between the flower and root, of which 3282 were downregulated, and 408 were upregulated (Table 4). In addition, there were 928 DEGs between the fruit and root, of which 784 were downregulated and 144 were upregulated. Moreover, we identified 717 DEGs between the leaf and root, 550 of which were downregulated and 167 of which were upregulated. Between the root and stem, 318 DEGs were downregulated whereas 570 DEGs were upregulated. Overall, we identified 272 common DEGs from the four comparison groups.
Summary of the differentially expressed genes (DEGs) numbers among pairwise comparisons of flower, fruit, leaf, and stem with root.
GO analysis of the DEGs in Flower_vs._Root showed that they were enriched in 52 categories, with cell part (1792 unigenes) representing the most abundant category, followed by cell (1783 unigenes), metabolic process (1607 unigenes), and cellular process (1579 unigenes) (Fig. 7). In Fruit_vs._Root, cell (640 unigenes) and cell part (639 unigenes) were the most abundant categories, followed by metabolic process (576 unigenes) and single-organism process (550 unigenes) (Supplemental Fig. 1). The most abundant categories in Leaf_vs._Root were similar to the categories in Fruit_vs._Root (Supplemental Fig. 2). The unigenes involved in cell (626 unigenes), cell part (626 unigenes), metabolic process (542 unigenes), cellular process (527 unigenes), organelle (519 unigenes), and single-organism process (498 unigenes) were enriched between root and stem (Supplemental Fig. 3).
KEGG pathway analysis showed that the DEGs in fruit, flowers, leaves, and stems were significantly enriched when compared with roots in several pathways, including photosynthesis, stilbenoid, diarylheptanoid, and gingerol biosynthesis, photosynthesis-antenna proteins, carbon fixation in photosynthetic organisms, pentose and glucuronate interconversions, and glucosinolate biosynthesis (Table 5). Some pathways, particularly the pathways of energy metabolism and biosynthesis of other secondary metabolites, were highly overlapping in all comparisons. For example, ≈30 unigenes associated with photosynthesis were significantly enriched in flowers (Table 5). These results suggested that these genes were of great importance in the growth and development of I. indigotica.
Top four enriched Kyoto Encyclopedia of Genes and Genomes (Wixon and Kell, 2000) pathways among pairwise comparisons of flower, fruit, leaf, and stem with root.
KEGG pathway analysis and functional genes involved in circadian rhythm.
To further analyze the transcriptome of I. indigotica, all unigenes were compared in the KEGG pathway database. In total, 19,927 unigenes were matched to the database and assigned to 124 KEGG pathways (Table 3; Supplemental Table 1). “Metabolic pathways” (ko01100: 5551 unigenes) had the largest number of unigenes, followed by “biosynthesis of secondary metabolites” (ko01110: 2552 unigenes), “ribosome” (ko03010: 1333 unigenes), “oxidative phosphorylation” (ko00190: 713 unigenes), and “spliceosome” (ko03040: 709 unigenes).
According to the pathway-based analysis, a total of 80 candidate genes were identified for the “circadian rhythm-plant” pathway (ko04712) (Fig. 8; Supplemental Tables 1 and 2). These rhythm-related genes, including circadian clock associated 1 (CCA1, K12134), gigantea (GI, K12124), late elongated hypocotyl (LHY, K12133), flavin-binding kelch domain F box 1 (FKF1, K12116), phytochrome D (PHYD, K12122), cryptochrome 1 (CRY1, K12118), phytochrome A (PHYA, K12120), phytochrome B (PHYB, K12121), phytochrome E (PHYE, K12123), with no lysine kinase 1 (WNK1, K12132), phytochrome interacting factor 3 (PIF3, K12126), early flowering 3 (ELF3, K12125), lov kelch protein 2 (LKP2, K12117), and cryptochrome 2 (CRY2, K12119), were involved in many rhythmic processes such as photoperiodic flowering, cell elongation, and ultraviolet-B protection (Fig. 8). Besides, several genes, including constant (CO, K12135), zeitlupe (ZTL, K12115), timing of cab expression 1 (TOC1, K12127), and chalcone synthase (CHS, K00660), were implicated in plant light signaling pathways. Studying the roles of circadian-regulated genes responsible for controlling flowering could facilitate elucidation of interplays between photoperiod and the circadian clock.
Identification of flowering-associated genes in I. indigotica.
The complex genetic network which controls the development transition of flowering comprised several coordinate flowering pathways (Srikanth and Schmid, 2011). More than 200 flowering-related genes have been identified and characterized in the model plant A. thaliana (Amasino and Michaels, 2010; Srikanth and Schmid, 2011). To identify the genes responsible for bolting and flowering regulation in I. indigotica, a local BLASTx similarity search against A. thaliana flowering genes was performed. A total of 91 unigenes that showed high homology to the known bolting and flowering-related genes in A. thaliana were identified, and these identified genes were involved in various flowering pathways (Table 6; Supplemental Tables 2 and 3), including photoperiod/circadian clock, GA, autonomous, vernalization, and aging pathways (Amasino and Michaels, 2010; Srikanth and Schmid, 2011). These include orthologs of the autonomous pathway genes such as flowering locus D (FLD), flowering locus with kh domains (FLK), and luminidependens (LD), orthologs of the GA pathway gene such as flowering-promoting factor (FPF) and orthologs of the vernalization pathway genes such as vernalization 1 (VRN1), embryonic flower1 (EMF1), and enhancer of AG-4 2 (HUA2), as well as a number of genes responding to the photoperiod pathway, such as agamous-like 15 (AGL15), CO, CCA1, CHS, CRY2, PHYA, PIF3, ELF3, LHY, TOC1, and terminal flower1 (TFL1). In addition, homologs of the key genes involved in flowering regulation, such as flowering locus C (FLC), flowering locus T (FT), CO, and suppressor of overexpression of co1 (SOC1), were also identified in this study. In addition, we identified several orthologs of the functional genes associated with floral meristem identity and flower development such as apetala1 (AP1) and sepallata1 (SEP1). There was a very high degree of similarity in the known genetic flowering pathways and a lot of critical flowering genes between I. indigotica and A. thaliana.
The identified candidate genes associated with flowering pathways of Isatis indigotica.
qRT-PCR analysis.
To evaluate the reliability of RNA-Seq analysis, eight bolting and flowering-related genes, AGL15 (TRINITY_DN18713_c0_g1), AGL17 (TRINITY_DN59446_c0_g1), ELF6 (TRINITY_DN69071_c0_g2), ELF8 (TRINITY_DN69534_c0_g2), EMF2 (TRINITY_DN66245_c2_g3), FLC (TRINITY_DN69832_c1_g1), photoperiod independent early flowering1 (PIE1) (TRINITY_DN67850_c0_g2), and TFL1 (TRINITY_DN21486_c0_g1), were selected for verification in five tissues (roots, stems, leaves, flowers, and fruit) using qRT-PCR (Fig. 9). The primer sequences, RNA-Seq results, and qRT-PCR values are listed in Supplemental Table 4. The expression profiles of the eight genes revealed by qRT-PCR were generally consistent with the corresponding FPKM values derived from RNA-seq. The expression patterns of five genes (Fig. 9A, B, D, G, and H) fit well with the RNA-Seq results, whereas three genes (Fig. 9C, E, and F) had similar expression patterns, but with very small inconsistencies compared with the RNA-Seq results. These results support the reliability of the RNA-Seq data.
Discussion
This study catalogs the occurrence of genes in I. indigotica that have sequence similarity to flowering gene A. thaliana and provides preliminary investigations of flowering-associated genes in I. indigotica. Furthermore, we compared transcriptomes of leaves, stems, flowers, and fruit with that of roots at the reproductive stages. Functional enrichment of the DEGs provided clues about the molecular basis of reproduction in I. indigotica.
I. indigotica is an important biennial herbaceous plant, and its dried leaves and roots have commonly been used as Chinese medicines for hundreds of years. Because of the low number of available I. indigotica gene sequences in public databases, little functional investigation has been done for this plant. Tang et al. generated 28 million Illumina paired-end reads that were then assembled into 33,238 unigenes in a previous study (Tang et al., 2014). In this study, RNA-seq and differential gene expression profiling analyses were performed. About 150 million clean reads were generated and assembled into 124,508 unigenes. The numbers of high-quality reads and assembled unigenes exceeded the previous study. The unigenes assembled in this study were subjected to a BLASTx similarity search and annotation against the nr, SwissProt, COG, KEGG, GO, and Pfam databases. The most frequent GO term in the molecular function, biological process, and cellular component groups were the cell part, metabolic process, and binding, respectively. These findings differ somewhat from the results in our previous study of the transcriptome of I. indigotica. The COG functional annotation of the unigenes revealed that the category “general function prediction only” was the most highly represented category. A large number of unigenes were also annotated to the category “replication, recombination, and repair,” which were similar to the previous study of the transcriptome of I. indigotica (Tang et al., 2014). These findings establish the basis for further genomic studies in I. indigotica.
Studies of A. thaliana flowering have revealed that multiple different flowering pathways converge at several flowering pathway integrators such as SOC1 and FT, which were regulated by two key upstream genes, CO and FLC that negatively control flowering (Lee and Lee, 2010; Moon et al., 2005; Srikanth and Schmid, 2011). In this study, two unique sequences, TRINITY_DN63019_c0_g1 and TRINITY_DN63019_c0_g2, were found to be the orthologs of AtCO. Moreover, some studies suggested that lots of genes associated with the vernalization pathway played important roles in flowering control (Jung et al., 2012; Srikanth and Schmid, 2011). As expected, the vernalization-response genes including EMF1, VRN1, and vernalization 2 (VRN2) were also identified in this study (Table 6) and the result was consistent with the previous studies of flowering gene discovery (Gao et al., 2014; Moon et al., 2003; Zhang et al., 2011, 2013). The unigene (TRINITY_DN38410_c0_g3) matched to AtSOC1 was found as well (Supplemental Table 3). The interaction of SOC1 and AGL24, which belong to the MADS-box gene family regulates the expression of LFY and determines the flowering time of A. thaliana (Liu et al., 2008; Seo et al., 2009). The family of MADS-box transcription factor is a major group of regulators controlling floral development and floral organ specification in A. thaliana (Becker and Theißen, 2003; Smaczniak et al., 2012; Theißen, 2001). Several members of the MADS-box family such as AP1, AP2, and AGLs were identified in this study and probably take part in the regulation of the development and flowering of I. indigotica.
Conclusion
Illumina sequencing and de novo assembly were performed for I. indigotica at the reproductive stages. As a result, more than 44.97 Gbp of data were obtained and assembled into 124,508 unigenes with an average length of 1027 bp, representing orthologs of known plant genes, as well as potential new genes. The assembly and annotation of the I. indigotica transcriptome achieved in this study revealed tissue-specific gene expression patterns and pathways. This study is the first to conduct systematic identification of flowering-associated genes based on transcriptome sequencing and assembly in I. indigotica. A total of 91 unigenes that showed high homology to the known bolting and flowering-related genes in A. thaliana were identified. The results presented herein could build a foundation for further investigation of bolting and flowering regulatory networks in I. indigotica and contribute to molecular and genetic research in the Brassicaceae plants.
Literature Cited
Amasino, R.M. & Michaels, S.D. 2010 The timing of flowering Plant Physiol. 154 516 520
Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Jones, S.G., Howe, K.L., Marshall, M. & Sonnhammer, E.E.L. 2002 The Pfam protein families database Nucleic Acids Res. 30 276 280
Becker, A. & Theißen, G. 2003 The major clades of MADS-box genes and their role in the development and evolution of flowering plants Mol. Phylogenet. Evol. 29 464 489
Benjamini, Y. & Hochberg, Y. 1995 Controlling the false discovery rate: A practical and powerful approach to multiple testing J. R. Stat. Soc. B 57 1 289 300
Boyle, E.I., Weng, S., Gollub, J., Jin, H., Botstein, D., Cherry, J.M. & Sherlock, G. 2004 GO:TermFinder—Open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes Bioinformatics 20 3710 3715
Conesa, A., Götz, S., García-Gómez, J.M., Terol, J., Talón, M. & Robles, M. 2005 Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research Bioinformatics 21 3674 3676
Dong, Z., Danilevskaya, O., Abadie, T., Messina, C., Coles, N. & Cooper, M. 2012 A gene regulatory network model for floral transition of the shoot apex in maize and its dynamic modeling PLoS One 7 e43450
Dukowic-Schulze, S., Harris, A., Li, J.H., Sundararajan, A., Mudge, J., Retzel, E.F., Pawlowski, W.P. & Chen, C. 2014 Comparative transcriptomics of early meiosis in Arabidopsis and maize J. Genet. Genomics 41 139 152
Gao, J., Zhang, Y., Zhang, C.L., Qi, F.Y., Li, X.P., Mu, S.H. & Peng, Z.H. 2014 Characterization of the floral transcriptome of moso bamboo (Phyllostachys edulis) at different flowering developmental stages by transcriptome sequencing and RNA-seq analysis PLoS One 9 e98910
Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z.H., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., Palma, F., Birren, B.W., Nusbaum, C., Lindblad-Toh, K., Friedman, N. & Regev, A. 2011 Full-length transcriptome assembly from RNA-Seq data without a reference genome Nat. Biotechnol. 29 644 652
Huang, L.Y., Zhang, F., Zhang, F., Wang, W.S., Zhou, Y.L., Fu, B.Y. & Li, Z.K. 2014 Comparative transcriptome sequencing of tolerant rice introgression line and its parents in response to drought stress BMC Genomics 15 1026
Iseli, C., Jongeneel, C.V. & Bucher, P. 1999 ESTScan: A program for detecting, evaluating, and reconstructing potential coding regions in EST sequences Proc. Intl. Conf. Intell. Syst. Mol. Biol. 99 138 148
Jung, C.H., Wong, C.E., Singh, M.B. & Bhalla, P.L. 2012 Comparative genomic analysis of soybean flowering genes PLoS One 7 e38250
Lee, J. & Lee, I. 2010 Regulation and function of SOC1, a flowering pathway integrator J. Expt. Bot. 61 2247 2254
Li, B. & Dewey, C.N. 2011 RSEM: Accurate transcript quantification from RNA-Seq data with or without a reference genome BMC Bioinformatics 12 323
Li, L., Stoeckert, C.J. & Roos, D.S. 2003 OrthoMCL: Identification of ortholog groups for eukaryotic genomes Genome Res. 13 2178 2189
Liu, C., Chen, H., Er, H.L., Soo, H.M., Kumar, P.P., Han, J.H., Liou, Y.C. & Yu, H. 2008 Direct interaction of AGL24 and SOC1 integrates flowering signals in Arabidopsis Development 135 1481 1491
Livak, K.J. & Schmittgen, T.D. 2001 Analysis of relative gene expression data using real-time quantitative PCR and the 2−ΔΔCT method Methods 25 402 408
Mao, X.Z., Cai, T., Olyarchuk, J.G. & Wei, L.P. 2005 Automated genome annotation and pathway identification using the KEGG orthology (KO) as a controlled vocabulary Bioinformatics 21 3787 3793
Moon, J., Lee, H., Kim, M. & Lee, I. 2005 Analysis of flowering pathway integrators in arabidopsis Plant Cell Physiol. 46 292 299
Moon, Y.H., Chen, L.J., Pan, R.L., Chang, H.S., Zhu, T., Maffeo, D.M. & Sung, Z.R. 2003 EMF genes maintain vegetative development by repressing the flower program in Arabidopsis Plant Cell 15 681 693
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. 2008 Mapping and quantifying mammalian transcriptomes by RNA-Seq Nat. Methods 5 621 628
Mouhu, K., Hytönen, T., Folta, K., Rantanen, M., Paulin, L., Auvinen, P. & Elomaa, P. 2009 Identification of flowering genes in strawberry, a perennial SD plant BMC Plant Biol. 9 122
Natale, D.A., Shankavaram, U.T., Galperin, M.Y., Wolf, Y.I., Aravind, L. & Koonin, E.V. 2000 Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs) Genome Biol. 1 5 research0009.1 0009.19
National Pharmacopoeia Committee 2015 Pharmacopoeia of the People’s Republic of China. China Med. Sci. Technol. Press, Beijing, China
Nie, S.S., Li, C., Xu, L., Wang, Y., Huang, D.Q., Muleke, E.M., Sun, X.C., Xie, Y. & Liu, L.W. 2016 De novo transcriptome analysis in radish (Raphanus sativus L.) and identification of critical genes involved in bolting and flowering BMC Genomics 17 389
Parcy, M. 2005 Flowering: A time for integration Intl. J. Dev. Biol. 49 585 593
Platel, R.K. & Jain, M. 2012 NGS QC toolkit: A toolkit for quality control of next generation sequencing data PLoS One 7 e30619
Polashock, J.J., Arora, R., Peng, Y., Naik, D. & Rowland, L.J. 2010 Functional identification of a C-repeat binding factor transcriptional activator from blueberry associated with cold acclimation and freezing tolerance J. Amer. Soc. Hort. Sci. 135 40 48
Sánchez-Sevilla, J.F., Cruz-Rus, E., Valpuesta, V., Botella, M.A. & Amaya, I. 2014 Deciphering gamma-decalactone biosynthesis in strawberry fruit using a combination of genetic mapping, RNA-Seq and eQTL analyses BMC Genomics 15 218
Sato, A., Okubo, H. & Saitou, K. 2006 Increase in the expression of an alpha-amylase gene and sugar accumulation induced during cold period reflects shoot elongation in hyacinth bulbs J. Amer. Soc. Hort. Sci. 131 185 191
Seo, E., Lee, H., Jeon, J., Park, H., Kim, J., Noh, Y.S. & Li, I. 2009 Crosstalk between cold response and flowering in Arabidopsis is mediated through the flowering time gene SOC1 and its upstream negative regulator FLC Plant Cell 21 3185 3197
Shrestha, R., Gómez-Ariza, J., Brambilla, V. & Fornara, F. 2014 Molecular control of seasonal flowering in rice, arabidopsis and temperate cereals Ann. Bot. 114 1445 1458
Smaczniak, C., Immink, R.G.H., Angenent, G.C. & Kaufmann, K. 2012 Developmental and evolutionary diversity of plant MADS-domain factors: Insights from recent studies Development 139 3081 3098
Srikanth, A. & Schmid, M. 2011 Regulation of flowering time: All roads lead to Rome Cell. Mol. Life Sci. 68 2013 2037
Stamm, M.D., Enders, L.S., Donze-Reiner, T.J., Baxendale, F.P., Siegfried, B.D. & Heng-Moss, T.M. 2014 Transcriptioal response of soybean to thiamethoxam seed treatment in the presence and absence of drought stress BMC Genomics 15 1055
Strickler, S.R., Bombarely, A. & Mueller, L.A. 2012 Designing a transcriptome next-generation sequencing project for a nonmodel plant species Amer. J. Bot. 99 257 266
Tang, X.Q., Xiao, Y.H., Lv, T.T., Wang, F.Q., Zhu, Q.H., Zheng, T.Q. & Yang, J. 2014 High-throughput sequencing and de novo assembly of the Isatis indigotica transcriptome PLoS One 9 e102963
Thakare, D., Yang, R., Steffen, J.G., Zhan, J.P., Wang, D.F., Clark, R.M., Wang, X.F. & Yadegari, R. 2014 RNA-Seq analysis of laser-capture microdissected cells of the developing central starchy endosperm of maize Genom. Data 2 242 245
Theißen, G. 2001 Development of floral organ identity, stories from the MADS house Curr. Opin. Plant Biol. 4 75 85
Tsuji, H., Taoka, K.I. & Shimamoto, K. 2011 Regulation of flowering in rice: Two florigen genes, a complex gene network, and natural variation Curr. Opin. Plant Biol. 14 45 52
Wang, J.W. 2014 Regulation of flowering time by the miR156-mediated age pathway J. Expt. Bot. 65 4723 4730
Wang, L., Feng, Z., Wang, X., Wang, X. & Zhang, X. 2010 DEGseq: An R package for identifying differentially expressed genes from RNA-seq data Bioinformatics 26 136 138
Wang, Z., Gerstein, M. & Snyder, M. 2009 RNA-Seq: A revolutionary tool for transcriptomics Nat. Rev. Genet. 10 57 63
Wixon, J. & Kell, D. 2000 The Kyoto Encyclopedia of Genes and Genomes–KEGG Yeast 17 48 55
Ye, J., Fang, L., Zheng, H., Zhang, Y., Chen, J., Zhang, Z., Wang, J., Li, S.T., Li, R.Q., Bolund, L. & Wang, J. 2006 WEGO: A web tool for plotting GO annotations Nucleic Acids Res. 34 W293 W297
Young, M.D., Wakefield, M.J., Smyth, G.K. & Oshlack, A. 2010 Gene ontology analysis for RNA-seq: Accounting for selection bias Genome Biol. 11 R14
Zhang, J.X., Wu, K.L., Zeng, S.J., Teixeira da Silva, J.A., Zhao, X.L., Tian, C.E., Xia, H.Q. & Duan, J. 2013 Transcriptome analysis of Cymbidium sinense and its application to the identification of genes associated with floral development BMC Genomics 14 279
Zhang, J.Z., Ai, X.Y., Sun, L.M., Zhang, D.L., Guo, W.W., Deng, X.X. & Hu, C.G. 2011 Transcriptome profile analysis of flowering molecular processes of early flowering trifoliate orange mutant and the wild-type [Poncirus trifoliate (L.) Raf.] by massively parallel signature sequencing BMC Genomics 12 63
Zhu, Q.H., Stephen, S., Kazan, K., Jin, G., Fan, L.J., Taylor, J., Dennis, E.S., Helliwell, C.A. & Wang, M.B. 2013 Characterization of the defense transcriptome responsive to Fusarium oxysporum-infection in Arabidopsis using RNA-seq Gene 512 259 266
Kyoto Encyclopedia of Genes and Genomes [KEGG (Wixon et al., 2000)] pathways of the assembled unigenes in Isatis indigotica.
The coding sequence (CDS) of flowering-associated genes in Isatis indigotica.
Detailed information of the identified candidate genes associated with flowering pathways of Isatis indigotica.
List of gene-specific primers used in quantitative real time polymerase chain reaction (qRT-PCR).