Abstract
Complex changes in gene expression occur during postharvest storage of apple (Malus ×domestica) and often precede or accompany changes in ripening and disorder development. Targeted gene expression analysis fundamentally relies on previous knowledge of the targeted gene. Minimally, a substantial fragment of the gene sequence must be known with high accuracy so that primers and probes, which bind to their targets in a complimentary fashion, are highly specific. Here, we describe a workflow that leverages publicly available transcriptome data to discover apple cultivar–specific gene sequences to guide primer design for quantitative real-time polymerase chain reaction (qPCR). We find that problematic polymorphisms occur frequently in ‘Granny Smith’ and ‘Honeycrisp’ apple when candidate primer binding sites were selected using the ‘Golden Delicious’ genome. We attempted to validate qPCR-based gene expression measurements with RNA sequencing (RNA-Seq) analysis of the same RNA samples. However, we found that agreement between the two technologies was highly variable and positively correlated with the similarity between cultivar-specific genes and RNA-Seq reference genes. Thus, we offer insight that 1) improves the accuracy and efficiency of qPCR primer design in cultivars that lack sufficient sequence resources and 2) better guides the essential step of validation of RNA-Seq data with a subset of genes of interest examined via qPCR.
Physiological disorders of apple that develop during storage contribute to significant postharvest losses (up to 30% cullage), causing significant economic losses for the apple industry (Bramlage and Watkins 1994; Doerflinger et al., 2015; Rosenberger et al., 2001). Our understanding of the molecular mechanisms of postharvest tree fruit disorders is continuously evolving (Johnson and Zhu 2015; Leisso et al., 2015; Lum et al., 2016; Sevillano et al., 2009). Disorder incidence and severity is often cultivar dependent, with some cultivars being predisposed to specific physiological defects over the course of long-term storage (Larrigaudière et al., 2016). Examining changes in gene expression during the postharvest period will 1) shed light on the molecular mechanisms underlying the physiology of tree fruit disorders; 2) be useful in classifying disorder susceptibility; 3) act as a guide for new storage strategies; and 4) improve risk assessment and management before, during, and after storage (Duan et al., 2017; Leisso et al., 2016; Nham et al., 2015).
Global-scale gene expression analysis afforded by second-generation sequencing of mRNA allows the activity of all genes to be monitored simultaneously. This powerful approach has been used extensively to deepen our understanding of numerous plant processes (Thudi et al., 2012; Voelckel et al., 2017; Wang et al., 2009). However, a primary limitation to gene activity measurements for both targeted (e.g., qPCR) and untargeted (e.g., RNA-Seq) techniques is accurate previous knowledge of gene sequences. As more genomes become available for important crop species, recently so for rosaceous specialty crops such as Malus (Daccord et al., 2017; Velasco et al., 2010), Prunus (Ahmad et al., 2011), Pyrus (Chagne et al., 2014; Wu et al., 2013), and Rubus (VanBuren et al., 2016; Ward et al., 2013), this wave of gene discovery can be leveraged for enhanced gene expression analysis. However, genetic diversity among cultivars in these specialty crops can cause a loss of fidelity in gene expression measurements due to cryptic polymorphisms or mismatches in primers/probes or digital gene expression reference sequences (Clark et al., 2007; Takahashi et al., 2009). Development of detailed and accurate knowledge of cultivar-specific gene sequences that recover this loss in fidelity is an essential step to obtain highly accurate estimates of gene expression in nonsequenced cultivars of interest. This is especially relevant in duplicated and highly heterozygous plant genomes like apple (Daccord et al., 2017; Velasco et al., 2010), where differentiation of highly similar but unique genes (and alleles) is challenging and requires highly accurate sequence information.
qPCR is the reference standard for gene expression analysis, as the detection of transcripts is direct—oligo nucleotide primers and/or probes physically bind to a cDNA sequence of interest (Arya et al., 2005; O’Driscoll 2011; Wong and Medrano 2005; Wu et al., 2014). The highly accurate and highly sensitive nature of this interaction is a key feature of the technology, making it robust for the differentiation of highly similar sequences, even down to single-nucleotide polymorphisms (SNPs) (Gehring and Geider 2012; Jatayev et al., 2017). High specificity makes this technique susceptible to cryptic polymorphisms when, for instance, the genome of one cultivar is used for primer or probe design for another genetically distinct cultivar. The reliance on accurate previous knowledge of gene sequences has long been acknowledged as a hurdle for gene expression analysis (Freeman et al., 1999).
To create cultivar-specific and robust qPCR primers for targeted gene expression analysis, we assembled transcriptomes de novo from publicly available data for two apple cultivars (Granny Smith and Honeycrisp). We selected candidate genes using two different strategies; a forward approach in which we selected gene targets from published results (de Freitas et al., 2010, 2011) that implicated genes in a disorder of interest, and a reverse approach in which we selected genes with expression that was correlated with disorder incidence in a previous publication (Gapper et al., 2017). Using these candidates, we developed an efficient workflow (Fig. 1) that includes de novo assembly, contig selection, transcript validation, informed primer design, and qPCR assay validation. For validated transcripts, our qPCR assay success rate exceeded 95%. In addition to efficient primer design, we identified cultivar-specific polymorphisms affording a deeper understanding of how genes of interest in specific cultivars may contribute to the development of physiological disorders. We also developed strategies to enhance the necessary qPCR validation of RNA-Seq data to provide more meaningful interrogation of changes in global gene expression during postharvest storage of apple.

Workflow for quantitative real-time polymerase chain reaction (qPCR) primer design. This workflow uses the ‘Golden Delicious’ reference genome and cultivar-specific de novo assemblies for enhanced qPCR primer design, with applications for gene expression and RNA Sequencing (RNA-Seq) – qPCR cross validation. PCR = polymerase chain reaction.
Citation: Journal of the American Society for Horticultural Science J. Amer. Soc. Hort. Sci. 143, 5; 10.21273/JASHS04424-18

Workflow for quantitative real-time polymerase chain reaction (qPCR) primer design. This workflow uses the ‘Golden Delicious’ reference genome and cultivar-specific de novo assemblies for enhanced qPCR primer design, with applications for gene expression and RNA Sequencing (RNA-Seq) – qPCR cross validation. PCR = polymerase chain reaction.
Citation: Journal of the American Society for Horticultural Science J. Amer. Soc. Hort. Sci. 143, 5; 10.21273/JASHS04424-18
Workflow for quantitative real-time polymerase chain reaction (qPCR) primer design. This workflow uses the ‘Golden Delicious’ reference genome and cultivar-specific de novo assemblies for enhanced qPCR primer design, with applications for gene expression and RNA Sequencing (RNA-Seq) – qPCR cross validation. PCR = polymerase chain reaction.
Citation: Journal of the American Society for Horticultural Science J. Amer. Soc. Hort. Sci. 143, 5; 10.21273/JASHS04424-18
Materials and Methods
Transcriptome assembly using public data.
Transcriptome data were retrieved from the Sequencing Read Archive (SRA) at the National Center for Biotechnology Information [NCBI (National Institutes of Health, 2018)] in compressed SRA archive format using the prefetch command. Archives were validated with the vdb-validate command, and archives were extracted with the fastq-dump command using the SRA Toolkit v2.8.2-1 (National Institutes of Health, 2018). The ‘Granny Smith’ apple fruit peel transcriptome was reported in Gapper et al. (2017) (SRA experiment SRP100589). The ‘Honeycrisp’ apple fruit peel transcriptome was reported in Leisso et al. (2016) (SRA experiment SRP081273).
Raw fastq data from the aforenamed NCBI SRA archives were quality trimmed and adapter filtered (referencing TruSeqv3 adapters; Illumina, San Diego, CA) using CLC Genomics Workbench 9.5.3 (QIAGEN, Hilden, Germany) with default parameters (e.g., trim bases less than Q20) except that all ambiguous bases were trimmed. The CLC Genomics Workbench 9.5.3 Toolkit de novo sequencing > de novo assembly function was used for de novo assembly with default parameters, except minimum contig length was decreased to 200 bp to allow inclusion of as many transcripts (and fragments thereof) as possible in our BLASTn searches. See Supplemental Table 1 for assembly statistics.
Identification of cultivar-specific gene sequences using BLAST.
Gene sequences from two versions of the apple genome appear in this manuscript. Apple V1 refers to the genome of the commercially produced ‘Golden Delicious’ apple [Velasco et al., 2010; specifically: Malus ×domestica Whole Genome v1.0.p - Assembly and Annotation at the Genome Database for Rosaceae (GDR) (Jung et al., 2014)]. Apple V2 refers to the genome of a ‘Golden Delicious’ double haploid from INRA breeding efforts in the 1960s [Daccord et al., 2017; specifically: Malus ×domestica GDDH13 v1.1 - Assembly and Annotation at GDR (Jung et al., 2014)]. The steps described to follow were performed with both Apple V1 and Apple V2 (beginning with the BLASTn search of de novo assembly databases). Because the annotations of these two genomes have not been reconciled, we used Apple V1 transcripts to identify best matches to Apple V2 transcripts.
Initially, 28 candidate apple genes for ‘Granny Smith’ and 14 for ‘Honeycrisp’ were selected for this analysis (see Supplemental Files 1 and 2, final lists in Tables 1 and 2). To create our ‘Granny Smith’ candidate list, we used Apple V1 gene IDs from Gapper et al. (2017) to retrieve coding sequences (CDS) from GDR. To create our ‘Honeycrisp’ candidate list, clone sequences from de Freitas et al. (2010, 2011) were retrieved from GenBank (Benson et al., 2012) using the published NCBI clone accession numbers, or using best hits to the M. ×domestica expressed sequence tags (ESTs; the referenced University of California at Davis URL to retrieve ESTs was not found—see Supplemental Table 2). Apple V1 was then queried with BLASTn (via GDR, default parameters, excluding hits <95% identity) to retrieve best matching ‘Golden Delicious’ CDS sequences for each ‘Honeycrisp’ candidate. For those that did not result in good matches to Apple V1 genes, the NCBI annotation of M. ×domestica was similarly searched using BLASTn (see Supplemental File 3 and Supplemental Table 2).
Summary of ‘Granny Smith’ candidate genes.z


Summary of ‘Honeycrisp’ candidate genes.z


BLASTn (built into CLC Genomics Workbench 9.5.3) was used to build nucleotide BLAST databases from the de novo transcriptome assemblies for each cultivar of interest. To retrieve cultivar-specific gene sequences, we used the known ‘Golden Delicious’ apple gene sequences to search each respective nucleotide BLASTn database. BLASTn parameters were default (including setting the e-value threshold to 1−10 and enabling the “low complexity filter”), except that the maximum number of hits was reduced from 250 to five. For each gene, we selected de novo contigs from the multiple BLASTn hits that had the best combination of high percent identity (80% to 100%), greatest subject coverage, and highest bit score. Known apple genes that did not produce sufficiently long, high-identity hits were excluded from further analysis. When a BLASTn search produced multiple good hits (e.g., two ≈600 bp, >95% identify alignments to distinct portions of the query sequence), these hits were taken into subsequent steps in the analysis and either manually assembled for validation [e.g., HC_02 (see Supplemental Fig. 1)], excluded based on failure of PCR validation, or the higher identity match was chosen. The selected contigs and Apple V1 and V2 transcripts were imported into Geneious v10.1.2 (Kearse et al., 2012) for further analysis and primer design.
Global alignments of reference genes and de novo contigs.
Alignments of each qualifying BLASTn hit to the query sequence was done using the Align/Assemble > Pairwise/Multiple Align function in Geneious (see Supplemental File 4). Multiple alignment algorithms were used to produce alignments that were then curated by hand. Generally, the “Geneious Alignment” and “MUSCLE Alignment” options performed well, but we note that the quality of the alignment depended heavily on sequence characteristics and therefore an exploration of aligner settings is prudent. Instances of ambiguities, SNPs (e.g., bp mismatches with regard to primer binding), splice variants, and insertions/deletions were recorded for each candidate. For all contigs, open reading frames were identified and translated for subsequent protein alignments to verify the CDS. Protein alignments were examined to find changes in the encoded proteins. For reference genes that lacked nucleotide ambiguity codes (common in the 454-based Apple V1), EMBOSS 6.5.7 (Rice et al., 2000) was used to predict secondary protein structure to scan for differences via a Geneious plugin.
Tissue handling and quality control.
‘Granny Smith’ peel tissue was removed from fruit at or shortly after harvest using a vegetable peeler. Three evenly spaced peels cut from the stem end to calyx end (sections of ≈20 × 3 cm) of each apple fruit were immediately flash frozen on liquid nitrogen and stored at –80 °C. Four samples in biological triplicate (a total of 12 observations—each replicate was a pool of tissues from six fruit) represent a short time course sampling scheme in the first 2 weeks of 1 °C air storage. ‘Honeycrisp’ peel and cortical tissue were collected from individual apples after about 6 months of storage at 1 °C in air using a 4-mm biopsy punch to an approximate depth of 6 to 8 mm, where the peel was immediately separated from the cortex using a razor blade and both tissues immediately frozen separately in liquid nitrogen, then stored at –80 °C. Four samples, in biological triplicate (a total of 12 observations—where each replicate was a pool of tissues from 10 fruits) represent peel and cortical tissues each from fruit with and without bitter pit lesions. RNA was extracted using a CTAB/chloroform protocol modified specifically for pome fruit tissue (Honaas and Kahn, 2017). Extracted RNA was analyzed for quantity and purity using a spectrophotometer (NanoDrop ND-1000; Thermo Fisher Scientific, Waltham, MA) and for quantity and integrity on an automated electrophoresis system (Bioanalzyer 2100 G2938C; Agilent Technologies, Santa Clara, CA) with the Agilent-RNA Pico Kit (catalog no. 5067-1513). Only RNA that met the following standards were used for downstream analysis: A260/A280 ≈2.0, RNA integrity number of ≥8.0.
Candidate gene transcript validation.
Putative cultivar-specific transcripts (i.e., de novo contigs) were verified by PCR where the amplicon size was maximized to capture as much of the putative transcript as possible. PCR primers were designed in Geneious using Primer3 v2.3.4 (Untergasser et al., 2012). For each candidate gene transcript, five to 10 primer pairs were designed using the parameters found in Table 3. Primer pairs were selected based on closest melt temperature (Tm), lowest hairpin and dimer Tm, longest sequence length, and percent GC content closest to 50%. Preferred primer pairs for each putative transcript were then BLASTn searched against all de novo contigs to ensure primer binding specificity. In instances in which primer pairs matched multiple contigs, new primer pairs were selected and checked in the same manner until each primer pair had a unique target in the respective de novo transcriptome assembly. Primers were synthesized by Integrated DNA Technologies (IDT, Coralville, IA), dissolved in qPCR-grade water (catalog no. W4502; Sigma-Aldrich, St. Louis, MO) to produce 100 µm solutions, and stored at –20 °C. Recommendations by IDT were used for calculating final Tm.
PCR assay information: A summary of assay parameters used in this study for PCR primer and qPCR primer design.z


Total RNA (1 µg) was converted to cDNA using qScript (catalog no. 95048-025; QuantaBio, Beverly, MA) for ‘Granny Smith’ and iScript (catalog no. 1708840; Bio-Rad Laboratories, Hercules, CA) for ‘Honeycrisp’. Template amounts were generally between 5 and 10 ng but were variable depending on amplicon yield. For ‘Granny Smith’, PCRs were run on a thermocycler (T100, catalog no. 1861096; Bio-Rad Laboratories) using KAPA2G Fast ReadyMix (catalog no. KK5021; Roche, Basel, Switzerland) and the manufacturer’s suggested reaction protocol. For ‘Honeycrisp’, PCRs were run on a thermocycler (Veriti 96-Well, catalog no. 4375786; Applied Biosystems, Foster City, CA) (using the gradient feature) using EconoTaq PLUS GREEN 2X Master Mix (catalog no. 30033-1; Lucigen, Madison, WI) with the manufacturer’s suggested protocol. Annealing temperatures predicted by Primer3 and IDT ± 1.0 °C defined the gradient range for all reactions. All PCRs were checked on 1.5% Tris-acetate-EDTA (TAE) or Tris-borate-EDTA (TBE) agarose gel. In situations in which at least one reaction yielded a single, expected-size product, the reaction was considered a successful validation of the transcript. Reactions that failed to generate a product, generated the wrong-sized product, or resulted in multiple products were re-run with the same reagents using touch-down protocol (touch-down PCR) where the initial annealing steps (for 10 to 15 cycles, less 0.5 °C each cycle) were ≈10 °C higher than the final annealing temperature. After touch-down PCR, any reaction that still failed to produce a clean product resulted in removal of the gene from the study.
In most cases, overlapping contig alignments were either easy to manually assemble (100% identical alignment overlap >30 bp required) or clearly different genes. In cases in which overlapping contig alignments were clearly different transcripts, the best was chosen for primer design and downstream analysis. In one instance (Supplemental Fig. 1) in which multiple short contigs aligned to query sequences with a substantial gap, the intervening sequences were determined by amplicon sequencing. The QiaQuick Gel Extraction kit (catalog no. 28704; QIAGEN) was used to extract gel-purified amplicons from PCR validation reaction (described previously) and was sequenced by Retrogen (San Diego, CA).
qPCR assay design and execution.
Apple reference genes identified specifically as suitable reference genes for gene expression studies of tree fruits were selected from the literature (Tables 1 and 2; Supplemental Table 2). The following reference genes from Apple V1 MDP0000274900 (Perini et al., 2014), MDP0000173025 (Bowen et al., 2014), and MDP0000223691 (Storch et al., 2015) were used for ‘Granny Smith’ (Table 1) and MDP0000223691 (Storch et al., 2015), MDP0000095375 (Bowen et al., 2014), and MDP0000213603 (Perini et al., 2014) for ‘Honeycrisp’ (Table 2). A BLASTn search was used to find best matching contigs within respective cultivar-specific de novo assemblies. Published primers (Tables 1 and 2) were checked for perfect binding site matches or new primers were designed based on the de novo cultivar-specific sequences as for the candidate genes. All qPCR primer design followed the parameters in Table 3. Multiple primer pairs (n = 5) were scanned for binding sites that overlapped with polymorphisms in each alignment and tallied. The best qPCR primer pair for each transcript (no overlap with polymorphisms and closest to optimum parameters) was used to query the respective de novo assembly with BLASTn to ensure primer specificity. Primer sequences were synthesized by IDT and prepared as described above (see the section “Candidate gene transcript validation”).
Conventional standard curve analysis to determine primer efficiency was performed for a subset of candidates (n = 4). Starting with 1 µg of total RNA, cDNA was synthesized (including DNase treatment) and a template dilution series of 10−1 (2 ng) to 10−4 (2 pg) was prepared by serial dilution. The standard curve was prepared based on the Ct value of the dilution series run and primer efficiency was calculated as described in Ginzinger (2002) (Supplemental Table 3). Reaction efficiency based on raw amplification data also were estimated using the R 3.2.1 (R Core Team, 2017) package “qpcR” (Ritz and Spiess, 2008). The “Cy0” method was used in the efficiency calculation function (Guescini et al., 2008). The reaction efficiency estimates were generally concordant (Supplemental Table 3); therefore, the “Cy0” method was used to estimate efficiencies for all remaining candidate genes (Table 4).
Summary statistics for the best overall match between the ‘Golden Delicious’ reference sequences (Apple V1 and Apple V2) and the de novo ‘Granny Smith’ and ‘Honeycrisp’ contig.z


For both ‘Granny Smith’ and ‘Honeycrisp’ samples, total RNA was concentrated in a Concentrator plus/Vacufuge® (AG 5305; Eppendorf, Hamburg, Germany) to a minimum concentration of 125 ng·μL−1. One microgram of total RNA from each sample was used to synthesize single-strand cDNA using the iScript gDNA Clear cDNA Synthesis Kit (kit includes a DNase treatment, catalog no. 1725034; Bio-Rad Laboratories). All nucleic acid samples were stored at –20 °C. For ‘Granny Smith’ genes, all qPCRs were run on a CFX384 Touch (catalog no. 1855485; Bio-Rad Laboratories) and for ‘Honeycrisp’ on a CFX96 Real-Time PCR Detection System (catalog no. 1855195; Bio-Rad Laboratories) using SsoAdvanced Universal SYBR® Green Supermix (catalog no. 1725270; Bio-Rad Laboratories). The reaction volume was 10 µL, the template mass per reaction was 10 pg cDNA, and primer concentrations were 300 nm (‘Granny Smith’) and 500 nm (‘Honeycrisp’). The recommended thermal cycling protocol for SsoAdvanced SYBR Green was used: activation/DNA denaturation at 95 °C for 30 s, denaturation at 95 °C for 10 s, and annealing/extension at 60 °C for 30 s for 40 cycles. A melt curve analysis was included: 65 to 95 °C at 0.5-°C increments, 5 s per step. Samples were run in Bio-Rad plastics (catalog no. HSP-3801) and sealed with optical adhesive seals (catalog no. MSB-1001; Bio-Rad Laboratories). All assays included reverse transcription–negative controls to check for genomic DNA contamination and no template controls to check for other contamination. Each reaction was run in technical triplicate. Reverse transcription–negative controls for each sample were run on a single plate, as were reactions for all three reference genes. Template amounts were optimized such that the crossing point (in the linear phase) was at or before cycle 35 of 40. Only one assay, GS_02, produced a bad (bimodal) melt curve, and even though the correlation with our RNA-Seq data was high, this assay requires further optimization. We followed the “sample maximization” scheme described by Hellemans et al. (2007), where all samples are run on each plate, with each plate containing a subset of gene tests obviating the need for interrun calibrations.
Targeted validation with RNA-Seq of the 29 candidate genes.
As part of ongoing and separate research efforts, we are generating transcriptome data for apple fruit. To summarize, total RNA (described previously) was provided to the Penn State Genomics Core Facility in University Park for library preparation. Libraries were constructed with 600 ng of total RNA using TruSeq Stranded mRNA Library Prep kit (catalog no. RS-122-2103; Illumina) according to the manufacturer's instructions. Libraries were sequenced on a 150-bp single-end protocol to a target volume of ≥20 million reads per biological replicate on HiSEq. 2500 in Rapid Mode (Illumina). Raw reads are available at the NCBI’s SRA (SRA accession SRP150622).
Illumina read data (150 bp single-end mRNA) were analyzed with FastQC (Babraham Bioinformatics, 2018) to survey overall data quality and identify adapter sequences. This was done iteratively after data trimming and filtering with Trimmomatic v0.36 (Bolger et al., 2014) and the following command shows parameters that were sufficient to trim and filter raw data to remove adapter sequences and low quality data: - java -jar trimmomatic.jar phred33 Sample.fastq Sample.trimmed.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:5 TRAILING:5 SLIDINGWINDOW:4:15 MINLEN:50. Clean reads were mapped to the predicted ‘Golden Delicious’ apple v1.0 gene annotations obtained from Phytozome v12 (Joint Genome Institute, 2018 - Mdomestica_196_v1.0.transcript.fa, mean mapping rate 59.5% ± 1.7%) and to the double haploid ‘Golden Delicious’ apple version 1.1 (Bucher Laboratory, 2018; GDDH13_1–1_mrna.fasta, - mean mapping rate 84.2% ± 0.8%). Expression abundance was estimated for both using the RSEM v1.3.0 (Li and Dewey, 2011) pipeline with the inbuilt Bowtie2 (Langmead and Salzberg, 2012) read aligner option. Example commands show parameters: - rsem-prepare-reference--bowtie2--num-threads 50 Mdomestica_196_v1.0.transcript.fa Mdomestica & - rsem-calculate-expression --phred33-quals--num-threads 50 --bowtie2 Sample.trimmed.fastq Mdomestica Sample.
Data analysis.
The “sample maximization” experimental set-up for multiplate qPCR studies is a prerequisite for creation of a “Gene Study” using the CFX Maestro Software (Bio-Rad Laboratories) in which plate data are combined to evaluate relative expression across experimental groups for each candidate gene. The CFX Maestro Software was used to analyze all qPCR data (crossing points in linear range) using the Pfaffl method (Pfaffl 2001) with three reference genes and accounting for primer efficiencies for each qPCR primer pair considered. For graph preparation and statistical analysis, R 3.2.1 (R Core Team, 2017) was used.
Relative normalized expression values (calculated using the CFX Maestro Software) from qPCR experiments were correlated with normalized digital count data (reads per kilobase of transcript per million mapped reads) for each triplicated apple fruit biological sample (n = 4 for both ‘Granny Smith’ and ‘Honeycrisp’). Linear regression analysis of each observation (four biological samples in triplicate) was performed for each candidate gene (α = 0.05). If R2 ≤ 0.80, the cultivar-specific transcript was used to query the annotated genes in Apple V1 and Apple V2 with BLASTn (default parameters) to search for other high-identity gene matches with better agreement to our qPCR gene expression estimates. These secondary matches were analyzed as described previously.
Results
De novo assembly of publicly available transcriptome data.
The first step in our workflow to design efficient qPCR assays was to de novo assemble publicly available fruit transcriptome data for ‘Granny Smith’ [Gapper et al., 2017 (SRA experiment SRP100589)] and ‘Honeycrisp’ [Leisso et al., 2016 (SRA experiment SRP081273)]. The ‘Granny Smith’ assembly of 355 million reads resulted in ≈68,000 contigs with >9000 contigs over 750 bp in length, and the ‘Honeycrisp’ assembly of 254 million reads resulted in ≈50,000 contigs with >4500 contigs over 750 bp in length (Supplemental Table 1). As a quality check, reads were mapped back to the cultivar-specific assembly (see Honaas et al., 2016); >80% of reads mapped back, indicating an assembly that was representative of the input data and therefore high quality (Supplemental Table 1). The CLC Genomics Workbench de novo transcriptome assemblies were used without extensive postprocessing to maximize recovery of cultivar-specific transcripts (or sufficiently large fragments thereof) because no contigs would be removed during any cleanup or refinement steps.
Gene of interest and contig selection.
Candidate gene mRNA transcripts for the cultivar ‘Golden Delicious’ were obtained from GDR by simple sequence retrieval in the case of ‘Granny Smith’ candidates. For ‘Honeycrisp,’ apple ESTs related to sequences in de Freitas et al. (2010, 2011) were used to search ‘Golden Delicious’ nucleotide database to retrieve candidates (see Supplemental File 2 for BLASTn results). These sequences were then used to query cultivar-specific transcriptome assemblies with BLASTn (see Supplemental Files 3 and 4 for BLASTn results). From the multiple hits, cultivar-specific contigs (putative transcripts or fragments thereof) were selected, resulting in an initial candidate list of 25 ‘Granny Smith’ genes and 14 ‘Honeycrisp’ genes.
Transcript validation.
Contigs that resulted from de novo assembly are essentially a hypothesis about oligonucleotides present in the sample from which a library was made. In the case of RNA-Seq, these oligonucleotides are typically enriched for mRNA [selected by hybridization of polyadenylated mature mRNA transcripts to oligo(dT) probes]; thus, the majority of de novo assembled contigs are hypothetical mRNA transcripts. Therefore, to quickly test these hypotheses, conventional PCR primers were developed for all candidates and tested for expected amplicon size (see Supplemental File 5 for primer sequences). A majority of ‘Granny Smith’ candidates (15 of 25) readily yielded clean amplicons of the expected size (Supplemental Fig. 2A). All ‘Honeycrisp’ candidates (n = 14) passed this validation step with clean and specific PCR tests (Supplemental Fig. 2B). This indicates that the de novo contigs were accurate hypotheses of cultivar-specific transcripts.
Informed primer design: Cultivar-specific transcripts are polymorphic.
We next examined alignments to ‘Golden Delicious’ sequences of the 15 validated ‘Granny Smith’ candidates and 14 validated ‘Honeycrisp’ candidates. Compared with Apple V1, the cultivar-specific transcripts (or fragments thereof) produced global alignments that ranged from 88.1% to 100% nucleotide identity and for Apple V2 ranged from 79.4% to 100% nucleotide identity (Table 4; Supplemental File 4). Referencing Apple V1, the overall average of nucleotide identity was 96.9% ± 3.7% (‘Granny Smith’ 98.2% ± 2.0%, ‘Honeycrisp’ 94.9% ± 4.7%). For Apple V2, the average identity was slightly greater at 97.8% ± 4.4% (‘Granny Smith’ 97.9% ± 3.4%, ‘Honeycrisp’ 97.8% ± 5.3%), likely due to resolution of ambiguous base calls. Approximately 28% of the candidate genes contained insertions or deletions compared with the Apple V1 and Apple V2 transcripts, including a deletion that produced an open reading frame for a divergent and extended C-terminal poly-peptide for GS_15. A majority of candidates (82.8% for Apple V1 and 85.7% for Apple V2) had SNPs (Table 4, Supplemental File 4). SNPs were observed in potential primer binding sites in 72.0% and 46.6% of Apple V1 and Apple V2 candidates, respectively (examples in Fig. 2, Table 4, and Supplemental Fig. 3). Referencing Apple V1, 65.4%, and Apple V2, 75.0%, of candidate genes had SNPs that altered predicted protein sequences (Table 4; Supplemental File 4). The protein identity ranged from 71% to 100% between contigs and Apple V1 transcripts and between 89.7% to 100% for Apple V2 (Table 4; Supplemental File 4). For those candidates lacking ambiguity codes, the amino acid sequence (thus allowing secondary structure prediction) changes altered the prediction of secondary protein structure 90% of the time (Supplemental File 4).

Cultivar-specific genes are polymorphic compared with ‘Golden Delicious’ reference genes. An alignment showing single-nucleotide polymorphisms (colored letters) between a ‘Granny Smith’ de novo transcriptome assembly contig (GS.contig, GS_06) and the ‘Golden Delicious’ predicted mRNA (GD.mRNA, MDP0000275383). Polymorphisms, especially those occurring in primer binding sites, are potentially problematic for gene expression measurements.
Citation: Journal of the American Society for Horticultural Science J. Amer. Soc. Hort. Sci. 143, 5; 10.21273/JASHS04424-18

Cultivar-specific genes are polymorphic compared with ‘Golden Delicious’ reference genes. An alignment showing single-nucleotide polymorphisms (colored letters) between a ‘Granny Smith’ de novo transcriptome assembly contig (GS.contig, GS_06) and the ‘Golden Delicious’ predicted mRNA (GD.mRNA, MDP0000275383). Polymorphisms, especially those occurring in primer binding sites, are potentially problematic for gene expression measurements.
Citation: Journal of the American Society for Horticultural Science J. Amer. Soc. Hort. Sci. 143, 5; 10.21273/JASHS04424-18
Cultivar-specific genes are polymorphic compared with ‘Golden Delicious’ reference genes. An alignment showing single-nucleotide polymorphisms (colored letters) between a ‘Granny Smith’ de novo transcriptome assembly contig (GS.contig, GS_06) and the ‘Golden Delicious’ predicted mRNA (GD.mRNA, MDP0000275383). Polymorphisms, especially those occurring in primer binding sites, are potentially problematic for gene expression measurements.
Citation: Journal of the American Society for Horticultural Science J. Amer. Soc. Hort. Sci. 143, 5; 10.21273/JASHS04424-18
In one ‘Honeycrisp’ example (HC_02), two short contigs aligned at either end of a single ‘Golden Delicious’ gene obtained from GenBank with high sequence similarity (>99%), yet a substantial gap remained in the middle of the reference transcript (Supplemental Fig. 1). This putative transcript was validated with conventional PCR, the resulting amplicon was sequenced, and it was aligned with 99.0% identity to the NCBI Expressed Sequence Tag cDNA clone and 100% identity with the GenBank reference apple genome transcript model (Supplemental File 3). This case demonstrates that even partial information can be successfully leveraged to develop primers and probes for cultivar-specific genes of interest. Also, contigs were manually assembled for HC_05 and HC_14 based on overlapping alignments that were validated by PCR (Supplemental File 4).
Primer binding sites frequently contain polymorphisms.
Approximately 60% of the putative qPCR primers spanned polymorphisms between the reference ‘Golden Delicious’ sequence and the cultivar-specific de novo contig (Supplemental File 4). This is likely an overestimate of polymorphic primer sites, given the use of one primer pair per gene. However, because of the nuanced nature of selecting primers and the frequent practice of evaluating multiple primer pairs, we reported the frequency of potentially problematic sites. Ambiguity codes in reference gene models from Apple V1 masked superior primer binding sites in ≈10% of the candidate transcripts (Supplemental File 4).
Avoiding pitfalls in cross-technology validation of gene expression data.
Transcriptome data were used to validate the qPCR results for the biological samples used in this study. Since cross-platform validation is a critical step for RNA-Seq experiments, a similar analysis was used for this cross-cultivar mapping experiment where we mapped Illumina transcriptome data from ‘Granny Smith’ and ‘Honeycrisp’ to the ‘Golden Delicious’ genome. A correlation analysis of relative expression and normalized count data, reads per kilobase of transcript per million mapped reads, mapping against Apple V1 and Apple V2 showed that in nearly one-half of the candidates, the estimates agreed (R2 > 0.8) (Table 4; Supplemental Fig. 4A and B). For most of the remaining candidates in this study, lower, but still significant, positive correlations between RNA-Seq and qPCR were observed (Table 4; Supplemental Fig. 4A and B).
We hypothesized that the correlation between RNA-Seq and qPCR could be influenced by sequence similarity between qPCR targets and RNA-Seq reference sequences. Furthermore, because draft genome annotations and de novo transcriptome assemblies were incomplete, reciprocal searching for better matches to candidates revealed matches that showed improved agreement between qPCR and RNA-Seq. Apple V1 and Apple V2 transcripts were searched with BLASTn using cultivar-specific contigs. Generally, where BLASTn hits showed a higher sequence identity, the correlation between qPCR and RNA-Seq improved (Supplemental Fig. 4A and B). The improvement in agreement ranged from dramatic (Fig. 3A) to nominal (Fig. 3B). When we filtered out alignments of cultivar-specific genes that covered less than 75% of the reference sequences, we observed a positive correlation (R2 = 0.60) between agreement of qPCR vs. RNA-Seq estimates of gene expression and alignment identity (Supplemental Fig. 5). Although concordant with our other analyses, this relationship may have cultivar-specific characteristics, could be nonlinear, and will require additional data to fully resolve.

Agreement between quantitative real-time polymerase chain reaction (qPCR) and RNA sequencing (RNA-Seq) is variable but can be improved in some cases (for all regressions of all candidate genes, see Supplemental Fig. 4). The change in agreement (Pearson’s R2 correlation between qPCR relative expression and RNA-seq normalized expression (reads per kilobase of transcript, per million mapped reads) between the two technologies ranged from dramatic (A) to minimal (B) when higher identity matches were found by additional and reciprocal searches of reference sequences for better matches. In (A), the identity increased by 10.3% (in the BestHit alignment) between matches for GS_10, resulting in a much better R2. In (B), the identity increased by 14.9% (in the BestHit alignment) between matches for HC_13 with virtually no change in R2. Biological samples in 3A represent a time–course experiment of ‘Granny smith’ apple fruit: T0 = at harvest; T1 = 1 week of storage in 1 °C air; T2 = 2 weeks of storage in 1 °C air; T3 = 1 week of storage in 1 °C air and then 1 week of storage at ambient temperature in air. Biological samples in 3B represent ‘Honeycrisp’ tissue taken after 6 months of 1 °C in air: S1 = symptomatic cortical tissue; S2 = symptomatic peel tissue; A1 = asymptomatic cortical tissue; A2 = asymptomatic peel tissue.
Citation: Journal of the American Society for Horticultural Science J. Amer. Soc. Hort. Sci. 143, 5; 10.21273/JASHS04424-18

Agreement between quantitative real-time polymerase chain reaction (qPCR) and RNA sequencing (RNA-Seq) is variable but can be improved in some cases (for all regressions of all candidate genes, see Supplemental Fig. 4). The change in agreement (Pearson’s R2 correlation between qPCR relative expression and RNA-seq normalized expression (reads per kilobase of transcript, per million mapped reads) between the two technologies ranged from dramatic (A) to minimal (B) when higher identity matches were found by additional and reciprocal searches of reference sequences for better matches. In (A), the identity increased by 10.3% (in the BestHit alignment) between matches for GS_10, resulting in a much better R2. In (B), the identity increased by 14.9% (in the BestHit alignment) between matches for HC_13 with virtually no change in R2. Biological samples in 3A represent a time–course experiment of ‘Granny smith’ apple fruit: T0 = at harvest; T1 = 1 week of storage in 1 °C air; T2 = 2 weeks of storage in 1 °C air; T3 = 1 week of storage in 1 °C air and then 1 week of storage at ambient temperature in air. Biological samples in 3B represent ‘Honeycrisp’ tissue taken after 6 months of 1 °C in air: S1 = symptomatic cortical tissue; S2 = symptomatic peel tissue; A1 = asymptomatic cortical tissue; A2 = asymptomatic peel tissue.
Citation: Journal of the American Society for Horticultural Science J. Amer. Soc. Hort. Sci. 143, 5; 10.21273/JASHS04424-18
Agreement between quantitative real-time polymerase chain reaction (qPCR) and RNA sequencing (RNA-Seq) is variable but can be improved in some cases (for all regressions of all candidate genes, see Supplemental Fig. 4). The change in agreement (Pearson’s R2 correlation between qPCR relative expression and RNA-seq normalized expression (reads per kilobase of transcript, per million mapped reads) between the two technologies ranged from dramatic (A) to minimal (B) when higher identity matches were found by additional and reciprocal searches of reference sequences for better matches. In (A), the identity increased by 10.3% (in the BestHit alignment) between matches for GS_10, resulting in a much better R2. In (B), the identity increased by 14.9% (in the BestHit alignment) between matches for HC_13 with virtually no change in R2. Biological samples in 3A represent a time–course experiment of ‘Granny smith’ apple fruit: T0 = at harvest; T1 = 1 week of storage in 1 °C air; T2 = 2 weeks of storage in 1 °C air; T3 = 1 week of storage in 1 °C air and then 1 week of storage at ambient temperature in air. Biological samples in 3B represent ‘Honeycrisp’ tissue taken after 6 months of 1 °C in air: S1 = symptomatic cortical tissue; S2 = symptomatic peel tissue; A1 = asymptomatic cortical tissue; A2 = asymptomatic peel tissue.
Citation: Journal of the American Society for Horticultural Science J. Amer. Soc. Hort. Sci. 143, 5; 10.21273/JASHS04424-18
Discussion
Overview.
For our explorations of apple fruit cross-cultivar gene expression analysis, we ultimately selected 15 ‘Granny Smith’ and 14 ‘Honeycrisp’ candidate genes for analysis using qPCR. For ‘Granny Smith’ apple genes, a reverse approach was employed in which a subset of genes was hand selected from thousands that were identified ab initio (i.e., expression that was significantly correlated with superficial scald incidence) from a transcriptome analysis of ‘Granny Smith’ apple fruit (Gapper et al., 2017). For ‘Honeycrisp’ apple genes, a forward approach, beginning with a literature search, led to a short list of genes implicated in bitter pit (de Freitas et al., 2010, 2011).
The initial ‘Granny Smith’ candidates were reduced from 28 to 15 by our validation criteria. The excluded genes either lacked BLASTn search matches in the ‘Granny Smith’ de novo transcriptome assembly (n = 3) or failed to produce clean PCR products at the conventional PCR validation step (n = 10). All ‘Honeycrisp’ gene candidates passed the validation process, likely due to previous molecular validation (de Freitas et al., 2010, 2011). Four ‘Honeycrisp’ candidates (identified as four distinct clones in de Freitas et al., 2010) did not produce sufficiently good alignments with predicted ‘Golden Delicious’ genes in Apple V1. In addition, alignments to the NCBI M. ×domestica gene predictions of Apple V2 genes for those four ‘Honeycrisp’ candidates showed only two unique matches. The clones are unique, yet the two pairs are highly similar and may be alleles; the number of ‘Honeycrisp’ genes remains unclear. The final 29 candidate genes (Tables 1, 2, and 4; Supplemental Files 1 and 5) were subject to our full analysis that resulted in successful qPCR assay development for 28 candidate genes, a success rate exceeding 95%. The sole assay, which requires further optimization, produced a bimodal melt curve (GS_02), perhaps due to cryptic alleles present in the qPCR amplicon.
Second-generation sequencing data can be leveraged for gene discovery.
Gene discovery in apple was stimulated with the release of the apple genome (Daccord et al., 2017; Velasco et al., 2010). However, substantial genetic diversity among apple cultivars likely presents a hurdle to gene expression analysis via loss of signal fidelity due to cryptic polymorphisms. Indeed, genomic surveys of apple during the development of a genotyping array showed more than 15 million variants detected across 63 apple cultivars. However, only 3.2% are represented in the final array (Bianco et al., 2016). A majority of gene candidates in this study (selected for their potential involvement in economically relevant disorders of apple fruit) are sufficiently divergent in coding sequences to impede targeted gene expression analysis (Lefever et al., 2013; Wu et al., 2009). We resolved these differences by gene discovery in cultivars of interest using publicly available transcriptome data. Transcriptome assembly is a more efficient approach to explore gene space (i.e., discover gene sequences) compared with genome assembly (De Wit et al., 2012), and in our case easily afforded by publicly available data (Gapper et al., 2017; Leisso et al., 2016). Using a proven de novo transcriptome assembly tool, CLC Genomics Workbench, accurate cultivar-specific gene models were developed. Thus, a workflow for enhanced targeted gene expression analysis was built around the new reference transcriptomes (Fig. 1).
Upfront informatics avoids pitfalls of cryptic polymorphisms.
SNPs have been shown to have dramatic effects on qPCR experiments, with four mismatches able to completely block amplification (Lefever et al., 2013). Fewer than four mismatches can be tolerated, although less so when mismatches occur nearer to the 3′ end of the primer. Just one SNP at a primer binding site can cause clear and significant effects on gene expression measurements (Lefever et al., 2013). To avoid these potential pitfalls, transcriptomes were assembled de novo to discover cultivar-specific versions of apple genes of interest. De novo transcriptomes were previously shown to be highly accurate with regard to base-call errors (Honaas et al., 2016). Therefore, we reasoned that de novo assembled ‘Granny Smith’ and ‘Honeycrisp’ transcripts would be excellent targets for high accuracy qPCR assay design. Here, we report that previously unknown transcript polymorphisms in ‘Granny Smith’ and ‘Honeycrisp’ compared with ‘Golden Delicious’ apple, occur in candidate primer binding sites. In this study, qPCR assays showed high specificity and high efficiency often with good agreement to RNA-Seq estimates of the same genes. Potential assay failures/problems due to cryptic polymorphisms were avoided by leveraging freely available public data.
Cultivar-specific gene discovery reveals altered protein sequences.
Polymorphisms in cultivar-specific genes were discovered that result in alterations of encoded proteins. SNPs that are silent and do not specify a different protein sequence are still equally problematic for primer design. Furthermore, protein sequence changes have the potential to alter biological activity of the protein. In addition to learning about nucleotide sequences to inform primer design, the biological significance of candidate genes can be enhanced by learning about the alterations in encoded proteins. In approximately two-thirds of cases, the cultivar-specific genetic differences resulted in altered protein-coding sequences. Although is it purely speculative to assume some biologically relevant change in these cases, it does provide additional information that will guide future experiments and interpretations thereof. Importantly, this information would remain cryptic without the gene discovery afforded by de novo assembly.
RNA-Seq validation.
It is common practice to validate RNA-Seq data with qPCR data. Several studies report comparisons between RNA-Seq and qPCR that found highly significant, positive correlations between the two methods in a diverse set of experiments (Asmann et al., 2009; Griffith et al., 2010; Gusberti et al., 2013; Wu et al., 2014; Xu et al., 2017). This cross-platform check is a critical step because two concordant estimates of gene expression from fundamentally different technologies is robust validation. Here, we validated our gene expression estimates because we had available RNA-Seq data to examine correlations in cross-cultivar RNA-Seq experiments for our candidates.
Initially, several candidates showed poor correlations between cultivar-specific qPCR assays and cross-cultivar RNA-Seq. Because the qPCR assays were highly specific (with clean melt curves and no off-target primer sites detected in the de novo assembly) and highly efficient (mean efficiency 100.8 ± 6.3), we suspected that the poor correlations were due to issues with the RNA-Seq analysis rather than the qPCR assay, likely due to high genetic diversity among apple cultivars (Bianco et al., 2016). By searching for reciprocal best hits, better gene matches were often found in the ‘Golden Delicious’ genome (Table 4; Supplemental Fig. 4A and B). Although we frequently found better matches, the higher identity matches did not always result in proportional increases in correlation coefficients (for example, see Fig. 3A vs. Fig. 3B). The presence of suboptimal matches is likely due to an incomplete annotation of the ‘Golden Delicious’ genome, genome mis-assembly (e.g., missing genes), transcriptome mis-assembly, or a combination thereof. Each of these could result in inferior matches when searching the de novo transcriptome assembly with predicted ‘Golden Delicious’ genes and vice versa. That closely related genes were often found in our analysis illustrates a key issue of large, complex gene families (that contain many highly similar members) in duplicated plant genomes (Jiao et al., 2011); this problem is confounded by high heterozygosity in apple (Daccord et al., 2017).
It is unclear why, in general, agreement between RNA-Seq and qPCR for the ‘Granny Smith’ candidates was better than for the ‘Honeycrisp’ candidates. It has been shown that even in isogenic comparisons, individual gene characteristics (e.g., gene length and expression level) can influence agreement between the technologies (Everaert et al., 2017). Thus, the difference in our study may arise from an issue of sampling effort and may require a much larger or even global-scale analysis to fully resolve. The different gene selection methods for each cultivar may also help explain the difference. Our ‘Granny Smith’ candidates were selected from a list of genes included in an RNA-Seq analysis, thus weeding out problematic loci—a step to which the ‘Honeycrisp’ candidates were not subjected. Parsing out which differences arise from genetic distinctness vs. other gene characteristics and how these interact with each technology will require additional work.
From a practical standpoint, our workflow provides an easy step to improve the agreement between RNA-seq and qPCR for a set of validation genes. Typically, validation genes are either chosen at random or represent a small set of genes of interest. qPCR is used to verify that, generally, the estimates of gene expression are concordant between the two technologies. This ranges from excellent correlations of R2 > 0.9 for genes of interest (Honaas et al., 2016; Zermiani et al., 2015) to nonmathematical visualization meant to show concordance (Busatto et al., 2018). Here, we show that disagreement may simply result from misidentification of the corresponding homologous gene, but we note that it is also variable from gene to gene.
Beginning a study with a gene discovery step like that described here will promote efficient and accurate qPCR assay design for genes of interest. For example, instead of choosing just 20 genes randomly for validation, we suggest selecting greater than 20, then screening them for the highest identity matches in a de novo transcriptome for the cultivar of interest. Primers could then be designed for the best (i.e., highest identity) RNA-Seq validation candidates, thus providing data with fewer artifacts resulting from cross-cultivar polymorphisms. If genes have already been chosen and qPCRs have already been run, reciprocal searches between de novo transcriptomes and reference genomes may yield matches in the genome with improved agreement to the existing qPCR data.
Considerations for data usage and software selection.
In the present study, we used free data and closed-source software to develop our workflow. However, if free and suitable transcriptome data for a cultivar of interest is not available, there are affordable options to generate sufficient data for robust gene discovery. We previously reported that a data set of roughly 4 Gbp (≈25 million 76 × 76 bp reads) was sufficient to reconstruct a majority of detected transcripts (Honaas et al., 2016) from a single tissue sample. The cost to generate such a transcriptome data set is easily on the order of the cost to develop qPCR primers for just a handful of candidates and should therefore be considered when no public data are available.
The workflow presented here does not rely on functionality unique to the closed-source software used in this study (CLC Genomics Work Bench and Geneious). The data retrieval and validation application SRA Toolkit from NCBI is open-source. Trinity (Grabherr et al., 2011) is an excellent open-source de novo assembler, and Honaas et al. (2016) have previously shown it to be among the top de novo transcriptome assemblers along with CLC Genomics Workbench and SOAPdenovo-Trans (also open-source; Beijing Genomics Institute, 2013). BLASTn, Primer3 (Untergasser et al., 2012), R, and the package we used for determining primer efficiency, “qpcR” (Ritz and Spiess 2008), are all open-source and freely available. An open source alternative for sequence alignments is MEGA (Kumar et al., 2016).
Conclusion
As genomes and transcriptomes become available for more specialty crops, opportunities to use these resources emerge. In this study, as a test group we focused on genes potentially involved in physiological disorders of apple fruit: superficial scald in ‘Granny Smith’ and bitter pit in ‘Honeycrisp’. To develop predictive and diagnostic tests for these disorders, knowledge of molecular mechanisms underlying superficial scald and bitter pit development are essential. This knowledge relies on high-fidelity measurements of gene activity as well as high-confidence RNA-Seq validation tests. We aimed to enhance gene expression analysis, and to that end, we have shown that transcriptome data can be leveraged before assay development to avoid pitfalls, enhance efficiency, and learn about genes of interest in genetically distinct apple cultivars, as well as improve RNA-Seq validation with qPCR.
Authors contributed equally to this work.
Literature Cited
Ahmad, R., Parfitt, D.E., Fass, J., Ogundiwin, E., Dhingra, A., Gradziel, T.M., Lin, D., Joshi, N.A., Martinez-Garcia, P.J. & Crisosto, C.H. 2011 Whole genome sequencing of peach (Prunus persica L.) for SNP identification and selection BMC Genomics 12 1 569
Arya, M., Shergill, I.S., Williamson, M., Gommersall, L., Arya, N. & Patel, H.R.H. 2005 Basic principles of real-time quantitative PCR Expert Rev. Mol. Diagn. 5 2 209 219
Asmann, Y., Klee, E., Thompson, E.A., Perez, E., Middha, S., Oberg, A., Therneau, T., Smith, D., Poland, G., Wieben, E. & Kocher, J.P. 2009 3′ tag digital gene expression profiling of human brain and universal reference RNA using Illumina Genome Analyzer BMC Genomics 10 1 531
Babraham Bioinformatics.2018 FastQC. 20 Apr. 2017. <https://www.bioinformatics.babraham.ac.uk/projects/fastqc/>
Beijing Genomics Institute 2013 SOAP (Short Oligonucleotide Analysis Package). 11 Jan. 2018. <http://soap.genomics.org.cn/SOAPdenovo-Trans.html#faq2>
Benson, D.A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J. & Sayers, E.W. 2012 GenBank Nucleic Acids Res. 41 D1 D36 D42
Bianco, L., Cestaro, A., Linsmith, G., Muranty, H., Denance, C., Theron, A., Poncet, C., Micheletti, D., Kerschbamer, E., Di Pierro, E.A., Larger, S., Pindo, M., Van de Weg, E., Davassi, A., Laurens, F., Velasco, R., Durel, C.E. & Troggio, M. 2016 Development and validation of the Axiom® Apple480K SNP genotyping array Plant J. 86 1 62 74
Bolger, A.M., Lohse, M. & Usadel, B. 2014 Trimmomatic: A flexible trimmer for Illumina sequence data Bioinformatics 30 15 2114 2120
Bowen, J., Ireland, H.S., Crowhurst, R., Luo, Z., Watson, A.E., Foster, T., Gapper, N., Giovanonni, J.J., Mattheis, J.P., Watkins, C., Rudell, D., Johnston, J.W. & Schaffer, R.J. 2014 Selection of low-variance expressed Malus × domestica (apple) genes for use as quantitative PCR reference genes (housekeepers) Tree Genet. Genomes 10 3 751 759
Bramlage, W.J. & Watkins, C.B. 1994 Influences of preharvest temperature and harvest maturity on susceptibility of New Zealand and North American apples to superficial scald N. Z. J. Crop Hort. Sci. 22 1 69 79
Bucher Laboratory 2018 The apple genome and epigenome: A project led by the Bucher lab. 15 Dec. 2017. <https://iris.angers.inra.fr/gddh13/the-apple-genome-downloads.html>
Busatto, N., Farneti, B., Commisso, M., Bianconi, M., Iadarola, B., Zago, E., Ruperti, B., Spinelli, F., Zanella, A., Velasco, R., Ferrarini, A., Chitarrini, G., Vrhovsek, U., Delledonne, M., Guzzo, F., Costa, G. & Costa, F. 2018 Apple fruit superficial scald resistance mediated by ethylene inhibition is associated with diverse metabolic processes Plant J. 93 2 270 285
Chagne, D., Crowhurst, R.N., Pindo, M., Thrimawithana, A., Deng, C., Ireland, H., Fiers, M., Dzierzon, H., Cestaro, A., Fontana, P., Bianco, L., Lu, A., Storey, R., Knabel, M., Saeed, M., Montanari, S., Kim, Y.K., Nicolini, D., Larger, S., Stefani, E., Allan, A.C., Bowen, J., Harvey, I., Johnston, J., Malnoy, M., Troggio, M., Perchepied, L., Sawyer, G., Wiedow, C., Won, K., Viola, R., Hellens, R.P., Brewer, L., Bus, V.G., Schaffer, R.J., Gardiner, S.E. & Velasco, R. 2014 The draft genome sequence of european pear (Pyrus communis L. ‘Bartlett’) PLoS One 9 e92644
Clark, R.M., Schweikert, G., Toomajian, C., Ossowski, S., Zeller, G., Shinn, P., Warthmann, N., Hu, T.T., Fu, G., Hinds, D.A., Chen, H., Frazer, K.A., Huson, D.H., Scholkopf, B., Nordborg, M., Ratsch, G., Ecker, J.R. & Weigel, D. 2007 Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana Science 317 5836 338 342
Daccord, N., Celton, J.-M., Linsmith, G., Becker, C., Choisne, N., Schijlen, E., van de Geest, H., Bianco, L., Micheletti, D., Velasco, R., Pierro, E., Gouzy, J., Rees, J.D.G., Guérif, P., Muranty, H., Durel, C.-E., Laurens, F., Lespinasse, Y., Gaillard, S., Aubourg, S., Quesneville, H., Weigel, D., van de Weg, E., Troggio, M. & Bucher, E. 2017 High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development Nat. Genet. 49 7 1099 1106
de Freitas, S.T., do Amarante, C.V.T., Labavitch, J.M. & Mitcham, E.J. 2010 Cellular approach to understand bitter pit development in apple fruit Postharvest Biol. Technol. 57 1 6 13
de Freitas, S.T., Padda, M., Wu, Q., Park, S. & Mitcham, E.J. 2011 Dynamic alternations in cellular and molecular components during blossom-end rot development in tomatoes expressing sCAX1, a constitutively active Ca2+/H+ antiporter from Arabidopsis Plant Physiol. 156 2 844 855
De Wit, P., Pespeni, M.H., Ladner, J.T., Barshis, D.J., Seneca, F., Jaris, H., Therkildsen, N.O., Morikawa, M. & Palumbi, S.R. 2012 The simple fool’s guide to population genomics via RNA-Seq: An introduction to high-throughput sequencing data analysis Mol. Ecol. Resour. 12 6 1058 1067
Doerflinger, F.C., Rickard, B.J., Nock, J.F. & Watkins, C.B. 2015 An economic analysis of harvest timing to manage the physiological storage disorder firm flesh browning in ‘Empire’ apples Postharvest Biol. Technol. 107 1 8
Duan, N., Bai, Y., Sun, H., Wang, N., Ma, Y., Li, M., Wang, X., Jiao, C., Legall, N., Mao, L., Wan, S., Wang, K., He, T., Feng, S., Zhang, Z., Mao, Z., Shen, X., Chen, X., Jiang, Y., Wu, S., Yin, C., Ge, S., Yang, L., Jiang, S., Xu, H., Liu, J., Wang, D., Qu, C., Wang, Y., Zuo, W., Xiang, L., Liu, C., Zhang, D., Gao, Y., Xu, Y., Xu, K., Chao, T., Fazio, G., Shu, H., Zhong, G.Y., Cheng, L., Fei, Z. & Chen, X. 2017 Genome re-sequencing reveals the history of apple and supports a two-stage model for fruit enlargement Nat. Commun. 8 1 249
Everaert, C., Luypaert, M., Maag, J.L.V., Cheng, Q.X., Dinger, M.E., Hellemans, J. & Mestdagh, P. 2017 Benchmarking of RNA-sequencing analysis workflows using whole-transcriptome RT-qPCR expression data Scientific Rpt. 7 1 1559
Freeman, W.M., Walker, S.J. & Vrana, K.E. 1999 Quantitative RT-PCR: Pitfalls and potential Biotechniques 26 1 112 125
Gapper, N.E., Hertog, M.L., Lee, J., Buchanan, D.A., Leisso, R.S., Fei, Z., Qu, G., Giovannoni, J.J., Johnston, J.W., Schaffer, R.J., Nicolaï, B.M., Mattheis, J.P., Watkins, C.B. & Rudell, D.R. 2017 Delayed response to cold stress is characterized by successive metabolic shifts culminating in apple fruit peel necrosis BMC Plant Biol. 17 1 77
Gehring, I. & Geider, K. 2012 Identification of Erwinia species isolated from apples and pears by differential PCR J. Microbiol. Methods 89 1 57 62
Ginzinger, D.G. 2002 Gene quantification using real-time quantitative PCR: An emerging technology hits the mainstream Exp. Hematol. 30 6 503 512
Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., di Palma, F., Birren, B.W., Nusbaum, C., Lindblad-Toh, K., Friedman, N. & Regev, A. 2011 Full-length transcriptome assembly from RNA-Seq data without a reference genome Nat. Biotechnol. 29 7 644 652
Griffith, M., Griffith, O.L., Mwenifumbo, J., Goya, R., Morrissy, A.S., Morin, R.D., Corbett, R., Tang, M.J., Hou, Y.C., Pugh, T.J., Robertson, G., Chittaranjan, S., Ally, A., Asano, J.K., Chan, S.Y., Li, H.I., McDonald, H., Teague, K., Zhao, Y., Zeng, T., Delaney, A., Hirst, M., Morin, G.B., Jones, S.J., Tai, I.T. & Marra, M.A. 2010 Alternative expression analysis by RNA sequencing Nat. Methods 7 10 843 847
Guescini, M., Sisti, D., Rocchi, M.B., Stocchi, L. & Stocchi, V. 2008 A new real-time PCR method to overcome significant quantitative inaccuracy due to slight amplification inhibition BMC Bioinformatics 9 1 326
Gusberti, M., Gessler, C. & Broggini, G.A. 2013 RNA-Seq analysis reveals candidate genes for ontogenic resistance in Malus-Venturia pathosystem PLoS One 8 e78457
Hellemans, J., Mortier, G., De Paepe, A., Speleman, F. & Vandesompele, J. 2007 qBase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data Genome Biol. 8 2 R19
Honaas, L.A., Wafula, E.K., Wickett, N.J., Der, J.P., Zhang, Y., Edger, P.P., Altman, N.S., Pires, J.C., Leebens-Mack, J.H. & dePamphilis, C.W. 2016 Selecting superior de novo transcriptome assemblies: Lessons learned by leveraging the best plant genome PLoS One 11 e0146062
Honaas, L.A. & Kahn, E. 2017 A practical examination of RNA isolation methods for european pear (Pyrus communis) BMC Res. Notes 10 1 237
Jatayev, S., Kurishbayev, A., Zotova, L., Khasanova, G., Serikbay, D., Zhubatkanov, A., Botayeva, M., Zhumalin, A., Turbekova, A., Soole, K., Langridge, P. & Shavrukov, Y. 2017 Advantages of Amplifluor-like SNP markers over KASP in plant genotyping BMC Plant Biol. 17 2 254
Jiao, Y.N., Wickett, N.J., Ayyampalayam, S., Chanderbali, A.S., Landherr, L., Ralph, P.E., Tomsho, L.P., Hu, Y., Liang, H.Y., Soltis, P.S., Soltis, D.E., Clifton, S.W., Schlarbaum, S.E., Schuster, S.C., Ma, H., Leebens-Mack, J. & dePamphilis, C.W. 2011 Ancestral polyploidy in seed plants and angiosperms Nature 473 7345 97 100
Johnson, F.T. & Zhu, Y. 2015 Transcriptome changes in apple peel tissues during CO2 injury symptom development under controlled atmosphere storage regimens Hort. Res. 2 15061
Joint Genome Institute 2018 Phytozome v12, the Plant Comparative Genomics portal of the Department of Energy's Joint Genome Institute. 8 Sept. 2017. <https://phytozome.jgi.doe.gov/pz/portal.html>
Jung, S., Ficklin, S.P., Lee, T., Cheng, C.H., Blenda, A., Zheng, P., Yu, J., Bombarely, A., Cho, I., Ru, S., Evans, K., Peace, C., Abbott, A.G., Mueller, L.A., Olmstead, M.A. & Main, D. 2014 The Genome Database for Rosaceae (GDR): Year 10 update Nucleic Acids Res. 42 D1 D1237 D1244
Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, S., Cooper, A., Markowitz, S., Duran, C., Thierer, T., Ashton, B., Mentjies, P. & Drummond, A. 2012 Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data Bioinformatics 28 12 1647 1649
Kumar, S., Stecher, G. & Tamura, K. 2016 MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets Mol. Biol. Evol. 33 7 1870 1874
Langmead, B. & Salzberg, S.L. 2012 Fast gapped-read alignment with Bowtie 2 Nat. Methods 9 4 357 359
Larrigaudière, C., Candan, A.P., Giné-Bordonaba, J., Civello, M. & Calvo, G. 2016 Unravelling the physiological basis of superficial scald in pears based on cultivar differences Scientia Hort. 213 340 345
Lefever, S., Pattyn, F., Hellemans, J. & Vandesompele, J. 2013 Single-nucleotide polymorphisms and other mismatches reduce performance of quantitative PCR assays Clin. Chem. 59 10 1470 1480
Leisso, R.S., Buchanan, D.A., Lee, J., Mattheis, J.P., Sater, C., Hanrahan, I., Watkins, C.B., Gapper, N., Johnston, J.W., Schaffer, R.J., Hertog, M.L., Nicolai, B.M. & Rudell, D.R. 2015 Chilling-related cell damage of apple (Malus x domestica Borkh.) fruit cortical tissue impacts antioxidant, lipid and phenolic metabolism Physiol. Plant. 153 2 204 220
Leisso, R.S., Gapper, N.E., Mattheis, J.P., Sullivan, N.L., Watkins, C.B., Giovannoni, J.J., Schaffer, R.J., Johnston, J.W., Hanrahan, I., Hertog, M.L., Nicolai, B.M. & Rudell, D.R. 2016 Gene expression and metabolism preceding soft scald, a chilling injury of ‘Honeycrisp’ apple fruit BMC Genomics 17 1 798
Li, B. & Dewey, C.N. 2011 RSEM: Accurate transcript quantification from RNA-Seq data with or without a reference genome BMC Bioinformatics 12 1 323
Lum, G.B., Shelp, B.J., DeEll, J.R. & Bozzo, G.G. 2016 Oxidative metabolism is associated with physiological disorders in fruits stored under multiple environmental stresses Plant Sci. 245 143 152
National Institutes of Health 2018 The National Center for Biotechnology Information. 20 Apr. 2017. <https://www.ncbi.nlm.nih.gov/>
Nham, N.T., de Freitas, S.T., Macnish, A.J., Carr, K.M., Kietikul, T., Guilatco, A.J., Jiang, C.Z., Zakharov, F. & Mitcham, E.J. 2015 A transcriptome approach towards understanding the development of ripening capacity in ‘Bartlett’ pears (Pyrus communis L.) BMC Genomics 16 1 762
O’Driscoll, L. 2011 Gene expression profiling. Humana Press, New York, NY
Perini, P., Pasquali, G., Margis-Pinheiro, M., de Oliviera, P.R.D. & Fernando Revers, L. 2014 Reference genes for transcriptional analysis of flowering and fruit ripening stages in apple (Malus × domestica Borkh.) Mol. Breed. 34 3 829 842
Pfaffl, M.W. 2001 A new mathematical model for relative quantification in real-time RT-PCR Nucleic Acids Res. 29 9 2002 2007
R Core Team 2017 R: A language and environment for statistical computing. 31 Oct. 2016. <https://www.R-project.org/>
Rice, P., Longden, I. & Bleasby, A. 2000 EMBOSS: The European molecular biology open source software suite Trends Genet. 16 6 276 277
Ritz, C. & Spiess, A. 2008 qpcR: An R package for sigmoidal model selection in quantitative real-time polymerase chain reaction analysis Bioinformatics 24 13 1549 1551
Rosenberger, D., Schupp, J., Watkins, C.B., Iungerman, K., Hoying, S., Straub, D. & Cheng, L. 2001 Honeycrisp: Promising profit maker or just another problem child? New York Fruit Qrtly. 9 3 9 13
Sevillano, L., Sanchez-Ballesta, M.T., Romojaro, F. & Flores, F.B. 2009 Physiological, hormonal and molecular mechanisms regulating chilling injury in horticultural species. Postharvest technologies applied to reduce its impact J. Sci. Food Agr. 89 4 555 573
Storch, T.T., Pegoraro, C., Finatto, T., Quecini, V., Rombaldi, C.V. & Girardi, C.L. 2015 Identification of a novel reference gene for apple transcriptional profiling under postharvest conditions PLoS One 10 e0120599
Takahashi, Y., Teshima, K.M., Yokoi, S., Innan, H. & Shimamoto, K. 2009 Variations in Hd1 proteins, Hd3a promoters, and Ehd1 expression levels contribute to diversity of flowering time in cultivated rice Proc. Natl. Acad. Sci. USA 106 11 4555 4560
Thudi, M., Li, Y., Jackson, S.A., May, G.D. & Varshney, R.K. 2012 Current state-of-art of sequencing technologies for plant genomics research Brief. Funct. Genomics 11 1 3 11
Untergasser, A., Cutcutache, I., Koressaar, T., Ye, J., Faircloth, B.C., Remm, M. & Rozen, S.G. 2012 Primer3-new capabilities and interfaces Nucleic Acids Res. 40 15 1 12
VanBuren, R., Bryant, D., Bushakra, J.M., Vining, K.J., Edger, P.P., Rowley, E.R., Priest, H.D., Michael, T.P., Lyons, E., Filichkin, S.A., Dossett, M., Finn, C.E., Bassil, N.V. & Mockler, T.C. 2016 The genome of black raspberry (Rubus occidentalis) Plant J. 87 6 535 547
Velasco, R., Zharkikh, A., Affourtit, J., Dhingra, A., Cestaro, A., Kalyanaraman, A., Fontana, P., Bhatnagar, S.K., Troggio, M., Pruss, D., Salvi, S., Pindo, M., Baldi, P., Castelletti, S., Cavaiuolo, M., Coppola, G., Costa, F., Cova, V., Ri, A., Goremykin, V., Komjanc, M., Longhi, S., Magnago, P., Malacarne, G., Malnoy, M., Micheletti, D., Moretto, M., Perazzolli, M., Si-Ammour, A., Vezzulli, S., Zini, E., Eldredge, G., Fitzgerald, L.M., Gutin, N., Lanchbury, J., Macalma, T., Mitchell, J.T., Reid, J., Wardell, B., Kodira, C., Chen, Z., Desany, B., Niazi, F., Palmer, M., Koepke, T., Jiwan, D., Schaeffer, S., Krishnan, V., Wu, C., Chu, V.T., King, S.T., Vick, J., Tao, Q., Mraz, A., Stormo, A., Stormo, K., Bogden, R., Ederle, D., Stella, A., Vecchietti, A., Kater, M.M., Masiero, S., Lasserre, P., Lespinasse, Y., Allan, A.C., Bus, V., Chagné, D., Crowhurst, R.N., Gleave, A.P., Lavezzo, E., Fawcett, J.A., Proost, S., Rouzé, P., Sterck, L., Toppo, S., Lazzari, B., Hellens, R.P., Durel, C.-E., Gutin, A., Bumgarner, R.E., Gardiner, S.E., Skolnick, M., Egholm, M., de Peer, Y., Salamini, F. & Viola, R. 2010 The genome of the domesticated apple (Malus × domestica Borkh.) Nat. Genet. 42 10 833 839
Voelckel, C., Gruenheit, N. & Lockhart, P. 2017 Evolutionary transcriptomics and proteomics: Insight into plant adaptation Trends Plant Sci. 22 6 462 471
Wang, Z., Gerstein, M. & Snyder, M. 2009 RNA-Seq: A revolutionary tool for transcriptomics Nat. Rev. Genet. 10 1 57 63
Ward, J., Jasbir, B., Fernaìndez-Fernaìndez, F., Moore, P., Swanson, J.D., Viola, R., Velasco, R., Bassil, N., Weber, C.A. & Sargent, D.J. 2013 Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation BMC Genomics 14 1 2
Wong, M.L. & Medrano, J.F. 2005 Real-time PCR for mRNA quantitation Biotechniques 39 1 75 85
Wu, A.R., Neff, N.F., Kalisky, T., Dalerba, P., Treutlein, B., Rothenberg, M.E., Mburu, F.M., Mantalas, G.L., Sim, S., Clarke, M.F. & Quake, S.R. 2014 Quantitative assessment of single-cell RNA-sequencing methods Nat. Methods 11 1 41 46
Wu, J., Wang, Z., Shi, Z., Zhang, S., Ming, R., Zhu, S., Khan, M.A., Tao, S., Korban, S.S., Wang, H., Chen, N.J., Nishio, T., Xu, X., Cong, L., Qi, K., Huang, X., Wang, Y., Zhao, X., Wu, J., Deng, C., Gou, C., Zhou, W., Yin, H., Qin, G., Sha, Y., Tao, Y., Chen, H., Yang, Y., Song, Y., Zhan, D., Wang, J., Li, L., Dai, M., Gu, C., Wang, Y., Shi, D., Wang, X., Zhang, H., Zeng, L., Zheng, D., Wang, C., Chen, M., Wang, G., Xie, L., Sovero, V., Sha, S., Huang, W., Zhang, S., Zhang, M., Sun, J., Xu, L., Li, Y., Liu, X., Li, Q., Shen, J., Wang, J., Paull, R.E., Bennetzen, J.L., Wang, J. & Zhang, S. 2013 The genome of the pear (Pyrus bretschneideri Rehd.) Genome Res. 23 2 396 408
Wu, J.-H., Hong, P.-Y. & Liu, W.-T. 2009 Quantitative effects of position and type of single mismatch on single base primer extension J. Microbiol. Methods 77 3 267 275
Xu, X., Chen, M., Ji, J., Xu, Q., Qi, X. & Chen, X. 2017 Comparative RNA-seq based transcriptome profiling of waterlogging response in cucumber hypocotyls reveals novel insights into the de novo adventitious root primordia initiation BMC Plant Biol. 17 1 129
Zermiani, M., Zonin, E., Nonis, A., Begheldo, M., Ceccato, L., Vezzaro, A., Baldan, B., Trentin, A., Masi, A., Pegoraro, M., Fadanelli, L., Teale, W., Palme, K., Quintieri, L. & Ruperti, B. 2015 Ethylene negatively regulates transcript abundance of ROP-GAP rheostat-encoding genes and affects apoplastic reactive oxygen species homeostasis in epicarps of cold stored apple fruits J. Expt. Bot. 66 22 7255 7270