Feasibility of Genome-wide Association Analysis Using a Small Single Nucleotide Polymorphism Panel in an Apple Breeding Population Segregating for Fruit Skin Color
Click on author name to view affiliation information
Click on author name to view affiliation information
Single-nucleotide polymorphisms (SNPs) have been used for a range of genetic studies and are now starting to be applied for marker-assisted selection in plant breeding programs. To identify SNP markers associated with red fruit skin color, we conducted a genome-wide association (GWA) analysis in an apple (Malus ×domestica Borkh.) breeding population comprising 94 phenotyped individuals using a 384-plex SNP assay. Linkage disequilibrium (LD) analysis indicated that LD extends over a long physical distance in the population (17 Mbp), indicating that a small number of generations separates the individuals. No significant association of anthocyanin content, overcolor, and colorimetric measures (a*, b*, L*, a/b*, and hue angle) with a marker was identified, although the apple fruit skin color locus has been previously located on apple linkage group 9. Our trial of a small SNP panel for GWA in apple breeding material has demonstrated the limitation of this approach for marker trait association.
Plant breeders worldwide are focusing on improving the prediction of phenotype from genotype. A range of genomic tools is now available to address this issue, including dense genetic maps and whole genome sequences. Large sets of molecular markers are being developed and our knowledge of the molecular control of a wide range of plant traits is growing speedily. Molecular markers are now widely used in plant breeding for the acceleration of genetic gain. Systems include marker-assisted selection (MAS) using either markers that are candidate genes for the trait of interest or are closely linked to chromosomal segments (Collard and Mackill, 2008) as well as selection at the whole genome level (e.g., Kumar et al., 2012). The use of molecular markers for phenotypic prediction is especially attractive for long-lived woody crop plants, where characters related to fruiting are often expressed several years after crossing, the cost of raising plants in research orchards is high, and where some traits are difficult to assess.
In recent years, the application of SNP markers has gained much attention in both the scientific and plant breeding communities (Rafalski, 2002). SNPs are abundant and evenly distributed throughout the genomes of most plant species and their main advantage over other molecular markers is that they are the most abundant form of genetic variation within plant genomes. Considerable progress has been made both in the area of SNP discovery and SNP assay development. SNP discovery in silico is now feasible thanks to the availability of whole genome sequences of rosaceous species including strawberry [Fragaria vesca L. (Shulaev et al., 2011)], apple (Velasco et al., 2010), pear [Pyrus bretschneideri Rehd. (Wu et al., 2013) and Pyrus communis (Chagné et al., 2014)], and peach [Prunus persica (L.) Batch (Verde et al., 2013)]. In apple, SNP markers were developed from large sets of cDNA sequences before the whole genome sequence became available (Chagné et al., 2008) and more recently have been identified using whole genome resequencing (Chagné et al., 2012a). Once SNPs linked to a trait have been identified and validated, they can be used by breeders to screen large sets of seedlings. The range of techniques for high-throughput SNP genotyping has evolved rapidly in recent years thanks to technological progress and several high-throughput platforms allow rapid and simultaneous genotyping of hundreds or thousands of plants with up to hundreds of thousands of SNPs. This includes high-throughput SNP arrays (Chagné et al., 2012a; Verde et al., 2012) and more recently genotyping by sequencing as the throughput of next-generation sequencing techniques (Elshire et al., 2011) has increased. The GoldenGate® assay (Illumina, San Diego, CA) has been the method of choice for researchers screening lower numbers of SNP markers, because it is capable of medium-density genotyping with 96 to 1536 SNPs per array (Fan et al., 2006). The GoldenGate® assay has been used for genetic analysis in several crop species: wheat [Triticum aestivum L. (Akhunov et al., 2009)], maize [Zea mays L. (Yan et al., 2010)], soybean [Glycine max (L.) Merr. (Hyten et al., 2008)], barley [Hordeum vulgare L. (Rostoks et al., 2006)], loblolly pine [Pinus taeda L. (Eckert et al., 2009)], and recently in apple (Khan et al., 2012) and peach (Martinez-Garcia et al., 2013).
SNP arrays have the potential to be used for a variety of applications in crop improvement, including genetic mapping and quantitative trait locus (QTL) analysis, diversity analysis, germplasm management and MAS as well as association studies (Yan et al., 2010). One methodology used for association analysis is GWA, which involves spanning the genome with a sufficient number of markers to enable detection of regions associated with the phenotype of interest. GWA is based on LD mapping and identifies markers that show a statistically significant association with phenotype in natural populations or germplasm collections (Myles et al., 2009). Once molecular markers linked to a trait of interest have been identified and validated, they can be used for MAS in wider breeding populations (Varshney et al., 2009). LD varies as a function of recombination frequency along the genome, history of the species, mating system as well as population structure (Gaut, 2003). Because the decay of LD is dependent on the history of the population used and its diversity, it is likely that plant material from breeding programs, which often use a limited number of founder individuals, will exhibit LD decay at a longer physical distance than more broadly based germplasm populations that are sampled over extensive natural ranges. GWA was recently reported for apple using an 8000-SNP array and a total population size of 1200 individuals, enabling the identification of SNPs associated to a range of traits (Kumar et al., 2013).
The accumulation of anthocyanin pigments in apple fruit is an important determinant of fruit quality, because red coloration of apple skin is a key factor for the acceptance of apple fruit by the marketplace and individual consumers. In general, red-colored varieties achieve higher market prices (Baugher et al., 1990; Iglesias and Alegre, 2006). Furthermore, red pigment content is implicated in the health attributes of apple fruit (Boyer and Liu, 2004). The main pigment responsible for red color in apples is cyanidin-3-galactoside, which belongs to the anthocyanin family (McGhie et al., 2005). Anthocyanins are derived from cinnamic acid by L-phenylalanine deamination, a reaction catalyzed by phenylalanine ammonia-lyase (PAL) (Lancaster and Dougall, 1992). Both PAL activity and the development of red color are directly regulated by environmental factors such as light (Arakawa, 1991; Lancaster and Dougall, 1992; Saure, 1990; Ubi et al., 2006) and temperature (Faragher, 1983; Lin-Wang et al., 2011; Ubi et al., 2006; Xie et al., 2012) and are variable across cultivars (Curry, 1997; Dickinson and White, 1986; Iglesias et al., 2008). Although most of the genes encoding enzymes involved in anthocyanin biosynthesis are under the regulation of ultraviolet-B light and temperature, recent studies indicate that the overall regulation of expression of the anthocyanin biosynthetic genes is by a transcription factor belonging to the MYB family: MdMYB1/MYB10 (Espley et al., 2007; Takos et al., 2006). This study investigating the use of a panel of 384 SNPs for identification of markers associated with red skin coloration using GWA analysis is a pilot study set up to evaluate the usefulness of this density of markers for trait mapping.
A total of 94 individuals segregating for skin color (derived from a single maternal parent and an unknown number of paternal parents) was selected from the joint Institut de Recerca i Tecnologia Agroalimentàries (IRTA) and New Zealand Plant & Food Research Ltd. pipfruit breeding program in Lleida, Spain. The trees were planted at the Gimenells (Spain) IRTA research orchards in 2006 and grown at 1 × 3.4-m spacing using standard commercial management practices recommended for the area, including fertilizer application and disease and pest control. Leaf tissue was harvested from each genotype, freeze-dried, and stored at –80 °C until required. The leaf samples were ground to fine powder in liquid nitrogen and DNA was extracted using the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions.
Five fruit from each tree/genotype were collected at optimum maturity (an average of approximately three) in a starch pattern index scale of 0 (low maturity) to 6 (over mature) (ENZA Fruit, Hastings, New Zealand) from the beginning of Aug. 2008 to the end of Oct. 2008. A total of 480 fruit was assessed within the 24 h after fruit harvest. Apple skin color was determined using three methods: visual assessment, colorimetry, and anthocyanin content. Overcolor was measured by a trained panel as the percentage of surface color using a visual scale from 0% (no blush) to 100% (fully colored) and expressed as the mean value of all fruits of the same genotype. Other assessments were at an equatorial location on the blushed side of each fruit. Colorimetric values were measured using a portable tristimulus colorimeter (Chroma Meter CR-200; Minolta Corp., Osaka, Japan). Chromaticity was expressed in L*, a*, and b* color space coordinates according to Commission Internationale de l'Eclairage. The L* value represents the relative lightness of colors with a range from 0 (black) to 100 (white), being small for dark colors and large for light colors. Both a* and b* scales extend from –60 to 60; a* ranges from green (–a*) to red (+a*) and b* ranges from blue (–b*) to yellow (+b*). Hue angle was calculated as described by McGuire (1992) and expressed in degrees; a*/b* was also calculated.
Total anthocyanins were extracted from 11-mm-diameter skin disks taken separately for each color measurement. The skin disks were held at 4 °C for 24 h in the dark in 10 mL of a solution of 50 methanol (26.4 M):1 HCl (35%):49 water and absorbance of the extracts was measured with a spectrophotometer (series 1000; Cecil Instruments, Cambridge, U.K.) at 532 nm. The anthocyanin concentration was subsequently determined using a molar extinction coefficient of 3.43 × 104 and data were expressed in nanomoles per square centimeter.
The 94 segregating individuals were genotyped with a panel of evenly spaced SNP markers spanning the entire apple genome using a 384-plex assay developed from a set of 1679 SNPs. These SNPs have been mapped in an integrated apple genetic map (Velasco et al., 2010) and are included in the International RosBREED SNP Consortium 8000-SNP array (Chagné et al., 2012a). Data were analyzed using GenomeStudio Version 1.0 software Genotyping Module (Illumina) with a Gencall threshold of 0.5.
Descriptive statistics regarding phenotypic data (chromaticity values, anthocyanin content, and overcolor), including means, distributions and correlations, were generated using JMP 8.0 software (SAS Institute, Cary, NC).
The genome wide decay of the LD was calculated after removing alleles with minor frequency (MAF) lower than 5%. GDA 1.1 software (Lewis and Zaykin, 2001) was used to compute the composite disequilibrium coefficient (Δab) between pairs of allele A and B at two different loci, either in the same or different LG, according to Weir (1996). For normalization, the interallelic correlation coefficient, r2ab, was calculated as in Weir (1996). The significance threshold, based on the null hypothesis of no linkage, was calculated by comparison with a χ2 test with 1 df for α ≤ 0.01 (χ21df = 6.635) as in Zaykin (2004).
The association analysis was carried out using the GAPIT R package (Lipka et al., 2012). The genome-wide significance threshold was α ≤ 0.01 (which roughly equates to a comparison-wise probability value of 3.4 × 10−5 assuming 295 SNPs were tested). A regular mixed linear model that takes into account population structure (Q matrix) and kinship (K matrix) was used for the association analysis. The Q matrix takes into account the first six principal components of a principal component analysis (PCA). The K matrix is a measure of relative kinship and quantifies the probability that two homologous genes are identical by descent. The K matrix was generated according to the method of VanRaden (2008). Both the kinship and the structure were computed directly using GAPIT.
Half of the individuals in the germplasm set exhibited colored fruit (orange–red, pink, red, or dark red) and the other half had non-colored fruit when the data were expressed as the mean percentage of overcolor in fruit harvested from an individual genotype. Fruit coloration was distributed from blushed fruit (10% to 20% surface covered), bicolor fruits (20% to 60%) to colored fruits (60% to 100%) (Supplemental Fig. 1). Of all colored fruit, 58.1% exhibited a blush pattern, whereas 41.9% had a striped pattern. Colorimeter L* values (lightness/darkness) varied from 35.51 (dark colors) to 80.13 (light colors), a* had a range from green (–17.63, negative value) to red colors (41.27, positive value), and b* values showed a range of different yellow colors between 14 and 59.48. Hue angle value varied from a minimum of 22.5 to a maximum of 111.7; where lower values of hue angle indicated higher fruit coloration. Anthocyanin concentration varied from 6.02 to 77.66 nmol·cm−2 (Supplemental Table 1).
The regressions for anthocyanin concentration vs. L*, hue angle, and overcolor provided coefficients of determination (R2) of 0.97, 0.97, and 0.94 (P ≤ 0.001), respectively (Fig. 1). The phenotypic correlations between color traits (chromaticity values, anthocyanin concentration, and overcolor) were highly significant for all traits [P < 0.001 (Table 1)].
Citation: Journal of the American Society for Horticultural Science J. Amer. Soc. Hort. Sci. 139, 6; 10.21273/JASHS.139.6.619
Of the 384 SNP markers screened over the 94 individuals (Supplemental Fig. 2), 309 (80.5%) were polymorphic, displaying three clear clusters corresponding to the expected genotypes (e.g., AA homozygote, BB homozygote, and AB heterozygote). Ambiguous data points located between clusters were scored as missing data. Of the remaining markers, 53 (13.8%) were monomorphic and 22 (5.7%) failed to amplify.
The 295 SNPs with a MAF higher than 0.05 (Supplemental Table 2; Supplemental Figs. 2 and 3) were considered for the LD analysis, whereby the interallelic r2 values, namely the association between each of the alleles at two loci, were calculated for a total of 43,345 pairs of alleles, 2,565 of which involved intrachromosome comparison. When the decay of r2 against the physical distance between the pairs of loci based on the apple genome assembly (Velasco et al., 2010) was calculated (Fig. 2), the curves that describe the LD decay with distance reach the r2 significance value at 17 Mbp (r2 ≥ 0.07). When pairs of SNPs with a distance shorter than 10 Mbp were considered, the average r2 and the percentage of comparison with r2 ≥ 0.07 were 0.29% and 67.8%, respectively. These values decrease to 0.18% and 49.4% considering all the intrachromosomal comparison and 0.02% and 4.1% considering the interchromosomal comparisons.
Citation: Journal of the American Society for Horticultural Science J. Amer. Soc. Hort. Sci. 139, 6; 10.21273/JASHS.139.6.619
The kinship matrix (Supplemental Fig. 4) shows that the great majority of the individuals share between one-fourth and half of the genome.
The population structure (Supplemental Fig. 5) was evaluated through a PCA analysis. The first principal component highlights the presence of two clusters (perhaps indicating the involvement of at least two pollen parents), whereas the separation on the second component was not as clear with the individuals being evenly distributed.
The genotypic data obtained for the 295 polymorphic SNP markers with MAF greater than 0.05 were used to test the association with the phenotypic data. When the log10 of the probability value was plotted against the position of the SNPs based on the ‘Golden Delicious’ apple genetic map (Velasco et al., 2010) (Fig. 3), no association was found for a genome-wide threshold of α < 0.01. Nevertheless, a weak association was found with marker GDsnp00766 and overcolor (P = 0.006) with anthocyanin concentration (P = 0.019) as well as colorimetric traits L* (P = 0.006), b* (P = 0.005), a*/b* (P = 0.008), and hue (P = 0.005) (Table 2). However, such P values were not significant at the genome-wide or marker level. Based on the LG9 assembly, marker GDsnp00766 is located 1.4 Mbp upstream of the anthocyanin regulating MYB10 gene.
Citation: Journal of the American Society for Horticultural Science J. Amer. Soc. Hort. Sci. 139, 6; 10.21273/JASHS.139.6.619
Our use of a 384-plex SNP assay for GWA analysis is a trial of the application of a low-throughput multiplexed SNP assay in an apple breeding population. The panel of 384 SNPs used for the assay had been previously validated in ‘Golden Delicious’ apple using the lower-throughput 48-plex SNPlex™ technique (Applied Biosystems, Foster City, CA) (Micheletti et al., 2011; Velasco et al., 2010). We found that 309 SNPs (80.5%) were polymorphic, of which 295 with MAF greater than 0.05 were useful for association analysis in our study. This high degree of SNP transferability between ‘Golden Delicious’ and our set of individuals is similar to that obtained by Micheletti et al. (2011) (84%) in a broader-based set of 123 M. ×domestica accessions that included 119 elite selections, founders and old cultivars, and four rootstock cultivars.
The number of polymorphic SNPs we used enabled us to cover the apple genome at an approximate density of one marker every 4 cM based on a genetic length of 1200 cM (Velasco et al., 2010) and one marker every 2.5 Mbp based on an approximate genome size of 740 Mbp for apple. This density is a significant increase compared with the microsatellite marker systems previously used for genome scans in apple (Patocchi et al., 2005); however, it is of lower density than the SNP array used by Kumar et al. (2013). Target marker density for GWA studies is dictated by the rate of LD decay across the genome; if the level of LD is low, GWA analysis may be difficult because of the high density of markers required; however, the resolution will be very high. On the other hand, if the LD is extensive, whole genome scans are feasible using a lower density of markers; however, conversely the resolution will be limited (Rafalski, 2002). Our analysis indicated that LD extends a large physical distance in our population, which might have enabled scanning of the genome for a marker-trait association using the 384-plex SNP array. However, a GWA analysis for red skin coloration in our population did not identify any significant marker association with red skin color. Although associations with a marker on LG9 and phenotypes were weakly significant (P > 0.001), they were below the genome-wide threshold. However, the weak association suggested on LG9 is consistent with previous studies on the position of loci controlling red coloration in apple. A major locus for skin color was mapped in a segregating population at the distal end of LG 9 (Cheng et al., 1996; Maliepaard et al., 1998) and furthermore, recent research on the molecular control of anthocyanin biosynthesis in apple has shown that a transcription factor that controls fruit skin and flesh and foliage color (MYB10/MYB1) (Espley et al., 2007; Takos et al., 2006) maps at the distal end of LG 9 (Chagné et al., 2007). In addition, alleles of MYB10/MYB1 were associated with skin color in a candidate gene-based association study in the U.S. breeding material (Zhu et al., 2011). Based on the apple genome assembly (Velasco et al., 2010), MYB10 is located 1.4 Mbp from the closest marker we found to be weakly associated with skin color (GDsnp00766). Although a significant association between MYB10 and red skin color was expected in this study, our GWA strategy using a population of 94 individuals and 309 markers could not retrieve the association that had been demonstrated previously using the candidate gene approach.
Our GWA analysis is a pilot study demonstrating that a marker density of 309 polymorphic SNPs is insufficient to identify linkage between markers and a trait governed by a strong effect QTL in an apple breeding population material derived from a narrow genetic base. This indicates that careful consideration of the size of the effect is needed before using such a GWA strategy. It is well known that a number of other polygenic quantitative traits of high agronomic importance in apple are controlled by a few large to medium effects as well as a multitude of small effect QTLs (Chagné et al., 2012b; Gardiner et al., 2012; Khan et al., 2007; Kumar et al., 2012; Soufflet-Freslon et al., 2008) and we advise the use of higher-density SNP arrays for GWA analysis of such traits. Although the results from GWA analysis in a breeding population comprising seven full-sib families suggested that medium- to large-sized QTLs can be detected using an 8000-SNP array (Chagné et al., 2012a; Kumar et al., 2013), we believe that a larger array [e.g., the recent 20,000-SNP array developed by the FruitBreedomics project (Laurens et al., 2010)] will be needed for identifying smaller effect QTLs.
This study is an example of a whole genome scan performed for association analysis in apple. The scan was not successful in identifying markers associated with a large effect QTL of which we had prior knowledge. We have demonstrated that screening with high-density SNP arrays is required to identify an association between markers and large effect QTLs in a breeding population. For identification of marker associations with agronomically significant traits that are controlled by few large effect QTLs, we suggest GWA analysis of an existing breeding population using a SNP panel of 20,000 markers as an alternative approach to QTL analysis of full-sib populations.
Contributor Notes
This work is part of an European Union funded project in the “7th Framework Programme: Marie Curie Actions, People International Research Staff Exchange Scheme (FP7-PEOPLE-IRSES-2008)” proposal 130857 and was also funded by Spain’s Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA) through the project RTA 2009-00069-00-00. L.L. was supported by a doctoral fellowship from INIA.
Current address: Research Centre for Agriculture and Forestry Laimburg, I-39040 Ora/Auer (BZ), Italy.
Corresponding author. E-mail: David.Chagne@plantandfood.co.nz.