Genetic Diversity of Cultivated and Wild Capsicum Accessions from Guam and Tinian Using MIG-seq
Click on author name to view affiliation information

Principal component analyses of 37 local Capsicum accessions across different multiplexed intersimple sequence repeat genotyping by sequencing (MIG-seq) data sets with varying levels of loci filtering (R), where a minimum proportion of individuals was required to retain a locus. The R values ranged from 0.04 to 0.9. The percentage of variation explained by each principal component is also shown. Across all data sets, the Capsicum accessions clustered into three distinct groups: group A (n = 20), group B (n = 15), and group C (n = 2). Further details for each data set are provided in Supplemental Appendix 1. PC1 = principal component 1; PC2 = principal component 2.

Maximum likelihood tree of 37 local Capsicum accessions based on 585 single-nucleotide polymorphisms generated from multiplexed intersimple sequence repeat genotyping by sequencing (MIG-seq) analysis, rooted using two outgroup accessions from group C (C. chinense). The tree was constructed using the GTR + G + I nucleotide substitution model, and branch robustness was estimated with 1000 bootstrap replicates. Branches with bootstrap values greater than 50% are indicated. Three distinct genetic groups were identified with high bootstrap value: group A (n = 20), group B (n = 15), and group C (n = 2). Each group is represented by a different color.

Neighbor-Net network of 37 local Capsicum accessions based on 585 single-nucleotide polymorphisms generated from multiplexed intersimple sequence repeat genotyping by sequencing (MIG-seq) analysis. Three distinct genetic groups were identified: group A (C. frutescens, n = 20), group B (C. annuum, n = 15), and group C (C. chinense, n = 2).

Population structure of accessions in group A, Capsicum frutescens, based on 465 identified single-nucleotide polymorphisms using Bayesian clustering analysis in STRUCTURE. (A) Plot of L(K) (mean ± standard deviation) and ΔK values for the different number of assumed genotypic groups. (B) Bar plots showing genetic admixture proportions for each Capsicum accession for 2 (K = 2), 3 (K = 3), and 7 (K = 7) genotypic groups. Each genotypic group is represented by a different color.

Population structure of accessions in group B, Capsicum annuum, based on 644 identified single-nucleotide polymorphisms using Bayesian clustering analysis in STRUCTURE. (A) Plot of L(K) (mean ± standard deviation) and ΔK values for the different number of assumed genotypic groups. (B) Bar plots showing genetic admixture proportions for each Capsicum accession for 2 (K = 2), 3 (K = 3), and 4 (K = 4) genotypic groups. Each genotypic group is represented by a different color.

Fruit and flower morphology (inset) of Capsicum accessions: C. frutescens HP09 ‘DOAG Såli’ (A) and HP41 ‘Guåfi Down’ (B) with the yellowish-green to greenish-white corollas; and C. annuum accession HP61 ‘Hachon H2’ (C) with white corollas.
Click on author name to view affiliation information
The genus Capsicum is a diverse group encompassing several wild and domesticated species native to tropical and temperate regions of the Americas. In cultivation, convergent domestication for desirable traits has resulted in significant morphological and genetic overlap across species, particularly within the globally distributed and economically valuable Capsicum annuum complex, which includes species such as C. annuum, Capsicum chinense, and Capsicum frutescens. In Guam, the hot pepper known as donne’ is an economically and culturally important crop first introduced by Spanish traders in the late 17th century. Since its introduction, Capsicum has become naturalized, growing in island forests as wild type and diversified as landraces maintained by local farmers and home growers. However, research on the genetic diversity of local Capsicum specimens remains limited. In this study, we used multiplexed intersimple sequence repeat genotyping by sequencing (MIG-seq) to assess the genetic diversity of 37 Capsicum accessions, including both cultivated and wild types, collected from Guam and Tinian in the Mariana Islands. Analysis of genome-wide single-nucleotide polymorphisms generated from MIG-seq strongly supported three distinct groups among the Capsicum accessions assignable to C. frutescens (n = 20), C. annuum (n = 15), and C. chinense (n = 2). Population structure analysis within the C. frutescens and C. annuum groups clustered accessions into seven and three distinct genotypic populations, respectively. The clustering of accessions into these genotypic groups elucidated shared ancestry among variants or clones of unknown origin. Furthermore, the evidence of genetic admixture in accessions between genotypic populations suggests introgression between cultivated and wild-type hot peppers and intraspecific cross-pollination between accessions. The study presents MIG-seq as a potential method for characterizing the genetic diversity in traditional Capsicum landraces, providing new knowledge of a valuable genomic resource for agriculture.
Characterization and conservation of genetic diversity in crop species remains essential in supporting the productivity and resilience of agricultural systems (Khoury et al. 2022). Genetic diversity within wild species, landraces, and breeding stocks has enabled farmers and plant breeders to develop new varieties possessing desirable morphological and phytochemical traits and tolerance to biotic and abiotic stressors in cultivation (Salgotra and Chauhan 2023). However, in the modernization of agriculture to satisfy consumer demands, monoculture, large-scale production, and homogenization to a few modern varieties has displaced many landraces traditionally grown by farmers (Khoury et al. 2022; Smolders 2006). Such genetic erosion poses a significant threat because traditional landraces serve as key genetic resources with adaptations that have been selected by farmers over generations or developed through introgression with wild relatives (Newton et al. 2011; van de Wouw et al. 2009; Zeven 1998).
The genus Capsicum L. (Solanaceae), commonly known as hot pepper, paprika, or chili, comprises approximately 43 species with a native range spanning tropical and temperate Central and South America (Carrizo-García et al. 2016, 2022; Hunziker 2001). Five domesticated species (C. annuum L., C. frutescens L., C. chinense Jacq., C. baccatum L., and C. pubescens Ruiz and Pav.) are cultivated worldwide as economically and culturally valuable vegetables, spices, and medicinal plants. Both pungent and nonpungent Capsicum varieties are widely consumed for their high nutritional value and distinct chemical profile of capsaicinoids, carotenoids, phenolics, and flavonoids (Antonio et al. 2018; Hamed et al. 2019). Recent statistics estimate global production of fresh peppers exceeding 36.9 million tons annually, grown across 2 million hectares (FAOSTAT 2022).
Capsicum is considered one of the oldest domesticated plants originating from the Americas (Perry et al. 2007; Pickersgill 1969). Alongside the diversity of wild Capsicum in the region, indigenous peoples cultivated hot pepper as a spice crop as early as 6000 BC (Basu and De 2003). Archaeological and genetic evidence suggests multiple centers of domestication, with C. annuum and C. frutescens likely originating from Mesoamerica, C. baccatum and C. pubescens from the Andean region, and C. chinense from tropical lowland South America (Albrecht et al. 2012; Eshbaugh 1993; Kraft et al. 2014; Moses et al. 2014; Pickersgill 2007). From its native range in the Americas, Capsicum was later introduced to Europe by the end of the 15th century and later spread into Africa, India, China, and Japan along routes in the spice trade (Andrews 1992; Bosland and Votava 2000; Davenport 1970). Human-driven selection under different agro-environments in both the New and Old Worlds has led to a second diversification of domesticated Capsicum with intra- and interspecific variations in morphology, flavor, and pungency (Aguilar-Meléndez et al. 2009; DeWitt and Bosland 1996; Sarpras et al. 2016).
Convergent domestication for desirable morphological and phytochemical traits across different Capsicum species has also resulted in hot pepper varieties with considerable overlap. Three major complexes of Capsicum are recognized based on morphological, cytogenetic, and genetic similarities: 1) C. annuum complex (C. annuum, C. chinense, C. frutescens, and C. galapagoense Hunz.); 2) C. baccatum complex (C. baccatum, C. chacoense Hunz., C. praetermissum Heiser and P.G.Sm, and C. tovarii Eshbaugh, P.G.Sm. and Nickrent); and 3) C. pubescens complex (C. pubescens, C. cardenasii Heiser and P.G.Sm, and C. eximium Hunz.) (Carrizo-García et al. 2016; Moscone et al. 2007). Species classification within these complexes originally relied on differences in key taxonomic characters such as plant pubescence, calyx constriction, corolla color, flower position, fruit appearance, and seed color (Eshbaugh 2012). However, morphological classification remains unreliable due to variations within species and shared characteristics between species, leading to potential misidentification of Capsicum varieties. Proper taxonomic identification and genotyping within these major complexes remain important, especially for characterizing traditional landraces and preserving genotypes to improve yield, pest and disease resistance, and other agronomic benefits (Cui et al. 2023; Newton et al. 2011; Smith and Heiser 1951; van de Wouw et al. 2009; Zeven 1998).
Recent advancement in genetic studies using molecular markers and genome-wide approaches provide a more precise method for taxonomic classification in Capsicum spp. Such genetic information can also be used to infer evolutionary relationships that may not be clearly evident based on morphology alone. DNA barcoding approaches based on randomly amplified polymorphic DNA markers, plastome markers, and internal transcribed spacer sequence of nuclear ribosomal DNA have been previously used to describe Capsicum species with varying levels of success but are limited when there is not enough interspecific genetic differentiation in the target genes (Carrizo-García et al. 2016; Jarret and Dang 2004; Rodriguez et al. 1999; Shiragaki et al. 2020; Sun et al. 2014; Walsh and Hoot 2001). The development of high-throughput sequencing technologies, particularly genotyping-by-sequencing, has enabled robust analysis of genetic diversity and population structure in both domesticated and wild Capsicum (Colonna et al. 2019; Lozada et al. 2021; Pereira-Dias et al. 2019; Taranto et al. 2016; Tripodi et al. 2021). Multiplexed intersimple sequence repeat (ISSR) genotyping by sequencing (MIG-seq) is a novel, accessible, reduced-representation genotyping method using high-throughput sequencing technology that detects genome-wide single-nucleotide polymorphisms (SNPs) and genetic differences among individuals and populations (Suyama and Matsuki 2015). In MIG-seq, libraries are constructed in two polymerase chain reactions (PCRs) and can be prepared even with small amounts or low-quality DNA. Although MIG-seq has rarely been applied to a major crop species such as Capsicum, the method has the potential to provide valuable insights into the genetic structure of populations, gene flow via pollination and seed dispersal, and hybridization between wild and cultivated plants.
Guam is the largest and southernmost of the Mariana Islands in the Western Pacific and lies between 13.2° and 13.7°N and between 144.6° and 145.0°E. The island spans 549 km2 in total, with an estimated 48% covered by forest (Donnegan et al. 2004; Young 1988). The island is characterized by a tropical marine climate with an average annual rainfall of 2540 mm (Lander and Guard 2003). The hot pepper, also known as donne’ in the native Chamorro language, is an economically and culturally important food in the Mariana Islands, including Guam, where it is eaten fresh or prepared in hot sauces or spicy dishes (Safford 1905; Stone 1970). Historical accounts suggest that hot pepper was first introduced to Guam by Spanish traders in the late 17th century alongside other major food crops such as corn, sweet potato, and cassava from the Americas and the Philippines (Rogers 2011; Stone 1970). In 1979, Fosberg et al. listed two Capsicum species, C. annuum and C. frutescens, present in Guam. A wild type known locally as donne’ såli is favored by locals for its small, bright red, pungent fruits, and it has been observed to be an important food source by its namesake såli bird or Micronesian starling (Aplonis opaca) (Egerer et al. 2017; Safford 1905; Stone 1970). Lee (1987) described one C. frutescens cultivar as the Guam Super Hot chili pepper. Over many generations, several traditional varieties or landraces have been maintained by local farmers and home growers. To date, studies characterizing the genetic diversity of wild and cultivated Capsicum in Guam remain limited, especially for peppers in the C. annuum and C. frutescens complex, which seem to comprise most of the traditional landraces grown in Guam. In our study, we used MIG-seq to characterize the genetic diversity, population structure, and taxonomic assignment of 36 Capsicum accessions collected in Guam and one from Tinian from the Mariana Islands.
Thirty-seven hot pepper accessions at the University of Guam Horticulture Laboratory were studied. The majority of accessions were collected earlier from wild habitat, local farmers, and local stores in Guam. One accession was collected from Tinian, located in the Northern Mariana island about 200 km north northeast of Guam with 102 km2 (Young 1989). A commercial cultivar, C. chinense ‘Chocolate Habanero’ was also included in the study as an outgroup taxon. Table 1 lists 37 Capsicum accessions studied including accession name, collection site, fruit color, and fruit size. Each accession does not necessarily represent a cultivar, but it is a locally recognized type of hot pepper in Guam.
The plants were grown at a plant nursery at the University of Guam Horticulture Laboratory. Fresh leaf samples from one representative individual of each hot pepper accession were collected, dried in envelopes with silica gel, and stored at −20 °C until use. Dried leaf tissue was pulverized using a TissueLyser (Qiagen, Hilden, Germany), and total genomic DNA was extracted using sucrose extraction solution described by Berendzen et al. (2005). DNA concentration and purity was assessed using a Qubit 2.0 (ThermoFisher Scientific, Carlsbad, CA, USA) and a NanoPhotometer N60 (Implen, Munich, Germany). Genome-wide SNPs were obtained by MIG-seq (Suyama and Matsuki 2015). Construction of the MIG-seq library involved two PCR steps, wherein (1) ISSR regions were amplified by PCR using universal primer sets and (2) amplicons were indexed by sample allowing for high-throughput sequencing following protocol by Suyama and Matsuki (2015) and Suyama et al. (2022). PCR products of each sample were standardized at equimolar concentration using an MCE-202 MultiNA (Shimadzu, Kyoto, Japan) and pooled into a single volume to create the sample library. The MIG-seq sample library was size-selected for fragments between 300 and 800 bp using Sera-Mag SpeedBeads (Cytiva, Tokyo, Japan) according to the manufacturer’s protocol. The purified MIG-seq library was sequenced on a DNBSEQ-G400 sequencer (MGI Tech, Shenzhen, China) for PE150 cycles.
Adapter sequences and low-quality reads were removed using fastp with parameters q = 30 and u = 40, where q is the quality cut-off value, and u is the percentage of bases in a sequence that must have a quality value equal or higher than q (Chen et al. 2018). Reads were then filtered against a reference chloroplast genome of C. baccatum var. pendulum (NC_072696) using Bowtie 2 (Langmead and Salzberg 2012), and any mapped chloroplast reads were removed using Samtools (Li et al. 2009). SNPs were identified from the resulting nuclear reads using reference mapping in Stacks 2 (Rochette et al. 2019). Reads were initially mapped to the reference genome of the nearest sister taxa of the C. annuum complex, C. baccatum var. pendulum (ASM3086422v1), using the “ref_map.pl” pipeline (Liu et al. 2023).
The populations component of Stacks 2 was then run with options: –min-maf 0.05, –max-obs-het 0.7, –write-single-snp, and –min-samples-overall (or –R), which is the minimum percentage of individuals across populations required to process a locus. The –R parameter was varied and assessed at 0.04, 0.2, 0.4, 0.6, 0.8, and 0.9 to determine the optimal setting for balanced loci retainment, SNPs discovery, and average genotyping rate [1 – average genotyping rate (%)]. We selected R = 0.6 for these reasons with 1159 retained loci, 585 SNPs, and an average genotyping rate of 78.95% (Supplemental Appendix 1).
Genetic relationships within and among the Capsicum accessions were assessed from the optimized SNP data set using principal component analysis (PCA) in R with the packages ‘adegenet’ and ‘ggplot2’. An unrooted phylogenetic network was created to visualize reticulate evolutionary patterns among accessions using SplitsTree, version 4.19.1 (Huson and Bryant 2006). A maximum likelihood phylogenetic tree was constructed to infer divergence patterns among accessions using RAxML-NG ver. 1.2.0 (Kozlov et al. 2019). The best-fit nucleotide substitution model was evaluated using ModelTest-NG, version 0.1.7 (Darriba et al. 2020). We used a GTR + G + I model and performed 1000 bootstrap replicates to evaluate the relative robustness of each branch (Felsenstein 1985). The population structure of the most identified groups was then analyzed separately to include more divergent SNP data in the analysis.
Because there are no reference genomes for C. frutescens, detailed analysis of each group was re-run using the Stacks 2 “denovo_map.pl” pipeline for de novo SNP identification. The parameters were optimized using the R80 method, wherein only polymorphic loci found in 80% of individuals were processed (Paris et al. 2017). The populations component of Stacks 2 was run with options: –min-maf 0.05, –max-obs-het 0.7, –write-single-snp, and –min-samples-overall (–R) 0.8. The population structure of the identified SNPs was assessed using Bayesian clustering methods in STRUCTURE (Pritchard et al. 2000). Markov chain Monte Carlo searches consisted of 100,000 burn-in steps, followed by 100,000 iterations in an admixture model. Ten replicate runs were conducted for each K value ranging from 1 to 10, indicative of the assumed number of genotypic populations. To determine the optimal K value, the logarithmic probability and ΔK values were assessed for each run using the Structure Harvester server and the Evanno method (Earl and vonHoldt 2012; Evanno et al. 2005). Replicate runs for the optimal K value were averaged and visualized using ‘ggplot2’.
After quality filtering, an average of 298,195 reads (ranging from 163,995 to 1,044,869) were obtained from the 37 Capsicum accessions in this study. During reference mapping, about 4.9% of reads were mapped to the C. baccatum var. pendulum reference genome. When analyzing mapped reads, locus filtering at R = 0.6 was found to be the most optimal, generating 1159 loci composed of 174,207 sites (bp). A total of 585 SNPs (variant sites) were retained with an average genotyping rate of 78.95%. PCA of SNPs identified from the 37 Capsicum accessions resolved three major groups: group A (n = 20), group B (n = 15), and group C (n = 2), based on high genetic similarity across all variations in locus filtering (Fig. 1). Maximum likelihood analysis of the Capsicum accessions further demonstrated strong support for these distinct groups based on high bootstrap support shown in Fig. 2. Similarly, Neighbor-Net network of 37 accessions based on 585 SNPs generated from MIG-seq analysis indicated same three distinct genetic groups shown in Fig. 3. Group A consists of ‘Anao Såli’ (HP16) and ‘Cocos Island Såli’ (HP51) of wild types with the morphological description of C. frutescens. Group B includes heirloom cultivars of ‘Hachon’ (HP61) and ‘JB Mañu’ (HP04) having morphological characteristics of C. annuum. Finally, group C includes commercial cultivars of C. chinense ‘Chocolate Habanero’ (HP26) and ‘Ghost Pepper’ (HP50). The present finding coincides with the report of naturalization of C. frutescens and C. annuum in Guam by Fosberg et al. (1979). Table 2 summarizes a list of accessions classified into three groups based on the genetic analysis in this study.


Citation: HortScience 60, 4; 10.21273/HORTSCI18396-24


Citation: HortScience 60, 4; 10.21273/HORTSCI18396-24


Citation: HortScience 60, 4; 10.21273/HORTSCI18396-24
Using the R80 method in the de novo discovery of SNPs, 465 SNPs were identified with an average genotyping rate of 90.2% in group A (C. frutescens), and 644 SNPs were identified with an average genotyping rate of 91.0% in group B (C. annuum). Population structure analysis within group A identified maximum ΔK values of K = 3 and K = 7, with both three and seven genotypic populations effectively explaining the SNP variation among accessions (Fig. 4A). We selected K = 7 to reveal fine-scale differences in population substructure between the Capsicum accessions of group A (Fig. 4B). Genetic admixture with two to four genotypic populations (F1, F2, F3, F4, F5, F6, or F7) was present in some accessions. ‘Tinian Såli’ (HP17) and ‘JM Small’ (HP56) showed a genetic admixture of F1 and F2; ‘RT Såli’ (HP03) and ‘Anao Såli’ (HP16) showed a genetic admixture of F1 and F6; ‘Vietnamese Mike’ (HP34) showed a genetic admixture of F1, F2, F5, and F6; and ‘Coco Island Såli’ (HP51) showed a genetic admixture of F1, F3, F5, and F6.


Citation: HortScience 60, 4; 10.21273/HORTSCI18396-24
Population structure analysis within group B (C. annuum) identified a maximum ΔK value of K = 3 with three genotypic populations effectively explaining the SNP variation among accessions (Fig. 5A). Genetic admixture with two to three genotypic populations (A1, A2, or A3) was present in some accessions (Fig. 5B). ‘Barcinas’ (HP07) and ‘Donne’ Poinsetta’ (HP29) showed similar genetic admixture of mostly A1 and A2, while ‘Hachon WR’ (HP13) and ‘Inarajan’ (HP62) showed a genetic admixture of A1, A2, and A3. Remaining accessions in groups A and B showed SNPs dominated by a single genotypic population as A1, A2, or A3.


Citation: HortScience 60, 4; 10.21273/HORTSCI18396-24
Genetic and morphological similarities between and among the domesticated Capsicum complexes have provided challenges in their taxonomic identification and genetic characterization globally (Eshbaugh 2012; Jarret and Dang 2004; Walsh and Hoot 2001). In our study, we demonstrated the application for MIG-seq to characterize the genetic diversity, population structure, and taxonomic assignment of 37 Capsicum accessions collected from Guam and Tinian. In analyzing sequence data generated from MIG-seq, a total of 1159 loci yielding 585 SNPs were detected in the final data set of all hot pepper accessions. PCA and maximum likelihood phylogenetic analysis on the SNP data resolved three genetically distinct groups with high support: group A C. frutescens (n = 20), group B (n = 15), and group C (n = 2), consistent with interspecific variation across the Capsicum accessions.
Preliminary field observations of growth morphology (Marutani M, University of Guam, unpublished data) suggest that certain morphological features support Capsicum species assignment of the 37 pepper accessions to some extent. Accessions of group A generally produced small, erect, red fruits and yellowish-green to greenish-white corollas characteristic of C. frutescens (Smith and Heiser 1951; DeWitt and Bosland 1996). These were common characteristics of accessions classified as the wild type donne’ såli, particularly ‘RT Såli’ (HP03), ‘DOAG Såli’ (HP09; Fig. 6A), ‘Ånao Såli’ (HP16), ‘Tinian Såli’ (HP17), and ‘Cocos Island Såli’ (HP51) (Table 1). However, some accessions showed much larger and varied fruit habits, most notably the ‘Guåfi’ type cultivars such as ‘Guåfi Up’ (HP06), ‘Guåfi Flat’ (HP14), and ‘Guåfi Down’ (HP41; Fig. 6B), which have been previously recognized as a cultivar of C. annuum for these reasons. Our reclassification of C. frutescens ‘Guåfi’ warrants further investigation into verifying the species assignments of traditional landraces in germplasm collections, as these can potentially be misclassified based solely on morphological taxonomy. Accessions of group B exhibited the greatest diversity especially in fruit size, color, and shape, with some accessions closely resembling C. frutescens. Most accessions in this group shared a discriminating trait of white corollas characteristic of C. annuum (DeWitt and Bosland 1996; Smith and Heiser 1951), except for ‘Saipan’ (HP15), which had yellowish-green corollas as in C. frutescens. An example of C. annuum, ‘Hachon H2’ (HP61; Fig. 6C) is shown with its orange fruits and a white flower. The two accessions in group C shared a common morphological feature of C. chinense in that those fruits showed annular constriction at the base of the calyx (Smith and Heiser 1951; DeWitt and Bosland 1996). However, several accessions of C. annuum and C. frutescens also showed similar characters, namely ‘Dragon’ (HP01), ‘Guafi Up’ (HP06), and ‘Guafi Flat’ (HP14). The morphological diversity and consequential overlap between Capsicum spp. as demonstrated in our study highlights the limitations of morphological taxonomy in reliably delineating species. In contrast, MIG-seq enables more robust and accurate species identification, yielding results consistent with recent genotyping-by-sequencing studies that have clarified the taxonomic framework of domesticated and wild-type Capsicum, especially within the C. annuum complex (Colonna et al. 2019; Lee et al. 2016; Lozada et al. 2021; Pereira-Dias et al. 2019; Taranto et al. 2016; Tripodi et al. 2021).


Citation: HortScience 60, 4; 10.21273/HORTSCI18396-24
In addition, the high-throughput data generated from our study provided high-resolution insight into the genetic variation within Capsicum species. Population structure analyses within group A and group B revealed detailed genetic differences and gene flow between accessions. Within group A (C. frutescens group), seven major genotypic populations F1, F2, F3, F4, F5, F6, and F7 were identified with varying degrees of genetic admixture (Fig. 4B). Accessions contained the genotypic population F1 including ‘RT Såli’ (HP03), ‘Kika’ (HP03), ‘DOAG Såli’ (HP09), ‘Ånao Såli’ (HP16), ‘Tinian Såli’ (HP17), and ‘JM Small’ (HP56), which closely resembled the wild-type C. frutescens (donne’ såli). Separation of this group from other accessions suggests that the wild type may be a genetically distinct group from cultivated varieties as a result of divergent evolution in a natural setting. Furthermore, the high degree of genetic admixture within this group may result from introgression from cultivated varieties after introduction into cultivation. Notably, ‘Tinian Såli’ (HP17) collected from the island of Tinian in the Northern Mariana Islands also clustered with this group yet showed significant genetic admixture with the genotypic population F2, which is predominantly observed in cultivated accessions such as the ‘Guåfi’ group. Unlike Guam, which has lost nearly all avifauna as a result of predation by the invasive brown tree snake (Boiga irregularis), the island of Tinian hosts several frugivorous bird species particularly the native Micronesian starling (Aplonis opaca), which functions as the primary seed dispersers of wild-type C. frutescens on the island (Egerer et al. 2017). The high degree of genetic admixture between the genotypic populations F1 and F2 in ‘Tinian Sali’ (HP17) may be the consequence of introgression between wild-type and cultivated genotypes as evidenced by the ecological capacity of frugivorous avifauna to facilitate gene flow.
Accessions with the genotypic population F2 included a locally cultivated pepper, C. frutescens ‘Guåfi’, which was previously misidentified as C. annuum. Three different accession numbers were assigned as HP06, HP14, and HP41 of open-pollinated ‘Guåfi’ due to some degrees of phenotypic variations. Other accessions with genetic similarity to ‘Guåfi’ included ‘Dragon’ (HP01), ‘Kathrina’ (HP63), and ‘Raymond’ (HP67) (Fig. 4B), which were collected from local farmers in Guam. Given the seed and plant-sharing practices in Guam, it is possible that these peppers originated from a single type of pepper of this variety. In collecting and saving seeds from presumably open-pollinated crops, natural cross-pollination might have occurred to generate both phenotypic and genetic variability in local peppers. Although Capsicum is generally recognized as a self-pollinating plant, it has been observed to undergo natural cross-pollination by insect pollinators (Liu et al. 2023; Tanksley 1984). Accessions of the genotypic population F3 included the genetically similar hot peppers ‘Quichocho’ (HP18) and ‘Donne’ Maseta’ (HP27) collected from different farmers, which may have originated from the same type of pepper. Accessions of the genotypic population F5 included nearly identical ‘Boonie Pepper’ (HP10) and ‘Mai’ (HP20) grown by a local farmer and purchased at a local store, respectively, suggesting that the pepper may have originated from the same source. In this context, genome-wide approaches like MIG-seq can effectively detect genetically identical accessions of unknown origin and breeding history.
Within group B (C. annuum group), three major genotypic populations, A1, A2, and A3, were identified with variable degrees of genetic admixture (Fig. 5). Accessions of the genotypic population A1 included ‘Donne’ Pika’ (HP02), ‘JB Mañu’ (HP04), ‘Hachon WK’ (HP13), ‘GT Mañu’ (HP59), ‘Hachon H2’ (HP61), and ‘Guam X-mas’ (HP66), mostly belonging to variations of two heirloom varieties grown by local farmers: ‘Hachon’ and ‘Mañu’. Despite their striking morphological differences—namely that ‘Hachon’ ripens orange (Fig. 6C) while ‘Mañu’ ripens deep red—the two are grouped as more similar than other C. annuum accessions, which may be indicative of a common ancestry at some point in cultivation. Additionally, accession ‘Donne’ Pika’ (HP02), a local line maintained by the University of Guam, produces distinct deep red, wide, curved, conical fruit similar to C. annuum ‘Mañu’ and may have been originated from the same seed source. By their genetic similarity and distinct wide, curved, conical morphology, C. annuum ‘Hachon’, ‘Mañu’, and ‘Donne’ Pika’ belong to the same morphotype. Accessions of the genotypic population A2 included ‘Donne’ Poinsetta’ (HP29), ‘GSC’ (HP40), ‘Madarang’ (HP48), ‘Ilo’ (HP52), and ‘Inarajan’ (HP62), which have similar long, tapering fruit and may constitute a distinct morphotype. Two accessions, ‘Maria Lynn’ (HP12) and ‘Saipan’ (HP15) were collected from local farmers, and ‘New Campus’ (HP35) was purchased from a local store. It is possible that the store-bought pepper originated from seed stock similar to that of the two local farmers. This finding illustrates that the genetic data generated from MIG-seq show a strong potential to investigate the genetic identity of peppers with unknown origin and breeding history. Additionally, accessions of the genotypic population A3 have similar narrow, sometimes erect, conical fruit and may constitute another distinct morphotype. Genetic admixture was mostly apparent in ‘Inarajan’ (HP62), ‘Donne’ Poinsetta’ (HP29), and ‘Barcinas’ (HP07), which may be the consequence of cultivation in an open-pollinated setting, facilitating gene flow between genotypic populations. One of the striking features of ‘Barcinas’ (HP07) is that this accession contains very high content of capsaicin compared with other C. annuum accessions (Marutani M, University of Guam, unpublished data). The population substructure of C. annuum based on fruit shape observed in our study aligns with findings from other studies that similarly identified subpopulations associated with specific fruit morphotypes (Colonna et al. 2019; Du et al. 2019; Ortega-Albero et al. 2024; Pereira-Dias et al. 2019; Tripodi et al. 2021; Wang et al. 2024). These results prompt additional investigation into the variations within C. annuum as one of the most phenotypically and genetically varied domesticated Capsicum species (Stommel and Bosland 2007; Zhigila et al. 2014). In using MIG-seq to genotype 37 Capsicum accessions collected from Guam and the island of Tinian in the Northern Mariana Islands, we identified three genetically distinct groups assignable to C. frutescens (n = 20), C. annuum (n = 15), and C. chinense (n = 2). Population structure analysis within the C. frutescens and C. annuum groups further revealed detailed genetic structure within populations and genetic admixture between populations suggestive of intraspecific cross-pollination. The findings from MIG-seq demonstrate its capacity in characterizing the genetic structure of populations, gene flow via pollination, and introgression between wild and cultivated plants in a major crop, Capsicum. This information proves valuable in characterizing and understanding the genetic diversity of poorly studied traditional landraces as some of the most promising genetic resources for crop development and resilience.

Principal component analyses of 37 local Capsicum accessions across different multiplexed intersimple sequence repeat genotyping by sequencing (MIG-seq) data sets with varying levels of loci filtering (R), where a minimum proportion of individuals was required to retain a locus. The R values ranged from 0.04 to 0.9. The percentage of variation explained by each principal component is also shown. Across all data sets, the Capsicum accessions clustered into three distinct groups: group A (n = 20), group B (n = 15), and group C (n = 2). Further details for each data set are provided in Supplemental Appendix 1. PC1 = principal component 1; PC2 = principal component 2.

Maximum likelihood tree of 37 local Capsicum accessions based on 585 single-nucleotide polymorphisms generated from multiplexed intersimple sequence repeat genotyping by sequencing (MIG-seq) analysis, rooted using two outgroup accessions from group C (C. chinense). The tree was constructed using the GTR + G + I nucleotide substitution model, and branch robustness was estimated with 1000 bootstrap replicates. Branches with bootstrap values greater than 50% are indicated. Three distinct genetic groups were identified with high bootstrap value: group A (n = 20), group B (n = 15), and group C (n = 2). Each group is represented by a different color.

Neighbor-Net network of 37 local Capsicum accessions based on 585 single-nucleotide polymorphisms generated from multiplexed intersimple sequence repeat genotyping by sequencing (MIG-seq) analysis. Three distinct genetic groups were identified: group A (C. frutescens, n = 20), group B (C. annuum, n = 15), and group C (C. chinense, n = 2).

Population structure of accessions in group A, Capsicum frutescens, based on 465 identified single-nucleotide polymorphisms using Bayesian clustering analysis in STRUCTURE. (A) Plot of L(K) (mean ± standard deviation) and ΔK values for the different number of assumed genotypic groups. (B) Bar plots showing genetic admixture proportions for each Capsicum accession for 2 (K = 2), 3 (K = 3), and 7 (K = 7) genotypic groups. Each genotypic group is represented by a different color.

Population structure of accessions in group B, Capsicum annuum, based on 644 identified single-nucleotide polymorphisms using Bayesian clustering analysis in STRUCTURE. (A) Plot of L(K) (mean ± standard deviation) and ΔK values for the different number of assumed genotypic groups. (B) Bar plots showing genetic admixture proportions for each Capsicum accession for 2 (K = 2), 3 (K = 3), and 4 (K = 4) genotypic groups. Each genotypic group is represented by a different color.

Fruit and flower morphology (inset) of Capsicum accessions: C. frutescens HP09 ‘DOAG Såli’ (A) and HP41 ‘Guåfi Down’ (B) with the yellowish-green to greenish-white corollas; and C. annuum accession HP61 ‘Hachon H2’ (C) with white corollas.
Contributor Notes
Current address for M.A.P.F.: Pacific Biosciences Research Center, University of Hawai’i at Mānoa, 1993 East-West Road, Honolulu, Hawai’i, 96822, USA.
Raw sequence reads for all samples are available via National Center for Biotechnology Information Bioproject PRJNA1140178. We thank the University of Guam Horticulture Lab and the Guam Plant Extinction Prevention Program. We also thank Alexander Greene, Chieriel Desamito, Maegan Delfin, Kiana Camacho, Jessica Muyco, Gerard Chargualaf, and Akihiro Nishimura for technical support. This project was funded by US Department of Agriculture/National Institute of Food and Agriculture Hatch Award 7004065, US Department of Agriculture/National Institute of Food and Agriculture Resident Instruction in Insular Areas Award 2023-70008-41051, and Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research JP23K20303 and JP21KK0131.
M.M. is the corresponding author. E-mail: marutanim@triton.uog.edu.

Principal component analyses of 37 local Capsicum accessions across different multiplexed intersimple sequence repeat genotyping by sequencing (MIG-seq) data sets with varying levels of loci filtering (R), where a minimum proportion of individuals was required to retain a locus. The R values ranged from 0.04 to 0.9. The percentage of variation explained by each principal component is also shown. Across all data sets, the Capsicum accessions clustered into three distinct groups: group A (n = 20), group B (n = 15), and group C (n = 2). Further details for each data set are provided in Supplemental Appendix 1. PC1 = principal component 1; PC2 = principal component 2.

Maximum likelihood tree of 37 local Capsicum accessions based on 585 single-nucleotide polymorphisms generated from multiplexed intersimple sequence repeat genotyping by sequencing (MIG-seq) analysis, rooted using two outgroup accessions from group C (C. chinense). The tree was constructed using the GTR + G + I nucleotide substitution model, and branch robustness was estimated with 1000 bootstrap replicates. Branches with bootstrap values greater than 50% are indicated. Three distinct genetic groups were identified with high bootstrap value: group A (n = 20), group B (n = 15), and group C (n = 2). Each group is represented by a different color.

Neighbor-Net network of 37 local Capsicum accessions based on 585 single-nucleotide polymorphisms generated from multiplexed intersimple sequence repeat genotyping by sequencing (MIG-seq) analysis. Three distinct genetic groups were identified: group A (C. frutescens, n = 20), group B (C. annuum, n = 15), and group C (C. chinense, n = 2).

Population structure of accessions in group A, Capsicum frutescens, based on 465 identified single-nucleotide polymorphisms using Bayesian clustering analysis in STRUCTURE. (A) Plot of L(K) (mean ± standard deviation) and ΔK values for the different number of assumed genotypic groups. (B) Bar plots showing genetic admixture proportions for each Capsicum accession for 2 (K = 2), 3 (K = 3), and 7 (K = 7) genotypic groups. Each genotypic group is represented by a different color.

Population structure of accessions in group B, Capsicum annuum, based on 644 identified single-nucleotide polymorphisms using Bayesian clustering analysis in STRUCTURE. (A) Plot of L(K) (mean ± standard deviation) and ΔK values for the different number of assumed genotypic groups. (B) Bar plots showing genetic admixture proportions for each Capsicum accession for 2 (K = 2), 3 (K = 3), and 4 (K = 4) genotypic groups. Each genotypic group is represented by a different color.

Fruit and flower morphology (inset) of Capsicum accessions: C. frutescens HP09 ‘DOAG Såli’ (A) and HP41 ‘Guåfi Down’ (B) with the yellowish-green to greenish-white corollas; and C. annuum accession HP61 ‘Hachon H2’ (C) with white corollas.