Genome-enabled Breeding Strategies for Nitrogen Use Efficiency in Sweet Corn
Click on author name to view affiliation information

(A) The mating plan used to create the hybrids. Each dotted line represents a parental line, and each dot represents a single hybrid. (B) Genetic diversity within the individuals were analyzed by conducting a principal component analysis based on the markers for the lines and hybrids. (C) Distribution of nitrogen use efficiency traits. The blue values represent the best linear unbiased estimation for each trait. R1.LN, R3.LN, and R6.LN represent leaf nitrogen at stages R1, R3, and R6, respectively. R3.LN-H represents the trait measured in the hybrid population.

Manhattan plot with the single nucleotide polymorphisms (SNPs) associated with nitrogen use efficiency (NUE) traits. Some candidate genes were indicated for each SNP. (A) Plot of the R3.LN trait from mixed linear model (MLM). (B) Plot of the R1.LN trait from the FarmCPU model and (C) plot of the R3.LN trait from FarmCPU model. The dotted line represents the Bonferroni threshold, which is the same across all three plots (same number of SNPs used). Note that the scale of the x-axis is different in plots (A), (B), and (C).

Prediction accuracy of cross-validation scheme one (CV1) for the R1.LN, R3.LN, and R6.LN traits (A) and cross-validation scheme zero (CV0) for the R3.LN trait (B). The accuracy reported is the Pearson correlation between best linear unbiased estimates (BLUEs) and estimated breeding values from the genomic model. Note that the x-axis in (A) and that in (B) are on different scales. GBLUP = genomic best linear unbiased prediction; MT_GBLUP = multi-trait GBLUP model; MT_SpikeSlab = multi-trait spike–slab model; ST_ BayesB = single-trait BayesB model; ST_GBLUP = single GBLUP model.

F1 progenies predicted via SimpleMating. (A) Pairwise prediction of midparental values for all possible crosses among the lines’ population plotted against the pairwise covariance coming from the additive relationship matrix. The dotted line represents the culling parameters. (B). Predicted crosses vs. the best linear unbiased estimates (BLUEs) values for the trait measured in the hybrid population. The dotted line represents a regression line. NS = nonselected; S = selected by the SimpleMating algorithm.

Additive population mean and variance through 20 years of simulation for the three scenarios simulated. The shading around the curve represents the standard error (SE) for the 20 repetitions. (A) Mean and (B) variance. The OCS scenario (using the SimpleMating algorithm) yielded a smaller decrease in genetic diversity. Conv = truncated phenotypic selection; GS = truncated genomic selection; MPV = midparent value; OCS = optimum cross selection with the MPV-based performance.
Click on author name to view affiliation information
Sweet corn is one of the most widely cultivated vegetable crops in the United States. Nitrogen plays a critical role in promoting plant growth and development and is essential for maximizing grain yield and accelerating genetic progress. In this study, we aimed to evaluate the integration of modern breeding tools into the sweet corn breeding program to improve nitrogen use efficiency (NUE) traits. A total of 693 inbred lines and 108 hybrids were assessed, with NUE traits measured across three developmental stages (leaf nitrogen at R1, R3, and R6). All inbred lines were whole-genome resequenced. We performed a genome-wide association study (GWAS) and implemented a set of genomic models to predict inbred and hybrid performance. We simulated two traits reflecting the genetic architecture of NUE traits in a sweet corn pipeline using coalescent theory to compare genomic selection strategies against a benchmark phenotypic selection scenario. Our results revealed significant genetic variation among inbred lines and hybrids for most NUE traits, indicating a complex genetic architecture. The GWAS identified candidate genes potentially associated with NUE. Among the traits evaluated in inbred lines, R1.LN showed the highest prediction accuracies (0.36–0.38), followed by R3.LN (0.29–0.33) and R6.LN (0.03–0.06). However, genomic prediction accuracy in the hybrid population was lower (R3.LN: 0.14). Notably, using molecular markers to optimize crosses emerged as the most effective strategy for the long-term improvement of NUE traits. In summary, genomic tools can enhance NUE traits, thus highlighting their potential to improve NUE not only in sweet corn but also in other vegetables and move toward more sustainable production.
Sweet corn (Zea mays L.) is an important vegetable crop worldwide. In the United States, it has consistently ranked among the top vegetables in both harvested area and total production in recent years (US Department of Agriculture 2025). To sustain high productivity, farmers commonly rely on synthetic nitrogen fertilizers. However, nitrogen fertilizers are a significant environmental pollutant and require substantial energy to produce (Wani et al. 2021; Yu et al. 2022). Breeding for nitrogen use efficiency (NUE) traits is important in scenarios in which breeders seek practices that promote more sustainable agriculture while reducing environmental impact. The NUE is defined as the ratio of grain yield to available nitrogen (Moose and Below 2009). In other words, high values of NUE represent plants that are more efficient in using nitrogen (uptake and utilization), which could reduce the amount of synthetic nitrogen needed. Despite its importance, breeding for such traits in sweet corn has experienced slow progress. As is known, increases in field corn yield have been associated with increases in NUE (Govindasamy et al. 2023). However, NUE is not a trait that is directly selected for in sweet corn breeding programs. Future research is needed to understand the impact of explicitly selecting for NUE in a breeding program; however, increasing our understanding of the genetic control of this trait is a first step toward this goal.
The NUE can be broken down into two components: uptake efficiency, which refers to the amount of available nitrogen that the plant can absorb, and utilization efficiency, which refers to how efficiently the plant converts the additional nitrogen taken up into yield (Gheith et al. 2022). On average, plants take up two-thirds of the total nitrogen by the time they shift from vegetative to reproductive growth (stage R1). Thereafter, some uptake continues (postsilking uptake) while remobilization begins and nitrogen is reallocated from the shoots to reproductive regions (Coque and Gallais 2007). Selecting for NUE (and its components) is associated with few challenges. Phenotyping for NUE is labor-intensive and time-consuming, and it often requires destructive sampling (Sanchez et al. 2023). Moreover, NUE is a complex trait governed by multiple physiological processes (Gheith et al. 2022). In sweet corn populations, the genetic architecture underlying this trait remains poorly understood.
Because of the importance and complexity of breeding for NUE traits in sweet corn, molecular markers represent a tool to enhance breeding efforts. For instance, markers can be used to identify genomic regions associated with NUE traits through a genome-wide association study (GWAS) (Sanchez et al. 2023). In the GWAS, a panel of markers is used in a regression-like analysis with the phenotypic response (the trait of interest) to determine whether changes in the markers are associated with changes in the phenotype. In addition, they can be incorporated in genomic selection (GS) models, where historical phenotypes together with molecular markers in the same individuals are used in a model to predict the performance of individuals in a new set of only genotyped individuals. In the context of a breeding program, GS models can be used to increase selective accuracy (i.e., enhance the certainty of which individuals to select or to discard), reduce phenotyping costs, shorten the breeding cycle, and guide crosses (Crossa et al. 2021; Marinho et al. 2022).
Our objectives were the following: provide insights into the genetic architecture (heritabilities, variance components, and correlations) of NUE traits in a sweet corn population; identify genomic regions associated with NUE traits and potential candidate genes for those traits; apply genomic models to predict the performance of inbreeds and hybrids; and evaluate the long-term efficiency of genomic selection in multitrait framework through stochastic simulations to improve NUE traits.
We used a diverse sweet corn population genotyped with whole-genome resequencing and phenotypes of key NUE traits from the lines and hybrids derived from the population to build a GWAS model for candidate gene identification and a genomic model (GS) to predict the performance of the individuals. In addition, simulations were conducted to infer the long-term effects of GS to improve NUE traits in a sweet corn program.
A diversity panel of 693 sweet corn inbred lines containing tropical and temperate-adapted materials was tested in this study. A sweet corn population assembled by Baseggio et al. (2021) was expanded using germplasm acquired from the US Department of Agriculture Germplasm Resources Information Network (GRIN) system, the University of Wisconsin-Madison Sweet Corn Breeding Program, and the University of Florida Sweet Corn Breeding Program. A subset of 49 inbred lines from the panel were crossed to create 108 unique F1 hybrids (Fig. 1) that were later used for the field evaluation. To generate the hybrids, we used an incomplete North Carolina II crossing scheme. This was a collaborative effort involving 24 elite inbred lines from the University of Florida Sweet Corn Breeding Program and 25 inbred lines from the University of Wisconsin-Madison Sweet Corn Breeding Program.


Citation: HortScience 60, 12; 10.21273/HORTSCI19004-25
For genotyping, DNA was extracted and sequenced with NovaSeq 2 × 150 bp reads at a shallow depth (mean depth = 8.45×) to create a high-density panel of single-nucleotide polymorphism (SNP) markers distributed throughout the genome. Reads were aligned to the Ia453-sh2 reference genome (Hu et al. 2021). Variants (including SNPs and small insertion-deletion polymorphisms) were called using both GATK (Van der Auwera and O’Connor 2020) and Freebayes (Garrison and Marth 2012). The overlapping variants resulting from these two methods (47,160,177 variants) were selected and further filtered using GATK best practices (DePristo et al. 2011) and by missingness (>30%) and minor allele frequency (>1%), resulting in a final panel of 28,498,353 million variants (Colantonio et al. 2022).
Inbred lines were grown at the University of Florida Plant Science Research and Education Unit in Citra, FL, USA, in 2019, while the hybrid population was generated from inbreds in 2019 and grown in Citra, FL, USA, in 2021. A resolvable incomplete block design with two replications and nine incomplete blocks was used for the inbred line evaluation, whereas a resolvable incomplete block design with two replicates and four incomplete blocks was used for hybrids. Genotypes were planted in single-row plots with 12 plants per row. Plants were spaced at 7 inches apart, and rows were 36 inches apart. For each trial, seeds were treated with fungicide, and plots were irrigated and fertilized with nitrogen at 220 kg/ha.
The leaf nitrogen content was determined by a combustion analysis of pooled leaf samples collected from the leaf immediately above the uppermost ear on three representative plants per row. Sampling occurred at three key developmental stages: silking (R1, referred to as the R1.LN trait), 21 d after pollination or milk stage (R3, referred to as the R3.LN trait), and physiological maturity (R6, referred to as the R6.LN trait). This approach allowed us to track nitrogen content through plant development, including remobilization and contribution to yield. A combustion analysis was conducted by Waters Agricultural Laboratories, Inc. (Camilla, GA, USA). In the hybrid population, because of the high correlation between R1.LN and R3.LN, leaf nitrogen data were collected only at the R3 stage.
A two-stage analysis was implemented to evaluate the traits (Holland and Piepho 2024). In the first stage, the restricted maximum likelihood method (REML) and best linear unbiased prediction (BLUP) procedure (Henderson 1974; Patterson and Thompson 1971) were used along with the following linear mixed model: [1] where y is the vector of phenotypic data, b is the fixed effect of repetition and checks inside blocks (assumed as fixed) summed with the overall mean, g is the vector of genotypes (assumed as random), with g ∼ N, where is the genotypic variance, p is the vector of blocks (assumed as random), with p ∼ N, where is the block variance and e is the residual effect (random), with e ∼ N, where is the residual variance. Additionally, X, Z, and W were the incidence matrices for b, g, and p, respectively. Broad-sense heritability was calculated as the ratio of genotypic variance to phenotypic variance. For the second stage of the analysis, we assumed that the genotypic effect was fixed in the same model described, and best linear unbiased estimates (BLUEs) were estimated.
The variances of the random effects were tested using the likelihood ratio test (Rao 1973), as follows: [2] where logLr and logL are the logarithm of the maximum residual likelihood function of the reduced and full model, respectively. For the likelihood ratio test, χ2 statistics with 1 degree of freedom (df) and 5% probability of error type I were considered.
Modeling of nongenetic effects was implemented to account for heterogeneity within the trial. The method allows for spatial autocorrelations among observations in a row–column fashion (Gilmour et al. 1997; Stefanova et al. 2009; Werner et al. 2024). Five models were implemented to expand the linear model in equation 1. The first model (model 1) did not consider any spatial pattern and was implemented as a benchmark comparison model. Model 1 was implemented by assuming spatially independent observations , where Ir is the identity matrix for rows (r × r) and Ic represents the identity matrix for columns (c × c).
Moving forward, we modeled the spatial or correlated error. This error captures local and global trends, such as scale fertility gradients, small changes in soil composition, small-scale disease/insect damage at the plot level, and other factors (Coelho et al. 2021; Gilmour et al. 1997; Stefanova et al. 2009; Werner et al. 2024), that may affect the experiment under evaluation creating dependency among spatially close plots units. For such, we shaped spatial correlation among rows and columns by including the autoregressive process of order 1 (AR1 structure) (Box et al. 2015). The residual was modeled as (ρr) and (ρc), for model 2 and model 3, respectively, where represents the spatial dependent or correlated error, ρr and ρc represent the correlations for rows and columns, respectively, and and represent the first-order autoregressive correlation matrix for columns and rows. Model 4 accounted for correlations among rows and columns simultaneously, with the residual structure denoted by (ρr) (ρc).
Finally, the inclusion of a random error, or nugget effect, which captures any additional variation that is not a trend, was used (Gilmour et al. 1997). In the model, this was achieved by including an independent or uncorrelated term and accounting for the variance of the same component () implemented in model 4, which became model 5. Such a structure captures any noise, measurement error, or even intrinsic variability within the plots (random, independent, or nonspatially correlated) and could not be captured by the dependent structure (Gilmour et al. 1997; Wilkinson et al. 1983).
The best model for each trait was identified using the Bayesian Information Criterion (BIC) (Schwarz 1978). Models (from model 1 to model 5) with the lowest BIC for each trait were assumed as the best models. Only the experiment from the lines field was modeled. All analyses conducted during the second stage used the genetic values generated by such a model.
We hypothesized that genetic variants could be involved in the expression of NUE traits. To test this hypothesis, we conducted a genome-wide association analysis using two distinct methods. Initially, SNPs with minor allele frequency (<0.01) were filtered out, resulting in a set of 16,755,210 variants (hereafter referred to as the 16M set). The first method used the R package GAPIT (Wang and Zhang 2021), with the kinship matrix calculated using the default algorithm in the FarmCPU model. The analysis was conducted using a maxLoop threshold of 10 and a QTN threshold of 10. The second method used EMMAX (Kang et al. 2008) software to fit a standard univariate linear mixed model. For both methods, a Bonferroni-corrected significance threshold of 0.05 was applied to identify candidate SNPs, which represented a threshold of −log10(p) = 8.32.
The genome sequence of the maize line B73 (RefGen_v5) (https://www.maizegdb.org/) was used as the reference genome for candidate gene searching. The confidence interval of significant SNPs was determined based on the LD of the population (Colantonio et al. 2022) at 1 MB upstream and downstream of the significant SNPs. The confidence interval of each significant SNP was used to determine the search scope of candidate genes. First, the SCISPACE AI tool (PubGenius Inc. 2025) was used to identify genes with known functions related to nitrogen metabolism by using the prompt “candidate genes involved in nitrogen metabolism, uptake, utilization, and mobilization in maize,” which returned a description of genes with known functions related to nitrogen metabolism alongside cited articles. Any identified genes found in the artificial intelligence (AI)-provided list were preferred as candidate genes. Then, studies of NUE-related traits using the GWAS (He et al. 2020), transcriptomics (Singh et al. 2023; Zhang et al. 2024), network and regulatory analysis (Plett et al. 2017), and combined omics (Gong et al. 2020; Liu et al. 2012) suggested by the AI tool were used to assist with screening candidate genes.
We hypothesized that molecular markers can be used to predict the performance of individuals for NUE traits. To test this hypothesis, we implemented genomic selection models and tested their performance in cross-validation schemes. Genomic selection models were implemented in single-trait and multi-trait approaches. First, a genomic BLUP (GBLUP) model was used (VanRaden 2008), which was represented as follows: [3] where y is the matrix of BLUEs estimated in the first stage of the analysis, µ is the overall mean for each trait, Z is the incidence matrix connecting the observations with the response variable, a is the vector of additive genetic effects (assumed as random), where a ∼ N, with G as the additive relationship matrix (VanRaden 2008) and is the variance–covariance matrix for the genetic variance, and e is the residual effect (random), where e ∼ N, with as the residual variance–covariance matrix of residual variances, and I is an identity matrix. Then, the GBLUP was derived, and it was implemented in its univariate form.
The second set of models used included marker-based models, where marker effects were estimated rather than individual predictions, as in the GBLUP model. Implementation involved two special models: a univariate BayesB model (Meuwissen et al. 2001) and a similar implementation in a multitrait framework called the spike–slab model (Habier et al. 2011; Pérez-Rodríguez and de los Campos 2022). Then, the model was implemented as follows: [4] where y is the matrix of BLUEs from the first-stage analysis, u is a vector of trait-specific overall means, X is the matrix for the predictors (i.e., the marker matrix for each individual in y), β is a matrix with the effect of each SNP (markers by traits), where β ∼ N, with as the variance of the nonzero markers (π), which follow an inverse χ2 distribution, and e as the residual (random), where e ∼ N, with is the residual variance.
In a BayesB model, a large proportion of the SNPs are assumed to have zero effects on the target trait (1 − π), while a small proportion (π) is considered with nonzero effects. The prior distribution of the effect of each SNP is a mixture of a scaled-t distribution with probability π and a distribution of point mass at zero with probability 1 − π. In the spikes–slab model, , where α1 and α2 are the prior shape1 and shape2 parameters for the kth trait. An unstructured variance–covariance matrix was chosen for the effect in this model (Ω), with the prior a [P(Ω)] assumed to be an inverse Wishart distribution.
Using BGLR software, the residual variance is assigned to a scaled inverted χ2 prior, where , with de and Se being the df and the scaling parameter, respectively. Additionally, for R0, an inverse Wishart prior was assumed. The overall mean (μ) is assigned to a flat prior.
A subset of 200,000 SNPs was randomly selected from the 16M set. The SNPs were filtered out by minor allele frequency (<0.05) and missingness (>0.80), yielding a total of 101,348 SNPs. The additive realized relationship matrix (VanRaden 2008) was generated with the aid of the R package AGHmatrix (Amadeu et al. 2023) and used for model prediction and cross optimization. The missing values of the markers were replaced by the mean of the markers, which is the default in AGHmatrix. For the hybrid combinations, we recreated the value of the markers of the hybrid by summing the values of the parent markers (Peixoto et al. 2024c). In addition, to summarize and visualize the genetic diversity of the populations (lines and hybrids), we plotted a principal component analysis using the prcomp() function from base R.
Cross-validation schemes were used to evaluate the predictive performance of genomic models for unobserved genotypes. Two schemes were implemented. First, cross-validation scheme one (CV1) involved five-fold cross validation applied to the population of inbred lines and repeated 10 times to ensure robustness. The second scheme (CV0) was designed to reflect a more realistic breeding program scenario in which models are expected to predict the performance of future hybrids in untested conditions. Using this approach, all inbred lines, including the parents, were used to predict hybrid performance. The prediction accuracy of the genomic models under each scenario was assessed by calculating the correlation between the estimated breeding values and the BLUEs obtained in the first-stage analysis for each trait.
We synthetically generated all possible pairwise crosses among the individuals in the line panel to develop a mating plan targeting the improvement of NUE traits. A total of 39,778 crosses were predicted from all combinations of the 546 phenotyped lines. Marker data and estimated SNP effects were used to predict the midparent value (MPV) for each F1 combination for both traits—R1.LN and R3.LN—using the getMPV() function from the SimpleMating package (Peixoto et al. 2025). Equal weights (1:1) were assigned to both traits to ensure balanced selection pressure.
Subsequently, the top 100 crosses were selected to form a mating plan using the selectCrosses() function under constraints that limited each parent to a maximum of two crosses and imposed a co-ancestry threshold of −0.02 to minimize relatedness. The SNP effects used for prediction were obtained from the BGLR package via the Multitrait() function by applying a spike–slab model (Pérez-Rodríguez and de los Campos 2022).
We hypothesized that implementing a genomic selection model for NUE traits would be the most effective strategy for achieving long-term genetic gains in the sweet corn breeding program. Additionally, as previously described, the use of cross prediction and optimization could enhance the breeding program by improving the selection of superior crosses for NUE traits. To evaluate this hypothesis, we simulated the sweet corn breeding pipeline from the University of Florida (for further details, see Peixoto et al. 2024b). In the simulations, various scenarios were implemented, and genetic progress was monitored across selection cycles.
Briefly, the University of Florida sweet corn breeding program follows a doubled haploid pipeline with three rounds of hybrid evaluation in the target environment. The program operates across two environments: the target environment, where hybrid testing occurs over three consecutive testcross rounds, and an off-season nursery used for creating the crosses, hybrids, and DH lines. The program leverages two pseudo-heterotic groups: one composed of University of Florida-derived lines and the other consisting of proprietary material from a commercial partner. Crosses between these groups are evaluated as hybrids in the target environment.
We used the AlphaSimR package (Gaynor et al. 2021) to simulate 20 years of the sweet corn breeding program at the University of Florida. The breeding pipeline, including population sizes and selection decisions, followed the doubled haploid scheme thoroughly described by Peixoto et al. (2024a). To initiate the simulation, a base genome of 100 individuals was generated using the Markovian Coalescent Simulator (Chen et al. 2009). The Maize option was specified in the species argument to ensure realistic haplotype structures and genome characteristics representative of maize.
Two traits were simulated to reflect the nitrogen content at silking (R1.LN) and 21 d after pollination (R3.LN) by incorporating additive, dominance, and genotype-by-environment (G × E) effects. The traits were modeled with a genetic correlation of 0.48, mimicking the empirical relationship observed between R1.LN and R3.LN. The additive means and variances were set to 4.04/3.04 and 0.5/0.5, respectively, for R1.LN and R3.LN. Residual variances were set to 0.9 and 0.7, respectively. The G × E variance for each trait was set to three-times the residual variance. A dominance degree of 0.93 and dominance variance of 0.2 were used for both traits. For those parameters that could not be estimated in the base population (G × E variance and dominance degree), we used previous studies of sweet corn to incorporate them into the simulated pipeline (for more details, please see Peixoto et al. 2024a, 2024b).
Three breeding scenarios were simulated and compared. All were designed to reflect feasible strategies for improving NUE within the current sweet corn breeding pipeline.
This baseline scenario represents the existing breeding approach, which relies on truncated phenotypic selection and doubled haploid production. Parents are selected based on phenotypic values and randomly mated to generate 50 crosses at the start of each cycle. This scenario includes a 15 year burn-in phase, after which all subsequent scenarios begin from the same genetic background and continue for an additional 20 years under the conventional (Conv) strategy.
This scenario integrates genomic selection to replace phenotypic selection. Individuals are selected at each stage based on estimated breeding values using a truncated selection strategy. Genomic prediction is applied to identify superior DH lines for use as parents and to advance lines throughout the testcross stages (see Peixoto et al. 2024a for details). As in the Conv scenario, 50 crosses are produced per cycle by randomly mating selected parents.
Building on the GS framework, this scenario adds an optimized mating plan using genomic predictions. All DH lines are considered potential parents. Crosses are predicted using midparent values computed via the getMPV() function in the SimpleMating R package, followed by optimization with the selectCrosses() function. Constraints included a maximum of two crosses per parent and a minimum pairwise co-ancestry threshold of –0.02 to limit relatedness and manage inbreeding.
The simulations and scenarios described were designed to mimic the breeding process for the R1.LN and R3.LN traits in the sweet corn pipeline. These traits are not only costly to measure but also moderately correlated (r = 0.48). Because R1.LN data were not available for the hybrid population, we simulated all three scenarios using selection based solely on the R3.LN trait. This allowed us to evaluate the indirect response in R1.LN and assess the extent of genetic gain achievable by phenotyping only at a single developmental stage for NUE traits.
The genomic model for the GS and optimal cross selection (OCS) scenarios was implemented in BGLR using the BGLR( ) function and the BayesB model to estimate marker effects. The training population was updated using a 4 year sliding window approach and the most recent 4 years of genotypic and phenotypic data. These individuals from the training population came from the parents’ performance and were assessed by a general combining ability model.
The parameters, additive population mean and variance, were tracked to compare the performance of the scenarios at each cycle. The measurement was conducted in the parental population in each cycle.
The mixed models (REML/BLUP) for BLUEs estimation, heritability calculation, and spatial model correction were implemented in the R package ASREML (Butler et al. 2017). The genomic models were implemented in BGLR (Pérez-Rodríguez and de los Campos 2022, 2014), and the burn-in and number of iterations were equal to 1000 and 10,000, respectively (for breeding program simulations and CV1 and CV0 schemes). The datasets and scripts to reproduce all analyses can be found online (https://github.com/Resende-Lab/NUE-Traits_SweetCorn).
The phenotypic distributions of the measured traits indicated that phenotypic ranges were consistently broader among inbred lines compared with hybrids (Fig. 1C). The R6.LN trait was the only one not showing statistical significance for the genotypic effect, which meant that the performance of the lines was the same for all individuals for that trait. The broad-sense heritability varied across traits and populations (range, 0.03–0.41), with higher values generally observed in the inbred lines (e.g., R3.LN: 0.41 in lines vs. 0.18 in hybrids). This is evidence that the trait is moderately heritable, which, for breeding, opens a window to select and make genetic progress. Among the trait correlations, only the R1.LN–R3.LN pair showed a statistically significant relationship with a moderate correlation of 0.48.
Regarding the phenotypic assessment models, model 1, which did not account for spatial correlation, provided the best fit for R3.LN in the inbred line population. For R1.LN, model fit improved with the inclusion of an autoregressive structure for rows (model 2). For the R6.LN trait, the most complex model tested (model 5), achieved the lowest BIC; therefore, it was selected as the best-fitting model (Table 1 and Supplemental Material A Fig. 1A).
A genome-wide association analysis of the inbred diversity panel was performed to identify genomic variants associated with NUE trait expression. The R6.LN trait was excluded because of the lack of genotypic variance captured (Table 1). Based on the GWAS results of both methods [FarmCPU and mixed linear model (MLM)], a total of 17 significant SNPs reached the Bonferroni threshold across chromosomes (Fig. 2).


Citation: HortScience 60, 12; 10.21273/HORTSCI19004-25
Using the MLM, there was no significant SNP from R1.LN, and we identified only one significant SNP for R3.LN. The significant SNP for R1.LN is located on chromosome 10. Using the model FarmCPU, we identified 16 significant SNPs, 11 SNPs from R1.LN, and five SNPs from R3.LN. The significant SNPs for R1.LN were distributed across chromosomes 1, 3, 4, 5, 8, and 9, while significant SNPs that were found for the R3.LN trait were distributed across chromosomes 2, 4, 6, 7, and 8.
In addition, the quantile-quantile plot, which assesses how well the SNP P value distribution fits the null hypothesis of no association, showed that most SNPs aligned closely with the expected distribution along the diagonal line, with only a few deviating significantly (Supplemental Material Fig. 1B). This suggests that the models effectively corrected for population structure using the kinship matrix while also identifying potential true associations.
In the FarmCPU model, candidate genes were identified for three of the 11 significant SNPs from R1.LN and three of the five significant SNPs from R3.LN (Fig. 2, Table 2, and Supplemental Material B). For the R1.LN trait, in the confidence interval of chromosome number 1 (position 222056270), one gene model was found (Zm00001eb042130). The gene encodes an aminopeptidase involved in the peptide catabolic process. Two significant candidate genes, Zm00001eb002490 and Zm00001eb339390, were identified as candidates within 234 kb of SNP chromosome number 1 (position 7261188) and 276 kb of chromosome number 8 (position 29802249), respectively. These genes encode protein NLP7 and protein NLP4, respectively, which both function as DNA-binding transcription factors involved in controlling nitrate-responsive gene networks. A significant candidate gene, Zm00001eb214960, located 169 kb of SNP chromosome 5 (position 7874493), that encodes an ammonium transporter 10, which is involved in transmembrane ammonium ion transport, was identified.
For R3.LN traits, we selected a candidate gene within 80 kb of the SNP in chromosome 4 (position 192007595). The gene model, Zm00001eb196180, encodes a putative peptide/nitrate transporter (NRT) involved in the transmembrane transport of nitrate. We also found another nitrate transporter gene, Zm00001eb287950, encoding protein NRT1/PTR family 3.1, within 336 kb of the SNP in chromosome 6 (position 157366906). We selectedcandidate gene Zm00001eb326420, within 167 kb of the SNP in chromosome 7 (position 171898601), which encodes a putative indole-3-acetic acid-amido synthetase GH3.1/GH3.8 and is involved in amino acid ligase activity.
Using the MLM, only one significant candidate gene was associated with the R3.LN trait. The gene model, Zm00001eb406990, located within 108 kb of the SNP of chromosome 10 (position 5917216), encodes a cationic amino acid transporter involved in the transmembrane transport of amino acids. Overall, candidate genes included multiple nitrate regulatory genes, nitrate transporters, and GABA transaminases. No significant SNPs were shared between R1.LN and R3.LN. The complete list of genes and their putative descriptions are provided in Supplementary Material B.
For the genomic selection models, we evaluated the following key aspects: the impact of different statistical methods on prediction accuracy and the use of cross-validation schemes for validating the genomic model for selection efficiency in a breeding program targeting NUE traits. Among the traits evaluated in the inbred line population, R1.LN showed the highest prediction accuracies (0.36–0.38), followed by R3.LN (0.29–0.33) and R6.LN (0.03–0.06) (Fig. 3). These results are particularly interesting because this model can be directly used as a training population in a breeding program for either selecting top-performing individuals or discarding undesired performance-based individuals.


Citation: HortScience 60, 12; 10.21273/HORTSCI19004-25
Although all statistical models showed similar trends across traits, multitrait models consistently outperformed others, with the MT_GBLUP model yielding the best overall correlation. In addition, the multi-trait model showed a smaller spread for the predictions for all traits compared with single-trait models. Additionally, R6.LN showed low prediction accuracy. This result may be attributable to the fact that there is very little variability for this trait (Fig. 1).
To assess how well our models predict the performance of untested future generations (hybrids) and untested environments, models were calibrated using the inbred lines (which included the parents) to predict hybrid performance for the R3.LN trait. The predictive accuracy of the models for hybrids was lower than the accuracy observed within the inbred population in which they were trained (Fig. 3B). Among the models tested, GBLUP slightly outperformed BayesB in terms of prediction accuracy (0.14 vs. 0.13).
The list of the top 100 predicted crosses with the highest performance is provided in Supplemental Material C. These crosses involved 72 unique parents. The average predicted performance for the index for the selected crosses was 2.69, with an estimated inbreeding coefficient of 0.022 for the resulting generation. A correlation of 0.13 was observed between the predicted values of these crosses and the phenotypic values of the corresponding F1 hybrids evaluated in the field (Fig. 4).


Citation: HortScience 60, 12; 10.21273/HORTSCI19004-25
The set of simulations represented breeding strategies that could be realistically implemented in the sweet corn breeding program to improve NUE traits (Fig. 5 and Table 3). Among them, the most effective was the application of OCS, which led to a 20% higher genetic gain compared with the benchmark Conv scenario.


Citation: HortScience 60, 12; 10.21273/HORTSCI19004-25
This gain was achieved through the optimization of crosses using the SimpleMating algorithm to select the best combinations throughout the pipeline. The target trait (mimicking R3.LN) exhibited the highest performance, while trait 2 (representing indirect selection in R1.LN) showed a lower, yet positive, genetic gain after 20 years of selection. As expected, genomic selection increased the genetic gain for both traits, outperforming the Conv scenario. These results provide strong evidence that genomic tools can effectively accelerate genetic gain for NUE traits in the sweet corn breeding program.
The heritability estimates and variance components for traits within the diversity panel fall within expected ranges, indicating meaningful breeding potential, consistent with findings in field corn (Mastrodomenico et al. 2018). In terms of genetic architecture, we identified that NUE traits in our population had a complex genetic architecture, with low to moderate heritability values. Notably, an increase in the mean value of R3.LN was observed in hybrids derived from the diversity panel, likely reflecting heterosis (Labroo et al. 2021), because these hybrids were generated from a subset of the inbred lines. Within the inbred population, R1.LN and R3.LN showed a moderate positive correlation (0.48), suggesting a degree of pleiotropy. However, the fact that this correlation is not perfect implies the existence of partially independent genetic mechanisms underlying the two traits. This finding underscores the importance of considering potential indirect responses when selecting for one trait because it may impact others, and it is relevant for designing effective breeding strategies (Covarrubias-Pazaran et al. 2022; Marulanda et al. 2021).
The application of spatial models for genotypic evaluation appeared to capture certain patterns of field variability affecting the traits under study. Interestingly, although all traits were measured within the same trial, the spatial correction patterns differed. This discrepancy likely reflects environmental influences that vary across different stages of plant development. For R6.LN, in particular, a row–column effect and a nugget effect were included in the spatial model (Supplemental Material A Fig. B). The nugget effect helps account for intrinsic measurement errors or small-scale environmental variation (Werner et al. 2024). While the inclusion of this component improved overall model performance, the genotypic effect for R6.LN was not statistically significant. This suggests an absence of genetic variation for this trait among the evaluated genotypes, implying that selection would yield no genetic gain. This conclusion is further supported by the very low heritability estimate for R6.LN (H2 = 0.03). In summary, spatial models can enhance the precision of genotypic evaluations and, thus, increase the efficiency of breeding programs targeting NUE. However, their effectiveness relies on the presence of genetic variation in the trait of interest.
We used GWAS to identify candidate genes associated with NUE traits, specifically R1.LN and R3.LN. Because of the highly quantitative nature of these traits, we used the FarmCPU method based on the hypothesis that it would outperform the traditional MLM, particularly for traits controlled by many small-effect SNPs. This hypothesis was supported by the following results: FarmCPU identified candidate genes for both traits, whereas the MLM detected associations for only one trait. In addition, the variance explained by the significant markers had a small value per se, which was also an indication of the quantitative nature of these traits.
Gene mining and annotation are the most laborious steps of running a GWAS pipeline. However, with the advances of AI in several fields, some tools could be used to speed this process. After identifying the genes near the significant SNPs for both traits, we used the SCISPACE AI tool to perform an initial screening to identify genes related to NUE traits in sweet corn. Among those, we could identify the following four relevant categories of genes: nitrogen uptake genes, including the high-affinity nitrate transporters (NRT2.2 and NRT2.5) and the low-affinity nitrate transporters (ZmNPF7.9 or NRT1.5) essential for seed development, facilitating nitrate transport from maternal tissues to the endosperm (Wei et al. 2021); nitrogen metabolism and assimilation genes, including glutamine synthetase and asparagine synthetase, which convert inorganic nitrogen into organic forms for plant use (Singh et al. 2023), and ammonium transporter 1, which plays a role in ammonium uptake and transport; nitrogen utilization and efficiency, including the transmembrane amino acid transporter family protein associated with nitrogen compound metabolic processes and NUE, and the MADS26 transcription factor, which, when overexpressed, enhances nitrate utilization (Zhang et al. 2024); and nitrogen remobilization genes, including ZmASR6 and the ATP-dependent Clp protease gene, which have been linked to nitrogen remobilization efficiency, particularly during leaf senescence, and are part of a complex regulatory network involving hormone signaling (Gong et al. 2020).
In this context, we highlighted that the use of the AI tool was instrumental in accelerating the candidate gene identification process, offering a promising avenue to streamline candidate gene screening within breeding programs. The GWAS is known to include true positive and false positive quantitative trait loci (Fernando et al. 2004; Hayes 2013). Future functional studies are essential to validating the efficiency and accuracy of the SCISPACE tool following the GWAS. While validating the candidate genes identified in this study will be strategic for their introgression into elite germplasm to enhance NUE performance, our work represents an initial step toward that goal.
Breeding for NUE traits in the sweet corn program can greatly benefit from the implementation of genomic selection across multiple stages of the pipeline. First, genomic prediction can be used to identify the best inbred lines to advance to hybrid development (Graciano et al. 2025; Peixoto et al. 2024a). At this stage, a model capturing only additive genetic effects may be sufficient to select the top-performing individuals. Second, the same model can be used to estimate marker effects and, from a pool of potential parental candidates, facilitate the design of crosses that are more likely to enhance NUE traits (Peixoto et al. 2024a). Ultimately, because the primary goal of the sweet corn breeding program is the development of high-performing hybrids, genomic models can also be applied to predict hybrid performance (Peixoto et al. 2024c; Zystro et al. 2021a, 2021b). This enables the selection of the most promising combinations to advance for field evaluation.
As a vegetable crop, sweet corn presents multiple breeding targets, making the selection of superior commercial genotypes a complex task. While this complexity can hinder genetic gains, it also presents an opportunity for quantitative geneticists to apply multi-trait genomic selection models that leverage correlations among traits (Calus et al. 2013; Cui et al. 2020; Sandhu et al. 2022). In our study, the use of a multi-trait GBLUP model led to increased predictive performance and genetic gain compared with its single-trait counterpart, resulting in improvements of 5% for R1.LN, 12.5% for R3.LN, and 120% for R6.LN. This added benefit of genomic information in multi-trait models is most likely attributable to the moderate correlation between traits.
Our results highlight that the genomic models for selecting lines with outstanding performance in NUE traits and for guiding cross prediction demonstrated good predictive accuracy and could be implemented immediately in the breeding pipeline. However, the performance of the model when predicting hybrid performance, particularly the R3.LN trait, was not as strong as that in the inbred line panel. The following several factors may explain this, despite the predictive values not being necessarily poor: only a subset of the inbred lines was used as parents, and not all of those hybrid parents were genotyped; the number of hybrids generated was limited, which could have introduced bias in the analyses when the experiment was small and/or poorly replicated; the experimental design for evaluating hybrids was suboptimal (although 108 hybrids were taken to the field, only 49 had phenotypes for the target trait); and parents were present in only one or a few crosses.
As reported by Peixoto et al. (2024b), we implemented a model with nonadditive effects (additive plus dominance effects). We hypothesized that this could improve prediction accuracy in the hybrid population. However, following the same pattern of such work, we found lower or similar prediction accuracy compared with that of the model with additive effects only (results not shown). Despite the decrease in accuracy when predicting untested hybrids, the predictive ability can still be valuable for breeding. Particularly, the model can still be used to eliminate low-performing candidates, reduce phenotyping efforts, and ensure that the most outstanding candidates reach the field stage (Beyene et al. 2021).
The genomic model applied to the inbred line population demonstrated reliable predictive accuracy for both traits that were studied (R1.LN and R3.LN). Because of the moderate genetic correlation between these traits and their high phenotyping costs, the following key question arose: what would be the impact of applying genomic selection targeting only one trait on the long-term genetic gain for both? The results indicated a clear benefit of implementing genomic selection for NUE traits. Direct genetic gain was substantial for trait 1 (mimicking R3.LN); notably, a positive genetic response was also observed for trait 2 (mimicking R1.LN). This outcome emphasized the potential of indirect selection to improve correlated traits (Marulanda et al. 2021), and it is useful for defining breeding goals that target NUE traits in the sweet corn breeding program.
Therefore, the use of genomic selection for predicting crosses was highly recommended and not only improved trait performance over the long term but also helped maintain genetic variance when the SimpleMating algorithm was applied with appropriate constraints. Notably, the strategy adopted in this study did not incur additional costs to the breeding program because both prediction and optimization were based on existing marker effects and the genomic relationship matrix available in both scenarios. Beyond enabling a cost-neutral increase in genetic gain, the implementation of OCS transformed the breeding program into a more data-driven operation (Akdemir and Sánchez 2016; Gorjanc et al. 2018; Peixoto et al. 2025).
In a sweet corn breeding program, while some traits, such as plant height, ear width, and taper, are relatively easy and less laborious to measure (Gonzalez et al. 2022), others require more arduous and time-consuming techniques (e.g., NUE, phytoglycogen content, and disease resistance scores) (Mahon 2023). As we advance through the selection pipeline, it is crucial to understand how the improvement of one trait may impact others. This impact is known as the indirect response, and it arises from the genetic correlation between the main trait and secondary traits (Mrode 2014). A negative genetic correlation can lead to an undesirable tradeoff, whereby improving one trait causes a decline in another. To mitigate such outcomes, a selection index can be used. This approach assigns weights to each trait and combines them into a single index value, enabling balanced selection that accounts for the genetic relationships among traits (Batista et al. 2021; Marulanda et al. 2021; Silva et al. 2021).
For NUE traits, indirect selection appears to be beneficial. First, because these traits are labor-intensive and expensive to measure, targeting only one trait (i.e., measuring it in the laboratory) can still yield a positive response in the correlated trait. Second, borrowing information across traits using a multi-trait genomic selection framework can enhance prediction performance. This approach is particularly advantageous for traits that are difficult or costly to measure, expressed in late phenological stages, or have low heritability because it can improve predictive accuracy through shared genetic signals (Calus et al. 2013; Lyra et al. 2017; Sandhu et al. 2022).
Through this work, we computed genetic parameters and suggested that NUE traits in sweet corn have complex genetic architecture but potential for breeding. Additionally, we provided GWAS hits and candidate genes that should be further validated in downstream validation analyses. We also drew attention to the potential of genomic selection to accelerate and increase genetic gains for those traits and opened a window for testing the genomic model in larger populations, especially hybrids across different breeding states. Finally, we showed that combing genomic selection with cross prediction and optimization can increase genetic gain for NUE traits in the long-term. Therefore, including genomic tools to breed for NUE traits is effective and opens a new window for future implementations. Ultimately, these results and strategies can be used in the University of Florida breeding program and any other program that targets NUE.

(A) The mating plan used to create the hybrids. Each dotted line represents a parental line, and each dot represents a single hybrid. (B) Genetic diversity within the individuals were analyzed by conducting a principal component analysis based on the markers for the lines and hybrids. (C) Distribution of nitrogen use efficiency traits. The blue values represent the best linear unbiased estimation for each trait. R1.LN, R3.LN, and R6.LN represent leaf nitrogen at stages R1, R3, and R6, respectively. R3.LN-H represents the trait measured in the hybrid population.

Manhattan plot with the single nucleotide polymorphisms (SNPs) associated with nitrogen use efficiency (NUE) traits. Some candidate genes were indicated for each SNP. (A) Plot of the R3.LN trait from mixed linear model (MLM). (B) Plot of the R1.LN trait from the FarmCPU model and (C) plot of the R3.LN trait from FarmCPU model. The dotted line represents the Bonferroni threshold, which is the same across all three plots (same number of SNPs used). Note that the scale of the x-axis is different in plots (A), (B), and (C).

Prediction accuracy of cross-validation scheme one (CV1) for the R1.LN, R3.LN, and R6.LN traits (A) and cross-validation scheme zero (CV0) for the R3.LN trait (B). The accuracy reported is the Pearson correlation between best linear unbiased estimates (BLUEs) and estimated breeding values from the genomic model. Note that the x-axis in (A) and that in (B) are on different scales. GBLUP = genomic best linear unbiased prediction; MT_GBLUP = multi-trait GBLUP model; MT_SpikeSlab = multi-trait spike–slab model; ST_ BayesB = single-trait BayesB model; ST_GBLUP = single GBLUP model.

F1 progenies predicted via SimpleMating. (A) Pairwise prediction of midparental values for all possible crosses among the lines’ population plotted against the pairwise covariance coming from the additive relationship matrix. The dotted line represents the culling parameters. (B). Predicted crosses vs. the best linear unbiased estimates (BLUEs) values for the trait measured in the hybrid population. The dotted line represents a regression line. NS = nonselected; S = selected by the SimpleMating algorithm.

Additive population mean and variance through 20 years of simulation for the three scenarios simulated. The shading around the curve represents the standard error (SE) for the 20 repetitions. (A) Mean and (B) variance. The OCS scenario (using the SimpleMating algorithm) yielded a smaller decrease in genetic diversity. Conv = truncated phenotypic selection; GS = truncated genomic selection; MPV = midparent value; OCS = optimum cross selection with the MPV-based performance.
Contributor Notes
This work was supported by the National Institute of Food and Agriculture SCRI 2018-51181-28419, AFRI 2019-05410, and USDA-NIFA 2022-51181-38333.
M.R. is the corresponding author. E-mail: mresende@ufl.edu.

(A) The mating plan used to create the hybrids. Each dotted line represents a parental line, and each dot represents a single hybrid. (B) Genetic diversity within the individuals were analyzed by conducting a principal component analysis based on the markers for the lines and hybrids. (C) Distribution of nitrogen use efficiency traits. The blue values represent the best linear unbiased estimation for each trait. R1.LN, R3.LN, and R6.LN represent leaf nitrogen at stages R1, R3, and R6, respectively. R3.LN-H represents the trait measured in the hybrid population.

Manhattan plot with the single nucleotide polymorphisms (SNPs) associated with nitrogen use efficiency (NUE) traits. Some candidate genes were indicated for each SNP. (A) Plot of the R3.LN trait from mixed linear model (MLM). (B) Plot of the R1.LN trait from the FarmCPU model and (C) plot of the R3.LN trait from FarmCPU model. The dotted line represents the Bonferroni threshold, which is the same across all three plots (same number of SNPs used). Note that the scale of the x-axis is different in plots (A), (B), and (C).

Prediction accuracy of cross-validation scheme one (CV1) for the R1.LN, R3.LN, and R6.LN traits (A) and cross-validation scheme zero (CV0) for the R3.LN trait (B). The accuracy reported is the Pearson correlation between best linear unbiased estimates (BLUEs) and estimated breeding values from the genomic model. Note that the x-axis in (A) and that in (B) are on different scales. GBLUP = genomic best linear unbiased prediction; MT_GBLUP = multi-trait GBLUP model; MT_SpikeSlab = multi-trait spike–slab model; ST_ BayesB = single-trait BayesB model; ST_GBLUP = single GBLUP model.

F1 progenies predicted via SimpleMating. (A) Pairwise prediction of midparental values for all possible crosses among the lines’ population plotted against the pairwise covariance coming from the additive relationship matrix. The dotted line represents the culling parameters. (B). Predicted crosses vs. the best linear unbiased estimates (BLUEs) values for the trait measured in the hybrid population. The dotted line represents a regression line. NS = nonselected; S = selected by the SimpleMating algorithm.

Additive population mean and variance through 20 years of simulation for the three scenarios simulated. The shading around the curve represents the standard error (SE) for the 20 repetitions. (A) Mean and (B) variance. The OCS scenario (using the SimpleMating algorithm) yielded a smaller decrease in genetic diversity. Conv = truncated phenotypic selection; GS = truncated genomic selection; MPV = midparent value; OCS = optimum cross selection with the MPV-based performance.