Abstract
It is not appropriate to compare ratio-based expressions for different cultivars or treatments if a plot of the denominator versus the numerator of a ratio-based expression has a nonzero y-intercept and the values for either the denominators or numerators differ with cultivars or treatments. Whenever nonzero y-intercepts are encountered, the value for a ratio-based expression will be dependent on both the denominator and numerator. The “ratio problem” is demonstrated with shoot N concentration in blueberries (Vaccinium corymbosum L.) and amino acid accumulation in almonds [Prunis dulcis (Mill.) D.A. Webb]. Data were collected from the first and second growth flush of blueberry shoots on plants that were at two in-row spacings and two rates of N fertilizer. Free amino acid:total amino acid ratios were measured in dormant almond trees fertilized at different rates with and without foliar N supplements. Functions describing the relationship between dry weight and total N content in blueberry tissues have positive y-intercepts for both N fertilizer application rates. Functions describing the relationship between total amino acids and free amino acids in almond trees have a negative y-intercept. Differences attributable to fertilization rate in blueberries probably were the result of differences in N uptake and N utilization, but the effects of spacing and growth flush are indirect and can be accounted for by differences in dry weight. Likewise, effects of fertilization rate and foliar N supplement in almonds are indirect and can be accounted for by differences in the total amino acids in dormant trees. With regression one can determine if the relationship between the denominator and numerator differs for the groups or treatments being studied. When an analysis of covariance is used to account for differences in the denominators of ratio-based expressions, results are consistent with the regression analysis. When a conclusion is based on statistical differences of a ratio-based expression, it is the researcher's responsibility to determine whether these effects are direct or indirect.
Biologic research uses many ratio-based expressions to interpret experimental results. When we evaluated citations in physiology subsections (developmental, environmental stress, photosynthesis, source sink) of the Journal of the American Society of Horticultural Science for 2004, we found that 95% of the published articles have ratios presented in tables or figures. Water use efficiency (dry weight/water transpired), nutrient use efficiency (nutrient uptake/amount applied), nutrient derived from fertilizer (fertilizer-derived nutrient/total nutrient content), yield efficiency (yield/trunk cross-sectional area), canopy efficiency (yield/canopy area), CO2 assimilation (CO2/leaf area), and so on, are examples of commonly used ratio-based assessments. Mineral and chemical concentrations (amount/dry weight or amount/volume) are other ratio-based expressions widely used in biologic sciences.
This article presents evidence that it is not appropriate to compare ratio-based expressions for different experimental groups or treatments if a plot of the denominator versus the numerator of the ratio-based expression has a nonzero y-intercept and the values for either the denominators or numerators differ among experimental groups or treatments.
It is important to know whether statistical differences among treatments reflect physiological differences directly related to the treatments or are merely an artifact incited by differences in the size of ratio components (numerator or denominator). In this narrative, we argue that the important distinction between direct and indirect effects cannot be addressed without a more detailed analysis than what is conventionally presented in scientific studies.
We are not the first to express reservations about the use of ratios to scale biologic data. Concerns similar to ours have been discussed for over 100 years (Pearson, 1897). Tanner (1949) clearly stated that the use of per-weight and per-surface area expressions is often inappropriate. Atchley et al. (1976) suggested that ratios greatly confuse and, in many cases, invalidate critical statistical or biologic analyses of the original data. Packard and Boardman (1988) suggested that ecologic physiologists discontinue using ratios to scale data and use analysis of covariance (ANCOVA) instead. Atchley et al. (1976) and Packard and Boardman (1988) include additional literature citations addressing problems associated with the use of ratios in biologic research. We also have found several cursory reports of the “ratio problem” in the plant science literature (Meinzer and Zhu, 1998; Ranjith and Meinzer, 1997; Sage and Pearcy, 1987; Sandrock et al., 2005), and there are likely more.
The fact that the studies cited here are routinely ignored shows that ratios are a crucial part of our analytical thinking. Their use may never disappear. Furthermore, in many situations, ratio-based expressions are appropriate. Our goal here is to 1) reiterate an extremely important concept, 2) demonstrate that evaluating ratio-based expressions for only a control and an experimental treatment is almost always inconclusive, 3) demonstrate the importance of measuring both mineral concentrations and total dry weight when interpreting the relevance of statistical differences in mineral concentration, 4) offer an explanation of why nonzero y-intercepts occur when they are conceptually unexpected, and 5) provide an approach to help users decide whether a ratio-based expression is appropriate. The argument we describe is broadly applicable to all ratio-based expressions commonly used in scientific inquires.
Theory
In Figure 1A, three different hypothetical linear functions describe the relationships between the numerators and denominators for ratio-based expressions. One function passes through the origin, one has a positive y-intercept, and one has a negative y-intercept. These three parallel lines represent systems that have the same incremental efficiency. For each unit gain in the denominator (x variable), there is a constant gain for the numerator (y variable).
Figure 1B describes the relationships between the ratios and their denominators for the same data presented in Figure 1A. The relationships between these ratios and their denominators are different (Fig. 1B). Ratio-based expressions are dependent on their denominators if the relationship between the numerator and the denominator is represented by a function with a nonzero y-intercept. Ratio-based expressions do not necessarily negate the differences associated with different-sized denominators.
For all linear functions of the form y = mx + b, a plot of y/x versus x will have the equation of y/x = (1/x)b + m. For linear functions that pass through the origin [(1/x)b = 0], y/x will be constant and equal to the slope of the original linear function. However, ratio-based expressions (y/x) for linear functions that have nonzero y-intercepts clearly are dependent on the value of the denominator (x). A ratio-based expression decreases (for functions with positive y-intercepts) or increases (for functions with negative y-intercepts) as denominators increase. Both the decreasing and increasing functions displayed in Figure 1B asymptotically approach the slope of the original linear function. Differences among efficiency expressions derived from the three linear functions are greatest for small values of the denominator.
Numerators and denominators are strongly correlated; therefore, ratios also will be dependent on the numerator if the relationship between the numerator and the denominator is represented by a function with a nonzero y-intercept. The linear function describing the relationship between the numerator and denominator dictates that a plot of y/x versus y will have the equation of y/x = bm/(y − b) + m. A ratio-based expression decreases (for functions with positive y-intercepts) or increases (for functions with negative y-intercepts) as numerators increase. In cases in which linear functions pass through the origin (bm/(y − b) = 0), y/x will be unrelated to the numerator, constant, and equal to the slope of the original linear function.
Intuitively, one often expects to encounter functions with zero y-intercepts. For example, one normally would expect the function describing a relationship between dry weight and total N content to have a zero y-intercept. A positive y-intercept is not conceptually possible. If a plant has no dry weight, it cannot have any N. Similarly, negative values for total N content as dry weight approaches zero are not conceptually realistic. However, as discussed subsequently, positive and negative y-intercepts are possible in experimental data sets.
In many situations, there are no samples with x-axis data values close to zero. A nonzero y-intercept may occur when an incomplete data set is evaluated. It is possible that a real function bends and passes through the origin, but only a linear function with a nonzero y-intercept is apparent in the data collected. For example, data described by a linear equation with a nonzero y-intercept can often be described with a curvilinear power function y = cx b that rapidly bends beyond the data range. Regardless of the equation used to describe the data set, the ratio will be dependent on the size of the denominator.
Nonzero y-intercepts may be incited by the error and variation involved in measuring any real-world phenomenon. Inherent biases in error toward overestimation or underestimation or the shape of the distribution of the dependent variable at each value of the independent variable can cause the y-intercept to be consistently negative or positive. A linear function with a nonzero y-intercept will be sensitive to small changes in x when x approaches its minimum value. Thus, small errors in the measurement of the x variable can result in large changes in the ratio-based expression.
If there were no variation in the values of the dependent variable at each value of the independent variable and if the true values of the data points were on a straight line, the measured values would not necessarily be colinear because of measurement error or inherent variability in that population. As a result, the best-fit line likely would be different from the line along which the true data points lie.
Similarly, even if there were no error in measuring the quantities, there could be variation from one plant to the next causing a range of values of the dependent variable to be represented for each sampled value of the independent variable. The specific values sampled from these ranges are random variables and can, through statistical variation, cause the y-intercept to be nonzero.
In many data sets, a scatterplot of the x and y variables produces a wedge-shaped distribution (Kaiser et al., 1994; Maller, 1990; Maller et al., 1983; Thomson et al., 1996). In wedge-shaped distributions, variability is larger for x-axis values furthest from the origin. One can find low y values for either low or high values of x, but high y values only occur for large values of x. Positive y-intercepts sometimes occur when subsamples from a wedge-shaped distribution are averaged to represent data trends from a larger data set. When points are averaged, a function with a positive y-intercept can be produced. Wedge-shaped distributions are common when multiple limiting factors occur (Kaiser et al., 1994; Maller, 1990; Maller et al., 1983; Thomson et al., 1996). Subsampling or the use of composite samples can shift the y-intercept upward.
Negative y-intercepts are expected if there is a threshold level below which values on the y-axis do not occur. For example, when CO2 assimilation is plotted against total leaf N content, a leaf compensation point, below which respiratory release of CO2 exceeds CO2 assimilation, causes negative y-intercepts (Sage and Pearcy, 1987). Whenever something new is introduced to a preexisting system, negative y-intercepts can be expected. For example, if 15N is introduced into a perennial plant, one might expect a plot of total N content versus 15N to have a negative y-intercept. The limited data range and experimental error arguments presented here could also lead to negative y-intercepts.
One should never assume that a plot of the denominator versus the numerator of a ratio-based expression has a y-intercept of zero. In Figure 2, the upper function displayed in Figure 1A has been used to demonstrate the mathematical consequences of comparing treatments if a plot of the denominator versus the numerator of a ratio-based expression does not have a zero y-intercept and the values for the denominators differ with treatment.
There are four replicates for two in-row spacing treatments (narrow and wide) in this hypothetical horticultural experiment (Fig. 2A). All points fall on the same regression line, regardless of in-row spacing. Because plants at wide or narrow in-row spacing have a different dry weight (Fig. 2A), statistical differences in N concentration will be apparent (Fig. 2B). Small plants will have a higher N concentration and more variability than large plants. The effect of spacing is indirect because it arises through its effect on plant size rather than affecting N concentration directly. Similarly sized plants from the narrow and wide spacings would have similar N concentrations. In this case, statistical differences between experimental treatments do not imply that treatments directly affected the physiological parameter of interest but may have acted indirectly through a different physiological mechanism that affected plant size, which in turn was correlated with treatment.
The phenomena in hypothetical examples demonstrate the interpretive problems associated with statistical evaluations of ratio-based expressions. We hypothesized that it would be difficult to distinguish between direct and indirect effects and verified this with experimental data.
Materials and Methods
A blueberry data set demonstrates indirect effects that can occur when a plot of the numerator versus the denominator of a ratio-based expression has a positive y-intercept. Data were collected from an experiment using the first and second growth flush of blueberry shoots on plants that were at two in-row spacings and two rates of N fertilizer. There were three single plant replicates for each in-row spacing and N rate.
An existing ‘Bluecrop’ blueberry planting at the North Willamette Research and Extension Center, Aurora, Ore., established in Oct. 1993, was used for this study. The planting site was fumigated with methyl bromide/chloropicrin, and sawdust and fertilizer (66 kg·ha−1 N) were incorporated before planting 2-year-old container stock. Plants were spaced at 0.45 m and 1.2 m in the row with 3 m between rows. The N fertilizer rate treatments were 0 and 200 kg·ha−1 N applied as a triple split (33% 9 Apr., 33% 9 May, 33% 17 June 2002). All treatment plots were fertilized each spring with 35 kg·ha−1 of P and 66 kg·ha−1 K.
One plant per plot was destructively harvested in Oct. 2002. Plant shoots were separated into first and second flushes. Each flush category was dried and dry weight measured; tissues were then ground and the total N content in a subsample was determined. Both the total N content and N concentration for each flush category were calculated. Data for one replicate (0 N, narrow spacing, first growth flush) are unavailable. Regression analysis was used to evaluate the relationship between dry weight and total N content at both N rates. The relationship between the N concentration and total dry weight was also evaluated.
A Bartlett's test (Barnett, 1962; Bartlett, 1937) indicated that log-transformed values for dry weight and nitrogen content met assumptions concerning homogeneity of variance when the raw data did not. The P values for treatment effects also declined when the transformed data were analyzed; thus, only transformed data were used to detect treatment effects for dry weight and total N content. Nitrogen concentration met assumptions concerning homogeneity of variance; thus, the raw data were analyzed. Data were analyzed as a split-plot randomized complete block with N rate and spacing as main plots and flush as a subplot.
SAS statistical software (version 9.1; SAS Institute, Cary, N.C.) was used for statistical analyses. The PROC MIXED procedure with a RANDOM statement was used to conduct an analysis of variance (ANOVA) and to make pairwise comparisons of SAS LSMEANS. The PROC MIXED procedure was also used to conduct an ANCOVA on log10 total N content and percent N with log10 dry weight and the inverse of dry weight as the respective covariates.
Homogeneity of slopes was demonstrated because covariates × treatment interactions were not significant. Respective P values were 0.5170, 0.5920, and 0.9966 for log10 dry weight × N level, log10 dry weight × spacing, and log10 dry weight × flush interactions. Respective P values were 0.2959, 0.8236, and 0.3929 for inverse of dry weight × N level, inverse of dry weight × spacing, and inverse of dry weight × flush interactions.
The relative importance of different factors (nitrogen level, spacing treatment, growth flush, and dry weight) was also assessed. Mixed model, type I, P value estimates were used to evaluate the sequential incremental improvement as each effect was added to the model in the covariance procedure described here.
A wine grape (Vitis vinifera L.) data set demonstrates how averaging subsample data points for the numerator and denominator of a ratio-based expression can produce a positive y-intercept even when this is not conceptually possible. Data were collected from a rootstock-cultivar trial in which field-grown Räuschling wine grape accessions were evaluated. Eight 3-year-old grapevines grafted on 5C rootstock (Auer Baumschule, Hallau, Switzerland) were planted in 40-L pots containing loamy soil in Spring 1990. The pots were sunk in an open field (to minimize root-zone temperature fluctuation) and watered periodically with drip irrigation. Fertilizer was added to maintain optimal levels of N, P, K, and Mg based on soil analysis. At full bloom, one shoot per plant was retained, and clusters were removed to stimulate vegetative growth. In the next season, one fruit-bearing shoot per plant was allowed to grow. The third leaf from the shoot base was used for gas-exchange and leaf-area measurements. An individual leaf from single plant replicates was evaluated on three dates (15, 26, and 29 June), and an average value was used to represent each replicate. Gas-exchange measurements and calculations were performed with a portable LCA-2 system (Analytical Development and Co., Ltd., Hoddesdon, Herts, U.K.) as described by Candolfi-Vasconcelos and Koblet (1991). Leaf area of the leaves used in the gas-exchange measurements was measured after the last sampling date using an area-meter (model LI-3100; LI-COR Biosciences, Lincoln, Nebr.). Net CO2 assimilation per leaf area (μmol·m−2·s−1) was calculated. Regression analysis was used to evaluate the relationship between total CO2 assimilation/leaf (μmol·s−1) and leaf area. The relationship between net CO2 assimilation per leaf area (μmol·m−2·s−1) and leaf area was also evaluated.
A previously published almond data set (Bi et al., 2004) demonstrates indirect effects that can occur when a plot of the denominator versus the numerator of a ratio-based expression has a negative y-intercept. June-budded ‘Nonpaveil’/‘Nemaguard’ almond trees were grown in 8-L pots containing 1:2:1 (by volume) mix of peatmoss, pumice, and sandy loam soil under natural conditions in Corvallis, Ore. Five replicate plants were randomly assigned to one of five groups. From 1 July to 1 Sept., each group was fertigated twice weekly (300 mL per pot) with one of five N concentrations (0, 5, 10, 15, or 20 mm N from NH4NO3) using a modified Hoagland's solution (Hoagland and Arnon, 1950). A set of five replicate plants at each N application rate was supplemented with foliarly applied urea. A 3% urea solution was applied (sprayed to drip) to each tree on 10 Oct. and 20 Oct. Dormant trees were harvested in December. A composite sample was made for each tree based on its dry matter partitioning. Free amino acids and total amino acids after protein hydrolysis were separated and quantified with a Beckman automated amino acid analyzer (model 6300; Global Medical Instrumentation, Ramsey, Minn.). Data shown are for whole trees. Digitally evaluated images of the figures in the original reference (Bi et al., 2004) were used to extract mean free amino acid and total amino acid levels for each treatment.
The original authors (Bi et al., 2004) conducted an ANOVA on total amino acids, free amino acids, and a ratio between the amino acid types. Data were analyzed as a completely randomized factorial experiment with five ground N application rates with and without supplemental foliar urea. Comparisons of means among treatments were performed by contrasts adjusting for multiple comparisons using Tukey's method (Bi et al., 2004).
We used regression analysis to evaluate the relationships between measured variables (free amino acids, total amino acids, and free amino acid:total amino acid ratio) and ground N application rates. Regression analysis was also used to evaluate the relationship between total and free amino acids and the relationship between total amino acids and the free amino acid:total amino acid ratio. Both the combined data and data for ground applications with and without supplemental foliar urea were analyzed.
Results and Discussion
Mean dry weight, total N content, and N concentration values for the eight blueberry treatments are shown in Table 1. The mixed model ANOVA P values (Table 2) suggest that significant differences in N concentration for N rate (P = 0.0004), spacing (P = 0.0066), and flush (P = 0.001) occurred. Trends for plant spacing × N rate and plant spacing × flush interactions were apparent but they were not statistically significant (P = 0.0749 and P = 0.0846, respectively). Flush × N rate and N rate × flush × spacing interactions were also not significant. Based on the ANOVA, one would conclude that N concentration increases with N application rate and is higher for narrow spacings and the second growth flush. However, spacing effects were only significant for the second growth flush at the 200 kg·ha−1 N rate.
Mean values for dry weight, N content, and N concentration for an experiment evaluating the first and second growth flush of blueberry shoots on plants at two in-row spacings and given two rates of N fertilizer.
Mixed-model analysis of variance P values for treatment and covariate effects on N concentration and log total N content for an experiment evaluating the first and second growth flush of blueberry shoots on plants at two in-row spacings and two rates of N fertilizer.
In the analysis that follows, we suggest that the differences attributable to N rate are probably the result of real differences in N uptake and N utilization, but effects of spacing and growth flush are indirect and can be accounted for by differences in the dry weight of the sampled tissues. Although a trend may suggest spacing effects for N concentration are still present when differences in dry weight are accounted for, these spacing effects are not statistically significant.
Plots of dry weight versus total N content for both the 0 N and 200 N rates are shown in Figure 3A. Slopes were different for the functions for different nitrogen fertilization rates (P = 0.0157), suggesting that statistical differences for N concentration between N treatments (Table 1) are direct. However, within an N treatment, all data values for spacing treatments (pooled across flush values) appear to fall on similar lines (Fig. 3B). Points for the 200 kg·ha−1 N, narrow spacing treatment, appear to fall slightly above the line for the 200 kg·ha−1 N, wide spacing treatment, but differences for y-intercepts and slopes are not significantly different (P = 0.7524 and P = 0.5255, respectively). This implies that, regardless of in-row spacing treatment, tissues accumulate a similar amount of N for every unit increase in dry weight. A regression analysis of flushes (pooled across plant spacing treatments) at the two N treatments produced similar results. Within nitrogen treatment, slopes and y-intercepts for the two flush types were not statistically different (data not shown). Because this experiment was conducted with only three replicates, it is impossible to determine if data values for individual spacing and flush treatments within an N fertilization rate produce different regression lines. As one would expect, slopes for the three-point regression lines for spacing and flush treatments do not differ significantly within an N fertilization treatment. It is possible that data points fall on different regression lines and the effects are, indeed, direct, but this cannot be determined with the current level of replication.
Functions describing the relationship between total N content and dry weight (Fig. 3A) have a positive y-intercept for both N rates. The cause of the positive y-intercept reported here cannot be determined. Bulk samples representing many different-sized twigs that likely had different N concentrations and N content were pooled for the replicate values. It is possible that pooling data produces a function with a positive y-intercept. The y-intercept was relatively small (compared with that presented in Fig. 5A), but a more subtle y-intercept shift might occur. The well-documented Steenbjerg effect (Steenbjerg, 1951) suggests that high N accumulation rates also occur at low dry weights. Therefore, the true function might bend and pass through the origin, but lack of data at the low dry weights obscured the pattern. Regardless of the cause, these positive y-intercepts, although very small, alter interpretation.
Only the high N rate produces functions with a y-intercept that is likely to be different from zero (P = 0.448 and P = 0.0575 for 0 and 200 kg·ha−1 of N rates, respectively). However, whether a y-intercept statistically differs from zero or whether a curvilinear function could also fit the data does not alter the relationship between the ratio and either its denominator or numerator. The issue is whether the ratio components are statistically related to the ratio-based expression and whether points that appear to fall on the same linear regression line have different ratio values. With small sample sizes, many y-intercepts will not significantly differ from zero. A conservative approach is to expect nonzero y-intercepts unless strong evidence suggests otherwise.
In Figure 3C, both the actual concentration data and the predicted N concentration = (1/dry weight)b + m relationship derived from the original total N = m(dry weight) + b functions are shown. N concentration is related to dry weight. Points representing small and large dry weights that fall on the same regression lines (Fig. 3A) have different concentrations. The major differences in N concentration for both Table 1 and Figure 3C occur when the largest tissues (first flush, wide spacing) are compared with the smallest tissues (second flush, narrow spacing). For simplicity of presentation, other flush and spacing treatments are not shown in Figure 3C. The mean values for dry weight and total N content for flush and spacing treatments were significantly different at both N rates (Table 1).
Figure 3D is a linear version (N concentration versus inverse of dry weight) of Figure 3C. Nitrogen concentration is significantly related to the inverse of dry weight for the 200 kg·ha−1 of N (r2 = 0.698; P < 0.01) and the 0 kg·ha−1 of N (r2 = 0.479; P < 0.05) functions. Slopes for the two nitrogen treatments are not significantly different (P = 0.9694). However, y-intercepts are significantly different (P = 0.0012), suggesting once again that the differences in N concentration attributable to N rate are direct. If each spacing treatment is plotted independently, the results are the same. Slopes are not significantly different for the four treatments (0 N narrow spacing, 200 kg·ha−1 N narrow spacing, 0 N wide spacing, 200 kg·ha−1 N wide spacing), but y-intercepts for 0 N and 200 kg·ha−1 N treatments statistically differ (data not shown).
The fact that dry weight differences influence N concentration irrespective of treatment suggests that an ANCOVA could help differentiate between direct and indirect effects. Packard and Boardman (1988) suggested that the denominator of a ratio-based expression could be used as a covariate when analyzing the numerator to eliminate indirect effects associated with different-sized denominators. Unfortunately, the data presented in Figure 3A violate both homogeneity of variance and homogeneity of slope assumptions. However, when log-transformed data for both dry weight and total N content are used, slopes become statistically identical and nearly parallel (Fig. 4A, B) and variances for different treatments become homogenous (Bartlett's test P values = 0.0138, 0.0177, 0.8111, and 0.7448 for dry weight, total N content, log10 dry weight, and log10 total N content, respectively).
Plots of log10 dry weight versus log10 total N content for both the 0 and 200 N rates are shown in Figure 4A. Although values for the 200 kg·ha−1 N treatment appear to fall on a line above the control values, slopes (P = 0.9901) and y-intercepts (P = 0.2576) were not significantly different. Once again, within N fertilization rate, all data values for spacing treatments (pooled across flush values) appear to fall on similar lines (Fig. 4B). The regression equations do not significantly differ for narrow and wide spacing treatments within either N level. The 200 kg·ha−1 N, narrow spacing treatment regression line appears to fall slightly above the line for the 200 kg·ha−1 N, wide spacing treatment, but differences for slopes and y-intercepts are not significantly different (P = 0.1766 and P = 0.0894, respectively). There may be a trend for greater N accumulation at similar dry weights for the narrow spacing but it is not statistically significant. A regression analysis of flushes (pooled across spacing treatments) at the two N treatments produces similar results (data not shown).
The slope of a log-log plot (log10 Y versus log10 M) is often used to define the exponent of a power law in scaling studies in which Y = Y 0 Mb (Brown and West, 2000; Kleiber, 1932; Reich et al., 2006). In the scaling literature, the variable Y is some physiological or morphologic characteristic, Y 0 is a scaling constant, and M is often body size (Brown and West, 2000; Kleiber, 1932; Reich et al., 2006). Scaling occurs when the exponent does not equal 1.0 (Brown and West, 2000; Kleiber, 1932; Reich et al., 2006). If b > 1, Y/M mathematically increases with body size. If b < 1, Y/M mathematically decreases with body size. The slopes of the log-log plots in Figure 4A are significantly less than 1 (P < 0.05 for both 0 and 200 kg·ha−1 N rates). Therefore, the slopes themselves provide additional evidence that N concentration is dependent on dry weight.
In Table 2, the statistical differences in total N content for spacing treatment and growth flush apparent in Table 1 disappear if log10 dry weight is used as a covariate when analyzing log10 total N content. As Packard and Boardman (1988) suggest, this indicates that differences in total N content for spacing treatment and growth flush are explained by differences in dry weight. Therefore, N concentration differences appear to be indirect. Differences in N rate are still significant in the covariate analysis suggesting that N rate effects are direct. The significant N rate effects were apparent although regression lines in Figure 4A are not significantly different. In this example, the ANCOVA was better able to detect statistical differences than regression.
A similar interpretation occurs if the inverse of dry weight is used as a covariate in the analysis of N concentration (Table 2). If data points for a plot of a ratio's denominator versus its numerator fall on the same regression line with a nonzero y-intercept, a ratio will be linearly related to the inverse of the denominator (Fig. 3D). Therefore, when the inverse of the denominator is used as a covariate, statistically different ratios in a standard ANOVA that have indirect causes (size of the denominator) will not be identified as statistically significant. Although numerators are often strongly correlated to denominators, a ratio is only related to its denominator when the equation for the original denominator versus numerator plot has a nonzero intercept. A significant relationship between a ratio and the inverse of its denominator in itself suggests that indirect effects are possible.
In Table 3, the two N levels are statistically evaluated independently. Conclusions with regard to N concentration support the results in Table 1. However, significant differences for spacing and flush for either log10 total N content or N concentration are not apparent when the respective covariates (log10 dry weight and inverse of dry weight, respectively) are used (Table 3). If only the data for the second growth flush at the 200 kg·ha−1 N rate are used (six data points), statistical differences for spacing effects do not change. N concentrations differ for spacing treatment (P = 0.0046). However, when either log10 total N content is adjusted with a log10 dry weight covariate or N concentration is adjusted with an inverse of dry weight covariate, significant differences disappear (P = 0.1244 and P = 0.0774 for log10 dry weight and inverse dry weight covariate analyses, respectively).
Mixed-model analysis of variance P values for treatment and covariate effects on N concentration and log total N content for an experiment evaluating the first and second growth flush of blueberry shoots on plants at two in-row spacings and two rates of N fertilizer.
The suggestion that indirect effects explain statistical effects for growth flush and spacing treatment is supported by the type I probability results presented in Tables 2 and 3. A model using only N rate and log10 of dry weight significantly explains the observed variability in log10 of total N content. Similarly, a model using only N rate and the inverse of dry weight significantly explains the observed variability in N concentration. Adding plant spacing or flush effects to the models does not result in statistically significant sequential incremental improvement.
The most appropriate interpretation of the data is that N concentration decreases as dry weight increases irrespective of spacing treatment or growth flush (Fig. 3C). The differences in N concentration for spacing treatment and growth flush (Tables 1–3) are associated with differences in dry weight and only indirectly related to spacing and flush. There may be a trend for greater N accumulation as dry weight increases for the narrow spacing at the 200 kg·ha−1 N level, but this effect was not statistically significant. In this experiment, indirect effects magnify a trend for greater N accumulation for the narrow spacing. Indirect effects also create statistical differences between growth flushes when there is no suggestion that factors other than differences in dry weight and total N content are important. Indirect effects related to the denominator can enhance or diminish the magnitude of direct effects depending on whether the sign of the y-intercept is negative or positive and whether treatments increase or decrease values for the denominator.
A scatterplot revealing the relationship between total CO2 assimilation per leaf (μmol·s−1) and leaf area for our wine grape example is shown in Figure 5A. None of the points suggests that photosynthesis can occur without leaf area, that is, a function representing the edge of the data cloud (maximum CO2 assimilation for a given leaf area) would have a negative y-intercept. However, a regression for averaged subsamples produces a linear function with a positive y-intercept that is not conceptually possible (there can be no photosynthesis without leaf area).
This positive y-intercept defines the relationship between leaf area and CO2 assimilation when expressed on a leaf-area basis (μmol·m−2·s−1). Photosynthetic performance (Fig. 5B) significantly declines as leaf area increases if a ratio-based expression for net CO2 assimilation (μmol·m−2·s−1) is used to describe photosynthetic efficiency. This occurs although the regression equation in Figure 5A suggests a constant incremental increase in total CO2 assimilation as leaf area increases.
Not all wedge-shaped distributions produce positive y-intercepts. In a cursory evaluation of real data distributions that, in our judgment, appeared to be wedge-shaped, averaging points produced positive y-intercepts in ≈15% of the data sets (data not shown). Although subsample-induced y-intercept shifts do not always occur, they are prevalent enough to be a concern. Averaging subsamples is a common practice in horticultural field trials. Single composite samples representing more than one individual also are collected routinely. In the citations in the Journal of the American Society of Horticultural Science discussed previously, 81% of the authors used either subsample means or composite samples to determine the values for replicates.
Mean almond free amino acid:total amino acid ratios for different ground N application concentrations with and without foliar urea supplements are shown in Figure 6. Our regression analysis for ground N application rate versus free amino acid:total amino acid ratio supports the conclusions of the original authors (Bi et al., 2004) in which a ratio between amino acid types was significantly altered by both ground and foliar N applications. The original authors also reported a significant ground N application rate × foliar N interaction.
For treatments without foliar urea, the free amino acid:total amino acid ratio increases with ground N application rate (r2 = 0.9788; P = 0.0013). For trees treated with foliar urea, the free amino acid:total amino acid ratio is higher than untreated trees and remains relatively constant with increasing ground N application rate. The modest increase in free amino acid:total amino acid ratio as ground N application rate increases for foliarly treated trees in Figure 6 may indicate a trend, but it is not statistically significant (r2 = 0.692; P = 0.081). Respective slopes (0.0003 and 0.0015) and y-intercepts (0.0855 and 0.0364) for the functions describing data with and without foliar urea supplements are statistically different (P < 0.001); therefore, an N rate × N regime interaction occurred. Our analysis and the authors’ original evaluations support the following conclusions:
-
The whole tree free amino acid N to total amino acid N ratio increased with increasing ground N application rate. Fall foliar N supplements increased the N ratio at each given ground N application rate with trees fertigated at a lower N concentration being more responsive.
-
Free amino acids account for a larger proportion of storage nitrogen with increasing N supply, but proteins still remain as the primary form of storage nitrogen.
However, the data do not necessarily suggest that foliar N supplements or changes in ground N application concentration alter how a tree stores N. The data in Figure 7A suggest that for every incremental increase in total N content that the plant takes up, the partitioning of this N into storage forms (free amino acid versus protein) is constant. For every unit increase in total amino acid N, there is a constant increase in free amino acid N. All points appear to fall on the same regression line regardless of ground N application rate or whether foliar N supplements were applied. Approximately 15% of each incremental increase in total amino acids consists of free amino acids, and this percentage never changes. Whether a plant has supplemental foliar fertilization or is exposed to different ground N application rates has no effect on how storage N is allocated between protein N and free amino acid forms. The negative y-intercept in Figure 7A may result from a threshold level below which values on the y-axis do not occur. It is likely that very few, if any, free amino acids accumulate in storage tissues when plants are N-deficient and have small amounts of total N.
Regression equations for ground N application rates with and without foliar N supplements are not significantly different. The treatment effects apparent in Figure 6 could only be direct if the points for the supplemental foliar N treatment fell on a different regression line than the points for the treatment without foliar N supplements. The function describing the relationship between total amino acids and free amino acids has a nonzero y-intercept. Therefore, the ratio of total amino acid N to free amino acid N will be dependent on the amount of total amino acid N (Fig. 7B). Differences in total amino acid N explain the differences apparent in Figure 6. Our regressions and statistics in the original reference suggest total amino acids and free amino acids increase with increased N application (data not shown).
Experiments are often conducted where ratio-based expressions are evaluated for only a control and a treatment. Because these experiments provide few data points, one cannot determine whether effects are direct or indirect. For example, if a foliar N-supplemented treatment for the almond experiment was evaluated at only one of the five ground N application rates, there would be a significant difference in the free amino acid:total amino acid ratio. The differences between mean amino acid ratio for urea supplemented and unsupplemented trees at the same ground N application rate are two- to fivefold larger than the standard errors for an amino acid ratio within treatments (Bi et al., 2004). However, with only two treatments, the researcher would be unable to determine whether replicate values for foliar N supplements or control treatments fall on the same or different regression lines.
The real cause of significant differences does not disappear when scientists design experiments that will not detect them. Experiments that evaluate ratio-based expressions for only a control and an experimental treatment are often inconclusive. More extensive experiments could address issues concerning indirect and direct effects.
A similar argument can be made with regard to the N concentration data discussed in Table 1 and Figure 3. If total dry weight of the tissues was not measured, there would be no way to determine if effects were direct or indirect. This is important because concentration data are often collected when total dry weights are not measured.
Issues involving statistically appropriate approaches to analyzing ratio-based expressions have been presented elsewhere (Huhn, 1991, 1993, 1998). When the numerator and the denominator of a ratio consist of random variables, the variance of the ratio response becomes a function of the variance–covariances of both numerator and denominator variances and the statistical distribution of the response is confounded (Huhn, 1991, 1993, 1998). This could complicate interpretation of an ANOVA. Here, we are concerned with the interpretation of obvious differences rather than the statistical approach used to define these differences. Our goal is to differentiate between direct and indirect effects.
The issue of indirect effects extends to the study of treatment effects on any ratio-based quantity but is frequently overlooked. The indirect effect is not always significant, particularly if the ratio does not vary with the numerator or denominator. However, lines such as those shown in Figures 2B, 3C, 5B, and 7B often are not flat because flat lines require that a linear function with a zero y-intercept describe the numerator versus denominator relationship. When the y-intercept is not zero or the function is not linear, the perceived treatment effect may be a statistical artifact. In Figures 2B, 3C, and 7B, effects could be attributed to treatment, when dry weight (Figs. 2B and 3C) or total amino acids (Fig. 7B) are the important factors.
The y-intercept need only be slightly different from zero for indirect effects to occur. In Figure 3A, the y-intercepts are small, but the indirect effect is still the source of statistical differences among treatments. In other cases, a slightly nonzero y-intercept would not impact the analysis or conclusions.
At some level of precision, almost all best-fit linear functions will have a nonzero y-intercept. There is no “rule of thumb” about when a positive or negative y-intercept has serious interpretive consequences. A very small nonzero y-intercept can alter interpretation if data points for the x-axis are clustered close to zero. If data points are located far from the origin, nonzero y-intercepts are unimportant. Defining the y-intercept and determining whether it is statistically different from zero is helpful but not conclusive. This is apparent in Figure 3C in which the N concentration = (1/dry weight)b + m function for the 0 N treatment derived from the original total N = m(dry weight) + b function (Fig. 3A) significantly describes the data points although the original positive y-intercept is not statistically significant. We have found other examples in which data sets produce functions with y-intercepts that are not significantly different from zero but still have significant relationships between a ratio and its components, but this rarely occurs (data not shown).
Indirect effects are associated with nonzero y-intercepts for functions describing the relationship between the denominator and the numerator of a ratio-based expression. However, we view determining whether a plot of the denominator versus the numerator for a data set has a nonzero y-intercept as only a first step in determining if indirect effects are important. There are four criteria that, when met, suggest statistically different treatment effects for a ratio-based expression are indirect:
-
A plot of the denominator versus the numerator for a ratio-based expression produces a function with a nonzero y-intercept.
-
A plot of the denominator versus the numerator for the treatment with the greater ratio does not produce a regression line that is significantly different from the regression line for the treatment with a smaller ratio.
-
Treatments have significantly different values for their denominators or numerators.
-
The trend between a ratio and either its denominator or numerator is significant.
All four criteria were met in most of the examples in which we suggest that treatment effects are indirect. In the exception previously alluded to (blueberry experiment at 0 N level), three of the four criteria were met (numbers 2–4).
Packard and Boardman (1988) are correct in suggesting that discontinuing the use of ratios to scale data and using ANCOVA to evaluate experimental data would reduce interpretive errors. Tanner (1949) is also correct in pointing out the difficulties of using per-area expressions. Furthermore, both Tanner's data and the examples presented here suggest that even traditional concentrations can have y-intercept-related interpretive problems. One could argue that per-area and per-weight expressions are inappropriate (Tanner, 1949). However, one cannot imagine a horticultural research environment where mineral concentrations and CO2 assimilation m−2·s−1 expressions are not used. A better solution is to continue the use of ratio-based expressions while insisting that editors and reviewers demand that both direct and indirect effects be evaluated. This is far better than the current state of affairs in which a large body of convincing research is simply ignored.
Conclusions and Recommendations
Scientists often are interested in differences in ratio-based expressions for samples with different-sized denominators or numerators. In many cases, the statistical differences reported in the literature may have a strong physiological basis that is unrelated to the size of the ratio components (denominator or numerator). However, unless the relationship between the denominator and numerator is known, it is difficult to determine if ratio differences are the result of different-sized denominators or numerators or other physiological causes.
Subsampling procedures, collecting data over a limited range, experimental variation, and error can lead to positive y-intercepts that are associated with indirect effects. Negative y-intercepts may also occur. Even subtle nonzero y-intercepts can affect interpretation radically. One should never assume that a plot of the denominator versus the numerator of a ratio-based expression has a y-intercept of zero.
Indirect effects associated with the “ratio problem” will not always occur. If a plot of the denominator versus the numerator of a ratio-based expression has a zero y-intercept or if the values for the denominators or numerators do not differ with cultivars or treatments, ratios are not dependent on the size of the ratio components. However, the widespread use of ratios in which the denominators or numerators for treatments are statistically different and the fact that nonzero y-intercepts may be ubiquitous suggest that interpretive errors may be common.
Although ratios have been used in scientific inquiries for hundreds of years, their interpretation is complex. Effects are direct when plots of the denominator versus the numerator for different groups or treatments produce regression lines that are significantly different. Data will likely be interpreted correctly when regression evaluations and an ANCOVA is used to supplement traditional data analysis. An ANCOVA can evaluate differences in covariate-adjusted means and determine the sequential importance of different variables. The inverse of the denominator can be used as a covariate when a ratio is analyzed. The denominator of a ratio-based expression can also be used as a covariate when the numerator is analyzed. However, as Packard and Boardman (1988) suggest, the covariance approach is not a panacea.
If covariates vary widely between treatments or groups, the adjustment may involve an element of extrapolation and the comparison of adjusted means has low precision (Snedecor and Cochran, 1973; Smith, 1957). One could conclude that differences among treatments are the result of the covariate, but a better interpretation may be that only very large effects would be detected (Snedecor and Cochran, 1973; Smith, 1957). In some cases, we have detected differences in smaller subsets of data when an evaluation of a larger subset, even with greater degree of freedom, did not result in significant differences (data not shown). Although assumptions for homogeneity of variance and homogeneity of slope were met in the examples we present, this is not always the case (data not shown).
For all of these reasons, we prefer to use ANCOVA as confirmation for conclusions based on plots of the denominator versus the numerator for ratio-based expressions. In the blueberry example, ANCOVA procedures were consistent with the graphic observations and regression analysis for both the complete data and data subsets.
A complete analysis requires a determination of values for both the numerator and denominator of a ratio-based expression. Unfortunately, in many cases, experiments are designed in which it is impossible to plot the denominator versus the numerator of a ratio-based expression, and the researcher can never determine if effects are direct or indirect. For example, when mineral concentration data are collected without measuring total dry weight or CO2 assimilation per leaf area is measured without measuring total leaf size, an interpretation will always be inconclusive.
In addition to ensuring that the necessary data be collected, experiments should be designed with enough replicates to determine conclusively if the relationship between the denominator and numerator differs for experimental treatments. Evaluating whether effects are indirect will be more revealing than a simple statistical comparison of ratio-based expressions. When a conclusion is based on statistical differences of a ratio-based expression, it is the researcher's responsibility to determine whether these effects are direct or indirect. Indirect effects are common enough that the possibility needs to be eliminated before a conclusion is reached.
Literature Cited
Atchley, W.R. , Gaskins, C.T. & Anderson, D. 1976 Statistical properties of ratios. I. Empirical results Syst. Zool. 25 137 148
Barnett, V.D. 1962 Large sample tables of percentage points for Hartley's correction to Bartlett's criterion for testing the homogeneity of a set of variances Biometrika 49 487 494
Bartlett, M.S. 1937 Some examples of statistical methods of research in agriculture and applied biology J. R. Stat. Soc. [Ser A] Suppl. 4 137 170
Bi, G. , Scagel, C.F. , Cheng, L. & Fuchigami, L.H. 2004 Soil and foliar nitrogen supply affects the composition of nitrogen and carbohydrates in young almond trees J. Hort. Sci. Biotechnol. 79 175 181
Brown, J.H. & West, G.B. 2000 Scaling and biology Oxford University Press N.Y
Candolfi-Vasconcelos, M.C. & Koblet, W. 1991 Influence of partial defoliation on gas exchange parameters and chlorophyll content of field-grown grapevines. Mechanisms and limitations of the compensation capacity Vitis 30 129 141
Hoagland, D.R. & Arnon, D.I. 1950 The Water-Culture Method for Growing Plants Without Soil Calif. Agr. Expt. Sta. Circ 347
Huhn, M. 1991 Character associations among grain yield, biological yield and harvest index J. Agron. Crop Sci. 166 308 317
Huhn, M. 1993 Comparison of harvest index and grain/straw ratio with applications to winter oilseed rape J. Agron. Crop Sci. 170 270 280
Huhn, M. 1998 A note on the skewness of the frequency distribution of harvest indices with an application to winter oilseed rape (Brassica napup L.) J. Agron. Crop Sci. 180 73 76
Kaiser, M.S. , Spekman, P.L. & Jones, J.R. 1994 Statistical models limiting nutrient relationships in inland waters J. Amer. Stat. Assn. 89 410 423
Kleiber, M. 1932 Body size and metabolism Hilgardia 6 315 353
Maller, F.C. , de Boer, E.S. , Joll, L.M. , Anderson, D.A. & Hinde, P.J. 1983 Determination of the maximum foregut volume of western rock lobsters (Panulirus cygnus) from field data Biometrics 39 543 551
Maller, R.A. 1990 Some aspects of a mixture model for estimating the boundary of a set of data J. du Conseil International pour l'Exploration de la Mer 46 140 147
Meinzer, F.C. & Zhu, J. 1998 Nitrogen stress reduces efficiency of the C4 CO2 concentrating system, and therefore quantum yield, in Saccharum (sugarcane) species J. Expt. Bot. 49 1227 1234
Packard, G.C. & Boardman, T.J. 1988 The misuse of indices and percentages in ecophysiological research Physiol. Zool. 61 1 9
Pearson, K. 1897 On a form of spurious correlation which may arise when indices are used in the measurement of organs Proc. Royal Soc. London 60 489 502
Ranjith, S.A. & Meinzer, F.C. 1997 Physiological correlates of variation in nitrogen-use efficiency in two contrasting sugarcane cultivars Crop Sci. 37 818 825
Reich, P.B. , Tjoelker, M.G. , Machado, J.L. & Oleksyn, J. 2006 Universal scaling of respiratory metabolism, size and nitrogen in plants Nature 439 457 461
Sage, R.F. & Pearcy, R.W. 1987 The nitrogen use efficiency of C3 and C4 plants Plant Physiol. 84 959 963
Sandrock, D.R. , Righetti, T.L. & Azarenko, A.N. 2005 Isotopic and nonisotopic estimation of nitrogen uptake efficiency in container-grown woody ornamentals HortScience 40 665 669
Smith, H.F. 1957 Interpretation of adjusted treatment means and regression in analysis of covariance Biometrics 13 282 308
Snedecor, G.W. & Cochran, W.G. 1973 Statistical methods The Iowa State Univ. Press Ames, Iowa
Steenbjerg, F. 1951 Yield curves and chemical plant analysis Plant Soil 3 97 109
Tanner, J.M. 1949 Fallacy of per-weight and per-surface area standards and their relation to spurious correlation J. Appl. Physiol. 2 1 15
Thomson, J.D. , Weiblen, G. , Thomson, B.A. , Alfaro, S. & Legendre, P. 1996 Untangling multiple factors in spatial distributions: Lilies, gophers, and rocks Ecology 77 1698 1715