Climate variability and changes affect crop yields by causing climatic stresses on components of the agricultural ecosystem (Bazzaz and Sombroek, 1996). Climate impacts the production of crops by modifying the biophysical environment in which they grow (Monteith, 1981). Temperature is one of the most important variables controlling crop growth and phenology (Hodges et al., 1993), while rainfall and atmosphere–soil interactions play determining roles in crop productivity (Reddy et al., 2002). Weather conditions interact with soil water availability and affect water status in plants, which, in turn, influences how plant processes respond to other environmental factors. Photosynthesis, photorespiration, and transpiration are the main plant processes directly affected by atmospheric CO2 changes. Elevated CO2 generally enhances photosynthesis due to increased concentrations of the substrate and suppression of photorespiration in C3 plants such as cotton (Reddy et al., 2000) that in turn causes partial stomatal closure (Morison, 1987) and reduces transpiration (Rosenzweig and Hillel, 1998; Kimball and Idso, 1983). This increases water use efficiency (Reddy et al., 2000) and plant yields (Kimball, 1983). It is imperative that any crop growth model first be evaluated for its ability to reproduce these observed relationships between crop yields and climate stresses before its general use to study crop–climate interactions under present and future conditions.
Therefore, the objective of this study was to evaluate a geographically distributed cotton growth model newly redeveloped from GOSSYM for its ability to simulate historical cotton yield variations and their responses to climatic stresses under actual climatic conditions across the U.S. Cotton Belt. In a continuous effort to study cotton–climate interactions, GOSSYM has been totally reengineered to the FORTRAN 90 software standard (Xu et al., 2005), and its representation of physical processes and specification of adjustable parameters (Liang et al., 2012a) have been greatly improved to facilitate full coupling with the mesoscale regional Climate–Weather Research and Forecasting (CWRF) model (Liang et al., 2012b). The present stand-alone evaluation, as driven by the best available observational analysis of climate, eliminates complications from any CWRF climate biases via nonlinear feedbacks and thus enables effective identification of physical and numerical deficiencies specific to GOSSYM. This effort is a prerequisite for application of the fully coupled CWRF–GOSSYM across the same computational domain and grid resolution.
MATERIALS AND METHODS
Model Simulations and Observational Data
The redeveloped, geographically distributed GOSSYM, at the CWRF 30-km grid spacing, is integrated continuously for the entire growing season (with the longest span from 30 April to 12 November in New Mexico) during 1979 to 2005 under realistic climate conditions and agricultural practices. The original GOSSYM has 55 parameters for users to calibrate information specific to the cotton cultivar at a farm site (Boone et al., 1993; Reddy et al., 2003) and an additional 30 parameters to be specified as input for soil conditions and management practices. To facilitate the spatially distributed modeling of climate–crop interactions, Liang et al. (2012a) first minimized the input parameter list by replacing most of these quantities with the best available physical representations and observational estimates. The remaining two new adjustable parameters with regional dependence (initial NO3 amount in the top 2 m of soil and the ratio of irrigated water amount to potential evapotranspiration) were then determined through inverse modeling to minimize local root mean square errors (RMSEs) of annual cotton yields under realistic climate conditions. This new set of parameter specifications was adopted in the present study to evaluate GOSSYM's representation of cotton yields and climate stresses across the U.S. Cotton Belt.
The climatic conditions driving the stand-alone GOSSYM integration were taken from the North American Regional Reanalysis (NARR) (Mesinger et al., 2006). It is a long-term, consistent, high-resolution climate data set, representing the best available proxy of observations. The NARR adopts a 32-km grid, close to that of CWRF, and provides 3-h atmospheric and land surface data, including precipitation, evapotranspiration, runoff, net radiation flux, surface air temperature, humidity and wind, and soil temperature and moisture. Given large biases in the NARR model-based product, especially for the southeastern United States, we replaced daily precipitation with an objective analysis based on gauge measurements from 7235 U.S. National Weather Service cooperative stations (see Liang et al.  for the data source and analysis procedure).
Figure 1 illustrates the geographic distributions of rainfall, daily maximum surface air temperature, incident solar radiation, evapotranspiration, and wind speed at 10-m height averaged across the growing season during 1979 to 2005. These mean climatic conditions determined the overall planting and yield distributions across the U.S. Cotton Belt, discussed below.
For this study, the main verification data were observed cotton yields. Annual yield data for each county are available from the National Agricultural Statistics Service (NASS) archive and were interpolated into the CWRF grids using ArcGIS. (The interpolation included six steps using ArcGIS commands: “joinfield” to incorporate the plant area, harvest area, and yield data from NASS into the county shape file; “shapegrid” to transform the data in ArcGIS shape file format to ArcGIS GRID format at 0.01° resolution by nearest neighbor assignment approach; “project” to project the data onto the CWRF Lambert Conformal Conic Projection; “resample” to resample the projected data to 1-km resolution by the nearest neighbor assignment method; “latticeclip” to clip the above 1-km data into the 30-km grid; and “zonalmean” to get the mean of the 1-km data within the 30-km grid. All of the steps were programmed into one ArcGIS script to be run in batch manner.) They provided the reference against which the GOSSYM simulated cotton yields were evaluated. The same observational data described above were used to assess the model's ability to represent cotton growth dependence on climate.
Cross-Validation for Parameter Optimization
The model optimization required a long time series of historical observations and retrospective predictions to train the redeveloped GOSSYM and derive the appropriate geographic distribution for the two new adjustable parameters. Many previous studies have adopted the cross-validation approach (Efron and Gong, 1983; Michaelsen, 1987) to assess performance for climate models (Peng et al., 2002; Kharin and Zwiers, 2002; Liang et al., 2007) and crop growth models (Irmak et al., 2000; Wallach et al., 2001; Thorp et al., 2007; Xiong et al., 2008) because of constraints imposed by short data records. For example, Irmak et al. (2000) evaluated the ability of a soybean [Glycine max (L.) Merr.] model to predict anthesis, maturity, and yields and found that the errors using cross-validation were similar to those from a variety of trial data. Thorp et al. (2007) evaluated CERES-Maize yield simulations using 5 yr of observed yields and found that cross-validation was an appropriate method of model assessment. They also pointed out that a more stable optimized parameter can be obtained when additional growing seasons are used in the cross-validation.
We focused on annual cotton yield variations during 1979 to 2005 using the leave-one-out cross-validation approach, where the two new adjustable parameters are trained with data from all years except one and then used for GOSSYM to make a prediction of the cotton yield for that excluded year. Specifically, 26 yr of data were used for training in each optimization solution and 27 yr of the resulting prediction were applied for verification. Thus, the total number of annual data samples for training was 26, and 27 for verification, which is sufficient to obtain robust statistics. Figure 2 illustrates the relative mean biases  of the cotton yields predicted by GOSSYM using the cross-validation approach, where denotes the mean of the cotton yields from 1979 to 2005, while the subscripts p and o represent predictions and observations, respectively.
The redeveloped GOSSYM prediction performed well, where the long-term mean modeled yields were within ±10% of observations in the 30-km grids across major areas of the U.S. Cotton Belt, and –4% overall (Fig. 2). The largest underestimations (less than –20%) of the predictions occurred along the Arizona–California border and in much of central and eastern Texas. In the Southeast, predictions were generally within ±10% of observations, with larger overestimations (>20%) in some areas. In contrast, the original GOSSYM overestimates yields by 27 to 135% on the state level and 92% overall (Liang et al., 2012a). Clearly, the redeveloped GOSSYM significantly improved the prediction of mean cotton yields. In addition, the mean bias distribution based on the cross-validation shown here is very close to that using a single optimization with all 27 yr of data (Liang et al., 2012a). These results indicate that the method for optimizing the two new parameters through inverse modeling is appropriate and robust. This establishes the credibility of using the redeveloped model to study the relationships between cotton yields and climate stresses.
Mean Cotton Yields and Climate Dependences
Figure 3 depicts the observed and GOSSYM-simulated geographic distributions of 1979 to 2005 average cotton yields. In observations, the yields were greatest (>1200 kg ha−1) in California and Arizona, least (<600 kg ha−1) in the Texas High Plains and Oklahoma, medium (about 800 kg ha−1) in New Mexico and the Mississippi Delta, and low (about 600 kg ha−1) in the remaining states. The GOSSYM-simulated values agreed well with observations, where the long-term mean modeled yields were within ±10% of observations across most areas of the U.S. Cotton Belt. This provides a credible baseline to study crop yield interannual variability and its dependence on climate variations.
Comparison of Fig. 1 and 3 provides a general picture of how the geographic distribution of mean cotton yields depends on the distinct regional climate. Most critical to crop growth are temperature, incident solar radiation, water, and nutrient supply. Rainfall is abundant in the southeastern states, exceeding 3 mm d−1 on average, and is accompanied by a temperature of approximately 30°C, which is close to the 26 to 28°C threshold for optimum growth (Reddy et al., 2002). Meanwhile, increased cloud cover causes incident solar radiation to be reduced, and both moisture and nutrients are depleted by rapid drainage in sandy soil (which requires, in addition to rich rainfall, light irrigation <100 mm yr−1 in certain areas, like Georgia), each of which is less than optimal for cotton growth. As a result of the competition between the two opposing environmental conditions, the actual yields in these states fall in the low range of the U.S. Cotton Belt. On the other hand, the southwestern states are identified with abundant daily solar radiation, in excess of 300 W m−2, that promotes cotton growth but excessively dry and hot weather, which causes large water and heat stresses. High cotton yields are achieved in this region through heavy irrigation (∼500–900 mm yr−1). Across Texas and Oklahoma, the prevailing low-level southerly jet stream facilitates large evapotranspiration and produces additional water stress. The limited availability of water for irrigation in this region contributes to the lowest yields in the U.S. Cotton Belt. Along the Lower Mississippi River, moderate rainfall and large irrigation potential as well as warm temperatures, adequate solar radiation, and deep soils allow large planting acreages and average yields. The GOSSYM model realistically captured these physical cotton–climate correspondences.
Figure 4 illustrates the simulated geographic distributions of water and carbohydrate stresses on plant growth, N stress on the cotton boll, as well as annual maximum leaf area index (LAI) and plant height averaged during the growing season. Each stress factor increased as the actual plant stress decreased from the most severe (0.0) to none (1.0). The water stress factor (Fig. 4a) was generally larger under irrigated than unirrigated conditions, with mean values of 0.53 and 0.37, respectively. This indicates that water availability or precipitation timing for cotton growth was less optimal in the rainfed regions during 1979 to 2005. Higher cotton yields in these regions can be produced as water becomes more abundant. Across the irrigated regions, the larger water stress factor resulted from model optimization for GOSSYM to produce realistic cotton yields. Uncertainties exist, however, due to unknown actual irrigation practices. The water stress factor was largest (∼0.58) across Arizona and California, where cotton growth is maintained by heavy irrigation presuming an abundant water supply; it was smallest (∼0.25) in the dryland areas of the Texas High Plains, where both irrigation and rainfall are insufficient; and it was medium (∼0.33–0.54) in the Lower Mississippi River basin and the southeastern states, where abundant rainfall or irrigation occurs.
Across most areas, regardless of irrigation practice, the simulated carbohydrate stress factor (Fig. 4b) varied between 0.37 and 0.75. Thus, the CO2 fertilization effect on cotton growth is not saturated at present, and higher yields can be projected in the future as atmospheric CO2 concentrations increase. The lowest carbohydrate stress factor existed across the dryland areas of the Texas High Plains where the water stress factor also reached its minimum. Across this region, greater cotton productivity can be achieved when water availability and CO2 concentration are enhanced. On the other hand, the model simulated weak N stress across the U.S. Cotton Belt, where the stress factor was between 0.86 and 1.0 everywhere (Fig. 4c). This may result from the tendency of farmers to apply adequate fertilizers to maximize cotton yields. The impact of this factor on cotton growth shall be addressed in the future, when robust observational data become available.
Note that the annual yield (Fig. 3b), maximum LAI (Fig. 4d), and plant height (Fig. 4e) were all higher across areas with larger water stress factor values. These variables were maximized in Arizona and southern California, where irrigation is unlimited, and were somewhat smaller in northern California and along the Lower Mississippi River, where irrigation is also sufficient. Under unirrigated conditions, mostly in the southeastern states, the water stress on cotton growth may be substantial, depending on regional rainfall abundance and timing. Across Texas and Oklahoma, severe water stress greatly reduced LAI, plant height, boll weight, and number of open bolls, and consequently led to very low cotton yields (<500 kg ha−1).
Interannual Cotton Yield Variations and Climate Stresses
Figure 5 depicts the observed and GOSSYM-simulated cotton yield interannual standard deviations. There is reasonable agreement between the modeled and observed variabilities. Given that the yearly records are independent and have 26 degrees of freedom, the modeled and observed deviations are not statistically different at the 95% confidence level when their ratios are within 0.67 to 1.50. By this measure, GOSSYM simulated cotton yield interannual variability close to observations across 57% of the harvested areas.
A more important capability in modeling variability, however, is the temporal correspondence between GOSSYM and observed anomalies from 1 yr to another. This can be measured by correlating the time series. Figure 5c shows that GOSSYM and observed crop yields are positively correlated across 99% of the grids of the U.S. Cotton Belt. Assuming yearly independence, correlation coefficients >0.32 are statistically significant at the 95% confidence level. By this measure, significant correlations were found across 79% (irrigated), 83% (unirrigated), and 87% (all) of the harvest grids. Note that a grid was defined as irrigated when the irrigated area percentage is >80%. while unirrigated grids are those in which the unirrigated area percentage is >80%. This result demonstrates the important contribution of climate variations to cotton yields in these areas and explains >50% of the observed interannual variance in 53% of the harvest grids. There are, however, about 3% of the harvest grids having correlations <0.1. These areas may be identified with minor climate control, large model deficiencies, or observational inaccuracies due to the mapping of USDA county-level data to the CWRF 30-km grid. In particular, correlations were small in southern California and western Arizona. This model failure probably resulted from simulation of the incorrect cotton cultivar. In reality, California produces a special upland cotton cultivar known as San Joaquin Valley Acala, variations of which are also grown in Arizona. Its yield is higher than those of the upland cotton cultivars planted in the southeastern, Mississippi Delta, and Plains states and results from the longer growing season, greater number of hot days, and close control of irrigation. Unfortunately, the current GOSSYM does not possess the ability to simulate the growth of this Acala variant.
Previous empirical studies have strived to establish statistical relationships between crop yields and both major climate variables and stresses for prediction purposes; however, the relationships, if any, are very complex because it is the integration across many factors (climate stresses, soil characteristics, cultural practices) during the entire crop life cycle that determines the final yield. A physically based model, such as GOSSYM, provides a unique tool to discover the underlying climatic controls on crop yields. For example, we sought to identify which climate variables in particular growth periods play a major role in final cotton yield. This can be depicted by the time evolution during the life cycle of interannual correlation coefficients R(s,) between yearly cotton yields Y(s,y) and 30-d running means V(s,,y) of the climate variable of interest, where (s,y,t) denotes the specific grid, individual year, and Day of the Year, while indicates the running mean of the variable between days and + 29.
Figure 6 illustrates the percentage of the total harvest areas across the U.S. Cotton Belt where 30-d running correlations of observed and GOSSYM-simulated yields with rainfall, incident solar radiation, daily maximum surface air (2-m height) temperature and root-zone (top 1-m depth) soil temperature are statistically significant at the 95% confidence level during the growing season, averaged from 1979 to 2005. Note that the correlations may be positive or negative, depending on actual heat, water, light, and nutrient stresses. For example, positive temperature anomalies may foster higher (lower) yields in a cold (hot) region, where the average temperature is below (above) the optimum for cotton growth. Thus, correlation absolute values were considered here for both rainfed and irrigated lands.
Across the rainfed lands, cotton growth in GOSSYM is determined predominantly by climate variations. For the four climate variables examined, the simulated percentage of areas with significant correlations exceeded observations, especially during the summer peak dependence periods. This overestimation probably occurred because the model does not consider plant mortality caused by weather, weeds, and insects or cultural practices. The model, however, faithfully captured the overall observed climate control on cotton yields. The most important climatic factor that affected cotton yields in both observations and GOSSYM is the daily maximum air temperature, which had a distinct plateau of significant correlations in >68% of the harvest areas during late June to early August. Given the use of a 30-d sliding window, this result suggests that the July to August maximum air temperature is a good predictor of the annual cotton yield. The second most important factor is root-zone soil temperature, where the observed peak, relative to maximum temperature, occurred approximately 30 d later and significant correlations occurred across a smaller percentage of the harvest area (51 vs. 68% for maximum temperature). The model also correctly simulated this delayed effect. Note that soil temperature was not directly measured but produced by the NARR model with certain biases. This may partially explain the larger discrepancy that existed between the modeled and observed yield dependence on soil temperature when compared with air temperature. The third most important factor is incident solar radiation, for which the observed overall yield dependence peaked in late May to early June across about 35% of the harvest areas. The model correctly depicted this peak but maintained an elevated percentage of significant correlations through early July. The fourth factor is rainfall, with which significant cotton yield correlations were observed during May to August across about 20% of the harvest areas. The model produced a much higher percentage of significant correlations, in excess of 60% of the harvest areas, during late May to early July. This large overestimation may have occurred, in part, because the model does not represent the impact of rainfall damage on cotton growth.
For irrigated lands, the percentage of areas having significant cotton yield correlations with the climate variables shows both the magnitude and phase agreement between GOSSYM and observations. The predictive signals, however, are dramatically different than those for rainfed lands. In particular, a relatively large percentage of areas (∼40%) had significant correlations with rainfall, incident solar radiation, and daily maximum air temperature during mid-April to May and with root-zone soil temperature (40–60%) in mid-April to June. These peaks occurred approximately 2 mo earlier than they did for rainfed lands. On the other hand, there was a rapid decrease in the percentage of significant areas during early June for the first three variables and late June for the fourth variable. This transition is coincident with the development of cotton squares and flowers. Model output showed that irrigation was essentially not applied during the early growing season because the surface temperature and evapotranspiration were relatively low. As such, initial crop growth strongly depended on local climate conditions. As the growing season progressed, more frequent and heavier irrigation regulated crop growth, making climate control less evident. A rapid increase in the percentage of area with significant correlations with daily maximum air temperature occurred in July and early August. The model correctly simulated this increase, although it overestimated its magnitude.
The above correlation results are not intended to indicate that rainfall and solar radiation are any less important than temperature at specific locations. They only suggest a general tendency under the prevailing regional climate conditions. Across the U.S. Cotton Belt, water stress is generally not severe because rainfall and irrigation are sufficiently plentiful. Given that nutrient stress is also weak and solar radiation is abundant, heat stress becomes the dominant factor that controls cotton yields.
Plant physical conditions at certain growth stages in the cotton life cycle are anticipated to be related to final yields (Li et al., 2001; Doraiswamy et al., 2004, 2005; Zhao et al., 2007). One representative variable is the LAI, which can be retrieved through remote sensing (Myneni et al., 2002). The LAI predicts photosynthetic primary production and is often used as a reference for crop growth (Hay and Porter, 2006). In general, increased LAI implies a more vigorous vegetative canopy during the plant growing period but may reduce boll production in the mature stage as vegetative and fruiting structures compete for the available energy and nutrients. To depict the general yield–LAI relationship across the entire U.S. Cotton Belt, four statistical measures were calculated: gross correlation R(), mean correlation , and frequency (across space) of negative and positive R(s,) correlations significant at the 95% confidence level, where R() is the same as R(s,) except that all values at individual grids (s) and years (y) are used as data samples, while is the average of R(s,) across all grids.
Figure 7 shows the time evolution of the four statistical measures of cotton yield correlations with preceding LAI as simulated by the redeveloped GOSSYM. Across the entire U.S. Cotton Belt, annual yields were identified with a distinct LAI correlation maximum during July to August, reaching 0.6 to 0.7 for R() and 0.4 to 0.7 for . The number of grids with significant positive interannual correlations increased from 25% in early July to 60% in late August. After that, as cotton plants were in their mature stage with fully developed fruiting structures, the correlations rapidly decreased and became highly negative in late September to October, reaching –0.4 to –0.5 for R(). The number of grids with significant negative interannual correlations increased from 18% in mid-September to 38% in mid-October. These modeled relationships are physically intuitive from the perspective of cotton growth characteristics and, when validated by observations, have important implications. In particular, the summer LAI information can be an effective predictor of final yields at a lead time of 2 to 3 mo (Zhao et al., 2007).
DISCUSSION AND CONCLUSIONS
The geographic distribution of cotton yields is critically determined by distinct regional climate characteristics. In the southwestern states, plentiful solar radiation and heavy irrigation counteract dry and hot weather to achieve high yields. Along the Lower Mississippi River, moderate rainfall plus convenient irrigation, warm temperatures, and abundant solar radiation produce average yields. In the southeastern states, adequate rainfall and warm temperatures compete against cloud-cover-attenuated solar radiation and drainage-induced moisture and nutrient depletion to maintain lower yields. Finally, across Texas and Oklahoma, large evapotranspiration and insufficient water availability for irrigation result in the lowest yields in the U.S. Cotton Belt. These physical cotton–climate correspondences were realistically captured by GOSSYM.
The GOSSYM-simulated water stress was generally large without irrigation. This indicates that rainfed regions experience less optimal water availability conditions for cotton growth, although higher yields can be produced when water becomes more abundant. Across the irrigated regions, water stress is relatively low when water supplies allow heavy irrigation. Lower yields may be anticipated in the future with a drier climate and less water available for irrigation. Across most irrigated and unirrigated areas, modeled carbohydrate stress ranged from medium to high. This suggests that the CO2 fertilization effect on cotton growth is not currently saturated, and higher yields can be projected in a future with enhanced CO2 concentrations. The GOSSYM model, however, simulated weak N stress, the credibility of which warrants tests against detailed observational measurements when they become available.
The GOSSYM model simulated interannual cotton yield variability that was in substantial agreement with observations, with significant correlations in 87% of all harvest grids. This indicates the model's ability to simulate regional climate impacts on cotton yields that explains >50% of the observed interannual variance in 53% of the harvest grids. Lag correlation analysis revealed that the July to August maximum air (August to September soil) temperature anomalies may predict annual cotton yield differences in unirrigated lands. These time-specific air temperature and delayed soil temperature signals are more distinguishable than other climate variables for both observed and GOSSYM yields. A similar, but somewhat weaker, predictive signal was found for irrigated lands in both modeled and observed yields. In addition, simulated cotton yields were highly positively correlated with July to August LAI. These relationships, if confirmed by observations, have important implications for predicting annual cotton yields from summer LAI as retrieved from satellite measurements.
This study demonstrates that GOSSYM simulations are very promising for the geographically distributed modeling of climate-driven cotton growth. There remain, however, important model output discrepancies with observations. In particular, the model generally overestimated cotton yield interannual variability, where standard deviations across the entire U.S. Cotton Belt were, on average, about 45% larger than observations. The percentage of areas containing significant cotton yield correlations with climate variables was also greater in GOSSYM than observations, especially during the summer peak dependence periods. These overestimations may have occurred because the model does not currently consider plant mortality caused by catastrophic weather events, such as hurricanes or heavy rainfall (especially in the southeastern states) and strong wind gusts (especially in Texas) or yield loss due to weeds, diseases, and insects. Large differences in technology and management, including irrigation, fertilizer and pesticide application, tillage practice, hybrid, and perhaps nutrient stresses such as K deficiency (Reddy and Zhao, 2005), also exist across individual farm fields and time periods, none of which are explicitly treated in the model. These missing factors are presumed to account for a large portion of the disagreement between modeled and observed variance and, hence, imperfect temporal correspondence.
In conclusion, the redeveloped GOSSYM model, with significant physics improvements and the best available soil and cultural specifications, realistically reproduced the geographic distribution of mean annual yields across the U.S. Cotton Belt and captured key interannual signals of climatic stresses observed during 1979 to 2005. This provides a baseline reference from which further model improvements and applications will be made. In particular, outcomes from the present study form a solid foundation for our next research phase. That work will examine the utility of the fully coupled CWRF–GOSSYM model to predict cotton–climate interactions and project future yield changes.