About Us | Help Videos | Contact Us | Subscriptions

Agronomy Journal - Agronomy, Soils & Environmental Quality

Testing Remote Sensing Approaches for Assessing Yield Variability among Maize Fields


This article in AJ

  1. Vol. 106 No. 1, p. 24-32
    unlockOPEN ACCESS
    Received: June 26, 2013
    Published: October 18, 2013

    * Corresponding author(s): dlobell@stanford.edu
Request Permissions

  1. Adam M. Sibleya,
  2. Patricio Grassinib,
  3. Nancy E. Thomasc,
  4. Kenneth G. Cassmanb and
  5. David B. Lobell* *a
  1. a Dep. of Environmental Earth System Science and Center on Food Security and the Environment, Stanford Univ., Stanford, CA 94305
    b Dep. of Agronomy and Horticulture, Univ. of Nebraska, Lincoln, NE 68583–0915
    c Spatial Analysis Center, Stanford Univ., Stanford, CA 94305


For remote sensing to be useful for analyzing crop yield gaps, methods should be accurate at the field scale without need for local ground calibration. We used an extensive field-level data set of on-farm yields from 134 irrigated and 94 rainfed maize (Zea mays L.) fields in Nebraska during a 4-yr period to evaluate three methods that do not require ground-based calibration. The first method is based on summing estimates of absorbed photosynthetically active radiation from the Moderate Resolution Imaging Spectroradiometer (MODIS) sensor, the second on using either MODIS or Landsat data to calibrate a crop model (Hybrid-Maize), and the third using Hybrid-Maize simulations to train a simple regression, which is then applied to MODIS or Landsat data. For MODIS, all three methods performed similarly poorly at predicting maize yields, with an R2 between observed and predicted yields of roughly 0.10 for rainfed and 0.20 for irrigated fields. Estimates from Landsat were considerably more accurate, with up to 20% of rainfed and 50% of irrigated yield variation captured by the predictions. Across all methods and sensors, irrigated yield variations were more successfully captured than rainfed yields because of relatively smaller rainfed field sizes and the added difficulty of modeling crop water stress in rainfed fields. Agreement between observed and predicted yields was highest for the third approach, which is attractive because it leverages a crop model’s ability to synthesize knowledge on crop physiology and year-to-year differences in weather throughout the season, yet produces a simple regression that can be rapidly applied to Landsat imagery.


    APAR, total absorbed photosynthetically active radiation; CM-CS, crop model curve selection; CM-Reg, crop model based regression; fPAR, fraction of absorbed photosynthetically active radiation; LAI, leaf area index; MODIS, Moderate Resolution Imaging Spectroradiometer; NIR, near infrared; PAR, photosynthetically active radiation; RUE, radiation use efficiency; VI, vegetation index; WDRVI, wide dynamic range vegetation index

Since the advent of Earth-observing satellites several decades ago, researchers have made efforts to derive useful information about agricultural fields from satellite data. Perhaps the most relevant and sought-after quantity is yield (i.e., crop grain production per unit of land area), for which remotely sensed estimates would greatly benefit farmers as well as researchers and policymakers concerned with food production. Ideally, satellite data would be able to provide an accurate yield estimate for each individual field each year and even for individual parcels within fields, regardless of the crop being grown. Providing information about yield at this level of detail remains a challenge, however, both because of limitations imposed by the spatial, spectral, and temporal resolutions of available satellites and because of complicated relationships between remotely measurable parameters and crop productivity.

Yield estimates based on satellite data are useful for a variety of applications, each with their own requirements for achieving desired levels of accuracy and spatial detail. For example, the Famine Early Warning System utilizes satellite data to provide near-real-time assessments of crop conditions over large areas (http://fews.net). For this application, the objective is to provide indications of large areas that are well below average yields, so high accuracy and fine spatial resolution are much less important than rapid turnaround times. A different application of remote sensing, which we focused on in this study, is yield gap analysis, in which maps of actual yield estimates can be used to investigate sources of yield variability among fields across space and time, which can provide insight into causes of the yield gap (the difference between potential and actual yields) (Lobell et al., 2009). Moreover, estimation of actual crop yield through remote sensing offers an alternative to the more resource-consuming field measurements and surveys typically used to estimate crop yields at regional to national scales.

For a remote sensing approach to be most useful to yield gap analysis, two characteristics are especially important: (i) the ability to operate in new locations or years without additional ground-based calibration, and (ii) good field-level accuracy. The first is needed to limit the time and expense involved in obtaining information on yields across multiple areas and years. For example, several studies have demonstrated accurate maize yield prediction with statistical models that relate measured yield to vegetation indices (VIs) derived from remotely sensed reflectance measurements (Cicek et al., 2010; Panda et al., 2010; Shanahan et al., 2001); however, applying these approaches to other years or new areas would require recalibrating the statistical models by measuring yields on a sufficient number of fields. The second criterion, field-level accuracy, is relatively straightforward but rarely evaluated in studies that purport to test remote sensing methods. Some studies have sampled enough fields to report field-level correlations between estimated and observed yields, for instance in sugarbeet (Beta vulgaris L.) (Clevers, 1997; Launay and Guerif, 2005) and wheat (Triticum aestivum L.) (Lobell et al., 2005, 2007). More often, however, studies that estimated crop yields at the field level lacked sufficient field-level data and therefore reported only the mean yield estimates for a given region compared with the mean reported yields (Baez-Gonzalez et al., 2005; Doraiswamy et al., 2005; Fang et al., 2008) or the correlation between reported and estimated yields at an aggregate level where official statistics are available (Becker-Reshef et al., 2010; Lobell et al., 2010). Researchers commonly recognize the potential value of field-level evaluation but do not have access to sufficient data. For example, Dente et al. (2008) mapped wheat yields in Italy and compared the average and variance of estimated yields with reported values. They acknowledged the limited insight this gives into field-level accuracy but explained that “a deeper analysis about yield variations estimated by the model goes beyond the limit of this work because it would require a higher number of monitored fields and a larger dispersion between measured yields.”

To improve understanding of remote sensing accuracies for yield estimation at the field level, the current study focused on maize yields in Nebraska, for which a highly detailed ground-based data set was obtained for evaluation. Specifically, the goal of this study was to test three approaches to estimate actual yields with remote sensing, all of which avoid the need for ground-based calibration.

Background on Approaches

Many approaches have been developed to translate remote sensing data into estimates of yields, and several reviews of such methods exist (Gallego et al., 2010; Moulin et al., 1998). Here we focus on three approaches that do not require ground-based measurements, except for weather data that can be obtained from ground or satellite-based measures. The first approach is based on the light-use-efficiency approach (Monteith, 1977) in which biomass is proportional to the total amount of absorbed photosynthetically active radiation (APAR) throughout a growing season. Biomass is then related to yield by a constant harvest index (HI):where RUE is the radiation use efficiency inherent to the crop and is generally higher for C4 crops like maize than C3 crops like wheat (Daughtry et al., 1992; Lindquist et al., 2005; Sinclair and Muchow, 1999). This approach, which we refer to as the APAR method, has been applied to a wide range of sensors, crops, and regions (Daughtry et al., 1992; Lobell et al., 2002; Reeves et al., 2005). Two key steps in this approach are to estimate the fraction of photosynthetically active radiation (PAR) absorbed by the canopy (fPAR) on a given day based on reflectance or VIs and to obtain enough fPAR estimates throughout the growing season to approximate the total season APAR. The APAR approach is most readily applied to sensors that have frequent observations, such as 8- or 16-d composite images from MODIS. In fact, the APAR approach underlies most standard products from MODIS related to plant growth, such as the gross primary production product (Zhao et al., 2005).

A second approach is to use crop simulation models to predict crop yields, with the remote sensing measurements used to adjust inputs or parameters for the model on a pixel-by-pixel basis. Like the APAR method, this approach has been common for at least two decades (Bouman, 1992; Maas, 1988). In practice, this approach is commonly applied by simulating crop growth and yields for multiple combinations of factors such as sowing date, relative growth rate, or soil water holding capacity, and the simulated values of leaf area index (LAI) or fPAR are then compared with remotely sensed estimates of these quantities (Clevers, 1997; Dente et al., 2008; Doraiswamy et al., 2005; Launay and Guerif, 2005). The inputs and parameters that result in the closest match between simulated and observed values throughout the season, for instance by producing the lowest RMSE, are then selected and the yield associated with that simulation is assigned to the given pixel. We refer to this second approach as crop model curve selection (CM-CS).

A third, less common approach uses crop model simulations in a different manner. As in the second approach, multiple simulations are performed with the crop model for different combinations of inputs and parameters. Rather than directly comparing the simulated data with remote sensing estimates, however, the simulated data are used to calibrate a simple regression model that relates yield to a single quantity, such as a VI or fPAR estimated on a specific date. This approach, which we refer to as crop model based regression (CM-Reg), was described by Clevers (1997) in a study of sugarbeet in the Netherlands. They found that despite its simplicity and use of fewer dates of imagery, it gave slightly better results than the curve selection approach (albeit based on an analysis of only 10 fields). A similar approach was also adopted in a study of maize in Mexico (Baez-Gonzalez et al., 2005), although no evaluations were made at the field scale. A key aspect of this approach is that the crop model is able to account for multiple processes throughout the season, such as the effect of temperature on crop development or the effect of radiation levels on grain filling rates, yet the resulting regression uses only a single estimate of VI for each field. Because the relationships between VI and final yields will depend on weather, a different regression is developed for each season. In theory, a simple extension of this approach would be to use multiple predictors in the regression, for instance if images are available on more than one date.

The accuracy of both crop model based approaches will undoubtedly depend on the crop model’s ability to simulate the main stresses experienced in a region. For example, few models accurately simulate biotic stresses, and regions where pests or weeds are common constraints would probably present difficulties. Similarly, many abiotic constraints such as salinity are not captured in most models. At the same time, if the main impact of stressors like pests or salinity is via a reduction in LAI, then this effect could be well represented by a crop model simulation with very low sowing density. In this situation, for example, the CM-CS approach would get the right answer for the wrong reason by selecting a simulation with lower sowing density than was actually the case but with a resulting yield estimate that was appropriately low.


Study Location and Field Measurement of Maize Yields

The study focused on maize fields located within the Lower Platte North Natural Resources District in Butler and Colfax counties, Nebraska (Fig. 1). Farmers in this area are requested to report their rainfed and irrigated maize yields to their local district office, together with a precise location of the fields in which the crops were grown. Average yields of maize reported by farmers were not statistically different from the average maize yields reported by the National Agricultural Statistics Service from the same counties and years (P > 0.5), with a high correlation between the two measures across all county–year–irrigation combinations (r = 0.92). The reported yield data provided by farmers also included supporting information such as grain elevator delivery records and/or yield maps of their fields derived from yield monitors mounted on their harvesters. Average (2007–2010) rainfed and irrigated maize yields were 8.2 and 12.3 Mg ha, respectively (associated coefficients of variation: 28.1 and 18.3%). A summary of the field sizes and reported yields in the data set are shown in Fig. 2.

Fig. 1.
Fig. 1.

The study location in eastern Nebraska. Boundaries for fields with actual yield data are shown in blue for irrigated and yellow for rainfed fields. The red arrow indicates the location of the weather station used in the study.

Fig. 2.
Fig. 2.

Histograms of field-based data for fields used to evaluate remote sensing products: (a) field size, (b) number of MODIS pixels in each field, (c) number of Landsat pixels in each field, and (d) reported yields. Values from all four study years are included, with irrigated and rainfed fields shown separately.


Beyond data availability, several features make this area attractive for testing remotely sensed approaches to maize yield estimation. First, there is a mixture of both rainfed and irrigated fields, which allows evaluation of the performance under both water regimes within the same region. Second, the fields are generally large enough to contain at least one 250- by 250-m MODIS pixel and a large number of 28.5-m Landsat pixels (Fig. 2), which helps to reduce challenges related to pixel contamination by bordering land covers. Third, the large range of yields in this area (?2– 18 Mg ha–1 across field-years) allowed us to test whether the methods are robust across a large range of productivity (Fig. 2).

A random sample of 35 irrigated fields from 2007 to 2010, along with all available rainfed fields, was selected from the data set of reported yields. Each site was then examined, and sites where paper records with field identification had an ambiguous match to paper maps were discarded. This resulted in a final data set of 134 irrigated and 94 rainfed field-year observations. Field boundaries from color-coded paper maps provided by Nebraska’s Natural Resource Division were used to delineate polygons in Google Earth that corresponded to the exact fields for which yields were reported. Importantly, this removed the chance of spatial mismatches between the remote sensing and field-based estimates, as often occurs when only a single reference point is given for each field.

Satellite Data

Two sources of satellite data were tested, namely MODIS and Landsat, with the products and dates summarized in Table 1. For MODIS, the 16-d VI maximum value composite (MVC) products at 250-m resolution from both the Aqua and Terra platform sensors were used. Rather than use the VIs reported in the product (Enhanced Vegetation Index or the Normalized Difference Vegetation Index), we instead use the red and near-infrared (NIR) spectral band reflectance values to compute an alternate VI, called the wide dynamic range vegetation index (WDRVI), which is more sensitive at high values common in maize canopies (Gitelson, 2004). The MVC acts as a coarse filter on contaminated observations because most sources of error in observations of vegetation greenness are negatively biased. In creating a time series of MODIS observations, we used the actual composite date rather than the center of the composite window to more accurately represent the time evolution of the VI (Guindin-Garcia et al., 2012). To further smooth out noise in the data and interpolate MODIS to the daily time scale, we applied a local polynomial (LOESS) fit to the yearly time series of each pixel (Cleveland and Devlin, 1988). For the LOESS fit, a span parameter of 0.3 was used, meaning that the fit at each point was made using a neighborhood spanning 30% of the year. Visual inspection of fitted curves verified that the LOESS did not introduce any biases, particularly for the early season with fast vegetative growth.

View Full Table | Close Full ViewTable 1.

Data sets used in this study.

Source Product name Spatial scale Temporal frequency Variables
MODIS sensors(Aqua and Terra) Collection 5, MOD13Q1, and MYD13Q1 250 m 16-d composite with 8-d offset between Aqua and Terra red band reflectance (620–670 nm), near-infrared (NIR) band reflectance (841–876 nm)
Landsat sensors (Thematic Mapper and Enhanced Thematic Mapper) Level 1T 30 m 10 June and 13 Aug. 2007 11 May and 8 July 2008 30 May and 2 Aug. 2009 2 June and 21 Aug. 2010 red band reflectance (630–690 nm), NIR band reflectance (760–900 nm)
Schuyler, NE, weather station NE7640 regional daily min. and max. temperature, precipitation
NASA POWER Agroclimatology daily averaged data 1/2° daily incident solar radiation, relative humidity, wind speed

To extract the appropriate MODIS pixels for our field locations, all of the geocoded field polygons were reprojected from WGS84 Geographic to the custom MODIS sinusoidal projection. To avoid pixels with large contributions from adjacent areas, any MODIS pixels that were at least 80% contained by a polygon were included in our analyses (varying this threshold did not appreciably affect results). Fifty-eight of our fields did not contain a pixel that met this criterion, so the single pixel with the largest percentage falling in the field was taken.

Landsat scenes that were cloud free or nearly cloud free and imaged during the summer growing season were identified and downloaded from the USGS EROS archive (Table 1). All images were converted to surface reflectance using NASA’s Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) preprocessing code (Masek et al., 2006), which performs up-to-date Landsat calibration, converts to top-of-atmosphere reflectance, and applies atmospheric correction following the radiative transfer methodology described by Vermote et al. (1997). Mean surface reflectance values for the red and NIR bands were extracted as zonal statistics from the digitized field boundaries and then converted to WRDVI for analysis. Two of the image dates were from the Landsat 7 Enhanced Thematic Mapper Plus sensor and thus exhibited missing data from the well-documented Scan Line Corrector failure. The zonal mean values for these years were calculated by ignoring the missing data in affected fields so that mean values represented only good pixel values.

Hybrid-Maize Simulations

For the methods of yield prediction that require simulated data from a crop model, we used the Hybrid-Maize model developed at the University of Nebraska, Lincoln (Grassini et al., 2009; Yang et al., 2004, 2006). This model uses ecophysiological principles to predict maize development and growth on a daily time step under a specified set of environmental and management conditions and has been extensively tested and validated in this region. Temperature and precipitation data used to drive the model were obtained from a weather station in nearby Schuyler, NE (available at http://mesonet.agron.iastate.edu/climodat/), and solar radiation, relative humidity, and wind speed were obtained from the NASA POWER agroclimatology database (Stackhouse, 2006) (Table 1).

To populate our database of simulation runs, we systematically varied the sowing date, seeding density, and maturity rating of the hybrids (measured as thermal accumulation needed for the crop to reach physiological maturity) to cover the full range of planting variability in this region, as described in Grassini et al. (2011). Lower values of seeding density were also added to mimic other potential yield-limiting factors not included in Hybrid-Maize, such as waterlogging, disease, or other factors that reduce plant populations. Five sowing dates evenly spaced between 14 April and 14 May, seven seeding density levels evenly spaced between 6 and 9 seeds m–2, and four relative maturity ratings evenly spaced between 106 and 118 d were used. These combinations were run for each of the 4 yr, resulting in a total of 560 rainfed and 560 irrigated simulations of maize growth and yield. As an indication of the appropriateness of the model and input parameters for this region, the overall distribution of simulated yields agreed relatively well with the observed distributions (Fig. 3), although the simulated rainfed yields had a slightly smaller range than observed (8.2 vs. 11.9 Mg ha–1) and simulated irrigated yields had a higher average than observed (14.1 vs. 12.3 Mg ha–1).

Fig. 3.
Fig. 3.

Comparison of histograms of reported (red) and Hybrid-Maize simulated (blue) (a) rainfed and (b) irrigated yields. Values from all site-years observations are included. Purple areas indicate overlap between the two histograms.


Yield Estimation

Three methods of yield estimation were tested in this study: (i) the APAR approach, (ii) CM-CS, and (iii) CM-Reg. For each approach, the red and NIR bands from either the MODIS 16-d composite products or from Landsat were used to calculate the WDRVI (Gitelson, 2004):where ρ refers to reflectance and α is a parameter fixed at 0.2 based on Gitelson (2004). As the name suggests, the advantage of this vegetation index is that it describes a larger range of vegetation cover before becoming saturated (Gitelson, 2004). Daily values of WDRVI were obtained for the MODIS data via the LOESS fitting method mentioned above.

The Absorbed Photosynthetically Active Radiation Approach

From the daily MODIS WDRVI a record of fPAR was computed for the APAR approach. This was done using the simple linear relationship indicated by Viña and Gitelson (2005):where a = 0.727 and b = 0.48. With this record of fPAR, APAR was then computed for each day by multiplying fPAR by the daily values of incident PAR, calculated by multiplying daily radiation from the NASA POWER data set by the fraction of incoming radiation as PAR (0.48) (Table 1). Daily APAR was then summed for a window spanning 1 May to 1 October, resulting in a single value of APAR per pixel. Many other windows besides 1 May to 1 October were also tested, including the entire calendar year, the entire growing season, the first or last half of the growing season, and the peak growing months (July and August). For fields with multiple MODIS pixels, the seasonal APAR values were then averaged with weights equal to the fraction of the pixel falling within the field.

Total APAR for the field was then converted to yield based on Eq. [1]. Values of RUE and HI were derived from our database of Hybrid-Maize simulations, with a separate set of constants calculated for rainfed and irrigated fields. The HI was calculated by taking the average ratio of simulated yield (at a standard 15.5% moisture) to aboveground biomass, which was 0.54 and 0.50 for irrigated and rainfed fields, respectively. To calculate RUE, the LAI from Hybrid-Maize was first converted to WDRVI (Viña et al., 2011):where a = 1.4392, b = 0.3418 and y0 = –0.6684. The WDRVI was then converted to APAR using the same method used for the satellite data (Eq. [3]). The average ratio between aboveground biomass and total APAR was then used as RUE in Eq. [1], equal to 3.24 g MJ–1 and 2.60 g MJ–1 for irrigated and rainfed fields, respectively. Allowing RUE and HI values to vary by year did not significantly improve the results.

Crop Model Curve Selection

The CM-CS approach used daily records of LAI from the Hybrid-Maize simulations. The daily LAI was converted to WDRVI via Eq. [4]. The satellite-based WDRVIs for each field were then compared with the database of simulated VIs that corresponded to the year and irrigation status of the field site. For MODIS, this was done on a pixel-by-pixel basis, and for Landsat, the field-average WDRVI was used.

For each Hybrid-Maize WDRVI curve, the RMSE between observed and simulated WDRVI was computed. For Landsat, this meant using the simulated WDRVI on the dates for which imagery was available (Table 1). For MODIS, daily values were drawn from a specified window of time. The same windows from the APAR approach were tested, and the presented results used the 1 May to 1 October window. In all cases, the curve with the lowest RMSE was selected and the corresponding yield was assigned to the field (Landsat) or pixel (MODIS). For MODIS, a weighted average was calculated for the field using the fraction of each pixel overlapping the field as weights.

Crop Model Based Regression

Like CM-CS, the crop model regression approach used LAI converted to WDRVI from the Hybrid-Maize simulations. Within the simulation database, a simple linear regression was done to relate the WDRVI in the simulations of a given year to the corresponding yields. A separate regression was run for each day, with the day’s simulated WDRVI values as predictors and the resulting yield as the response. The coefficients and R2 value of each regression were stored, and the day among the available images with the highest R2 was selected for use with the satellite VIs. In general, the highest R2 was >0.80 for both rainfed and irrigated simulations, with the timing of the peak value depending on the year (Fig. 4). These regressions were performed for rainfed and irrigated fields separately and on a yearly basis because each year’s weather conditions will dictate, to some degree, which date of VI is the most closely related to the final yield.

Fig. 4.
Fig. 4.

The coefficient of determination (R2) for a simple regression between simulated maize yields and simulated vegetation index (VI) on each day of the year for (a) rainfed and (b) irrigated fields. Values were computed using 140 different runs of Hybrid-Maize for each year, each with different combinations of sowing date, seeding density, and cultivar maturity rating. High values of R2 indicate that the VI on the given date is a good predictor of final yield. Hatched values on the x axis indicate the dates of available Landsat images each year that are nearest to the optimal date for yield prediction.


For CM-Reg with MODIS, we used the VI and regression coefficients for the day corresponding to the highest R2 in Fig. 4. For Landsat, the best date among the available images was used (the day for the selected Landsat regression is indicated in colored hatches in Fig. 4). As with CM-CS, a field-average WDRVI was used for Landsat, whereas for MODIS a weighted average of each pixel’s predicted yield was taken.


Prediction of the yield using the APAR method shows a significant correlation between observed and predicted yield when considering all fields together (R2 = 0.52, Fig. 5); however, regressions within irrigated and rainfed points show a weaker, albeit statistically significant (P < 0.01), relationship, with a particularly low R2 of 0.09 for rainfed fields. Alternate windows for summing APAR gave similar accuracies for yield estimation and are therefore not reported for brevity.

Fig. 5.
Fig. 5.

Comparison of reported yields with values predicted from the absorbed photosynthetically active radiation (APAR) method using MODIS data. Best-fit regression lines and associated R2 are indicated, along with the 1:1 line in gray.


Neither the curve selection (CM-CS) nor model-based regression (CM-Reg) showed significant improvement over APAR for irrigated and rainfed fields using MODIS (Fig. 6a and 6c). In fact, the results for rainfed fields were significantly worse, with MODIS CM-CS and CM-Reg exhibiting a nonsignificant (P > 0.50, R2 = 0.01) correlation with the reported yields; however, CM-CS applied using two to four dates of Landsat imagery worked significantly better for irrigated fields, with an R2 of 0.32 compared with 0.25 for MODIS APAR (Fig. 6b). The CM-Reg applied to Landsat was the most effective method within both irrigated and rainfed fields and overall, with R2 of 0.50, 0.20, and 0.63, respectively (Fig. 6d and 7). The skill for rainfed fields remained quite low but statistically significant, with RMSE of 2.1 Mg ha–1. For irrigated fields, roughly half of the yield variance was explained with a RMSE of 2.2 Mg ha–1 (or roughly 18% of mean yields).

Fig. 6.
Fig. 6.

Comparison of reported yields with values predicted from two different crop model based methods for two different sensors (MODIS and Landsat). Best-fit regression lines and associated R2 are indicated, along with 1:1 line in gray.

Fig. 7.
Fig. 7.

Summary of (a) R2 and (b) RMSE for yield prediction across the different methods (absorbed photosynthetically active radiation [APAR], crop model curve selection [CM-CS], and crop model based regression [CM-Reg]) and sensors (MODIS and Landsat) for rainfed and irrigated fields as well as the combined data set.


Overall, the predictive ability in rainfed fields was lower with all methods, and in particular with MODIS (Fig. 7). The relative difficulty in predicting rainfed yields can probably be attributed to three sources. First, the relationship between environmental drivers, maize growth, and yield is inherently more difficult to model in rainfed systems because many additional processes such as soil hydrology and plant water stress dynamics become important (Grassini et al., 2009). Thus, rainfed simulations will be less accurate than irrigated simulations in reproducing the true relationship between LAI or VI and final yield. Both Landsat- and MODIS-based approaches would be affected by this discrepancy. Second, the Hybrid-Maize model is not sensitive to the effect of early-season water deficits on leaf expansion rates so that under severe water limitation, yields tend to be overestimated. During the period of this study, however, there were no major droughts.

The third reason rainfed yields are more difficult to predict is related to field size, although this applies only to the MODIS-based estimates. Figure 2a shows that rainfed fields tend to be smaller than the irrigated fields in this region. The MODIS values for small fields are more likely to contain a large contribution from adjacent areas in other fields. In particular, the point-spread function of the MODIS sensor means that, even when measured at nadir (i.e., from directly above), only 75% of a measurement originates from the ground area in the pixel to which it is assigned (Tan et al., 2006). At higher view angles, the signal is derived from a larger ground area, and a correspondingly smaller fraction derives from the area within the pixel. On average, only 30% of a MODIS observation comes from the grid cell to which it is assigned (Tan et al., 2006). For rainfed fields, which are typically only one to two MODIS pixels large (Fig. 2), the remaining 70% of the signal probably originates from other fields. Even in irrigated fields, the adjacency effect may be important, although more of the signal is likely to come from ground area within the same field.

These adjacency issues are exacerbated by the maximum value composite scheme because the compositing will favor any observation in the composite window that contains green vegetation (Tan et al., 2006). The MODIS compositing algorithm accounts for this to a limited degree by taking the observation with the smaller view angle among the two highest VIs during the composite period (Huete et al., 2002). Early in the season when a field has little vegetation, however, it is especially likely that high VI values will come from observations with large view angles, which include signal from vegetation in other fields. This argument is supported by Fig. 8, which plots field-averaged MODIS WDRVI observations against the corresponding field-averaged Landsat observations. Values lower than zero in the Landsat data correspond to early-season acquisitions, where MODIS observations are heavily biased towards higher values.

Fig. 8.
Fig. 8.

A comparison of field-averaged wide dynamic range vegetation index (WDRVI) values measured from Landsat and MODIS for all fields on all available Landsat image dates. Deviations from the 1:1 line (in black) indicate probable errors in MODIS measurements arising from contamination from adjacent areas, which can be exacerbated by the compositing approach. Note that the RMSE between MODIS and Landsat is larger for smaller fields.


Also evident in Fig. 8 is that even during the peak season when high WDRVI values are observed, there is only a weak relationship between the MODIS and Landsat observations. Thus, even during the peak of the season, signal from adjacent fields can complicate interpretation of the MODIS signal. These discrepancies are generally larger as fields get smaller, with a RMSE between Landat and MODIS of 0.19 on the smallest half of fields and 0.15 on the larger half.

The large contribution of adjacent fields to the MODIS signal probably explains its poor performance relative to Landsat, despite the fact that MODIS provides much greater temporal frequency. Even when comparing the two methods applied to Landsat (CM-CS and CM-Reg), the method that used only a single date (CM-Reg) performed better than the method using two to four dates per season (CM-CS). This is similar to the results of Clevers (1997), who found that predictions based on a single date around the peak of the season outperformed a curve selection approach that used 10 observations throughout the season. Thus, while temporal frequency can be useful for many applications, such as for classifying crop types, its advantages for accurate yield estimation are not clearly evident.

One likely reason for the improved accuracy of CM-Reg relative to CM-CS is that the former can extrapolate to yields beyond those simulated by the crop model. That is, if a field exhibits a very low VI, the CM-Reg can predict a low yield, but CM-CS will be limited to the range of yields simulated by the crop model. The lowest yield simulated by Hybrid-Maize for irrigated fields in this study was 9.4 Mg ha–1, yet the lowest reported yield was only 4.9 Mg ha–1 (probably the result of hail damage during the summer) (Fig. 3). In fact, the CM-Reg approach underpredicted yield on the lowest field, with an estimate of just 1.5 Mg ha–1, but this was still significantly closer than the CM-CS estimate.

The CM-Reg approach has the added advantage of being more computationally efficient than the curve selection approach, which must be done for every pixel individually and is orders of magnitude slower. Once the regression between VI and yield has been calibrated from the crop model simulations, the regression equation can be applied to an entire image in a single operation. In practical terms, the regressions can be developed for all possible image dates (as in Fig. 5) to guide the selection of image dates. The CM-Reg could also be readily extended to incorporate results from multiple dates of imagery, using other VIs, or using forecast weather conditions to provide in-season yield forecasts, although testing these aspects was beyond the scope of this study. It is worth repeating the main distinction between CM-Reg and traditional empirical approaches: the calibration relies on simulations rather than field observations, and the crop model allows incorporation of information on weather conditions and how they affect crop yields.

While Landsat-based approaches demonstrated stronger predictive skill than MODIS in this study, efforts to improve MODIS-based approaches are still worthwhile in our view, especially given the difficulty of obtaining cloud-free fine-resolution data in many agricultural areas. One possible direction for future work would be to attempt to dissect the contribution of adjacent pixels to a single pixel, as done for example by Huang et al. (2002). A simpler approach would be to utilize only MODIS observations taken at close-to-nadir view angles. Although this may greatly reduce the temporal frequency of images, the results of this study suggest that high temporal frequency is not a requirement for skillful yield estimation, and indeed only a single date can work well within the CM-Reg approach. Also, as shown in Fig. 4, the selected Landsat dates (tick marks on the x axis) in this study were not always located near the peak of the R2 curves owing to the limited number of available images per year. The additional high-quality observations per year obtained from MODIS would increase the probability that the optimal date predicted from the crop model could be precisely matched.


This study leveraged an extensive data set of field-based yield estimates, combined with precise knowledge of field boundaries, to perform a rigorous testing of remote sensing estimates of maize yields in both irrigated and rainfed fields. For our study region of eastern Nebraska, estimates based on 250-m VI composites from MODIS were generally unable to capture yield variation very well. Landsat-based estimates performed significantly better, particularly within irrigated fields and when using the CM-Reg approach. Given that the Landsat-based CM-Reg estimates had the lowest overall RMSE, highest R2, and required the least amount of imagery or processing time, we conclude that this is the most promising of the tested approaches and should be further tested in other regions.

Some factors could degrade the performance of CM-Reg in other regions. For instance, the Hybrid-Maize model has been extensively tested in eastern Nebraska, and the region has relatively little variation in soil reflectance that could negatively affect WDRVI performance. At the same time, some factors could be less problematic in other regions, such as the relatively poor performance in rainfed fields. The study region is at the western edge of rainfed maize systems within the United States and, as such, experiences more water stress than is typical of rainfed fields throughout most of the Corn Belt. The challenges of modeling water stress in rainfed systems, associated with imperfect knowledge of the spatial heterogeneity in soil properties and rainfall as well as model errors in representing the yield impacts of water stress, would probably be less severe in other rainfed corn systems in the United States. Improving the sensitivity of models like Hybrid-Maize to water stress at key developmental stages would also probably improve the performance of the CM-Reg approach.

In summary, this study has demonstrated that the combination of a well-tested crop model, easily acquired weather data, and Landsat images is able to successfully capture much (54–63%) of the variation in irrigated maize yields in Nebraska. This result is significant given the tendency for many VIs to saturate well below the LAI values seen in these maize fields (Viña et al., 2011). Testing of these methods in other regions and for other crops is needed. For studies to be most useful to yield gap analysis, it is important that they evaluate methods not just on how they reproduce average regional yields but also on their ability to capture yield differences between fields.


This work was supported by NASA New Investigator Grant no. NNX08AV25G to D.B. Lobell. We thank three reviewers for helpful comments.




Be the first to comment.

Please log in to post a comment.
*Society members, certified professionals, and authors are permitted to comment.