About Us | Help Videos | Contact Us | Subscriptions

Agronomy Journal - Biometry, Modeling & Statistics

Geostatistical Models in Agricultural Field Experiments: Investigations Based on Uniformity Trials


This article in AJ

  1. Vol. 104 No. 1, p. 91-105
    unlockOPEN ACCESS
    Received: Mar 30, 2011

    * Corresponding author(s): ch.richter@agrar.hu-berlin.de
Request Permissions

  1. Christel Richter *a and
  2. Bärbel Kroschewskia
  1. a Dep. of Crop and Animal Sciences, Division Biometry and Experimental Design, Humboldt-Universität Berlin, 10115 Berlin, Germany


The probability of detecting treatment differences can often be increased by using geostatistical instead of classical statistical models. Geostatistical approaches require the selection of the best fitted model from a set of alternative models. This additional analysis effort could be reduced if the same model shows consistently the best fit for a given field or crop. To prove whether this reduction can be expected for designed on-station trials, we analyzed five uniformity trials conducted on the same field. We studied whether different layouts of randomized complete block designs, the positions on the field, and the randomized plans influenced the model decision and analyzed the precision achieved. For this, the designs were shifted across the field, and 1000 randomized plans were projected onto each position. The model fit was evaluated using the corrected Akaike information criterion (AICC). The ranked AICC values were used for assessing model preference. In the means of all crops, designs, and models, the variation of the ranks depended on an individual decision for the combination of position and randomized plan by 62.6%. Therefore, the best fitted model was not predictable for a single experiment. As in the classical approach, the proper layout of a trial determines precision and unbiasedness of treatment differences. Randomization and blocking still should be the basic principles of experiment planning; however, their roles have partially changed. The detected bias of the Type I errors, both of the t-test and F test, needs further investigation. Basic findings are also valid for on-farm trials.


    A, anisotropic power model; AICC, corrected Akaike information criterion; B, basic model; G and g, Gaussian model; P and p, power model; r, random-walk model; S and s, spherical model; uppercase letters symbolize a correlation over the whole design and small letters a correlation only within blocks; SED, standard error of differences

During the last few years, a considerable amount of research on agricultural field experiments has focused on the possibility of taking spatial correlations into account. It is generally accepted that more or less strong correlations exist between the yield data of neighboring plots. Wiebe (1935) already reported on the existence of covariances that decrease with increasing distance between plots. The consideration of these covariances raised the possibility of estimating treatment differences more precisely than with classical statistical approaches. This, in turn, may increase the probability of detecting differences. This fact has motivated scientific and practical interest in geostatistical models that use the information included in correlations. This study investigated some issues in the implementation of geostatistical methods in experimental practice, concentrating on designed on-station trials where several treatments are compared. Despite some specifics, basic questions likewise arise in on-farm trials (Piepho et al., 2011).

Since Fisher (1935), randomization in conjunction with replication of treatments and blocking of experimental units have been the basic principles of planning and analyzing field experiments. In the case of randomized complete block designs, the classical analysis model assumes fixed treatment and block effects and error terms that are normally distributed with an expectation value equal to zero and a constant variance. The covariances of the error terms are assumed to be equal to zero both within blocks and between blocks. We call this model the basic model (B). The ANOVA with the classical F test and subsequent multiple treatment comparisons are based on this model. Due to the existing correlations, analyses using the basic model and all consequential statistical inferences are justified solely in the mean of all randomizations (Brownie et al., 1993). Randomization serves as the classical approach to overcome (Yates, 1938) or neutralize (Zimmerman and Harville, 1991) but not to eliminate (Grondona and Cressie, 1991) spatial correlations.

Starting with Papadakis (1937) and Bartlett (1938), nearest neighbor methods were established to take spatial correlation between adjacent plots into account. Although revived and modified in the 1960s and after, these methods have only partially been accepted for a variety of reasons. Zimmerman and Harville (1991) and Grondona and Cressie (1991) have investigated the idea of integrating geostatistical methods into the analysis of field experiments. Whereas nearest neighbor methods indirectly take spatial dependencies between neighboring plots into account, geostatistical methods model the covariance function directly (Stroup, 2002; Richter and Kroschewski, 2006a). Piepho et al. (2008) demonstrated the existence of specific relations between these two approaches.

Compared with classical analysis, spatial analysis changes from a model-driven to a data-driven approach related to the covariance structure. This means that the analysis model is no longer known a priori. It has to be selected from a set of covariance models by using a criterion that measures the goodness-of-fit of different models to a given data set (Burnham and Anderson, 1998; Spilke and Richter, 2007). Due to this selection procedure, the analysis effort is much greater for the geostatistical than for the classical approach. The fit of geostatistical models requires an iterative procedure. This iteration is often connected with numerical convergence problems, which further complicates the analysis.

To gain general acceptance of this change, extensive research on the advantages, disadvantages, and issues related to the implementation of geostatistical approaches has been done, and further research is necessary for the theoretical foundations as well as questions of the practical relevance and consequences.

The practical benefit of using geostatistical models is often studied for trials that have already been conducted with several treatments and a given layout or by means of uniformity trials (Zimmerman and Harville, 1991; Grondona and Cressie, 1991; Brownie et al., 1993; Wu et al., 1998; Wu and Dutilleul, 1999; Stroup, 2002; Richter and Kroschewski, 2006b; Pilarczyk, 2007; Richter et al., 2007). While trials that have already been conducted provide only the opportunity to analyze the data using different models, the consequences of several designs and their layouts on model preference can also be studied by using uniformity trials. The investigations of Zimmerman and Harville (1991), Wu et al. (1998), and Wu and Dutilleul (1999) were based on uniformity trials. They projected one or several designs with repeatedly randomized dummy treatments onto the whole uniformity trial and used different spatial analysis models to analyze the data. The consideration of different randomization plans enabled them to compare the observed and estimated variances of the treatment differences. The resulting bias was used to evaluate the analysis models.

Many of the studies mentioned above have shown that consideration of spatial dependency leads to a better model fit and to a greater precision of treatment comparisons than the basic model; however, the model best suited to describe plot dependency often varies from study to study. Of course, this is not surprising because the experiments have been conducted under different environmental conditions (soil and climate) and with different crops. Furthermore, the researchers have partially selected the best-fit model from different candidate models.

This leads to a question of practical interest: whether, for different trials conducted on a given field, the same spatial model consistently shows the best fit. If this proves to be true, the increased analysis effort connected with geostatistical approaches could be reduced in such cases where stationary experimental fields exist. To investigate whether this can be expected, we analyzed five uniformity trials conducted on the same field with four different crops. We investigated the influence of different possible determinants on the model decision: the crop, different layouts of randomized complete block designs, the position on the field, and the chosen randomized plan.

Zimmerman and Harville (1991), Wu et al. (1998), and Wu and Dutilleul (1999) studied neither the role of position nor the role of randomization because the whole uniformity trial was regarded as one trial and the randomization was only used to calculate the bias for the model evaluation. To answer the questions raised above, we modified their procedure. The modification is explained below.

The additional analysis effort has to be considered in relation to the benefit. Therefore, we analyzed the observed and estimated standard errors of treatment differences for selected designs and all analysis models. The increased statistical power of the t-test and F test could not be demonstrated because we did not simulate treatment effects in this study. The coincidence of their empirical Type I errors and given nominal values is a necessary precondition, however, to obtain reliable statistical conclusions and valid estimations of the power. We checked the coincidence for the selected designs. The chosen strategy of our investigations enabled us to discuss the role of the trials layout as well as the roles of blocking and randomization in cases where geostatistical models are used.


The uniformity trials under investigation here were laid out on the same field at Blumberg, a former experimental station of the Humboldt-Universität Berlin, between 1954 and 1958 (Bätz, 1968). Bätz (1968) characterized the soil as a sandy loam with 12 to 17% clay and about 25% silt, with a mean fertility score. The crops were hemp (Cannabis sativa L. ssp. sativa) in 1954, winter rye (Secale cereale L.) in 1955, beet (Beta vulgaris L. ssp. vulgaris var. crassa Alef.) in 1956, oat (Avena sativa L.) in 1957, and hemp again in 1958. For all years, the plot size was 2 by 2 m and the coordinates of the plots remained the same. Due to this fact, we have a situation where the influence of different crops (combined with different weather conditions) on the results of the analysis can be assessed. The plots were harvested by hand. The grain yield of the cereals, the beet root mass, and the whole plant mass of the hemp were recorded.

For our study, we used a 40- by 48-m subarea of the original 40- by 50-m trial. To control the influence of the design parameters on the analysis results, a total of 13 different designs were constructed in the following way:

  1. Plot size and plot shape: We used the original 2- by 2-m plots and created 2- by 8- and 8- by 2-m plots by combining neighboring original plots to more closely resemble commonly used plot sizes.

  2. Block size and block shape: Four or 10 neighboring plots were investigated as complete blocks and each design consisted of four blocks. The plots were combined in different ways so that different block shapes existed for one plot size. The same was done to the different block shapes so that different design shapes resulted. All investigated designs were characterized by blocks where the plots were side by side in one row or in one column within the block (Table 1).

To evaluate whether and to what extent the position in the field and the randomized plan influenced the results, the following procedure was chosen:

  1. Each design was shifted across the field so that different positions resulted. Each shift involved one plot. Using this procedure, the data for each position differed by at least one plot. To get different positions, the design dimensions could not be too large.

  2. Four or 10 dummy treatments were assigned to the plots corresponding to the randomization strategy of a randomized complete block design. For each number of treatments, 1000 randomized plans were generated by PROC PLAN of SAS version 9.2 (SAS Institute, Cary, NC). The 1000 plans were projected onto each position. For Design 1, for example, 357 positions existed and on each position, 1000 plans were projected, resulting in 357,000 combinations of position and plan.

View Full Table | Close Full ViewTable 1.

Parameters of the investigated designs.

Dimensions Positions in the field Analyzed models
Design Plot Block Design
m by m no.
Four dummy treatments
1 R† 2 by 2 2 by 8 8 by 8 357 9
2 2 by 2 8 by 2 32 by 2 120 8
3 R 2 by 2 2 by 8 2 by 32 180 8
4 2 by 8 8 by 8 32 by 8 30 8
5 2 by 8 8 by 8 8 by 32 51 9
6 2 by 8 2 by 32 8 by 32 51 9
7 8 by 2 32 by 2 32 by 8 42 9
8 R 8 by 2 8 by 8 32 by 8 42 9
9 R 8 by 2 8 by 8 8 by 32 45 8
Ten dummy treatments
10 2 by 2 20 by 2 40 by 4 23 9
11 2 by 8 20 by 8 40 by 16 5 9
12 2 by 8 20 by 8 20 by 32 33 9
13 R 8 by 2 8 by 20 32 by 20 30 9
R = recommended.

Table 1 shows all investigated designs and their characteristics.

Statistical Analysis

To get a first impression of the covariance structure, the covariograms of the data for the 2- by 2-, 2- by 8-, and 8- by 2-m plots were calculated. Because evaluating the covariograms for each position and design would have been too extensive, we calculated them for the whole field. Of course, the empirical variances and covariances varied by position.

For each crop, each design, and each combination of position and plan, the data were analyzed by eight or nine models. In general, treatment and block effects were assumed to be fixed effects. The eight or nine models reflect a specific spatial dependency between plots and are summarized in Table 2.

View Full Table | Close Full ViewTable 2.

Analyzed covariance models.

Model Covariance function† Variance–covariance parameters Analyzed correlation options and corresponding model abbreviation
Basic σ2f(d) = σ2 if d= 0= 0 if d ≠ 0 1 no correlation B
For the whole trial Only within blocks
Spherical σ2f(d) = σ2[1 – 3/2(d/θ) + 1/2(d/θ)3] if d ≤ θ= 0 if d > θ 2 S s
Power‡ σ2f(d) = σ2ρd 2 P p
Gaussian σ2f(d) = σ2{exp[–(d/θ)2]} 2 G g
First-order random walk Cov(i,j) = σd2min(i, j) 1 r
Anisotropic power σ2f(dx,dy) = σ2ρxdx ρydy 3 A
d, distance between the midpoints of two plots; σ2, variance for d = 0; θ, range for the spherical model, the range parameter for the Gaussian model; ρ, correlation between plots if d = 1; i and j, plot numbers within a block, σd2, variance of the difference of neighboring plots; dx and dy, distances of two plots in the x and y directions, respectively; ρx and ρy, corresponding correlations if dx = 1 and dy = 1.
In the notation of SAS PROC MIXED.

The spherical, power, and Gaussian isotropic models assume that the covariance between two plots can be described by Cov(d) = σ2f(d), where f(d) is a decreasing nonlinear function of the Euclidian distance d between the midpoints of the plots and σ2 is the variance for d = 0. Each of these three models was investigated with regard to two options. The first option considered the correlation of all plots in the design (spherical [S], power [P], and Gaussian [G]), and the second considered only the correlation within blocks (spherical [s], power [p], and Gaussian [g]). The first-order random-walk model (r) uses the fact that the plots lie side by side in one row or in one column within a block. It assumes that all plots have the same dimension and that the plots within blocks are indexed in field order. This model has been investigated only in combination with the second option. It yields the same results as a linear variance model with fixed block effects or the model of first differences (Piepho et al., 2008). We used the random-walk approach due to its good convergence behavior. In addition, for cases in which not all plots of the design lay side by side, the anisotropic power model (A) was tested in combination with a covariance over the whole experiment.

Attention should be paid to the different definitions of the power model in the literature and likewise in the procedures of SAS. The above description corresponds to that of Littell et al. (2006) and the procedure PROC MIXED of SAS version 9.2, which was used for the data analysis. In this form, P and p are equivalent to the well-known exponential model; they are only differently parameterized (Littell et al., 2006). Furthermore, p corresponds to the autoregressive model AR(1) assuming an autoregressive structure only within blocks. For Designs 2, 3, 4, and 9, P is equivalent to AR(1) assuming the autoregressive structure for the whole trial. These equivalences arise due to the chosen layout of plots within blocks and of the blocks within the designs, respectively. Model A corresponds to AR(1) × AR(1) due to the lattice structure of the investigated designs. Table 1 includes the number of analysis models for each design. Richter and Kroschewski (2006b) analyzed the spatial structure of these trials. They used all plots of the whole field without projecting any design structure on the field, however, and fitted models that had up to six parameters. For these more complex models, the convergence has often been achieved only by using well-chosen initial parameter values. For this study, this procedure proved too cumbersome because of the very large number of runs, and the default of position-independent parameters did not lead to satisfying convergence behavior.

The data analysis was performed on the basis of the linear mixed model as implemented in the procedure PROC MIXED of SAS 9.2 using the restricted maximum-likelihood algorithm (Schabenberger and Pierce, 2002). In this procedure, we specified the model by fixed treatment and fixed block effects as class variables. The variance–covariance matrix of the error effects was chosen corresponding to the basic model or the spatial models.

Initially, we also analyzed all these models in combination with a nugget variance. The runs converged very rarely, however, so in the end we decided to exclude the nugget.

For each crop, design, and combination of position and plan, we evaluated the model fit using the small-sample corrected version of Akaike's information criterion (AICC; Hurvich and Tsai, 1989). The value of this criterion is the sum of –2 times the maximized restricted likelihood function (2lR) plus a penalty term. This term depends on the number of variance–covariance parameters, on the number of treatments, and on the number of replications. It ensures that models with a simpler variance–covariance structure are preferred. Burnham and Anderson (1998) suggested the use of AICC if the ratio of the total number of plots and the number of parameters is <40. In our study it was ≤40, so we followed their recommendation. To evaluate which sources caused variations in the AICC values among the combinations of plan and position, we split the total variance of the AICC values into the variance components for the positions, the plans, and the residuals. The latter corresponds to the variance component of the interaction between position and plan. The variance components were calculated per crop, design, and model, so that they are equivalent to the variance components calculated with the –2lR values. Setting the total variance equal to 100%, the proportions of the variance components were determined. On this basis, an overall assessment of crops, designs, and models with respect to the different sources of variation is possible.

We ranked the models corresponding to the AICC for each combination of position and plan to assess the model preference by crop and design. The best model received the rank of 1 and the model with the worst fit received a rank of 8 or 9 (depending on the number of fitted models). In case of ties, the mid-rank method was used. If a model did not converge for one combination, the highest rank was given. Thus, the highest rank means either the worst fit or no fit. No convergence appeared frequently for P, p, and A. To evaluate the model preference for the positions in the mean of all plans, the mean rank of each model for each position was calculated and the same was done for each plan in the mean for all positions. We counted how often each model had the smallest rank for all combinations of position and plan and the smallest mean rank for the positions as well as for the plans. Especially for the designs with only four treatments, a spatial model sometimes reduced to the basic model. In such a situation, it was counted as the basic model. To evaluate the variability of the model preference and its variation sources, the total variance of the ranks was split into its components as was done for the AICC values above.

Designs 4, 9, 12, and 13 were analyzed in more detail to investigate the influence of the layout on the desired positive effect expected from using geostatistical approaches. For these designs, each crop, and each model, the mean observed and estimated standard errors of differences SED ( and , respectively) were determined. The was calculated as the square root of the mean variance of all estimated treatment differences and as the square root of the mean estimated variances of the treatment differences, in both cases averaged across all combinations of position and plan and all treatment comparisons. The bias of the as well as the empirical Type I error for the t-test and the F test, using a nominal Type I error of 5%, were determined. To investigate the consequences of the model selection, we also calculated these quantities for each design and crop in the mean for the results of the best-fitting models per combination. Generally, in the framework of linear mixed models with a variance–covariance matrix that has to be estimated, the bias of the SED follows from the use of estimated variance–covariances as if they would be known parameters; the variance of the estimators would not be taken into account. Therefore, the SED of S, P, G, A, s, p and g is typically underestimated by and the test statistics of the t- and F tests do not exactly follow the t- and F distributions, respectively. To overcome this problem, the following methods are useful:

  1. The Satterthwaite method (Satterthwaite, 1941; Giesbrecht and Burns, 1985; Fai and Cornelius, 1996) corrects (downward) the degrees of freedom of the t-test and the denominator degrees of freedom of the F test.

  2. The Kenward–Roger method (Kenward and Roger, 1997, 2009) corrects (downward) the degrees of freedom and also works with an asymptotic correction of the variance–covariances (Kackar and Harville, 1984). In the following, we refer to the improved Kenward–Roger method for nonlinear covariance structures (Kenward and Roger, 2009) implemented in SAS version 9.2 PROC MIXED by the option ddfm = KR(firstorder).

Through simulations, we found that for the P model, the Kenward–Roger method extremely overestimates the SED, whereas the Satterthwaite method worked rather well. For the G model, the simulations continuously showed a smaller bias with the Kenward–Roger method; for the S model, the situation was not unique. Based on these results, the and consequential quantities were calculated using the Satterthwaite method for P and p, and the Kenward–Roger method for G and g. For the models S and s, we found that the Satterthwaite method worked better for cereals in this study; for the non-cereals, we decided to use results based on the Kenward–Roger method.


Figure 1 shows the yield maps and Fig. 2 the covariograms of the 2- by 2-m plots across the whole field. Both figures demonstrate that covariances existed for all crops and that they were larger within columns than within rows, except for rye in the first 6 m and for hemp in 1958 in the first 8 m. The covariograms of the 2- by 8- and 8- by 2-m plots confirm this impression (not presented). Based on these figures, normally the blocks would be laid out in the direction of columns and the plots at a right angle to this direction. In this case, the conditions within the blocks would be as homogenous as possible and most existing heterogeneity within the blocks would act in all plots in nearly the same manner. All designs that followed this recommendation are marked in Table 1 with R (recommended). Designs 6 and 7 are extremely inappropriate situations because the directions of the blocks and plots are not perpendicular to one another. The ranges of the omnidirectional covariograms lay approximately between 12 and 20 m. Within rows, these ranges are nearly the same for hemp in 1954, beet, and hemp in 1958; for rye and oat, they are a little larger. Except for hemp in 1958, the covariances within columns do not reach a value of zero up to 40 m. These range parameters and the given recommendation, however, are the consequences of an average statement across the whole field. For individual positions and when potentially existing block effects are taken into account, modifications do exist.

Fig. 1.
Fig. 1.

Yield maps of the five crops (from Richter and Kroschewski, 2009).

Fig. 2.
Fig. 2.

Covariograms of the 2- by 2-m plots for the five crops.


In Table 3, the descriptive measures show that variances between the plots decrease as plot sizes increase. Due to existing covariances, the variances of the 2- by 8- and 8- by 2-m plots are larger than a quarter of the variance of the 2- by 2-m plots. Except for rye, the variation of the 8- by 2-m plots is smaller than that of the 2- by 8-m plots because the 8- by 2-m plots soak up more variation than the plots in the other direction. This exception for rye could mean that the directions of blocks and plots do not play such an important role as for the other crops. It is conspicuous that the relative reduction in variation for rye and oat is much smaller than that for the other crops; this was caused by their higher correlation between the 2- by 2-m plots. The coefficient of variation is the smallest for rye at 11.97% and the highest for oat at 41.83%. This relation between the coefficients conforms to the known fact that summer cereals react more sensitively than winter cereals in central Europe (Chmielewski and Köhn, 1999). The higher yield variability of oat results from the often observed limited water supply during their short growing period. This implies that oat serves as a good soil marker and constitutes a favorable crop in uniformity trials.

View Full Table | Close Full ViewTable 3.

Basic characteristics of the crop yields for the investigated plot sizes.

Parameter Plots Hemp 1954 Rye Beet Oat Hemp 1958
Mean, Mg ha−1 480 17.70 2.85 55.44 1.49 23.89
Variance of plots
 2 by 2 m 480 22.00 0.12 121.08 0.39 13.70
 2 by 8 m 120 16.84 0.11 75.62 0.33 9.08
 8 by 2 m 120 15.97 0.11 68.31 0.31 8.27
Coefficients of variation, %
 2 by 2 m 26.49 11.97 19.85 41.83 15.49
One-lag omnidirectional correlation
 2 by 2 m 0.63 0.90 0.42 0.80 0.56

Figure 2 gives the impression that covariogram models with nugget variance would be appropriate for hemp in 1954 and beet—at least for designs with 2- by 2-m plots (Designs 1–3 and 10). This could not be confirmed by the position-specific data analysis, however, and for larger plots, this impression vanished.

The mean AICC values of the different crops could not be compared because they were influenced to a great extent by the different variations in the yield data. The variances of the AICC values show a rather heterogeneous behavior, which is mainly characterized by different crop reactions to the designs. The only consistent result is the more sensitive reaction of oat. In contrast, the relative proportions of the variance components of the AICC values (or of the –2lR values) are more or less homogeneous. In the mean of all crops, designs, and models, the relative proportion for the position is 78.1%, for the plans 1.3%, and for the interaction 20.6%. The position-dependent proportion shows certain differences among crops, designs, and models. The values for the crops (averaged across designs and models) are between 86.1% for oat and 72.7% for beet. For the designs (averaged across crops and models), the largest value is 83.8% (Design 1) and the smallest 68.3% (Design 13). Finally, the models (averaged across designs and crops) yield values between 82.5% for B and 71.5% for G. In particular, position determines the AICC variation for oat analyzed by B to be 90%. A small interaction between crops and designs exists; however, it does not substantially influence the interpretation.

To determine which model is the best one for any given crop and design, we ranked the models according to the AICC on each combination of position and plan, as mentioned above. If one model was the best one (or the worst one) for all combinations, then the mean rank would be equal to 1 (or 8 or 9), with a total variance equal to 0 in both cases. The maximum total variance of 12.25 (or 16) would arise if a model was the best for half the combinations and the worst for the other half, with a mean rank of 4.5 (or 5). None of the analyzed situations yielded such extreme results. Table 4 provides the mean ranks and the total variances for Design 12 and two crops as an example. Due to these variances, the mean ranks are not good indicators to predict which model fits best in which situation. For both crops, it can be noticed that B is rather bad. For rye, S is often among the best models because of a small mean (1.74) and a small variance (0.91). The mean ranks and the variances for beet do not differ among the spatial models in such a pronounced way as they do for rye. Here, r has the smallest mean rank (3.10); however, it does not have the smallest variance. Therefore, it cannot be expected that this model dominates all situations as S does for rye. Furthermore, this table shows the variance components of the ranks for the positions (in the mean of plans) and for the plans (in the mean of positions). Here already it can be observed that the variance proportion caused by the plans is extremely small.

View Full Table | Close Full ViewTable 4.

Mean, total variance, and variance components of the ranked corrected Akaike Information Criterion values as a demonstration of the limited information value of the mean and of the consistent results for the variance components of plans using Design 12, rye and beet, as an example. The minimum value for each crop is in bold type.

Crop Model† Mean Total variance Variance components
Position Plan
Rye B 7.02 1.48 79.81 0.00
r 2.45 1.92 64.51 0.45
P 6.76 6.00 41.78 0.00
S 1.74 0.91 27.10 0.34
G 4.24 2.65 26.47 1.79
A 6.44 6.17 51.75 0.30
p 7.30 2.39 29.65 0.49
s 3.73 1.21 31.55 1.02
g 5.29 2.55 26.91 1.66
Beet B 7.89 5.18 58.38 0.50
r 3.10 5.74 61.49 0.04
P 3.18 3.41 45.92 0.25
S 4.70 5.88 54.28 0.00
G 5.98 3.21 30.24 0.81
A 5.70 5.24 44.23 0.46
p 3.86 2.95 36.88 0.31
s 4.12 4.06 40.72 0.00
g 6.47 3.74 38.05 0.87
B, basic; r, random walk; P and p, power; S and s, spherical; G and g; Gaussian; A, anisotropic power model.

Because the information from the mean ranks and the variances was not appropriate to obtain a differentiated insight, we counted which model for which design and crop occurred most frequently as the best-fitting one. Table 5 features only models with a relative frequency >10% and distinguishes among the following three cases: the best models for all combinations of position and plan, for the positions in the mean of plans, and for the plans in the mean across all positions. The bold type and underlines provide additional information on the amount of the frequency. The results for Design 12 and beet are explained as an example: For all 33,000 combinations of the 33 positions and 1000 plans, r had the best fit most often (the smallest AICC and therefore the smallest rank). Models P and A follow with decreasing frequency (exactly 38.3, 18.8, and 11.7%). The remaining 31.2% results from several other models with frequencies <10%. For the 33 positions, P has the smallest mean rank (averaged across the plans) 12 times, the mean rank of r is the smallest 11 times, and s has the smallest mean rank for five positions (exactly 36.4, 33.3, 15.2, and several other models combined at 15.1%). For the mean of the positions, r has the smallest rank for 577 plans (57.7%), P for 405 plans (40.5%), and other models for the remaining 18 plans (1.8%).

View Full Table | Close Full ViewTable 5.

Best-fit models† with a frequency >10% sorted by descending frequency.

Design Case Hemp 1954 Rye Beet Oat Hemp 1958
1 (A)‡ combinations B/r§ G/r/S = B B/r B = r/G/S B/r/G
position B/r G/r/S B/r B/r/S = G B/r
plan B G B r/B r/B
2 combinations B/r r/B B/r r/B B/r
position B/r r/B B/r r/B B/r
plan B/r r B r r/B
3 combinations B/r r/G/B B/r r/B B = r
position B/r r/G/B B/r r/B r/B
plan B r B r r/B
4 combinations r/B r/g r = B r/g r/B
position r/B r B/r r/B r/B
plan r r B = r r r
5 (A) combinations r/B r/G r = B r/B B/r
position r/B r B/r r/B B/r
plan r r B/r r r = B
6 (A) combinations S = B = r = G G/S/B = r B/S/G = r S/G B/G/r = S
position S/B/r/G S/G/B B/S S/G B/S/G/r
plan S S/G S/B S S
7 (A) combinations S/G S/G r/B/S/G S = G S/B/G/r
position S/G S/G r/S/B/G S/G S/B/r = G
plan S S S/B S/G S
8 (A) combinations r/B r B/r r/B/G r/B
position r/B r B/r r/B/G r/B
plan r r B r r
9 combinations r/B r/G B/r r/B/g r/B
position r/B r/G B/r r/B r/B
plan r r B/r r r
10 (A) combinations G/P/S S/A/G P/B/r S/P/G P/S/G/A
position G/P/S S/G = A P/B = r S/P S/P/G
plan P/G S P S P/S
11 (A) combinations A/G/r S/g/A r/P/S S/g S/A/r
position A = r/G S/r P/r S/g S/r = p = g
plan A/S S r/S/P S P/S = A
12 (A) combinations A/r S/r/G r/P/A S/r r/G = A/P/s
position A/r/s S/r P/r/s r/S r/P/G = A = s
plan A S r/P r P
13 (A) combinations r/A = s/S S/G/s = r/g r/P r/S/G S/r
position r/s/S/A S/s/G = r/g r/P/p S/r/G/s S/r
plan r r/S s/p S/r S
B, basic; r, random walk; P and p, power; S and s, spherical; G and g, Gaussian; A, anisotropic power model.
(A) indicates that the anisotropic model was analyzed.
§Bold type indicates 50% ≤ frequency ≤ 90%; bold type and underline indicates 90% < frequency; nearly the same frequency is indicated by =.

It is obvious that in many cases B and r are preferred for Designs 1 to 5, 8, and 9. These designs with only four treatments are relatively small. They belong to the recommended designs or to the designs where plots and blocks lay at right angles to each other. For the majority of these cases, it is not necessary to take any correlation into account or a linear correlation function is appropriate for the short distances within a block (Design 1 constitutes an exception for rye). Designs 6 and 7 have four treatments as well; however, plots and blocks are not at a right angle to each other. For these cases, the spatial models dominate. For the larger Designs 10 to 13, the correlations exist across larger distances and the majority of the best models have a nonlinear or even a nonlinear anisotropic structure. In many cases, the number of best models in the mean across the positions reduces to only one or two models. In most cases, these models are exactly those ones that have the smallest mean rank across all combinations. This follows from the very small variation among the ranks caused by the plans. For example, in Design 12, r and P are the two models with the smallest mean rank for beet, and S prevails for rye (compare Tables 4 and 5).

Table 4 provides the variances of the ranked AICC values and the relative proportions of their components only for Design 12 and two crops. These values were calculated for all designs and crops to get an overall assessment. Considering the total variances and the absolute values of their components, we found no results that could be generalized. The relative proportions do show more consistent properties, however, as we also have found for the variance components of the AICC values. On average for all crops, designs, and models, the variation of the ranks depends 36.5% on position, 0.9% on the plan, and 62.6% on individual reactions to a combination of position and plan. The models cause the largest differences in the reaction on position (averaged across designs and crops). The most position-dependent models were r (49.7%) and B (47.6%). The values of the other models were consistently smaller. The differences among the designs and crops were not as pronounced.

Selected analysis results for Designs 4, 9, 12, and 13 were compared. Designs 4 and 9 have the same number of treatments, as do Designs 12 and 13, but they differ from each other in layout: Designs 9 and 13 correspond to our recommendation, whereas the plots and blocks of Designs 4 and 12 have the opposite direction. Table 6 shows the effect of the different layouts and the different models on the estimated and observed precision of the treatment differences. In this table and in Fig. 3 and 4, the “best” model (indicated by b*) summarizes the results of the best models per combination (position × plan). That the results of the best model do not correspond to the smallest value in each row follows from the variances discussed above.

View Full Table | Close Full ViewTable 6.

Mean standard errors of differences (SED) (est = estimated SED and obs = observed SED) for Designs 4, 9, 12, and 13 and all analysis models, as well as for the best model for each combination of position and plan. The smallest SED in each row is indicated in bold type; the largest SED in each row is indicated by bold italic type.

Mean SED
Crop Design SED type Best Basic r P‡ p s§ g A#
Hemp 1954 4 est 1.467 1.769 1.497 1.531 1.504 1.626 1.527 1.591 1.624
obs 1.642 1.779 1.538 1.599 1.593 1.627 1.567 1.578 1.631
9 est 1.068 1.330 1.106 1.122 1.096 1.208 1.138 1.162 1.211
obs 1.178 1.330 1.121 1.146 1.136 1.250 1.210 1.145 1.244
12 est 1.236 2.297 1.521 1.561 1.490 1.552 1.604 1.542 1.555 1.227
obs 1.398 2.290 1.562 1.550 1.531 1.646 1.583 1.579 1.645 1.285
13 est 1.109 1.860 1.162 1.205 1.173 1.275 1.204 1.183 1.275 1.123
obs 1.226 1.859 1.188 1.185 1.188 1.402 1.162 1.192 1.197 1.401
Rye 4 est 0.056 0.082 0.061 0.059 0.060 0.070 0.062 0.061 0.060
obs 0.059 0.081 0.055 0.058 0.056 0.070 0.063 0.057 0.062
9 est 0.051 0.082 0.062 0.063 0.060 0.056 0.061 0.062 0.058
obs 0.056 0.082 0.056 0.058 0.057 0.059 0.065 0.058 0.060
12 est 0.060 0.146 0.067 0.066 0.064 0.052 0.067 0.067 0.052 0.060
obs 0.058 0.145 0.057 0.055 0.055 0.068 0.056 0.056 0.068 0.054
13 est. 0.056 0.109 0.065 0.068 0.065 0.055 0.067 0.064 0.055 0.065
obs 0.062 0.109 0.057 0.058 0.056 0.068 0.059 0.058 0.056 0.068
Beet 4 est. 3.549 4.035 3.609 3.624 3.592 4.015 3.727 3.798 4.015
obs 4.025 4.017 3.891 3.890 3.938 4.172 3.917 3.939 4.154
9 est 3.060 3.422 3.123 3.110 3.089 3.420 3.154 3.263 3.479
obs 3.428 3.422 3.392 3.350 3.416 3.548 3.343 3.390 3.583
12 est 3.772 5.482 3.768 3.964 3.808 4.433 3.975 3.894 4.434 3.806
obs 4.298 5.481 4.142 4.139 4.140 4.693 4.152 4.239 4.693 4.126
13 est 3.277 4.464 3.202 3.461 3.305 3.775 3.395 3.347 3.775 3.321
obs 3.714 4.461 3.553 3.563 3.588 3.901 3.613 3.563 3.687 3.901
Oat 4 est 0.139 0.195 0.149 0.152 0.145 0.159 0.166 0.149 0.147
obs 0.149 0.197 0.140 0.144 0.144 0.165 0.167 0.144 0.153
9 est 0.127 0.179 0.139 0.141 0.134 0.141 0.127 0.138 0.137
obs 0.138 0.180 0.131 0.135 0.134 0.144 0.130 0.133 0.141
12 est 0.144 0.315 0.154 0.146 0.148 0.145 0.146 0.154 0.146 0.140
obs 0.141 0.313 0.139 0.133 0.136 0.173 0.132 0.139 0.172 0.134
13 est 0.132 0.281 0.149 0.155 0.146 0.125 0.155 0.148 0.126 0.143
obs 0.134 0.281 0.133 0.136 0.132 0.143 0.133 0.136 0.131 0.143
Hemp 1958 4 est 1.203 1.414 1.225 1.232 1.216 1.348 1.286 1.311 1.366
obs 1.349 1.412 1.277 1.280 1.294 1.376 1.313 1.301 1.385
9 est 0.936 1.172 0.965 0.983 0.967 1.055 1.007 1.031 1.054
obs 1.061 1.179 0.977 1.010 1.009 1.102 1.057 1.020 1.089
12 est 1.300 1.903 1.337 1.402 1.336 1.440 1.418 1.396 1.440 1.318
obs 1.475 1.903 1.398 1.400 1.382 1.510 1.417 1.431 1.510 1.388
13 est 0.994 1.657 1.024 1.041 1.007 1.104 1.067 1.040 1.105 1.028
obs 1.070 1.647 1.044 1.035 1.036 1.172 1.082 1.055 1.056 1.171
r, random walk model.
P and p, power model; results based on the Satterthwaite method.
§S and s, spherical model; results for the non-cereals based on the Kenward–Roger method, for the cereals on the Satterthwaite method.
G and g, Gaussian model; results based on the Kenward–Roger method.
#A, anisotropic power model; results based on the Satterthwaite method.
Fig. 3.
Fig. 3.

Bias of the standard error of difference (SED) and empirical Type I errors of the t-test and F test for a nominal error of 5% for the 1954 hemp, beet, and 1958 hemp crops using Designs 4, 9, 12, and 13 and all analyzed models: B, basic; r, random walk; P and p, power; S and s, spherical; G and g, Gaussian; A, anisotropic power model; b* indicates the best model.

Fig. 4.
Fig. 4.

Bias of the standard error of difference (SED) and empirical Type I errors of the t-test and F test for a nominal error of 5% for the rye and oat crops using Designs 4, 9, 12, and 13 and all analyzed models: : B, basic; r, random walk; P and p, power; S and s, spherical; G and g, Gaussian; A, anisotropic power model; b* indicates the best model.


Aside from some exceptions for beet, the basic model yields the largest and . For almost all crops and models, Design 9 results in smaller and than Design 4. If Designs 12 and 13 are compared in the same way, then Design 13 yields the smaller values for the non-cereals. For the cereals, the results are not so clear. More frequent exceptions can be found for rye. Apart from one case for rye, the best model of the recommended Designs 9 and 13 has a smaller and than the best model of the nonrecommended Designs 4 and 12. The more frequent exceptions for rye obviously follow from the results discussed for Table 3. Likewise, the comparison of Designs 6 and 7 with Designs 4, 5, 8, and 9 highlights the role of the layout because all of these designs have the same number of treatments and the same plot size. The and has the largest values for the basic model and Designs 6 and 7 (not shown in the tables). For the spatial models, these two designs also yield mostly higher values than the others.

Figures 3 and 4 show the SED bias (based on Table 6) as well as the empirical Type I errors for Designs 4, 9, 12, and 13.

For B, the SED bias is about zero in all instances, which reflects the fact that the error mean square is an unbiased estimator of the error variance. For this model, the empirical Type I errors of the t-test lies between 3.8 and 4.9% and of the F test between 3.0 and 4.8%; larger differences with the chosen nominal error can be seen for the cereals. For the F and t-test of the spatial models (except r), a moderate underestimation of the SED is acceptable due to the following downward correction of the degrees of freedom by the Satterthwaite or Kenward–Roger method. For the non-cereals, most spatial models show a negative bias. Obviously, the negative bias cannot be completely corrected so that the empirical Type I errors mostly tend to values that are a little larger than 5%. The values of the t-test lie between 4.3 and 8.3% and of the F test between 2.9 and 15.3%, with the minimum and maximum resulting from Design 13. The cereals give a completely different impression. Models G and g mostly display a negative bias. Model p shows the same effect for Designs 4 and 9. All other spatial models have a positive SED bias, which means that their empirical errors are much too small and this causes conservative test decisions. For these models, the values of the t-test are between 1.6 and 5.0% and for the F test between 0.2 and 4.7%. Models G and g represent extreme opposites, especially for Designs 12 and 13. For rye, the empirical errors of the F test and t-test increase to 46.7 and 15.8%, respectively. Often, the empirical error of the F test has larger differences with the nominal error than does the t-test.

The results for the best-fitting and selected models are much more interesting than the results for the individual models. For these models, empirical Type I errors are between 5.6 and 9.9% for the t-test and, for the F test, between 5.1 and 22.1%. The upper values come from rye in Design 13.


The geostatistical models analyzed here are rather popular and are implemented in SAS 9.2. Piepho et al. (2008) supplied the program code for analyzing the random-walk model. The fit of anisotropic models was only possible for some designs. Geometrically anisotropic models and the anisotropic exponential model with five parameters were not analyzed because they commonly converged only by using reasonable initial parameters. The exclusion of a nugget variance conforms to Schabenberger and Pierce (2002) and Zimmerman and Harville (1991), who referred to Besag and Kempton (1986). They stated that the nugget variance is unnecessary in many field experiments. Other researchers have reported on other converse experiences so that a general recommendation to omit the nugget variance is not possible.

Model Fit and Model Preference Depending on the Position and the Randomized Plan

The analysis of the variance components of the AICC confirmed our expectation that the position in the field determines the highest proportion of variation in the model fit. We also expected a stronger dependency on position for oat than for the other crops. The fit of B showed a stronger influence of the position as it applied to the spatial models. Due to existing covariances, which are more or less pronounced at the different positions, the spatial models have the ability to react more flexibly to the characteristics of the different positions than does the basic model. For all cases, the plans played a very small role in the mean across all the positions.

The variance components of the ranked AICC values give an impression of the variation in the model preference. The ranking of B and r was more influenced by position than the other spatial models. About 50% of the variation of their ranks depended on the position and 50% on the individual combination of position and plan. For the other models, about 67% of the variation depended on the individual combination. Practically, these results mean that the preference of a model for a given position and for a chosen plan is not predictable. In any case, one is bound to make an individual decision and this refers back to the choice of the selection criterion. We also performed all these analyses using Akaike's information criterion (Akaike, 1973). The variance components of the AICC values are identical to those of the –2lR values and of any other information criterion; for the ranked values of the criteria, we obtained almost identical results.

Role of a Proper Layout: Blocking and Large-Scale Heterogeneity

The role of the layout of plots and blocks according to the classical recommendations was demonstrated by the results of Designs 4 and 12 compared with those of Designs 9 and 13, as well as for Designs 6 and 7 compared with Designs 4, 5, 8, and 9. From the discussion of the (observed and estimated), it follows that a proper layout still provides the best guarantee for obtaining precise estimations when using the basic model as well as spatial models. The results could lead to the mistaken impression that the precision of a badly planned design can be considerably improved by spatial analysis. Such trials involve the danger of biased treatment estimations, however, and the unbiasedness of the estimations cannot be checked for a single experiment.

One of the repeatedly discussed problems of analyses by using geostatistical models is the separation of large-scale and small-scale heterogeneity. The use of geostatistical models, which describe small-scale heterogeneity, assumes second-order stationarity. This stationarity is violated if large-scale heterogeneity exists and is not taken into account by an appropriate experimental design. In general, blocking aims to turn known disturbance factors into planning factors to get accurate and precise estimations of treatment differences. It should furthermore guarantee second-order stationarity when performing an analysis using spatial models. Depending on the experimental conditions, this may be achieved by designs with complete blocks, incomplete blocks, or by row–column designs (John and Williams, 1995). In our study, for S, P, G, and A, this stationarity should be fulfilled across the whole design after the subtraction of block effects, and within blocks for s, p, g, and r. Tamura et al. (1988), Brownie et al. (1993), and Richter et al. (2007) used polynomial functions of the plot coordinates to generate trend models for the description of large-scale heterogeneity that was not captured by blocks. In these studies, the selection of the polynomial degree played an important role in avoiding overfitting. Based on Tamura et al. (1988), Brownie et al. (1993) used a significance level of 0.01 to decide whether a polynomial term should be included. Alternatively, the decision about the relevance of trend components can be made by using the information criteria, as Richter et al. (2007) and Spilke et al. (2010) have described. In doing so, it should be taken into account that only the information criteria of the maximum likelihood method can penalize the number of fixed as well as the number of variance–covariance parameters. Besides these methods to describe large-scale variation, more sophisticated approaches have been suggested (Piepho et al., 2008). Gilmour et al. (1997) suggested the addition of random effects for the large-scale trends, Verbyla et al. (1999) used smoothing splines, and Besag and Higdon (1999) worked with Bayesian methods to describe the fertility of a field.

In our uniformity trials, we knew that large-scale heterogeneity existed for the whole field. When searching for reasons for the stronger bias found for cereals in Designs 12 and 13, we proved whether such trend components described by polynomials of the coordinates should be taken into consideration for the individual positions. Finally, we did not find serious changes in the results, and moreover, this could not really have been expected because Design 13 was a recommended design for which stationarity should have been a given.

In general, the existence of large-scale heterogeneity plays a more important role in on-farm trials (Piepho et al., 2011). For on-station trials, however, appropriate block designs should be used to control these disturbance factors. Beyond this, the blocks provide the opportunity to capture management effects (e.g., temporal and staff effects) arising during the experiment.

The Role of Randomization in Spatial Models

The small variance components for the plans calculated from the AICC values as well as from their ranks should under no circumstances create the impression that randomization in the experiments is dispensable. Rather, its aim is to equalize potentially existing and, during the planning phase, unknown random disturbance influences on an experiment so that a bias of the estimated treatment means and their differences can be avoided in the mean of all randomizations. For the basic model, as well as for the spatial models, these estimations were unbiased in our study. For the basic model, randomization has the further function of justifying the use of this model for the analysis. Although covariances existed in the analyzed uniformity trials, the results of the basic model showed no SED bias and largely coincidental values of nominal and empirical Type I errors. This is the consequence of the large number of randomizations. Generally, this does not apply to analysis using spatial models. As mentioned above, the correlation structure will be neutralized in the mean of all randomizations. Thus it cannot be expected that the bias vanishes for the spatial models for one or only a few positions in the mean of the randomizations. In the studies of Zimmerman and Harville (1991), Wu et al. (1998), and Wu and Dutilleul (1999), different positions did not exist because the whole uniformity trial was regarded as one design. They calculated the bias of the estimated variances of treatment differences to evaluate the models by referring to Besag (1983). Besag (1983) explicitly emphasized that this criterion is valid “in a randomization framework”; however, geostatistical models do not operate in such a framework. Generally, inference statements are valid conditionally on the correctness of the model; however, the correctness of a spatial model, and the inferences based on it, cannot be justified in the mean of all randomizations as given for the basic model. Whether a correct spatial model has been chosen can only be proven in the mean of all (a large number of) realizations of a random field and not in the mean of randomizations. In any case, if only one realization of a random field exists, it can be expected that the observed and estimated variances of the differences are not equal.

These remarks can be analogously applied to the bias of the standard deviations used in our study. In contrast to the above researchers, we had different positions. Whether they can be considered as realizations of the same random field cannot be answered. The fact that different models showed the best fit at different positions is not necessarily caused by different underlying random fields, as simulations have shown (Spilke and Richter, 2007). Due to the chosen shift of the designs across the uniformity trials, the data for the different positions are partially dependent. Whether and to what extent this procedure impacted the results cannot be assessed.

Questions That Have To Be Answered by Simulations

The question remains where the stronger bias for rye and oat comes from in the larger designs for the individual models as well as for the best selected models. Apparently, this was not caused by an inappropriate design but it was rather a question of the characteristics of the crops in combination with the soil. Generally, a bias may arise if an inappropriate model has been chosen or the model is correct but the number of realizations of the random field is not large enough so that the bias would vanish in the mean. We noticed in our own simulations that considerable over- and underestimations for an individual realization of a random field (averaged across the randomizations) were more likely in cases of strong correlations than in cases of weak correlations. This even applied if the true model was equivalent to the analysis model. So, the stronger correlations of the cereals in contrast to the weaker ones of the non-cereals may be the reason for the differently pronounced biases for the crops (cf. Table 3). Whereas the relatively small number of positions was insufficient for the cereals, the non-cereals showed better results even though they were not satisfactory either. Currently, we tend to this explanation for the observed bias.

For all cases with 10 treatments, the geostatistical models were better fitted, and their analysis gave rise to a considerable gain in efficiency compared with the basic model, from which a larger test power can be expected. The main problem that remains is compliance with the nominal Type I errors. Hu et al. (2006) and Spilke et al. (2010) have shown that the bias of the Type I error of the t-test more or less vanishes in the mean of a large number of realizations of a random field. Hu et al. (2006) remarked, however, that the parameters of the simulation models influence the results. The role of the randomization has not been considered. Both studies used a pairwise consideration: one simulation model and one (the same or another) analysis model, where the latter was not selected by any information criterion. For practical situations, however, the results for the best-fitting models are of interest. To clarify whether the general tendency toward too-liberal test decisions in the means of the best models is due to our special field situation, simulations with a sufficient number of realizations of a random field and randomizations will be essential. Concerning the variance components of the model fit and of the ranked information criteria, a few simulations we have conducted show an astonishing congruence with the results presented here.

Comparison of the Results of Different Crops and Potential Advantages of Analysis Using Spatial Models

We observed some similarities and some differences when comparing the results for the crops. The existence of covariances, their pronounced directions, and ranges are very similar. Due to the different variances of the yield data, however, the correlations are differently pronounced. The model selections tend toward similar decisions insofar as the basic or random-walk model is better suited to the small designs and nonlinear spatial models are better suited to the larger designs. There are other experimental situations in which the linear model (r) is also appropriate for experiments with higher treatment numbers, however, as Pilarczyk (2007) and Müller et al. (2010) have reported. The idea that a particular covariance model is better suited than another for all subareas of the field or for a specific crop has been proven to be untenable.

Whether a scientist is willing to spend an additional analysis effort certainly depends on the precision gain that can be expected. Because of the SED bias, the gain by using spatial models compared with using the basic model was evaluated on the basis of the observed SED (Table 6). The quotient [SEDobs(basic model)/SEDobs(best model)]100 may serve as an appropriate measure. Gain was only not reached with smaller designs for beet. This corresponds to Table 5, which shows that the basic model turned out to be the best one for most situations. For the other crops, the values of the smaller designs (Design 4 and 9) lay between 104.7% (hemp in 1958, Design 4) and 146.4% (rye, Design 9). The gain was higher for the larger designs (Designs 12 and 13), in the range from 120.1% (beet, Design 13) to 250.0% (rye, Design 12). The larger the one-lag omnidirectional correlation (Table 3), the larger the gain. The pairwise comparison of the models P with p, S with s, and G with g shows that P, S, and G have in most cases a smaller mean SED. These models use the existing correlations across the whole trial, where the others use only the correlation within blocks. It can be expected from this precision gain that the power for the treatment comparisons by using spatial models is higher than that of the basic model. Such analyses could not be conducted in this study because we did not simulate treatment effects.

Final Remarks

Schabenberger and Pierce (2002, p. 628) described three schools of data handling in field experiments: The first school fully relies on the randomization theory and solely uses the basic model for the analysis. The second one does not take care of the randomization distribution and goes from the model-driven to a complete data-driven approach. The third school is a mixture of the two. This approach considers randomization and blocking as the basic principles in the planning phase. Therefore analysis using the basic model is justified. The geostatistical models are handled as an add-on component to the basic model so that the model-driven approach changes to a data-driven approach, related, however, only to the covariance structure. We prefer the third approach as a possibility to compare treatments more precisely for on-station trials. This approach was demonstrated in this study. For on-farm trials with mostly larger plots and blocks, however, we suppose that blocking will not be able to capture the large-scale heterogeneity in each case. In such a situation, it may be necessary to include covariates in the analysis model to get unbiased estimates and to fulfill stationarity as a precondition for fitting spatial covariance models.


This study showed that blocking and randomization should be the basic principles of planning of experiments regardless of whether the analysis is planned by the basic model or by geostatistical models as an add-on component. Provided that blocks and plots are well adapted to the characteristics of the experimental field, this strategy ensures that treatment differences are precisely estimated and without bias in the mean of all randomizations. Compared with the basic model, the blocking procedure for the spatial models has the further function of fulfilling the required second-order stationarity. In this study, the recommended layout of blocks and plots based on classical principles of block and plot construction accounted for the large-scale heterogeneity of the whole field. The large-scale heterogeneity was characterized by a relatively consistent pattern for all crops. In the mean of all combinations of positions and randomized plans, the recommendation worked well. This was shown by comparing the mean SED values for each crop; the same conclusion can be drawn by comparing the mean AICC values. For the basic model, for most spatial models, and especially for the best-fit models, the recommended designs show more precise estimations and smaller mean AICC values than the other designs. On the investigated field, nonlinear spatial models were better adapted to designs with 10 treatments; for smaller designs with four treatments, the linear spatial or the basic model was better fitted. The number of treatments per block was relatively small in both cases, in the same range as is also typical for randomized incomplete block designs. This implies that spatial analysis may be even a useful add-on component for designs with incomplete blocks. Compared with the basic model, remarkable reductions of the SED were achieved especially for larger designs. The larger the spatial correlations, the larger was the precision gain. The precision gain can only be evaluated if analysis by the basic model is justified. Statistical inferences based on the basic model are only correct in the mean of all randomizations. For spatial models, the randomization has lost this function; however, it has still to ensure the desired unbiasedness of estimates.

The variation in the fit of all models was mainly caused by the position in the field. In contrast, the variation of the model preference resulted predominantly from the interaction of position and randomized plan. Hence, the best-fit model for a single trial is not predictable—different candidate models have to be checked. This leads to increased effort for data analysis, and the question arises which models should be taken into account as candidate models. A simplification of the search procedure would be desirable; however, this aim cannot be supported by this study. The same model did not even show the best fit for all randomizations and a given design, crop, and position. Consequently, the same model cannot be expected for the same position in different years or for the same crop at all positions. An individual decision for each single trial seems to be inevitable. It is not surprising that the best-fit models vary from study to study, especially because the results depend on the candidate models considered. The decision of which models should be taken into account is a fundamental problem in data-driven approaches. This problem is still more apparent in cases were covariates have to be included as often arises in on-farm trials. The directional and omnidirectional covariograms of the residuals of the basic model may give a hint whether and which spatial models are worth considering. Likewise, the decision on the inclusion of covariates can be supported by graphical representations. This is not practical, however, for routine analyses with large numbers of trials. The only way remaining in this case is the fit of different models—with and without a nugget, with and without anisotropy–as was done in this study. Further investigations with simulated data are necessary to clarify the reasons for partially observed noncompliance with the nominal Type I error.

Many of the questions discussed here also arise in on-farm experiments. This concerns the problem of blocking and randomization when geostatistical models are used as well as the difficulties connected with the model fit and with the inclusion of covariates. We refer to Piepho et al. (2011) for a detailed discussion and further specifics.


We wish to thank Hans-Peter Piepho (Universität Hohenheim, Germany) and Joachim Spilke (Martin-Luther-Universität, Halle-Wittenberg, Germany) for very fruitful discussions and helpful remarks on previous versions of this paper.




Be the first to comment.

Please log in to post a comment.
*Society members, certified professionals, and authors are permitted to comment.