About Us | Help Videos | Contact Us | Subscriptions
 

Agronomy Journal - Article

 

 

This article in AJ

  1. Vol. 105 No. 1, p. 11-19
    unlockOPEN ACCESS
     
    Received: Jan 12, 2012
    Published: November 16, 2012


    * Corresponding author(s): j.crossa@cgiar.org
 View
 Download
 Alerts
 Permissions
Request Permissions
 Share

doi:10.2134/agronj2012.0016

META: A Suite of SAS Programs to Analyze Multienvironment Breeding Trials

  1. Mateo Vargasa,
  2. Emily Combsb,
  3. Gregorio Alvaradoc,
  4. Gary Atlind,
  5. Ky Mathewsc and
  6. Jose Crossa *c
  1. a Universidad Autonoma Chapingo, Chapingo, Mexico, and Biometrics and Statistics Unit, CIMMYT, Apdo. Postal 6-641, 06600, Mexico DF, Mexico
    b Dep. of Agronomy and Plant Genetics, Univ. of Minnesota, 1991 Upper Buford Cir., St Paul, MN 55109
    c Biometrics and Statistics Unit, CIMMYT, Apdo. Postal 6-641, 06600, Mexico DF, Mexico
    d Global Maize Breeding Program, CIMMYT, Apdo. Postal 6-641, 06600, Mexico DF, Mexico

Abstract

Multienvironment trials (METs) enable the evaluation of the same genotypes under a variety of environments and management conditions. We present META (Multi Environment Trial Analysis), a suite of 33 SAS programs that analyze METs with complete or incomplete block designs, with or without adjustment by a covariate. The entire program is run through a graphical user interface. The program can produce boxplots or histograms for all traits, as well as univariate statistics. It also calculates best linear unbiased estimators (BLUEs) and best linear unbiased predictors (BLUPs) for the main response variable and BLUEs for all other traits. For all traits, it calculates variance components by restricted maximum likelihood, least significant difference, coefficient of variation, and broad-sense heritability using PROC MIXED. The program can analyze each location separately, combine the analysis by management conditions, or combine all locations. The flexibility and simplicity of use of this program makes it a valuable tool for analyzing METs in breeding and agronomy. The META program can be used by any researcher who knows only a few fundamental principles of SAS.


Abbreviations

    BLUE, best linear unbiased estimator; BLUP, best linear unbiased predictor; MET, multienvironment trial; META, Multi Environment Trial Analysis; MRV, main response variable; PCA, principal component analysis; RCBD, randomized complete block design; REML, restricted maximum likelihood

Multienvironment trials allow breeders to select the best-performing genotype for their target regions by assessing the relative performance of genotypes under a variety of locations and environmental conditions (Xu, 2010). In addition to enabling thorough selections, METs also provide data for estimating broad-sense heritability (repeatability) and for studying the extent and pattern of genotype × environment interaction that can provide information on how genotypes respond to different environments (Cooper et al., 1996). Multienvironment trials produce a great deal of data and can give the breeder valuable insight into their genotypes and testing program, especially if there is a simple and efficient way of analyzing them.

An important component of any MET is the experimental design at each trial location. Effective experimental designs control plot-to-plot within-location variability so that data reflect the true genetic potential of each cultivar at the location (Oehlert, 2000). Randomized complete block designs (RCBDs) have the advantage that they are simple and work well when environmental conditions within a block are uniform, as is often the case in studies with small numbers of genotypes (<10) and optimal field conditions (Bos, 2008). Randomized complete block designs are not recommended, however, for experiments that include >10 genotypes or for variable field conditions such as those encountered under drought and low N. For these situations, incomplete block designs (e.g., lattice designs) break the field into smaller and more homogenous sub-blocks for analysis, creating a greater reduction in within-environment variation so that differences among genotypes are more precisely measured.

In addition to adjusting for location effects, it is sometimes necessary to adjust a trait for a correlated trait. The correlated trait is included as a covariate in the model. The META program allows the user to adjust one user-specified variable, called the main response variable (MRV), by a covariate. For example, when breeding for drought tolerance in maize (Zea mays L.), it is useful to adjust grain yield for anthesis date. Anthesis date can be strongly positively or negatively correlated with grain yield, depending on the environment; if the target and selection environments do not match perfectly, selection will be ineffective unless yield is adjusted (Banziger et al., 2004; Campos et al., 2004).

Unbalanced data also complicate the analysis of METs. Lattice designs inherently contain unbalanced data and RCBDs frequently do as well due to adverse field conditions, seed shortages, or other errors (Spilke et al., 2005). In general, METs must be analyzed using a mixed model because they contain a mixture of fixed and random effects. Replicates, incomplete blocks, and sites are considered random effects, while the covariate (if any) is a fixed effect. Genotypes can be considered either a fixed or a random effect, depending on the goals of the analysis and the way the genotypes were selected (Smith et al., 2005). Unbalanced data and mixed effects preclude the estimation of variances using the standard fixed effects model; instead, variances are estimated by restricted maximum likelihood (REML) (Holland, 2006). With unbalanced, mixed effects, a simple mean does not adequately describe the data. Instead, BLUEs or best linear unbiased predictors (BLUPs) must be used (McLean et al., 1991; Shaw and Mitchell-Olds, 1993). For models with all fixed effects, BLUEs are the appropriate statistic because they estimate the mean performance of a response variable using ordinary least squares. When data are unbalanced, the minimization of deviation from the multivariate regression results in deviation from the simple mean but more accurately represents the true performance of the response variable. Best linear unbiased predictors allow random effects to be included in the model and again minimize deviation from the multivariate regression.

There are multiple ways to use MET data when making selections: (i) selections can be made based on a combined analysis across all locations; (ii) when multiple management conditions are tested, one management condition may be weighted more heavily than another when making selections (this is often the case with stressed locations); and (iii) weights can be assigned such that certain locations are weighted more heavily than others, possibly because one type of location is more prevalent in the target breeding area or it may be due to past experience (Bos, 2008). Because breeders may wish to use the data from a MET in many ways, the program suite can output results per individual location, per location identified by management, or per location combined by management levels, or the overall results may be combined across all locations depending on the user’s preference.

Here we present META, a suite of SAS programs that analyze data from RCBD and lattice designs including one or more covariates easily and rapidly; META also computes BLUEs, BLUPs, variance components, least significant difference (LSD), coefficient of variation (CV), and broad-sense heritability, among others statistics, of the genotypes evaluated in METs. We illustrate the use of this program with a case study of two data sets. The first is a CIMMYT MET, which is part of the Drought Tolerant Maize for Africa Project of CIMMYT’s Global Maize Breeding Program. Because this MET has two management conditions and adjusts grain yield by anthesis date, it is ideal for demonstrating the power of this program. We also present the analysis of an RCBD data set, where 16 genotypes were evaluated at 36 locations across four countries in Africa. In this case, we have grouped the genotypes by country instead of management conditions to show the flexibility of the program.


MATERIALS AND METHODS

Experimental Data

The drought tolerance data set consisted of 100 genotypes that were evaluated in five environments using an α-lattice design with two replicates per location. Three environments were drought stressed and two were managed under optimal water conditions. When applicable, grain yield was adjusted by anthesis date. Data on grain yield, anthesis date, anthesis to silking interval, and number of plants at harvest per plot were collected from all five locations and were included in the analysis. Ear height and plant height were collected at only four locations and were also analyzed to show that the program will not include locations with all values missing in its analyses.

The RCBD data set looked at 16 genotypes grown with four replicates at 36 locations in four countries. Data on the traits of days to tasselling, plant height, moisture, and grain yield were taken at all locations, while data on stalk lodging were taken at 28 locations. To show that analyses can be grouped in other ways besides management, this analysis was grouped by country. We did not adjust by a covariate because these trials were grown under well-watered conditions.

The META Suite

The META suite consists of 33 associated SAS programs. It has been optimized for use with SAS version 9.2, but is compatible with version 9.1 (it was not tested with earlier versions). A flow diagram of the main options offered by META is depicted in Fig. 1. The user can run the entire suite of programs using the program META Menus, which has a menu-driven user interface. The user does not need to modify the SAS code; all options are chosen and the data read in through a graphical user interface. Although no changes are required, advanced users may wish to make some modifications; full details of common changes are provided in the user’s manual (provided in the supplemental materials) and as comments in the program’s code. Instructions on formatting data for META are also provided in the user’s manual.

Fig. 1.
Fig. 1.

Flow diagram of the options offered by META, showing the logic of how META works. The bold numbers show the code that the user types into the on-screen menu to select that option. The first menu, Select Design, allows the user to tell META if a lattice design or RCBD design was used. Next, the Select Covariate menu tells META if adjustment by a covariate is desired. One of four data entry programs is then opened, depending on which design the user selected. The user is then prompted to enter the location of programs, tell META the names of variables like the main response variable, and state where the results should be saved; full details of the Data Entry menu are available in the user’s manual. Next, the user decides whether to visualize the raw data or analyze the data. There are three options for data visualization. In brief, 01 and 03 generates box plots, 02 and 04 generate frequency histograms, and 00 returns to the Visualize or Analyze menu. For data analysis, there are seven options that correspond to the type of analysis the user wishes to do. These options are explained in detail in the user’s manual.

 

Running the META Suite

To run META, the driver program called META Menus must be opened in SAS and run. This will launch a series of menus where the user tells the programs what type of analysis to run (Fig. 1). First the Select Design menu will launch; this allows the user to select the type of experimental design used: lattice or RCBD (Fig. 2a). Next, the Select Covariate menu launches; it allows the user to choose analysis with or without a covariate (Fig. 2b). The Data Entry menu then has the user input details about the data set, such as where the SAS programs are saved and what the MRV is called (Fig. 2c). Errors can occur when introducing data-related information in (i) the input path that specifies where the programs are located, (ii) the output path for storing the results, (iii) the input file name, or (iv) any of the names of factors and variables requested; in these cases, the program will send an error message (Fig. 2d). When the user presses Enter, the Data Entry menu reappears and errors can be corrected before moving to the next menu. Once the data are entered correctly, a menu launches that allows the user to choose between data visualization and data analysis (Fig. 2e). The data visualization submenu has three options: boxplots, frequency histograms, and return to the previous menu (Fig. 2f). The data analysis submenu has seven options. The submenus for the two experimental designs (lattice and RCBD with and without covariates) are slightly different; the first number changes, with 11 to 17 indicating a lattice design with covariate, 21 to 27 indicating an RCBD with covariate, 31 to 37 a lattice design without a covariate, and 41 to 47 indicating an RCDB without a covariate. The suboptions are the same for each design type and include: 1, genetic correlations among locations; 2, BLUE and BLUP analysis by location without identifying the management type; 3, BLUE and BLUP analysis by location but sorted by management; 4, BLUEs and BLUPs combined by management type; 5, BLUEs and BLUPs combined across all locations; 6, all suboptions 1 to 5 run in a simple step, and 7, exit META (Fig. 2g). Details of the data visualization and data analysis options are provided below. Boxplots or histograms can be printed for data visualization. Boxplots are printed for all locations by trait, while frequency histograms are printed by location, trait, and replicate using the SAS procedures PROC Boxplot and PROC Univariate, respectively (Fig. 2f). The histogram option also prints basic statistics (mean, median, mode, standard deviation, variance, range, and interquartile range) and the extreme observations (the five lowest and five highest), which help detect mistakes or outliers.

Fig. 2.Fig. 2.Fig. 2.Fig. 2.Fig. 2.Fig. 2.Fig. 2.
Fig. 2.

Menus for the SAS programs: (a) the Select Design menu is launched by opening META Menus and running the program, and the user chooses whether a lattice or randomized complete block design (RCBD) was used; (b) the Select Covariate menu allows the user to tells META if the main response variable should be adjusted by a covariate; (c) the Data Entry menu is slightly different depending on which of the two possible experimental designs (lattice or RCBD with or without a covariate) is used—for this example of a lattice design with a covariate, the user is prompted to enter details about the files for analysis, including where the SAS META programs are saved, where the results should be stored, the name of the data file, the names of the variables to be analyzed, and the names of the factors to be used; (d) an error message sent by the META menu program if errors have occurred when filling in the Data Entry menu; (e) the Visualize or Analyze menu allows the user to decide to visualize the raw data or analyze them, using the same options for each experimental design with or without a covariate but codes to select the options that vary slightly; (f) the Visualize Raw Data menu allows the user to choose between using boxplots or frequency histograms to visualize the data, again using the same options for each experimental design with or without a covariate but codes that vary slightly—once data visualization is done, typing 00 will return the user to the Visualize or Analyze menu; (g) the Analyze Data menu allows the user to select how to analyze the data using options described in the menu, using the same options for each experimental design with or without a covariate but codes that vary slightly.

 

Five types of analyses can be performed for data analysis. Using the options shown in Fig. 2g as an example, when locations are analyzed individually (Options 12 and 13), the results for every genotype in every location are printed. When the analysis is combined across locations (Options 11, 14, and 15), any location with a heritability less than the user-defined threshold (default = 0.05) will not be used in the combined analysis across locations. The default threshold (0.05) was used for the sample analyses described here. Options 11, 14, and 15 group together different locations, so the first step in these programs is to calculate the heritability for each location and delete any locations that fall below the threshold.

The corresponding linear models are implemented in PROC Mixed of SAS using REML to estimate the variance components. For analyses of individual locations using a lattice design and adjusting by a covariate, using the same syntaxes as in the SAS programs, the model iswhere Y is the trait of interest, μ is the mean effect, Repi is the effect of the ith replicate, Blockj(Repi) is the effect of the jth incomplete block within the ith replicate, Genk is the effect of the kth genotype, Cov is the effect of the covariate, and ɛijk is the error associated with the ith replication, jth incomplete block, and kth genotype, which is assumed to be normally and independently distributed, with mean zero and homoscedastic variance σ2. When calculating the BLUEs, both the genotypes and the covariate are considered fixed effects, whereas all other terms are declared random effects; for calculating the BLUPs and broad-sense heritability, all effects are considered random except the covariate.

For individual analyses using an RCBD and adjusting by a covariate, the corresponding model becomeswhere the replicates now correspond to the complete blocks and all other terms are as above. For individual analyses without adjusting by a covariate, the models are the same as above, except that the term of the covariate is deleted.

For the analyses combined across management conditions or across all locations, new terms are added to the above models. For the lattice design adjusted by a covariate, the model iswhere the new terms Loci and Loci × Genl are the effects of the ith location and the location × genotype interaction, respectively. Again, for a combined analysis of an RCBD, the above model becomes

Similarly, for the analyses without a covariate, the models for the lattice design and the RCBD, respectively, are

Also, in these last four combined models, all the effects are considered random, with two exceptions. Genotype is a fixed effect when calculating BLUEs, and the covariate is always a fixed effect.

Broad-sense heritability of a given trait at an individual location is calculated aswhere σg2 and σe2 are the genotype and error variance components, respectively, and nreps is the number of replicates. For the combined analyses, the heritability is calculated aswhere the new term σge2 is now the genotype × environment interaction variance component and nlocs is the number of locations in the analysis. In both cases, the heritability of a given trait at a location or across all locations is printed to a .csv file and to the screen. In the combined analyses, if a location will be discarded due to low heritability (lower than the threshold selected), it will have the code –999 listed as additional information in the output (Table 1). The estimation of broad-sense heritability (repeatability) provides good insight into the quality of a breeding program for traits and environments that are well known.


View Full Table | Close Full ViewTable 1.

Estimated genotype and residual variance calculated using restricted maximum likelihood for grain yield (Mg ha−1) at five different locations for the sample lattice data set adjusted by anthesis date. Heritability was calculated as genotype variance/total variance.

 
Tlalizapan, Mexico
Zimbabwe
Statistic Kenya Optimum Stressed Optimum Stressed
CodeForOut† –999
Replications 2 2 2 2 2
Genotype variance 0.14 0.87 0.38 1.98 0.00
Residual variance 0.33 0.75 0.55 3.22 0.07
Heritability 0.46 0.70 0.58 0.55 0.02
–999 indicates that location was dropped from further analyses because the estimated heritability was below the user-defined threshold (0.05).

Suboption 1 calculates phenotypic and genotypic correlations between locations, from which a distance matrix is calculated as the identity matrix (matrix with 1s on the diagonal and 0s in every other position) minus the genetic correlations matrix. Calculation of the genetic correlations matrix is explained below. The distance matrix is used as the input data set to create a dendrogram using PROC Cluster and PROC Tree and a biplot of principal component analysis (PCA) using PROC PrinComp. Both plots can be viewed directly on the screen or saved in a computer graphics metafile (.cgm), which can be imported into several Microsoft Office programs (PowerPoint, Word, Excel, etc.).

The phenotypic correlations of MRVs between locations are simple Pearson correlations between MRVs at the different location pairs, calculated using PROC Corr. The genetic correlations among locations are calculated using equations from Cooper et al. (1996):where is the arithmetic mean of all pairwise genotypic covariances between environments j and j′, and is the arithmetic average of all pairwise geometric means among the genotypic variance components of the environments (Cooper et al., 1996).

Suboptions 2 to 5 calculate BLUEs and BLUPs for the MRV and BLUEs for all other variables present in the data set. They will also calculate the number of replicates, number of locations, location variance, genotypic variance, genotype × location variance, residual variance, grand mean, LSD, CV, and broad-sense heritability for all traits and locations in the individual analyses and for all traits across locations in the combined analyses. The different suboptions change how location and management are analyzed. In Suboptions 2 and 3, each location is analyzed individually, but in Suboption 3, locations are organized by management condition. This is accomplished by including a “by loc” statement in PROC Mixed. For Suboption 4, locations are combined by management type by including a “by management” statement in PROC Mixed. For Suboption 5, all locations are combined. Different mixed model equations are used for the different experimental designs, with or without adjusting by the covariate, as detailed above.

When the program calculates BLUEs for the MRV, only the covariance parameter estimates and Type 3 tests of fixed effects are printed (all other information from PROC Mixed is suppressed). The simple mean of the genotypic BLUEs is also calculated and serves as the grand mean. The LSD (p = 0.05) is calculated as: LSD = t(1 – 0.05/2, dferror) × ASED, where t is the cumulative Student’s t distribution, 0.05 is the α level selected, dferror are the degrees of freedom for error in the mixed model, and ASED is the average standard error of the differences for all pairwise comparisons between genotypes; it reflects the precision of the trial for that specific trait. The CV is calculated as: CV = (ASED/grand mean) × 100. The CV is highly dependent on the level of the grand mean of the trial; experiments under drought stress may often show high values of CV just because of a low grand mean. The program then calculates BLUPs for the MRV using the same equations as above but with genotype now considered a random effect; covariance parameter estimates and Type 3 tests of fixed effects are printed. The BLUP for each genotype is the grand mean added to the estimated random effect for each genotype. Heritability is then calculated using the equations provided above.

Next, BLUEs, LSD, CV, and heritability are calculated for all other traits. The same equations as for the MRV are used, except no traits are adjusted by a covariate, and genotypes are always considered as fixed effects for BLUEs and as random effects for the calculation of heritability. Also, only BLUEs are calculated, not BLUPs. Covariance parameter estimates and Type 3 tests of fixed effects are printed. Heritability, LSD, and CV are calculated as above for the MRV. Finally, all estimates and statistics are printed to the screen and to a .csv file.


Results

Example 1: Sample Lattice Data Set, Drought Tolerance Data

For the drought tolerance data, boxplots and frequency histograms were generated to visualize the data and identify outliers. Most of the sample data met our expectations; for example, with anthesis date, the dates under stressed conditions were generally later than under optimal conditions (Fig. 3a). For ear height, however, we were able to identify an outlier that was probably a data recording error for the Tlaltizapan, Mexico, optimal conditions location (Fig. 3b). This data point was removed from later analyses.

Fig. 3.Fig. 3.
Fig. 3.

Boxplots showing the distribution of observations (a) for anthesis date at five experimental locations (Kenya; Tlaltizapan, Mexico, optimum environment = Tlalti_Optim; Tlaltizapan, Mexico, stress = Tlalti_Stress; Zimbabwe optimum = Zimba_Optim; Zimbabwe stress = Zimba_Stress), and (b) for ear height at four experimental locations. Data for ear height were not taken at the Kenya location.

 

Analyses of the patterns of phenotypic and genetic correlations were performed (Table 2), as well as a dendrogram created of the location clustering (Supplemental Fig. 1a) and a PCA plot of the first two components from the distance matrix (Supplemental Fig. 1b). Although the original data set contained five locations, one location had a heritability below our threshold of 0.05 (Table 1), so the program deleted that location from all analyses that combined data across locations. Before adjustment for anthesis date, all locations had a heritability greater than the cutoff (0.05), so the unadjusted data used all locations in all analyses that combined data across locations (Supplemental Table 4).


View Full Table | Close Full ViewTable 2.

Phenotypic correlations (upper diagonal, italics) and genetic correlations (lower diagonal, bold) for grain yield between all location pairs in the sample lattice data set. Locations with heritability below 0.05 were excluded from this analysis.

 
Tlalizapan, Mexico
Location Kenya Optimum Stressed Zimbabwe optimum
Kenya 1 0.21 0.07 0.00
Tlalizapan, optimum 0.37 1 0.30 0.29
Tlalizapan, stressed 0.14 0.47 1 0.07
Zimbabwe, optimum 0.00 0.46 0.13 1

Next, the data were analyzed using Suboptions 13, 14, and 15, because this was a lattice design that was adjusted by anthesis date as a covariate. Given that 100 genotypes were analyzed, it would not be practical to present the results for each genotype; instead, the first three genotypes and the statistics were chosen as examples (Tables 3, 4, and 5). Due to space limitations, the results from the first two locations only were included in Table 3. Full results are available in Supplemental Tables 1 to 3. As shown in the tables, the output from the program is organized as follows: each column is a different trait, with two columns for the MRV (yield in this example) and one for the other traits. For each location and trait, the number of replications, estimates of variance components, grand mean, LSD, CV and heritability are provided.


View Full Table | Close Full ViewTable 3.

Results of an analysis of the sample lattice data done individually for each location. Only the results for the first three genotypes (B-1, B-105, and B-106) are shown. Best linear unbiased estimators (BLUEs) and best linear unbiased predictors (BLUPs) for yield are shown, as well as BLUEs for anthesis date in days after planting (AD), anthesis to silking interval (ASI), ear height (EH), and plant height (PH). The statistics listed for every trait are number of replications, genotype and residual variance calculated by restricted maximum likelihood, grand mean for the trait, CV, LSD, and broad-sense heritability.

 
Statistic Location Genotype BLUE for yield BLUP for yield AD ASI EH PH
Mg ha−1 d cm
Kenya B-1 4.23 4.07 68.51 2.13
Kenya B-105 4.89 4.29 70.52 1.47
Kenya B-106 3.65 3.84 71.46 3.6
Replications 2 2 2
Genotype variance 0.14 0.75 0.34
Residual variance 0.33 1.08 0.85
Grand mean 3.99 69.42 2.48
LSD 1.64 2.31 2.23
CV 18.52 1.51 40.86
Heritability 0.46 0.58 0.45
Tlalizapan, optimum B-1 8.21 8.21 57.44 0 133.5 231.5
Tlalizapan, optimum B-105 8.35 7.86 60.39 –1 139.5 244.11
Tlalizapan, optimum B-106 9.11 8.55 59.55 0 153 245.53
Replications 2 2 2 2 2
Genotype variance 0.87 1.12 0.11 6 65.71
Residual variance 0.75 0.72 0.72 5027.46 48.84
Grand mean 7.78 58.18 –0.73 142.72 231.11
LSD 1.95 2.1 1.87 156.06 19.29
CV 11.27 1.64 –116.51 49.68 3.79
Heritability 0.7 0.76 0.23 0 0.73

View Full Table | Close Full ViewTable 4.

Results of an analysis of the sample lattice data combined by management conditions (optimum or drought). Only the results for the first three genotypes (B-1, B-105, and B-106) are shown. Best linear unbiased estimators (BLUEs) and best linear unbiased predictors (BLUPs) for yield are shown, as well as BLUEs for anthesis date in days after planting (AD), anthesis to silking interval (ASI), ear height (EH), and plant height (PH). The statistics listed are number of replications per location, number of locations, location, genotype, genotype × location, and residual variance components calculated by restricted maximum likelihood, grand mean for the trait, CV, LSD, and broad-sense heritability.

 
Statistic Management Genotype BLUE for yield BLUP for yield AD ASI EH PH
Mg ha−1 d cm
optimum B-1 9.39 9.51 62.00 0.00 124.00 225.00
optimum B-105 11.88 10.33 64.00 0.00 135.00 246.00
optimum B-106 9.79 9.63 63.00 0.00 148.00 254.00
Replications 2 2 2 2 2
Locations 2 2 2 2 2
Location variance 8.51 34.27 0.12 168.62 0.00
Genotype variance 0.60 0.71 0.04 65.53 54.92
Location × genotype variance 0.76 0.24 0.00 0.00 7.93
Residual variance 2.09 1.22 1.01 2566.63 119.84
Grand mean 9.55 62.32 –0.42 133.16 232.48
LSD 2.81 1.92 1.41 70.95 16.96
CV 14.81 1.56 –169.4 26.85 3.68
Heritability 0.40 0.62 0.14 0.09 0.62
stress B-1 4.91 4.85 82.00 3.00 104.00 182.00
stress B-105 5.56 4.99 85.00 5.00 111.00 199.00
stress B-106 5.08 4.89 85.00 4.00 102.00 171.00
Replications 2 2 2 2 2
Locations 2 3 3 2 2
Location variance 5.69 138.01 41.76 171.03 1602.79
Genotype variance 0.08 0.79 0.00 19.61 12.98
Location × genotype variance 0.19 0.11 2.33 27.62 76.15
Residual variance 0.47 2.50 4.57 62.46 98.18
Grand mean 4.83 83.17 3.57 103.43 178.76
LSD 1.40 2.25 4.31 17.23 24.71
CV 14.58 1.37 61.21 8.40 6.97
Heritability 0.27 0.64 0.00 0.40 0.17

View Full Table | Close Full ViewTable 5.

Results of an analysis of the sample lattice data combined across all locations. Only the results for the first three genotypes (B-1, B-105, and B-106) are shown. Best linear unbiased estimators (BLUEs) and best linear unbiased predictors (BLUPs) for yield (Mg ha−1) are shown, as well as BLUEs for anthesis date in days after planting (AD), anthesis to silking interval (ASI), ear height (EH), and plant height (PH). The statistics listed are number of replications per location, number of locations, location, genotype, genotype × location, and residual variance components calculated by restricted maximum likelihood, grand mean for the trait, CV, LSD, and broad-sense heritability.

 
Statistic Genotype BLUE for yield BLUP for yield AD ASI EH PH
Mg ha−1 d cm
B-1 7.15 7.18 74.00 2.00 113.00 203.00
B-105 8.72 7.81 76.00 3.00 123.00 223.00
B-106 7.41 7.28 77.00 2.00 124.00 212.00
Replications 2 2 2 2 2
Locations 4 5 5 4 4
Location variance 8.52 207.32 25.59 414.53 1484.07
Genotype variance 0.24 0.75 0.00 34.53 41.87
Location × genotype variance 0.59 0.14 1.31 4.81 36.66
Residual variance 1.27 2.00 3.20 1207.78 110.44
Grand mean 7.19 74.81 1.99 118.27 205.71
LSD 1.63 1.55 2.52 36.21 14.37
CV 11.48 1.05 64.48 15.56 3.55
Heritability 0.44 0.77 0.00 0.18 0.65

Example 2: Sample Randomized Complete Block Design Data Set

The sample RCBD data set differed from the lattice data set because it looked at a fraction of the genotypes across more locations. Due to the large number of locations, the sample RCBD data set illustrates the ability of the program to cluster locations by genetic distance. There were three locations that behaved differently from the others: locations Gwoza, Tsafe, and Zuru formed a separate cluster, with Tsafe and Zuru grouping together in both the cluster and PCA analysis (Fig. 4). The reason that these locations are so distinct could not be determined from this analysis. Full results for each trait are available in Supplemental Tables 9 to 13 and Supplemental Fig. 2a and 2b.

Fig. 4.Fig. 4.
Fig. 4.

(a) Dendrogram of a cluster analysis of the genetic correlations for yield between locations in the sample randomized complete block design (RCBD) data, and (b) plot of the first two principal components of a principal component analysis of the genetic correlation for yield between locations in the RCBD data set.

 


Discussion

Sample Lattice Data

Analysis using the META suite of SAS programs revealed some interesting patterns in our data. For example, although we might expect environments to cluster by management, instead they clustered most strongly by location (Supplemental Fig. 1a and 1b). The analysis also showed the value of adjusting yield by the anthesis date; when adjusted, we obtained different yield estimates. For the combined analysis across all locations with yield adjustment, the BLUP for the highest yielding genotype (B-292) was 8.13 Mg ha−1; without adjustment, it was 6.40 Mg ha−1 (Supplemental Tables 4 and 9). By looking at the analysis by management, we see that the largest change from adjustment occurred under stressed conditions. Again, looking at our highest yielding genotype, the BLUPs for yield under optimal conditions with and without a covariate were: 10.78 vs. 10.69, respectively, and under stressed conditions 5.11 vs. 3.41, respectively. This confirms previous work that showed how important it is to adjust by anthesis date when analyzing data from water-stressed environments (Banziger et al., 2004).

The program calculates both BLUEs and BLUPs for the MRV (yield in our examples); which estimator to use has been hotly debated in the literature (Piepho and Mohring, 2006; Smith et al., 2005). If the data are balanced and orthogonal, then the BLUPs and BLUEs will be equivalent; however, this is rarely the case in METs, especially if a lattice design is used. The choice of statistic can make a real difference; in the drought data set, the rankings for yield for BLUPs and BLUEs are identical until the seventh highest yielding genotype; however, the differences between rankings are within the LSD.

Sample Randomized Complete Block Design Data

The RCBD data set showed interesting clustering based on genetic correlations of grain yield at different locations; three locations were distinct from all others and formed their own cluster. They are all locations within Nigeria; however, the other Nigerian locations were indistinct and mixed with other locations in a large cluster. We were not able to identify the cause of this unique clustering. When selecting the top 20% of genotypes based on yield, we obtained the same results whether we used BLUPs or BLUEs. This is because the data were balanced within a trait and replicated across many locations. Nevertheless, selections were different for BLUE and BLUP if we look at the best-yielding genotypes by country. Therefore, the list of the highest yielding genotypes for each country would be different if BLUE or BLUP were used.

Value of the Program

The META suite is a valuable tool for plant breeders because it can allow them to rapidly analyze METs for phenotypic and genetic correlations between locations, BLUEs and BLUPs for a MRV, BLUEs for all other traits, heritability, LSD, and CV. It allows analysis of common designs with or without a covariate. Instructions in the user’s manual explain how to quickly expand the program to accommodate multiple covariates. The BLUEs, BLUPs, and adjustment by a covariate can also be calculated for any trait by selecting an option from a menu. Despite all the options in this program, no changes must be made to the SAS code; everything is run through a menu-driven interface. The flexibility, power, and ease of use of this program make it a valuable instrument in a breeder’s toolbox. The SAS code for META and the user’s manual are available for free download as supplemental material.

Acknowledgments

We greatly appreciated the dedication and the detail work done by the associate editor, Dr. Nicolas Martin, and various anonymous reviewers. The constructive, precise, and positive contributions of the reviewers significantly improved the quality of the manuscript.

 

References

Footnotes



Files:

Comments
Be the first to comment.



Please log in to post a comment.
*Society members, certified professionals, and authors are permitted to comment.