Figure 1.
Figure 1.

A. Left: Sequence of the real process through which imputation methods would be used in association studies, from the selection of a reference panel through the use of imputed marker scores in association studies. Right: Parallel view of the simulation process used to verify that this approach might be fruitful. The conceptual experimental population in which imputation would be performed is in beige with the reference panel selected from it in red. Key common steps between real and simulation processes are highlighted in blue. Simulation steps that allow verification of imputation accuracy are in green. Steps in the real process that would benefit from further research are in red. B. Illustration of a small portion of the structure of reference and experimental panels for the imputation of markers that have not been scored on experimental lines. Markers (S01 to S17) are in columns and barley lines in rows. Markers in gray have not been scored in the experimental panel (denoted by line names “Exp_##”) and will be imputed on the basis of information in the reference panel (denoted by line names “Ref_##”) and on tag marker (S01, 03, 05, 09, 12, and 15) data in the experimental panel.

 


Figure 2.
Figure 2.

Lines of the barley CAP core plotted according to the first two eigenvectors determined by principal component analysis. The left and right ovals surround, respectively, the 2- and 6-row subpopulations used as parents in simulating populations for evaluation of imputation accuracy.

 


Figure 3.
Figure 3.

Fraction of masked marker scores correctly imputed by fastPHASE as a function of the number of subpopulations assumed in the barley CAP core and the number of haplotype clusters modeled in the analysis. Note the very small range of less than 1% between the best and worst correct fractions.

 


Figure 4.
Figure 4.

Probability density function of the distribution across markers of imputation correctness when marker scores missing at random were imputed by fastPHASE. Each point in the distribution represents for a marker the frequency that the marker's scores were correctly imputed.

 


Figure 5.
Figure 5.

A. Fraction of non-tag markers predicted better than or equal to a range of prediction r2. Prediction r2 calculated for: Solid line – fastPHASE imputing markers based on Haploview-selected tag SNP; Dashed line – fastPHASE imputing markers based on randomly-selected tag SNP; Dotted line – Haploview-determined proxy tests. B. Fraction of non-tag markers predicted better than or equal to a range of prediction r2. Prediction r2 calculated for: Solid line – 95% of markers retained as tags ; Dotted line – greedy algorithm (de Bakker et al., 2005) used with tag selection r2 set to 0.4; Dashed line – bestN algorithm (de Bakker et al., 2005) used with tag selection r2 set to 0.8. C. Fraction of non-tag markers predicted better than or equal to a range of prediction r2. Prediction r2 calculated for: Solid line, no symbols – 20% of markers randomly selected as tags; Circles – ten supplemental markers per chromosome; Crosses – twenty supplemental markers per chromosome; Solid lines with symbols – supplemental markers chosen because fastPHASE predicted them poorly; Dotted lines with symbols – supplemental markers chosen by bestN algorithm (de Bakker et al., 2005) with tag selection r2 set to 0.8. Corresponding solid and dotted vertical arrows indicate the increase in the fraction of markers predicted better than or equal to a prediction r2 of 0.8 when ten markers were added per chromosome. D. Fraction of non-tag markers predicted better than or equal to a range of prediction r2. Prediction r2 calculated for all OPA markers when DArT in the dataset were used as tag markers.

 


Figure 6.
Figure 6.

When Haploview and fastPHASE disagreed in their marker score imputation, the fraction of correct Haploview imputations was predicted by multiple regression (see Materials and Methods). Each point in the graph represents one marker. The line is the linear regression of observed on predicted Haploview correct fraction.