# The Plant Genome - Original Research Improving Genetic Gain with Genomic Selection in Autotetraploid Potato

1. Vol. 9 No. 3
OPEN ACCESS

Accepted: Mar 23, 2016
Published: August 18, 2016

* Corresponding author(s):
View
Permissions
Share

doi:10.3835/plantgenome2016.02.0021
1. Anthony T. Slater *a,
2. Noel O.I. Cogana,
3. John W. Forsterab,
4. Benjamin J. Hayesab and
5. Hans D. Daetwylerab
1. a AgriBio, Centre for AgriBioscience, Department of Economic Development, Jobs, Transport and Resources, Bundoora, Vic. 3083, Australia
b La Trobe University, Bundoora, Vic. 3086, Australia
Core Ideas:
• Progress in conventional potato breeding can be complex and slow.
• Genomic selection can be applied to autotetraploid potato.
• A number of factors that will affect genomic selection need to be considered.
• Genomic selection will accelerate genetic gain in potato.

## Abstract

Potato (Solanum tuberosum L.) breeders consider a large number of traits during cultivar development and progress in conventional breeding can be slow. There is accumulating evidence that some of these traits, such as yield, are affected by a large number of genes with small individual effects. Recently, significant efforts have been applied to the development of genomic resources to improve potato breeding, culminating in a draft genome sequence and the identification of a large number of single nucleotide polymorphisms (SNPs). The availability of these genome-wide SNPs is a prerequisite for implementing genomic selection for improvement of polygenic traits such as yield. In this review, we investigate opportunities for the application of genomic selection to potato, including novel breeding program designs. We have considered a number of factors that will influence this process, including the autotetraploid and heterozygous genetic nature of potato, the rate of decay of linkage disequilibrium, the number of required markers, the design of a reference population, and trait heritability. Based on estimates of the effective population size derived from a potato breeding program, we have calculated the expected accuracy of genomic selection for four key traits of varying heritability and propose that it will be reasonably accurate. We compared the expected genetic gain from genomic selection with the expected gain from phenotypic and pedigree selection, and found that genetic gain can be substantially improved by using genomic selection.

### Abbreviations

BLUP, best linear unbiased prediction; DR, double reduction; GBS, genotyping-by-sequencing; LD, linkage disequilibrium; MAS, marker-assisted selection; Ne, effective population size; QTL, quantitative trait locus; SNP, single nucleotide polymorphism

Potato is a significant and highly versatile human food crop produced in nearly 150 countries (Bradshaw and Ramsey, 2009) and is the major source of carbohydrate in the diets of hundreds of millions of people. Potatoes also supply significant amounts of protein, vitamins, and minerals (Bradshaw and Ramsey, 2009; Storey, 2007). These attributes have seen potato rapidly rise to be the fourth most important human food crop on a global basis, although it has only been 500 yr since its emergence from its center of origin in South America.

Since emerging, potato has been selected and bred for higher levels of local adaption to various environments in production areas. This outcome was achieved relatively quickly because of the high genetic diversity of potato, allowing identification of genotypes that perform better under various conditions. Such genetic heterogeneity is a result of both the obligate outbreeding habit of their progenitor species and the additional genotypic variation arising from the autotetraploid chromosomal constitution of cultivated varieties (Slater et al., 2014a). Despite this, genetic gain in complex traits, such as yield, has been slow to nonexistent (Jansky, 2009), especially when compared to other crops such as maize (Zea mays L.), wheat (Triticum aestivum L.), and rice (Oryza sativa L.) (Fischer and Edmeades, 2010). The development of superior potato cultivars is demanding, as breeders must consider a large number of traits during cultivar development. A number of these traits are under simple genetic control, whereas others are under the influence of several to a large number of genetic effects (Slater et al., 2014a). Breeding for improvements in the simple traits is relatively straightforward but obtaining improvements in the more complex traits is more difficult.

A significant amount of effort has focused on developing genomic resources to improve potato breeding, which has culminated in a draft potato genome (Potato Genome Sequencing Consortium, 2011). This has enabled the identification of over 39,000 genes, including a number that control biotic stress resistance (Bakker et al., 2011; Jupe et al., 2012; Jupe et al., 2013) and quantitative trait loci (QTLs) for quality traits (D’hoop et al., 2014; Uitdewilligen et al., 2013). However, for traits such as yield, QTLs of large effect are unlikely, even though these traits exhibit reasonable levels of heritability (Slater et al., 2014b). A likely genetic architecture for these traits is many mutations of small effect, as observed for yield in rice and maize (Huang et al., 2010; Laurie et al., 2004). Genomic selection, where genome-wide SNPs that are in linkage disequilibrium (LD) with the many QTLs affecting yield and other traits are used to derive genomic estimated breeding values for these traits (Meuwissen et al., 2001), is now possible in potato, given the large number of SNPs discovered by genome sequencing (Uitdewilligen et al., 2013). Genomic selection is being successfully applied in a number of animal and plant improvement programs (Crossa et al., 2010; Daetwyler et al., 2010a; Grattapaglia et al., 2011; Lin et al., 2014; Resende et al., 2012; Riedelsheimer et al., 2012; VanRaden et al., 2009; Wiggans et al., 2011; Wolc et al., 2011, 2015). In this review, we investigate the feasibility of applying genomic selection to potato, including the extent of increase in genetic gain expected as a result of using genomic selection in novel breeding schemes.

### Improving Potato Breeding Models

Potato breeding is challenging, as ∼40 traits are considered during new cultivar development (Gebhardt, 2013). These traits can be broadly divided into yield-related and tuber quality characteristics, as well as tolerances to biotic and abiotic stresses (Slater et al., 2014a). Knowledge of the genetic control of each trait and the degree of environmental influence on the expression of the target traits is important and will influence which method should be used for selecting superior phenotypes and genotypes. Some traits are controlled by single genes, whereas others are under more complex control (Slater et al., 2014a). Breeding is made more challenging in potatoes than in some grain and forage crops, not only because of potato’s highly heterozygous and autotetraploid nature but also because more market-specific traits are considered. Many target traits such as yield, tuber number, tuber size, specific gravity, and processing quality can be highly affected by the growing environment, which can vary significantly (Jansky, 2009). Consequently, a conventional breeding scheme will involve screening genotypes across a number of clonal generations and in a number of appropriate locations for a range of desirable characters, which can take over 10 yr (Jansky, 2009).

Over recent years, significant advances have been made in understanding the genetics of potato to improve breeding for more rapid genetic gain. A conventional breeding program creates a large population, then employs phenotypic recurrent selection over a number of generations, using a progression of selection pressures to reduce the population size while concurrently increasing the number of plants under evaluation of each genotype (Bradshaw and Mackay, 1994; Jansky, 2009). To reduce the size of a breeding population, the majority of programs practice early-generation visual selection to enable a more thorough assessment of fewer clones in later trials. Prior to the mid-1980s, most potato breeding programs planted a very large number of seedlings and then used intensive selection to reduce the population to a manageable size. Since then, several studies have concluded that this practice is ineffective, leading to the elimination of superior as well as inferior genotypes (Anderson and Howard, 1981; Bradshaw and Mackay, 1994; Brown et al., 1988, 1984, 1987; Tai and Young, 1984), largely because the expression of visual traits is strongly influenced by seed tuber weight (Maris, 1986). Some programs that still practice intense phenotypic selection (Haynes et al., 2012) have failed to obtain any improvement in yield over time, despite 150 yr of breeding (Jansky, 2009), which is likely to be a result of intense selection for visual attributes that are not highly correlated with final field performance.

To overcome the low heritability of these visual traits, progeny tests have been employed to determine better parental combinations (Bradshaw, 2007). Progeny testing of all members of a full-sib family will determine the value of the parent for these traits, without knowledge of the location or number of genes that regulate the expression of the trait, which will be caused by the accumulation of many minor genetic effects. Progeny tests for multiple traits (visual preference, late blight tuber and foliar resistances, white potato cyst nematode resistance, and fry color for processing families) have been used by a Scottish breeding program (Bradshaw, 2007; Bradshaw et al., 2003), resulting in the identification of superior clones and parental types with improved disease resistance, as well as a 14% increase in yield (Bradshaw et al., 2009).

More recently, the use of pedigree information in the analysis enabled the phenotypic values from all relatives (including full-sibs, half-sibs, and any other partial sibling) to maximize the amount of derived information to obtain a more accurate genotypic value. This allowed the estimation of breeding values for the selection of candidates on the basis of their genetic merit using best linear unbiased prediction (BLUP) (Henderson, 1975) to exploit the additive genetic variance. The use of BLUP estimated breeding values in potato breeding clearly demonstrated the advantage of using estimated breeding values over progeny means in cross-generation prediction of progeny performance, particularly for traits with low heritability (Slater et al., 2014b).

The use of molecular genetic markers provides the opportunity to improve breeding significantly by reducing both the duration and costs of a breeding cycle. Marker-assisted selection (MAS) has the ability to select for traits several years earlier in a program than would be practical using conventional screening methods. Marker-assisted selection will be effective for qualitative traits that are controlled by major genes but it may also be valuable for quantitative characteristics, if QTLs of large effect contribute to the measured trait. Although a substantial number of genetic markers linked to genes for important traits have been identified, only a few reports have been made of their use in commercial potato breeding programs (Dalla Rizza et al., 2006; Ortega and Lopez-Vizcon, 2012; Ottoman et al., 2009; Schultz et al., 2012). For potato breeders to adopt MAS, the use of markers must be cost-effective compared to conventional screening, as has been shown to be the case for pest and disease resistance screening (Slater et al., 2013).

As MAS can be cost-effectively applied at the second field generation (Slater et al., 2013) and estimated breeding values can be calculated for a number of more complex traits at the same stage (Slater et al., 2014b), combined use of the two methods could reduce the breeding cycle from >10 to 4 yr (Slater et al., 2014a). These advances will greatly accelerate the breeding cycle and therefore increase genetic gain over conventional breeding methods. When used together with a weighted selection index, they will ensure that improvement is made in all measured traits, from those under simple genetic control to those under far more complex control. Further reductions in the cycle time should be possible with genomic selection.

### Considerations for Genomic Selection in Potato

Genomic selection was first proposed by Meuwissen et al. (2001), and uses genome-wide marker effects estimated in a phenotyped and genotyped reference population to predict the genetic merit (future phenotypes) of otherwise uncharacterized selection candidates. Genomic selection differs from MAS in that it jointly analyzes all marker data and can therefore capture all the genetic variance, whereas MAS will only capture the variance from a limited number of QTLs. In addition, genomic selection does not experience the problems associated with genome-wide association studies and QTL analysis, such as the overestimation of marker effects (Beavis, 1998). With the completion of the potato genome sequence and the identification of a large number of genome-wide SNPs (Uitdewilligen et al., 2013), application of genomic selection in potato can be confidently expected in the near future, although the heterozygous nature of potato (Slater et al., 2014a) will require some considered strategies.

### Dense Marker Maps, LD, and Marker Numbers

Genomic selection requires a large number of markers that are spread across the entire potato genome. A number of potato studies have addressed this issue, with successive studies increasing the number of markers identified across the genome for the development of genome-wide marker maps (Bonierbale et al., 1988; Dong et al., 2000; Gebhardt et al., 1991, 1989; Milbourne et al., 1998; Tanksley et al., 1992). This process culminated in the development of a dense genetic linkage map populated with 10,000 amplified fragment length polymorphism markers (Van Os et al., 2006), which was used to construct a physical map to assist the assembly of the potato genome sequence.

The completion of the genome sequence allowed the identification of multiple SNPs in any set of sequenced genotypes that were aligned to it (Uitdewilligen et al., 2013). These SNPs can be used as a densely spaced set of molecular markers. The SNP frequency in potato has been estimated to be approximately 1 per 24 bp within exons (Uitdewilligen et al., 2013), illustrating the extent of sequence diversity within potato. The assembly of the potato genome sequence has also led to the development of an 8303-feature SNP chip (http://solcap.msu.edu/potato_infinium.shtml, accessed 13 June 2016) (Felcher et al., 2012; Hamilton et al., 2011). Application of this tool has been limited, potentially because of the limited number of markers that it assays, as SNP chips in many other species have recorded a large percentage of unusable data as a result of missing data, SNPs not segregating in the populations under investigation, or through issues of polyploid SNP calling (∼50% usable data only; Jan et al. (2016)). Problems with SNP cluster calling in polyploids, particularly of the heterozygous genotypic classes, have been partly addressed through the development of custom software packages (Voorrips et al., 2011; http://www.illumina.com/Documents/products/technotes/technote_genomestudio_polyploid_genotyping.pdf, accessed 13 June 2016) that now allow the calling of Infinium (Illumina, San Diego, CA) data into the five genotypic classes (Pembleton et al., 2013). The genome sequence will also provide a reference for resequencing for the delivery of large numbers of molecular genetic markers in a rapid timeframe. As the price of sequencing continues to reduce, low-coverage whole-genome sequencing will become an option in the near future.

Although the SNP chips will allow identification of genes with a large effect through genome-wide association studies, they may not capture the entire allele frequency spectrum (i.e., ascertainment bias) and may therefore not detect some relevant effects. Single nucleotide polymorphism chip systems will potentially be superseded in the near future by genotyping-by-sequencing (GBS) methods, which could deliver genome-wide SNP profiles at significantly reduced cost (Elshire et al., 2011; Xu et al., 2012). The genome-wide profiles will allow the computational analysis of partial or whole genomic SNP complements for their combined effect on complex phenotypic traits, thus enabling genomic selection applications and strategies in potato. To achieve this, the main issues faced with GBS methods currently applied are the number of features assayed, the volume of missing data that have to be compensated for, and the number of dominant marker types that are included in the data. The large number of fragments assayed is a positive attribute when sequencing costs are not considered; however, if a limited budget or a price point needs to be reached to be economically viable and beneficial, fewer sequence reads will be generated and there will be a higher volume of missing data. Handling datasets with up to 90% missing data is complex and challenging and relies heavily on imputation, the accuracy of which with high missing genotypes can be variable (Fu, 2014). An alternative option is to target specific portions of the genome through a capture-based system (Uitdewilligen et al., 2013). However, issues arising from sequence depth may incorrectly identify the specific allelic class present. The data presented in Uitdewilligen et al. (2013) indicate a high sequence depth is needed to identify the correct genotypic class accurately (60–80× coverage leading to 98.4% accuracy in genotypic calls). Empirical data from a genomic potato breeding program are needed to model the effect of miscalling the allelic status, including double reduction (DR), and defining the reduction in prediction accuracies as a result. The reduction in prediction accuracy caused by miscalling a number of allelic classes must also be compared to and contrasted with the increase in genome coverage and the increase in SNP marker numbers achieved by implementing such an approach.

The advantage of a large number of SNPs is that this provides coverage of the entire genome, ensuring that all QTL are in LD with at least one marker and therefore capturing the majority of the genetic variance. This would entail thousands of markers for those crops in which LD decays slowly but millions for those with rapid LD decay (Xu et al., 2012). The key question for genomic selection is the extent of LD in potato breeding populations and, in turn, how many SNPs are needed to capture the majority of QTLs through LD with markers. Calus et al. (2008) proposed that a mean LD of 0.25 between neighboring markers was sufficient for successful genomic selection in their simulation study. In potato, high LD has been shown to exist at distances less than 1 cM and to decay quickly to less than 0.2 at inter-marker distances greater than 1 cM (D’hoop et al., 2010). This would imply that approximately four SNPs per cM would suffice to achieve adequate genomic selection accuracy. Thus, at least 8000 SNPs would be required, assuming a genome map length of 18.72 M (Sharma et al., 2013). Another way to approximate the number of markers that are required has been proposed by Meuwissen (2009) through the estimation of the effective population size (Ne) of a breeding population and calculation of the number of markers needed using the formula 10NeL, where L is the genome length in Morgans. We have estimated the Ne of one potato breeding population to be 79 (details given below), which would require 14,789 SNPs for successful genomic selection. These marker estimates should be understood as the numbers needed when the genotype call rate is very high and all SNPs are segregating in the target population. If genotype call rates are only moderate or populations are more diverse, the number of SNPs required will increase.

These studies on LD in potato indicate that the worldwide population is genetically diverse. However, it is the diversity within specific breeding populations that is of most direct relevance to genomic selection. This diversity has often been greatly reduced by resource induced bottlenecking events (such as selection of relatively few parents). In such populations, the moderate SNP numbers described above should be sufficient.

### Genomic Selection Approaches in Autopolyploids

Working with the cultivated autotetraploid form of potato adds a degree of complexity, as the genotyping method ideally needs to determine the dosage of the different marker alleles, where the possible genotypes are AAAA, AAAB, AABB, ABBB, and BBBB. Uitdewilligen et al. (2013) and Ashraf et al. (2014) described a method based on GBS, whereas Gidskehaug et al. (2011) described a method of quantifying allele dosage in autotetraploids using intensity information from genotyping on SNP arrays.

One model for genomic selection (ignoring fixed effects other than the mean) is:where y is a (number of phenotypes × 1) vector of phenotypes, 1n is a vector of ones, μ is the population mean, X is a matrix of genotypes, b is a vector of marker effects, and e is the vector of random normal deviates. There are at least two possible assumptions regarding the effect of marker allele dosage on phenotype for genomic selection. One assumption would be a pseudodiploid model, where all heterozygous genotypes have an equal effect on the genotype, and that the effects of the heterozygotes is at the midpoint of the two homozygotes. In this instance, there is one effect per marker (b is a number of markers (m) × 1 vector) and the X matrix has the dimensions number of phenotypes (p) × m, coded as in Table 1.

View Full Table | Close Full ViewTable 1.

Possible coding of the genotype design matrix for autotetraploid potato.

 Cultivar genotype Pseudo diploid Additive autotetraploid Full autotetraploid, including nonadditive Number of effects per marker 1 1 1 2 3 4 5 AAAA 0 0 1 0 0 0 0 AAAB 1 1 0 1 0 0 0 AABB 1 2 0 0 1 0 0 ABBB 1 3 0 0 0 1 0 BBBB 2 4 0 0 0 0 1

This model could be adapted to estimate the additive marker effect by accounting for tetraploid allele dosage. In this case, X still has the dimensions p × m, but is now coded 0, 1, 2, 3, and 4, for AAAA, AAAB, AABB, ABBB, and BBBB, respectively (Table 1).

An alternative approach, which accounts for additive and nonadditive effects in a full autotetraploid model, is to assume that each genotype has its own effect. In this case, there are five possible effects per marker, assuming that the markers are fitted as random effects, so b is a vector with the dimension 5m × 1 and X is a matrix with p rows and 5m columns (Table 1).

These X matrices can be used to implement genomic selection in either BLUP methods (assuming that all marker effects are derived from the same normal distribution) or a method that assumes nonlinear distributions (e.g., BayesB, BayesA, and BayesR) (Erbe et al., 2012; Meuwissen et al., 2001). A similar approach to the full autotetraploid model as described in Table 1 has been used to estimate haplotype effects (Pryce et al., 2010). It is noteworthy that the pseudodiploid model could also be used for a genotyping system designed for diploids (where only AA, AB, and BB genotypes are called).

Fitting the above models results in either a single effect for each marker or five effects (one for each genotype), which can be then used to calculate the genomic estimated breeding values for selection candidates as:where GEBV is the genomic estimated breeding value, X is the design matrix for selection candidates constructed from their genotypes, and is the estimated genotype effects.

As described by Habier et al. (2007), VanRaden (2008), and Hayes et al. (2009c), an alternative implementation of the BLUP genomic selection is to first construct the genomic relationship matrix G among individuals using the SNP data, then fit the model in Eq. [3]:where Z is a design matrix mapping phenotypes to individuals and g is a vector (individuals × 1) of genomic breeding values. The genomic breeding values are assumed to be distributed as:

For the pseudodiploid model, G can be constructed via Eq. [5] (following Yang et al. (2010)) for the elements jk, which indicates the genomic relationship between the individuals j and k,

Alternatively, G could be constructed as Eq. [6] for the diagonal element for the individual j:where M is the number of markers, xji and xki are elements of the design matrix X for the pseudodiploid model described above, and pi for the ith marker is obtained as:where N is the number of individuals and nbbbb, nabbb, naabb, and naaab indicate the number of individuals carrying the BBBB genotype, the ABBB genotype, the AABB genotype, and the AAAB genotype respectively.

For the full autotetraploid model, G can be constructed as shown in Eq. [8] for the elements jk, which indicates the genomic relationship between the individuals j and k:where M is now the number of markers × 5. For the diagonal element for the individual j, G can be constructed as:where pi is now the frequency of each genotype (e.g., for the first marker nbbbb N–1, where nbbbb is the number of individuals carrying the BBBB genotype, naaaa is the number of individuals carrying the AAAA genotype and so on).

The model which best fits the data may be determined on the basis of maximum likelihood. For example, the model shown in Eq. [3] could be fitted to the data and the maximum likelihood of the pseudodiploid G would be compared to the full autotetraploid G [ASReml, (Gilmore et al., 2009) could be used to fit these models]. The log-likelihood of tetraploid BLUP models with varying degrees of DR were compared in Slater et al. (2014b), where the rate of DR with the highest model likelihood was variable across traits.

The genomic relationship matrices described above are based on identity-by-state and simply measure the similarity of alleles between individuals. A more complex problem is the modeling of identical-by-descent probabilities within a genomic relationship context, with identical-by-descent measuring the likelihood that two alleles are the same and have originated from a common ancestor. In cattle (diploid), one comparison of genomic relationship matrices based on identity-by-state and identical-by-descent failed to yield a significant difference in the accuracy of genomic prediction (Luan et al., 2012). In autopolyploids, this issue may be more important due to the phenomenon of DR (Huang et al., 2015, 2014), which allows the combination of alleles from sister chromatids in the same gamete and adds additional complexity. Slater et al. (2014b) compared heritability estimates in potato across a range of traits, with a relationship matrix derived from pedigrees that either considered or ignored DR and concluded that the estimated values were very similar in both cases. However, it would be valuable to repeat the construction of G when considering DR. Genetic markers should make it possible to identify accurately when DR has occurred, provided that all three possible heterozygote genotypes can be accurately distinguished.

### Potential Reference Populations and Trait Heritability

D’hoop et al. (2010) used a substantial number of markers distributed across the potato genome to investigate the population structure within a large germplasm collection of tetraploid cultivars and progenitor clones. These data were analyzed using three methods, which all obtained very similar results. They determined that the population resolved into six groups that correlated with historic breeding aims and activities. The three main groups contained cultivars that separately related to the starch, processing, and fresh market sectors and exhibited a graduation in average dry matter content from very high to high then medium, respectively. The smallest group (called SH) contained three diploid clones. A group called ‘Ancient’ contained cultivars bred between 1850 and 1950; the last group, termed ‘the rest’ contained progenitor clones used for introgressing disease resistance genes, a diploid cultivar, and miscellaneous European cultivars. Previous studies had not been able to resolve a population structure but had been limited by a low number of markers (Gebhardt et al., 2004; Simko et al., 2004, 2006).

This population structure in potato would assist in the identification of the germplasm that should be represented in the reference or training populations for genomic selection. As breeding programs have undertaken directional selection toward their individual breeding targets, germplasm from each of these targets should form the nucleus of the reference population. This would entail different reference populations for the starch, crisp (potato chip) processing, French fry processing, and fresh market classes. These reference populations would also need to be grown in appropriate conditions, such as tropical, temperate, Mediterranean, and hot, dry production environments, for the identification of appropriate locally adapted genetic backgrounds. The ultimate reference population could include germplasm from all of these subpopulations for each target, as well as germplasm from related species that could be used for the introgression of genes of interest. However, this latter reference population would require a very large number of markers to assess the extent of genetic variation, because of the rapid LD decay in potato across such populations.

The diversity of the target population will largely dictate the size of the reference required to achieve adequate genomic prediction accuracy. More diverse populations will require a larger reference population, as well as denser markers, given the lower extent of LD. Programs specific to particular breeding targets could be based on smaller reference populations than those programs aimed at multiple end uses. The reference population should include the desired phenotypic variation (Jannink et al., 2010).

The heritability of the traits of interest will also affect the size required of a reference population to achieve accurate genomic prediction. When breeding populations were analyzed allowing for tetrasomic inheritance, as compared to the (incorrect) assumption of disomic inheritance, the two analyses produced very similar results (Slater et al., 2014b) and therefore patterns of inheritance should not be a major confounding factor. A more influential effect will be the heritability of the traits themselves, which were found to vary from highly heritable (0.83) to those with low heritability (0.21). Those traits with low heritability will require a much larger reference population to obtain meaningful prediction accuracies, although population sizes of only 952 to 1137 individuals were large enough to improve the accuracy using the estimated breeding values derived from pedigrees (Slater et al., 2014b).

### Accuracy of Genomic Selection and Expected Genetic Gain

The accuracy of genomic selection can be predicted using population and trait parameters (Daetwyler et al., 2010b, 2008; Erbe et al., 2013; Goddard, 2009). We have used the formula of Daetwyler et al. (2010b) (Eq. [10]) to estimate the accuracy of genomic selection (r) in this potato pedigree (Table 2):where Np is the number of phenotyped and genotyped individuals in the reference population, h2 is the heritability of the trait, and Me is the number of independent chromosome segments. There are many potential derivations of Me and it is unclear which is most suitable. The two derivations most often applied are Eq. [11] Hayes et al., 2009b) and Eq. [12] (Goddard, 2009):where L is the length of the genome in Morgans. There is evidence that the first derivation is more appropriate for populations with a high degree of relatedness (Wientjes et al., 2013), whereas the second is better suited to less related populations (Clark et al., 2012). Results are only presented for the Eq. [11] approximation, which is expected to result in more conservative estimates of the accuracy than are achievable with genomic selection (Table 2). The assumed genome length (L) was double that of the 9.36-Morgans diploid linkage map reported in Sharma et al. (2013) (i.e., 18.72 Morgans) in order to account for the constitution of autotetraploid populations.

View Full Table | Close Full ViewTable 2.

Number of lines in each generation of potato, estimates of cumulative inbreeding, rate of inbreeding, and effective population size (Ne) for 0.0 and 0.25 double reduction (DR).

 Generation Number of lines Inbreeding Inbreeding rate Ne No DR 0.25 DR No DR 0.25 DR No DR 0.25 DR 1 97 NA† 0.25 NA NA NA NA 2 196 0 0.319 0.000 0.092 NA NA 3 1797 0.029 0.366 0.029 0.069 17.2 7.2 4 1218 0.034 0.376 0.005 0.016 97.1 31.7 5 396 0.042 0.384 0.008 0.013 60.4 39.0 Total 3704
NA, Not available

We have estimated the effective population size of our potato breeding population, which was breeding for fresh, crisp, and French fry processing (to calculate the size of the reference population required for genomic selection) by calculating the inbreeding level per generation in a five-generation pedigree going back to the early 1900s. A numerator relationship matrix (NRM) was calculated with a proportion of DR of 0.0 or 0.25 caused by autotetraploidy, as in Slater et al. (2014b). Inbreeding (F) was approximated from the diagonal of this relationship matrix as F = NRMjj – 1, for individual j. The effective population size was then estimated from the rate of inbreeding (dF) from one generation to the next (Falconer and Mackay, 1996):

Inbreeding levels and Ne estimates per generation are listed in Table 2. There are five generations in the pedigree, which overlapped because of the use of parents in multiple generations; therefore, our separation into discrete generations only resulted in approximations of dF and Ne. There is a wealth of published studies on the estimation of Ne in pedigree and genetic marker data (Charlesworth, 2009; Hayes et al., 2003; Leroy et al., 2013). The relationships and degree of inbreeding of the foundation germplasm were unknown; subsequent generations showed increasing inbreeding, although the rate slowly decreased between Generations 3 and 5. This indicates that the greatest breeding bottleneck occurred in Generation 2 and that subsequent selections produced smaller effects on levels of diversity. Accounting for 0.25 DR with a tetraploid numerator relationship matrix dramatically increased inbreeding but the rate of inbreeding also followed a decreasing trend after Generation 3. Estimates of Ne were lowest in Generation 3 and highest in Generations 4 and 5, depending on the level of DR. As expected from the higher inbreeding levels obtained with the 0.25-DR scenarios, the Ne in these instances was lower than when assuming no DR. We could have used the harmonic Ne mean across all three generations (Ne ∼36) for predictions. However, to be conservative in our predictions of accuracy, we chose the mean of Generation 4 and 5 of the no-DR scenarios in our equation (Ne ∼79). Although the ancestral Ne of potato was large, the recent Ne in this population was much smaller and comparable to that of cattle breeds (e.g., Holstein, Jersey) in which genomic selection has been successfully implemented. It has been shown that the recent Ne is the most relevant indicator of diversity for genomic prediction accuracy (Hayes et al., 2009b; Luan et al., 2012).

The predicted accuracy using Eq. [11] ranged from 0.19 to 0.77, depending on h2 and reference population size (Table 3). The accuracies identified were encouraging, even for relatively small reference populations and traits of low heritability. Moderate accuracy, though insufficient to select a small number of elite cultivars, would be capable of distinguishing the top 50% of selection candidates and substantially increase the likelihood of success. The level of prediction accuracy would allow for a gradual progression toward a full-scale genomic selection program, in which the general breeding pool would be initially selected on the basis of genomic predictions from relatively small reference populations that leverage historical cultivar-specific information. Although this historical phenotypic information would provide good information on each cultivar, it would not indicate the superior additive genetics that could be identified from progeny testing or breeding values. The reference populations could be further enhanced through additional targeted genotyping and phenotyping to increase accuracy and, in turn, permit prediction of elite parents.

View Full Table | Close Full ViewTable 3.

Predicted accuracy of genomic selection for four traits of potato, in which BVP is breeders’ visual preference, h2 is narrow-sense heritability, Acc BLUP is the prediction accuracy of pedigree best linear unbiased prediction, and Np is the reference population size.

 Trait h2† Acc BLUP‡ Np = 500 Np = 1000 Np = 2000 Np = 5000 BVP 0.23 0.33 0.19 0.27 0.37 0.53 Yield 0.56 0.19 0.29 0.40 0.52 0.70 Boiling color 0.73 0.44 0.33 0.44 0.57 0.74 Maturity 0.86 0.78 0.36 0.47 0.61 0.77
Data from Slater et al. (2014b).
Data from Slater et al. (2014b), reflecting the fact that only a subset of the pedigree was phenotyped for some traits.

## Strategies for Including Genomic Selection in Potato Breeding

### Rapid Improvement of Breeding Germplasm for Currently Phenotyped Traits

Genomic selection would readily fit with current potato breeding activities and could be described in terms of three distinct activities, being the tasks required to establish genomic selection, the genomic selection cycle itself, and routine phenotypic testing (including that for maintaining the accuracy of genomic selection) for variety development (Fig. 1).

Fig. 1.

Genomic selection (GS) scheme for potato illustrating the establishment activities, the routine genomic selection cycle, the routine testing and selection program, and how new germplasm would be introduced into the scheme.

The establishment activities include assembly of a reference population of the genotyped germplasm with phenotypes for the target traits that are relevant to the breeding program. The lowest risk option would be to use relevant historical cultivars and potential parents to ensure that the prediction accuracies would be high. This strategy has worked remarkably well in breeds of dairy cattle, which exhibit similar levels of population diversity to the potato population shown above. This strong focus will also reduce the size of the reference population that is required (e.g., Table 3). As a potato breeding program has access to a large amount of historic phenotypic data on cultivars, which are propagated clonally, these existing cultivars would simply require genotypic analysis to develop the algorithms to correlate the genotypic and phenotypic data. Further phenotyping could be undertaken, when necessary, to improve the phenotypic information and to increase prediction accuracy. New phenotyping would be on individuals rather than progeny tests, which is more cost-effective in most traits unless h2 is low.

The algorithms could then be used to predict which cultivars should be used as parents to create the next breeding populations.

The routine genomic selection cycle would see crosses made, seed germinated, and the seedlings grown in the glasshouse. Leaf samples from the seedling population could then be taken for genomic analysis to identify which seedlings should then be used as parents. As potato breeding programs typically create a number of full-sib families within the breeding population, genomic information from parents, full-sibs, and part-sib families will also help to rapidly identify the superior 10% of individuals to use as parents for the next cycle. This genomic data from the first year could be used with the prediction equations to predict a future phenotype that would otherwise take a number of years to attain through conventional phenotyping trials. The best crossing strategy for the selected parents that yields the maximum genetic gain should be investigated using stochastic simulation of breeding options.

If the analysis can be conducted fast enough, the superior seedlings could be potted up to allow these plants to flower in their first growth cycle. If not, it would be necessary to wait for a second growth cycle before these genotypes could be used as parents. As a potato generation can be completed rapidly in the glasshouse, two cycles could easily be completed within 1 yr. This rapid cycling would therefore allow rapid enrichment of desirable genes for the 40+ traits that are currently the focus of breeding programs, while using the same marker resource. This early prediction of all desirable genetic effects would permit the identification of the best combination for the majority of traits. The strategy avoids the disadvantage of early conventional phenotyping or MAS screening for only a few traits, which neglects the possibility that seedlings may also harbor other very beneficial characteristics. Therefore, the prediction of data would not just be for the genes of major effect, such as those controlling qualitative disease resistance, but also for the traits that are controlled by a large number of small effects, such as yield. Therefore, genomic selection should concentrate on the improvement of superior cultivars for a range of traits.

The genomic data would also identify which seedlings should be taken to the field for routine phenotypic testing and selection. The superior seedlings would proceed to the field, albeit in smaller numbers (20–50%), as only the predicted superior seedlings need to proceed, thus providing savings to the program through the ability to conduct smaller field trials (Table 4). Selection rates in the G1 stage would then be higher, only eliminating very poor phenotypes, resulting in ∼80% retention for better phenotypic evaluation at the G2 stage. They would then continue for successive cycles of routine phenotypic assessment and confirmation of the superior cultivars. Although the phenotypic assessment may be through conventional methods, selection of superior cultivars should be much faster, as the conventional data will be supported by the genomic data. This phenotypic assessment could then be used to refine the algorithms correlating the phenotypic and genomic data (Fig. 1). If the field evaluation reveals that a superior potato line has been missed when selecting parents in the rapid genomic selection cycles, it can still be added as a parent in the next cycle.

View Full Table | Close Full ViewTable 4.

Comparison of the cost of potato breeding under three selection regimes with two genomic selection models.

 Generation Details Intense Moderate Mild GS 5000¶ GS 2000¶ G0 No. of seedlings† 100,000 20,000 6,667 5,000 2,000 Seedling tuber production $300,000$60,000 $20,001$15,000 $6,000 G0 selection rate 0% 0% 0% 50% 50% G1 No. of seedlings 100,000 20,000 6,667 2,500 1,000 G1 cost‡$164,731 $32,946$10,983 $32,946$10,983 G1 selection rate 2% 10% 30% 80% 80% G2 No. of genotypes 2,000 2,000 2,000 2,000 800 G2 cost§ $119,552$119,552 $119,552$119,552 $47,821 Total cost$633,040 $257,248$194,618 $138,671$55,468 Cost savings with GS 5000 489,453 117,668 55,705 – – Breakeven genotyping cost per genotype 98 24 11 – – Cost savings with GS 2000 572,656 200,870 138,908 – – Breakeven genotyping cost per genotype 286 100 69 – –
The number of seedlings varies because the selection intensity applied at G1 to arrive at 2000 genotypes in the G2 generation apart from the GS 2000 model.
The G1 trial cost (in Australian dollars) is based on the cost per hectare taken from Slater et al. (2013).
§The G2 trial cost (in Australian dollars) is based on the cost per hectare taken from Slater et al. (2013).
GS = genomic selection seedling population size.

As the adoption of new techniques requires additional financial outlay by a breeding program, we have modeled two genomic selection programs to determine when genomic selection could be cost-effectively applied. As the genomic selection algorithms would identify the parents for the crossing program, the creation of breeding populations would be smaller and more targeted. Savings would also be found through smaller phenotyping trials and reduce the need to repeat trials to confirm results. These savings have allowed us to model the breakeven costs for genotyping, in order for genomic selection to be cost-effective (Table 4). Further savings that we did not model would be gained in the reduced number of trials needed to identify superior cultivars. In addition, revenue would arise from the superior genetic gain identified through genomic selection.

Most genomic selection breeding programs have focused on additive effects, as these are readily passed to offspring. The widespread use of clonal propagation in potato makes the prediction of nonadditive effects such as dominance and epistasis attractive, because they are fixed in the clonal cultivar. Prediction of the final cultivar phenotype will be more accurate if it were to include nonadditive effects; this could reduce evaluation time substantially. Furthermore, crosses could be guided to maximize the additive and nonadditive genetic merit in potential offspring.

Establishing the reference population in different production environments may identify germplasm with adaptation to different local conditions, but could also lead to the identification of germplasm that performs well over different production environments. Analysis of genotype × environment effects would permit the identification of widely adapted germplasm. This will allow a consistent supply of the same cultivar from various growers to processing factories, so that the factories do not need to change production parameters, and a consistent supply to fresh market outlets, so that the retailers can provide familiar cultivars to consumers. Compiling different reference populations in multiple locations will quickly increase cost. Therefore, to save expense, the same set of genotyped individuals would be phenotyped in multiple locations. Genomic prediction would explicitly model genotype × environment effects and a specific prediction equation could be developed for each location (Crossa et al., 2010; Heslot et al., 2014; Jarquín et al., 2014; Resende et al., 2012). We have predicted the expected genetic gain per generation (dG) using the equation dG = i × acc × SD(g), where i is the selection coefficient, acc is the accuracy of selection, and SD(g) is the genetic SD (Falconer and Mackay, 1996). The accuracy of phenotypic selection was approximated as √h2, whereas the accuracy of BLUP was taken directly from the analyses in Slater et al. (2014b) using the same pedigree that was used for estimation of Ne. This comparison may underestimate the value of BLUP because not all individuals were phenotypically assessed, leading to an accuracy of <√h2. To predict genetic gain from genomic selection, the selection coefficient was assumed to be 1.755, which reflects the level of 10% for selection candidates to be chosen as parents (Falconer and Mackay, 1996). The genetic SD [SD(g)] was assumed to be 1 for all traits, making them comparable. As a simplification, we assumed that the size of the reference population stayed constant over time. However, additional phenotyping would occur in the routine testing and selection program (Fig. 1) and these phenotypes would update the reference population, increasing the accuracy of genomic selection. However, there will be a delay, as phenotypic information will only be gathered after G2 (or later), which results in at least two (or more) genomic selection cycles occurring before the feedback. This means that recombination events would have occurred, thereby slowly deteriorating information from older reference population entries and limiting accuracy. In Holstein cattle, with a similar Ne to our potato estimates, the reduction in accuracy from predicting progeny from great-grandparents rather than parents was ∼0.1 to 0.2 (Habier et al., 2010). We have accounted for this in our prediction of genetic gain by subtracting 0.1 from all genomic selection accuracies in Table 3. Yield is only reliably measured at the G3 stage and hence we discounted its predicted accuracy by 0.2. It is clear that genomic selection can lead to higher levels of genetic gain than either BLUP or phenotypic selection. When the reference population size is lower (i.e., Np = 500), the genetic gain per cycle of genomic selection is less than for BLUP and phenotypic methods (Fig. 2). However, the cycle time is greatly accelerated, leading to more genetic gain per year and over time. Factoring in an increase in the reference population over time may increase the superiority of genomic selection over BLUP. Genetic gain is expected to be higher for traits with higher heritability (Fig. 3). If a low-heritability trait (e.g., breeders’ visual preference) is judged to be very important, it may, to a large extent, dictate the size of the reference populations that are required to achieve adequate gain.

Fig. 2.

Expected genetic gain in genetic SD of potato over a 20-yr period of phenotypic, best linear unbiased prediction (BLUP), and genomic selection for breeders’ visual preference (BVP) when 10% of candidates are selected as parents for the next cycle. GS ref, genomic selection reference population size.

Fig. 3.

Expected genetic gain in genetic SD in potato for breeders’ visual preference (BVP), yield, boiling colour, and maturity using genomic selection with a reference population size of 500 over a 20-yr period.

We have used a linear prediction of genetic gain where the rate of gain stays constant over time. In the long-term, the rate of gain would slow because selected loci increase in frequency and slowly explain a smaller proportion of the genetic variance, as demonstrated in several simulation studies (Goddard, 2009; Jannink, 2010). This would also be the case in potato. Our linear predictions of genetic gain are deterministic and do not account for the change in allele frequencies caused by selection. The main way to guard against reduced long-term gain is to put more weight on rare alleles to drive them to intermediate frequencies, where they account for more of the genetic variance. This can be accomplished by explicitly up-weighting rare alleles using an index and by fitting a pedigree effect in genomic selection models to account for variations that are not captured by genetic markers (Goddard, 2009; Hayes et al., 2009a; Jannink, 2010). Additional research using stochastic simulation will be necessary to model all these factors in more detail.

## Rapid Introduction of New Parents and Genes from Wild Relatives

There are currently 110 wild species recognized in a recent classification of tuber-bearing Solanum species (Ovchinnikova et al., 2011; Spooner et al., 2009) and four cultivated species (Ovchinnikova et al., 2011; Spooner et al., 2007). These species are found in a wide diversity of geographic locations and climates. Some are found close to the vegetation limits in the high Andean regions, which will be subjected to freezing temperatures; others originate from semidesert regions that are subjected to high temperatures and drought; and others are found in temperate conditions or rainforests. These wild relatives will contain genes for numerous traits that are not present in commercial cultivars and represent a rich source for improvements in disease resistance, abiotic stress resistance, and tuber quality characteristics (Hanneman, 1989; Jansky, 2009; Spooner and Bamberg, 1994) and are consequently a major resource for potato breeding (Hawkes, 1994). The introgression of genes from wild species began in 1909 and has been limited, although significant, for the introduction of resistance genes against pests and pathogens such as late blight, viruses, and cyst nematodes (Bradshaw, 2007). However, introgression of desirable genes from wild species will require many generations of breeding to develop a commercial cultivar.

Similarly, potato breeding programs may have access to a wide array of potential parents with desirable characteristics from outside their program. The inclusion of both sources of variation into a genomic selection breeding program would be quite similar. In essence, the establishment phase needs to be completed for this new material, just as the initial reference population had to be evaluated to initiate genomic selection (Fig. 1). New material could be phenotyped and genotyped along with the reference population to identify whether desirable genetic variants could be introduced into the scheme (Fig. 1). Depending on the level of similarity to the reference population, it may be necessary to create seedlings from crosses to increase the number of plants that can be phenotypically evaluated and added to the reference population, allowing for more accurate prediction of new attributes. This would also allow for a statistical validation to ensure that the performance of new lines could be adequately predicted. Eventually, it should be possible to predict how effectively genomic selection can predict the performance of new lines from their genomic similarity to the reference population. If new commercial lines are moderately similar, it may be possible to initially predict them on their genotype alone. Addition of germplasm from a different breeding pool is expected to be simpler, as these lines would be relatively genetically similar to the reference population compared to wild germplasm.

The process for including attributes from wild relatives will differ depending on the genetic architecture of the trait. For quantitative traits, where genetic variation is contributed by many loci, the process described above for new cultivars would also apply. In such cases, rapid genomic selection cycling of seedlings would rapidly incorporate the desirable variation. If major QTLs or genes contribute a large proportion of the trait, more targeted and optimized introgression algorithms leading to low levels of linkage drag could be combined with genomic selection. Rather than taking many generations of ∼10 yr in a conventional program, genomic selection would enable more accurate introgression in a much shorter time frame.

### Increasing the Suite of Traits That Can Be Evaluated or Selected

Genomic selection will also allow expansion of the range of traits that can be improved. These might include traits that are evaluated late in the breeding process or that have recently become a priority, such as low acrylamide levels, which have been the subject of recent attention (Bethke and Bussan, 2013) or they could be new abiotic or biotic stresses including drought, heat, salinity, or specific diseases. The reference population could also examine traits for identification of more productive genotypes, such as efficient nutrient or water use. In this strategy, a reference population could be developed for a subset of genotypes to develop specific prediction equations for the new trait of interest. These specific predictions could then be incorporated within the larger set to develop a selection index that covers the expanded suite of traits. This expansion of desirable traits would also provide the advantage of being chosen from the combined set before the breeding pool is reduced by selection for a narrower set of traits. This strategy is attractive when the new trait is expensive to measure, as it allows for targeted phenotypic assessment of key lines to maximize prediction accuracy to apply to genotyping seedlings.

Genomic selection may also be enhanced by high-throughput phenomics, which are expected to result in a large number of indicator traits that are substantially correlated with the characteristics used in conventional potato breeding. This ‘systems biology continuum’ extends through the transcriptome for global RNA synthesis, the proteome for total protein content, the interactome for protein–protein interactions, the glycome and ionome for carbohydrates and small charged molecules respectively, the hormonome for phytohormone signaling, and the metabolome for metabolite analysis for understanding and measuring phenotypes (Mochida and Shinozaki, 2011). The continuum further extends into phenomics for whole plants, based on real-time measurements of plant growth and development using glasshouse-based imaging facilities (Furbank, 2009; Furbank and Tester, 2011; Houle et al., 2010) or through in-field use of imaging equipment (Araus and Cairns, 2014). Metabolomic methods would enable the sugar and starch profiles of tubers to be more closely examined, whereas high-throughput imaging will allow plant biomass accumulation to be monitored under various biotic and abiotic stress conditions. The development of high-throughput phenomic platforms will enable phenotype data to be captured at a rate to match the advances that have occurred with genomic selection (Cobb et al., 2013; Mackay et al., 2015), although there will be a requirement to optimize approaches for individual plant species. In this context, a focus on species-specific traits for potato, particularly subterranean stolon and tuber characters, will be essential.

## Conclusion

Genomic selection will enable accelerated genetic gain for a combination of traits, including those under complex genetic control. The Ne of a representative potato breeding program is comparable to that of Holstein cattle, for which genomic selection has been successfully implemented in many countries. Conservatively predicted genomic selection accuracies ranged from 0.2, under conditions of low h2 and small reference populations, to 0.8 in larger reference populations. Less than 20,000 SNPs would be required to achieve useful predictions of genetic merit that could be applied in rapid breeding cycles to substantially increase genetic gain compared to phenotype- or even BLUP-based selection. Finally, many ancestral clones are still available, which greatly simplifies the mining of historical phenotype data for assembly of genomic selection reference populations. Taken together, these factors make genomic selection in potato thoroughly feasible and attractive.

## Acknowledgments

This work was supported by funding from the Victorian Department of Economic Development, Jobs, Transport, and Resources.