Figure 1.
Figure 1.

Illustration of a recent single-gene duplication event that results in highly similar paralogs, and how the paralog distinguishing list (PDL) distinguishes alleles from paralogs when calling SNPs. (A) The PDL method is based on the assumption that a pair of duplicated genes that are fixed in the extant maize population likely originated from a single duplication event, which in many cases was the ancient tetraploidization event. If the duplication event is sufficiently old, virtually all differences among paralogs are because of mutations that have occurred since the genome duplication event, and distinguishing paralogs is easy. However, if the duplication was recent and the ancestral gene was polymorphic, alternative alleles at the paralogous loci may become fixed in the population, and the number of fixed differences between the donor and derived loci may be similar to the average allelic pairwise difference observed in maize. It is these cases for which it is very difficult to distinguish alleles from paralogs on the basis of alignment scores only. (B) An intra-reference alignment of B73 reference sequences discovers putative fixed differences (T/A and G/A) that differentiate paralogs (B73 A and B73 B), which are recorded as context sequences in the paralog distinguishing list (PDL). Next, HpaII consensus sequences of Mo17 are aligned to B73 references sequences. Both the correct allelic (B73 B vs. Mo17 B) and erroneous paralogous (B73 B vs. Mo17 A) alignments detect a single nucleotide mismatch, and thus, cannot be distinguished from each other based solely on alignment scores. The context sequences of both single nucleotide mismatches (A/G and C/G) are searched against the PDL. The context sequence of the A/G mismatch matches a context sequence in the PDL; thus, the mismatch is correctly recognized as a putative fixed difference and not called a SNP. However, the context sequence of the C/G mismatch does not match any context sequence in the PDL and is therefore correctly called a SNP. When B73 carries a derived allele (B73A), the context sequence of the T/A mismatch in the allelic B73 A vs. Mo17 A comparison is also detected in the PDL. Thus, this true SNP is not called because it is incorrectly scored as a putative fixed difference, which ultimately leads to a reduction in SNP detection power.