Download presentation
1
Incorporating Mutations
Previous we allowed for gene variants (alleles), but without a model of how they came into being Rather than the coalescence of a single gene, next we consider successive generations of gene sets Two things to consider Variants of a gene (Alleles) Variants in allele combinations (Sequences) We begin by treating each independently Gn Gn Gn Gn Gn Gn Gn Gn+1 Gn+2 Gn+3 Gn+4 4/17/2017 Comp 790– Genealogies to Sequences
2
Infinite Alleles Model
Assumes all that is knowable is if alleles are identical or different No Spatial (i.e. sequence position) or quantitative information related to the observed differences Only keeps track of how many of each allele type Number of mutations that result in a variant is lost Two event types, splits and mutations Labels are arbitrary (A) (A,A) (B)(A) (B)(A) (B)(A,A) (B)(A)(C) (B)(A)(C,C) (B,B)(A)(C,C) (B)(D)(A)(C,C) (B)(D)(A)(C,C) B D A C C 4/17/2017 Comp 790– Genealogies to Sequences
3
Comp 790– Genealogies to Sequences
Infinite Sites Model Assumes mutations are rare events Assumes DNA sequences are large Multiple mutations at the same site are extremely rare Infinite Sites Model assumes that multiple mutations never occur at the same sequence position Thus, all genes are “Biallelic” Lost haplotype 4/17/2017 Comp 790– Genealogies to Sequences
4
Comp 790– Genealogies to Sequences
SNP Panels Observed Haplotypes and SNPs from previous example Under the Infinite Sites Model the haplotype size equals number of historical mutations While sequences can be lost, alleles cannot, in contrast to the Infinite Alleles Model SNP Diversity Patterns (SDPs) can be repeated (eg. S1 and S2) Since the assignment of 1s and 0s is arbitrary, a SNP and its complement share the same SDP For N haplotypes, there are at most 2N-1 – 1 “possible” SDPs S1 S2 S3 S4 S5 H1 1 H2 H3 H4 4/17/2017 Comp 790– Genealogies to Sequences
5
A Different Kind of Tree
Unrooted “Perfect” Phylogeny Nodes correspond to haplotypes (both visible and historical) Edges correspond to SNPs Removal of an edge creates a bipartition Tree leaves correspond to mutations (allele variants) that are unique to a sequence, i.e. an SDP with only one minority allele instance, a singleton 4/17/2017 Comp 790– Genealogies to Sequences
6
Build a Phylogenetic Tree
Assume we only have direct access to observed haplotypes Construct a pair-wise distance matrix between haplotypes using Hamming distances Add smallest edge between all nodes which do not introduce a loop If the smallest distance is greater than 1 add d-1 “hidden” nodes between the pair so that adjacent nodes have a hamming distance of 1 Augment the distance matrix with the new nodes and claim the introduced edges Repeat finding the smallest distance, and augmenting until the graph is fully connected S1 S2 S3 S4 S5 H1 1 H2 H3 H4 H2 H3 H4 HA HB H1 1 3 2 4 H2 H3 H4 HA H1 1 3 2 4 H2 H3 H4 H1 1 3 4 2 4/17/2017 Comp 790– Genealogies to Sequences
7
Comp 790– Genealogies to Sequences
Four-Gamete Test Under the assumption of the infinite sites model all SNP pairs exhibit the property no more that 3 out of the possible 4 allele combinations occur Direct consequence of only one mutation per site Showing that all SNP pair combinations satisfy the four gamete test is a necessary and sufficient condition for there to exist a perfect phylogeny tree S1 S2 S3 S4 S5 H1 1 H2 H3 H4 4/17/2017 Comp 790– Genealogies to Sequences
8
Comp 790– Genealogies to Sequences
Hard Questions Which SDPs are compatible with any other SNP? Given N distinct haplotype sequences resulting from an infinite sites model what is minimum number of SDPs? Given N distinct haplotype sequences resulting from an infinite sites model what is maximum number of SDPs? Singleton SNPs are compatible are compatible with any other SNP N-1 edges are the fewest necessary to connect N haplotypes into a “linear” tree. How many singleton SNPs occur in such a tree? 2 2N-3 edges, the number of edges in an unrooted tree with N leaves 4/17/2017 Comp 790– Genealogies to Sequences
9
Comp 790– Continuous-Time Coalescence
Exercise Consider the following SNP panel Satisfies the four gamete test? Construct the tree Is the SDP 11001T possible? S1 S2 S3 S4 S5 H1 1 H2 H3 H4 H5 4/17/2017 Comp 790– Continuous-Time Coalescence
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.