Download presentation
Presentation is loading. Please wait.
Published byViolet Holt Modified over 9 years ago
1
A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield RECOMB 2005, pp. 585–600 Date: Nov. 23, 2005 Introducer: Hsing-Yen Ann Modified from: http://wwwcsif.cs.ucdavis.edu/~gusfield/LPPH_RECOMB05.ppt http://wwwcsif.cs.ucdavis.edu/~gusfield/LPPH_RECOMB05.ppt
2
2 Abstract Since the introduction of the Perfect Phylogeny Haplotyping (PPH) Problem in RECOMB 2002, the problem of finding a linear-time (deterministic, worst-case) solution for it has remained open, despite broad interest in the PPH problem and a series of papers on various aspects of it. In this paper we solve the open problem, giving a practical, deterministic linear-time algorithm based on a simple data- structure and simple operations on it. The method is straightforward to program and has been fully implemented. Simulations show that it is much faster in practice than prior methods. The value of a linear-time solution to the PPH problem is partly conceptual and partly for use in the inner-loop of algorithms for more complex problems, where the PPH problem must be solved repeatedly. Since the introduction of the Perfect Phylogeny Haplotyping (PPH) Problem in RECOMB 2002, the problem of finding a linear-time (deterministic, worst-case) solution for it has remained open, despite broad interest in the PPH problem and a series of papers on various aspects of it. In this paper we solve the open problem, giving a practical, deterministic linear-time algorithm based on a simple data- structure and simple operations on it. The method is straightforward to program and has been fully implemented. Simulations show that it is much faster in practice than prior methods. The value of a linear-time solution to the PPH problem is partly conceptual and partly for use in the inner-loop of algorithms for more complex problems, where the PPH problem must be solved repeatedly.
3
3 Haplotypes to Genotypes 0 1 1 1 0 0 1 1 0 1 1 0 1 0 0 1 0 0 2 1 2 1 0 0 1 2 0 Two haplotypes per individual Genotype for the individual Merge the haplotypes (experiential results) Sites: 1 2 3 4 5 6 7 8 9 two 0s 0 two 1s 1 one 0 + one 1 2
4
4 Genotypes to Haplotypes 0 1 1 1 0 0 1 1 0 1 1 0 1 0 0 1 0 0 2 1 2 1 0 0 1 2 0 Two haplotypes per individual Genotype for the individual 0 (0, 0) 1 (1, 1) 2 (1, 0) or (0, 1) 2 k possible solutions!! Haplotype Inference Problem: Given a set of n genotypes (on the same sites), determine the original set of n haplotype pairs that generated the n genotypes
5
5 The Perfect Phylogeny Model of Haplotype Evolution 00000 1 2 4 3 5 10100 10000 01011 00010 01010 12345 sites Ancestral haplotype Extant haplotypes at the leaves Site mutations on edges Perfect: Never mutate twice on the same site
6
6 The Perfect Phylogeny Haplotyping (PPH) Problem Given a set of genotypes, find an explaining set of haplotypes that fits a perfect phylogeny 1 (a,b) (b)(b) 2 01c 20b 22a21 01c 01c 10b 00b 10a 01a21 10 01 00 Genotype matrix Haplotype matrix Perfect phylogeny Site (a,c,c)
7
7 The Perfection A example that does not fit a perfect phylogeny 1 (b) (a,b) 2 01c 20b 22a21 01c 01c 10b 00b 00a 11a21 10 01 00 Genotype matrix Haplotype matrix Not Perfect!! Site (c,c) 2 1 (a) 1 1
8
8 Prior Work Several existing algorithms: Several existing algorithms: A complex nearly-linear-time algorithm with a little bug runs in O(n m α(n m)) time. A complex nearly-linear-time algorithm with a little bug runs in O(n m α(n m)) time. Two simpler but slower algorithms run in O(n m 2 ) time. Two simpler but slower algorithms run in O(n m 2 ) time. Contribution of this paper: Contribution of this paper: A linear-time (O(n m)) algorithm. A linear-time (O(n m)) algorithm. Use a simple data-structure Shadow Tree and some simple operations on it. Use a simple data-structure Shadow Tree and some simple operations on it.
9
9 Shadow Tree (1/7) root 11 4 5 2 3 2 3 4 5 Tree edge Shadow edge Class Free link Flipping Fixed link Classes merge
10
10 Shadow Tree (2/7) root 11 4 5 2 3 2 3 4 5 Tree edge Shadow edge Class Free link Flipping Fixed link Classes merge
11
11 Shadow Tree (3/7) root 11 4 5 2 3 2 3 4 5 Tree edge Shadow edge Class Free link Flipping Fixed link Classes merge
12
12 Shadow Tree (4/7) root 11 4 5 2 3 2 3 4 5 Tree edge Shadow edge Class Free link Flipping Fixed link Classes merge
13
13 Shadow Tree (5/7) root 11 4 5 2 3 2 3 4 5 Tree edge Shadow edge Class Free link Flipping Fixed link Classes merge
14
14 Shadow Tree (6/7) root 11 4 5 2 3 2 3 4 5 Tree edge Shadow edge Class Free link Flipping Fixed link Classes merge
15
15 Shadow Tree (7/7) root 11 4 5 2 3 2 3 4 5 Tree edge Shadow edge Class Free link Flipping Fixed link Classes merge
16
16 The Algorithm Process the genotype matrix one row at a time, starting at the first row, and modify the shadow tree Process the genotype matrix one row at a time, starting at the first row, and modify the shadow tree While processing an element in one row, there are at most 4+3 cases, and all the cases can be done in constant time. While processing an element in one row, there are at most 4+3 cases, and all the cases can be done in constant time. Assumption: The genotype matrix only contains entries of value 0 and 2. Assumption: The genotype matrix only contains entries of value 0 and 2.
17
17 OldEntryList Genotype Matrix 2 2 2 0 0 2 0 0 2 2 2 2 2 0 2 2 0 0 2 0 2 2 2 0 0 2 0 0 2 2 2 2 2 0 2 2 0 0 2 0 OldEntryList for row 3 : 1, 2, 3, 5 OldEntryList : column indices that have entries of value 2 in this row and also have entries of value 2 in some previous rows OldEntryList : column indices that have entries of value 2 in this row and also have entries of value 2 in some previous rows 3
18
18 Shadow Tree After Processing the First Two Rows root 1 1 4 5 2 3 Genotype Matrix 2 2 2 0 0 2 0 0 2 2 2 2 2 0 2 2 0 0 2 0 2 2 2 0 0 2 0 0 2 2 2 2 2 0 2 2 0 0 2 0 3 1 2 OldEntryList for row 3 : 1, 2, 3, 5 2 3 4 5
19
19 Algorithm – FirstPath root 11 4 5 2 3 2 3 4 5 OldEntryList: CheckList: 3, 2 2,2,2,2, 3,3,3,3,5 1,1,1,1, Edges 4 and 5 cannot be on the same path to the root in any PPH solution Edges 4 and 5 cannot be on the same path to the root in any PPH solution
20
20 Algorithm – SecondPath root 1 1 4 5 2 3 2 3 4 5 CheckList:3 OldEntryList: 1, 2, 3, 5 2,2,2,2,
21
21 Shadow Tree to PPH Solutions (1/2) root 1 1 4 5 2 3 2 3 4 5 Genotype Matrix 2 2 2 0 0 2 0 0 2 2 2 2 2 0 2 2 0 0 2 0 2 2 2 0 0 2 0 0 2 2 2 2 2 0 2 2 0 0 2 0 One PPH Solution Sites: 1 2 3 4 5 abcd Final shadow tree 1 5 2 3 4
22
22 Shadow Tree to PPH Solutions (2/2) root 1 1 4 5 2 3 2 3 4 5 Second PPH Solution Final shadow tree 5 3 1 2 4 a,d b,c b,d a,c
23
23 The End
24
24 A P-Class of PPH Solutions 1 2 35 4 Genotype Matrix 2 2 2 0 0 2 0 0 2 2 2 2 2 0 2 2 0 0 2 0 2 2 2 0 0 2 0 0 2 2 2 2 2 0 2 2 0 0 2 0 One PPH Solution root P-Class: Maximum common subgraph in all PPH solutions P-Class: Maximum common subgraph in all PPH solutions Each P-Class consists of two subtrees Each P-Class consists of two subtrees Sites: 1 2 3 4 5 Genotypes a b c d a b c d a,d a,c b,d b,c
25
25 P-Class Property of PPH Solutions Second PPH Solutions All PPH solutions can be obtained by choosing how to flip each P-Class. All PPH solutions can be obtained by choosing how to flip each P-Class. One PPH Solution 1 2 35 4 root a,d a,c b,c b,d 2 3 4 a,c b,d root 1 a,d5 b,c Switching points
26
26 The Key Theorem Every PPH solution can be obtained by choosing a flip for each P-Class. Every PPH solution can be obtained by choosing a flip for each P-Class. Conversely, after fixing one P-Class, every distinct choice of flips of P-Classes, leads to a distinct PPH solution. Conversely, after fixing one P-Class, every distinct choice of flips of P-Classes, leads to a distinct PPH solution. If there are k P-Classes, there are 2 k – 1 distinct PPH solutions. If there are k P-Classes, there are 2 k – 1 distinct PPH solutions.
27
27 Shadow Tree Contains classes Contains classes Each class in the shadow tree is a subgraph of a P-Class Each class in the shadow tree is a subgraph of a P-Class Merging classes results in larger classes, classes are never split Merging classes results in larger classes, classes are never split Contains tree edges and shadow edges Contains tree edges and shadow edges
28
28 Overview of the Algorithm for One Row Procedure FirstPath Procedure FirstPath Procedure SecondPath Procedure SecondPath Procedure FixTree Procedure FixTree Procedure NewEntries Procedure NewEntries
29
29 Procedures FirstPath and SecondPath FirstPath : Construct a first path towards the root of the shadow tree which passes through tree edges of as many columns in OldEntryList as possible FirstPath : Construct a first path towards the root of the shadow tree which passes through tree edges of as many columns in OldEntryList as possible SecondPath : Construct a second path towards the root of the shadow tree which passes through tree edges of columns in OldEntryList and not on the first path SecondPath : Construct a second path towards the root of the shadow tree which passes through tree edges of columns in OldEntryList and not on the first path
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.