Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reconstructing Ancestral DNA (at least the gaps) Using unrooted phylogeny, multiple alignment, and affine gap cost function. Work in progress.

Similar presentations


Presentation on theme: "Reconstructing Ancestral DNA (at least the gaps) Using unrooted phylogeny, multiple alignment, and affine gap cost function. Work in progress."— Presentation transcript:

1 Reconstructing Ancestral DNA (at least the gaps) Using unrooted phylogeny, multiple alignment, and affine gap cost function. Work in progress.

2 2 Overview Introduction Examples Gap Graph construction Theory Algorithm Results Next steps

3 3 Example N3 ???

4 4 Example (a) Two long indels. N3 nnn

5 5 Example (a) Two long indels. (b) Three short indels. N3 n-n

6 6 Example (a) Two long indels. (b) Three short indels. Which is more parsimonious depends on gap cost function: Cost of indel of length k is g(k) = a + b*k N3 nnn/n-n

7 7 Harder Example N8, N9, N10, N11, N12, N13 ??? Problem: find optimal explanation for gaps in terms of indels.

8 8 Gap Representation 1.Find gap intervals 2.Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves

9 9 Gap Representation 1.Find gap intervals 2.Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves Vertex: a)subtree with gaps in all leaves b)section of alignment

10 10 Gap Representation 1.Find gap intervals 2.Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves

11 11 Gap Representation 1.Find gap intervals 2.Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves

12 12 Gap Representation 1.Find gap intervals 2.Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves

13 13 Gap Representation 1.Find gap intervals 2.Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves

14 14 Gap Representation 1.Find gap intervals 2.Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves

15 15 Gap Representation 1.Find gap intervals 2.Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves

16 16 Gap Graph Construction 3. Create connections between neighbors v and w if one is contained in the other.

17 17 Gap Graph Construction 3. Create connections between neighbors v and w if one is contained in the other.

18 18 Gap Graph Construction 3. Create connections between neighbors v and w if one is contained in the other.

19 19 Gap Graph Construction 3. Create connections between neighbors v and w if one is contained in the other.

20 20 What is a vertex? Either one indel created all gaps in the subtree, or the vertex (subtree) is decomposed into several indels. Algorithm goal: confirm or decompose vertices using gap cost function.

21 21 Flashback: ~ Jotun’s Algorithm This example can be solved optimally: using a=5, b=3, all vertices are confirmed. - i.e., all gaps created ‘as high as possible’ in the tree.

22 22 Horrific Counter Example At first sight: confirm all vertices.. (0,1) (0,1,2,3) (1,2,3,4)

23 23 Horrific Counter Example At first sight: confirm all vertices.. 6 indels. (0,1) (0,1,2,3) (1,2,3,4)

24 24 Horrific Counter Example At first sight: confirm all vertices.. 6 indels. BUT: solution with 5 indels can be found! (0,1) (0,1,2,3) (1,2,3,4) Depending on gap cost function, this may be cheaper. Thus first solution may not be optimal  Problem: the indel (2) is invisible.

25 25 New Type of Connection Needed! (0,1) (1,2,3,4) (0,1,2,3) 3. Create connections between neighbors v and w if one is contained in the other if they share leaves. - The indel (2) lies in the intersection of the cousins.

26 26 Now The(st)ory Begins By construction of the gap graph, we can prove two theorems: Theorem 1 Each optimal indel either corresponds directly to a vertex, or it crosses a cousin connection. Only possible optimal indels: (0,1) (3) (0,1,2,3) (1,2,3,4) (1) (4) (2) (0,1) (0,1,2,3) (1,2,3,4)

27 27 Now Theory Begins By construction of the gap graph, we can prove two theorems: Theorem 2 If a vertex v is decomposed in the optimal solution, all decomposing indels extend beyond v’s section of the alignment, and they do not all extend in the same direction. Thus we have to decompose none or both of (0,1,2,3) and (1,2,3,4): otherwise (2) doesn’t extend beyond the region of (0,1,2,3) (0,1,2,3) (1,2,3,4)

28 28 Now Theory Begins From the theorems we can prove some lemmas: 1: Leaf vertices can be confirmed. 2: Orphans / end vertices can be confirmed. 3: Patriarchs can be confirmed and trimmed.

29 29 Solving Earlier Example 1: Leaf vertices can be confirmed.

30 30 Solving Earlier Example 1: Leaf vertices can be confirmed. 2: Orphans / end vertices can be confirmed.

31 31 Solving Earlier Example 1: Leaf vertices can be confirmed. 2: Orphans / end vertices can be confirmed. 3: Patriarchs can be confirmed and trimmed.

32 32 Solving Earlier Example 1: Leaf vertices can be confirmed. 2: Orphans / end vertices can be confirmed. 3: Patriarchs can be confirmed and trimmed.

33 33 Solving Earlier Example 1: Leaf vertices can be confirmed. 2: Orphans / end vertices can be confirmed. 3: Patriarchs can be confirmed and trimmed. 4: Mono-chain vertices can be decided locally.

34 34 End of Pre-Processing In longer examples there will be undecided vertices (purple) after pre-processing. Find possible decompositions for each vertex and check all combinations in each chain 

35 35 9 sequences, 60% gaps, preproc.time < 4 s --------------------- Alignment length 3936, divided in 3922 gap intervals. --------------------- 1497 vertices undecided before trimming. 1112 vertices undecided after trimming. --------------------- Created 8912 vertices, 871 connections. Confirmed 5469 leaf vertices, 2285 patriarchs, 210 end vertices, 217 locally confirmed non-cousin chain vertices, 37 locally confirmed cousin chain vertices, and 487 mono-chain decomposed vertices. --------------------- 207 vertices undecided after all preprocessing. #chains with undecided: 89, max #undecided in same chain (C31): 7 estimated number of combinations: 2788, max in same chain: 1152 ---------------------

36 36 9 sequences, 60% gaps, preproc.time < 4 s --------------------- Alignment length 3936, divided in 3922 gap intervals. --------------------- 1497 vertices undecided before trimming. 1112 vertices undecided after trimming. --------------------- Created 8912 vertices, 871 connections. Confirmed 5469 leaf vertices, 2285 patriarchs, 210 end vertices, 217 locally confirmed non-cousin chain vertices, 37 locally confirmed cousin chain vertices, and 487 mono-chain decomposed vertices. --------------------- 207 vertices undecided after all preprocessing. #chains with undecided: 89, max #undecided in same chain (C31): 7 estimated number of combinations: 2788, max in same chain: 1152 ---------------------

37 37 Is Pre-Processing Important? 9 sequences, 60% gaps; no pre-processing: --------------------- Created 10082 vertices, 7121 connections. --------------------- 1497 vertices undecided with no preprocessing. #chains with undecided: 950, max #undecided in same chain (C40): 10 estimated number of combinations: 71950, max in same chain: 34560 9 sequences, 60% gaps; with pre-processing: --------------------- Created 8912 vertices, 871 connections. --------------------- 207 vertices undecided after all preprocessing. #chains with undecided: 89, max #undecided in same chain (C31): 7 estimated number of combinations: 2788, max in same chain: 1152

38 38 Next Steps Make poster for Recomb (suggestions?) Finish program Run it on real data Ideas for applications? (Score ranks alignment – use to find alignment..) Demo

39 39 Screenshots (in case demo doesn’t work)


Download ppt "Reconstructing Ancestral DNA (at least the gaps) Using unrooted phylogeny, multiple alignment, and affine gap cost function. Work in progress."

Similar presentations


Ads by Google