Download presentation
Presentation is loading. Please wait.
Published byGervase Williamson Modified over 8 years ago
1
Multiple Sequence Alignment by Iterative Tree-Neighbor Alignments Susan Bibeault June 9, 2000
2
2 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work
3
3 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work
4
4 / 29 VLSPADNVKAAWGKVGAHAGEYGAEALERMF VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVY GLSDGEWQLVLNVWGKVEADIPGHVLIRLFK V-LSPADN--VKAAWGKVGAHAGEYGAEALERM---F- VHLTPEEKSAVTALWGKVNVD--EVGGEALGRLLVVYP G-LSDGEWQLVLNVWGKVEA---DIPGHVLIRL---FK -VF---- -VLSPADN--VKAAWGKVGAHAGEYGAEALERMF---- VHLVVYP VHLTPEEKSAVTALWGKVNVD--EVGGEALGRLLVVYP -GFK--- -GLSDGEWQLVLNVWGKVEA---DIPGHVLIRLFK--- Multiple Sequence Alignment Problem Given Sequence Set: –Insert gaps into sequences so that evolutionary conserved regions are aligned Important tool –Relate Homologous Proteins –Discover Conserved Regions
5
5 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work
6
6 / 29 Tree based cost(edge) m Sum of Pairs cost(i,j) cost(i,j) = 6 cost(edge) = 1 m Scoring Multiple Alignments gorilla orangutan gibbon chimpanzee human
7
7 / 29 Alignments Scoring Cost Matrix: C (aa 1, aa 2 ) Gaps Penalties: Simple: C (aa, -) Affine: C(-) + Len * C (aa,-) Cost(s[1..i],t[i..j]) = min( Cost(s[1..i],t[i..j-1]) – g, Cost(s[1..i-1],t[i..j-1]) – C(s[i],t[j]) Cost(s[1..i-1],t[i..j]) – g)) VLSPADNVKA G L S D G E W Q L V L
8
8 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work
9
9 / 29 Global Methods –Optimal Algorithms (MSA, MWT, MUSEQAL) –Progressive (MULTALIGN, PILEUP, CLUSTAL, MULTAL, AMULT, DFALIGN, MAP, PRRP, AMPS) Local methods –PIMA, DIALIGN, PRALIGN, MACAW, BlockMaker, Iteralign Combined (GENALIGN, ASSEMBLE, DCA) Statistical (HMMT, SAGA, SAM, Match Box) Parsimony (MALIGN, TreeAlign) Current Approaches Global Methods –Optimal Algorithms (MSA, MWT, MUSEQAL) –Progressive (MULTALIGN, PILEUP, CLUSTAL, MULTAL, AMULT, DFALIGN, MAP, PRRP, AMPS) Local methods –PIMA, DIALIGN, PRALIGN, MACAW, BlockMaker, Iteralign Combined (GENALIGN, ASSEMBLE, DCA) Statistical (HMMT, SAGA, SAM, Match Box) Parsimony (MALIGN, TreeAlign) Global Alignment ABCDEFGHI :::: ABCD-FGHI Local Alignment XXXABCDYYY :::: ZZZABCDEEEE
10
10 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work
11
11 / 29 Our Heuristic Distance Estimation Tree Construction Node Initialization Tree Partitioning Iteration
12
12 / 29 PEALNYGWY----SSESDVW PEVIRMQDDNPFSFSQSDVY Estimation of Protein Distance Aligned Sequences Estimated Pair Distances Issue: Implied vs. Optimal Pair Alignments PEAAALYGRFT---IKSDVW PESAALYGRFT---IKSDVW PESLALYNKF---SIKSDVW PEALNYGRY----SSESDVW PEALNYGWY----SSESDVW PEVIRMQDDNPFSFSQSDVY PESLALYNKFSIKSDVW PEALNYGRY-SSESDVW PESLALYNKFSIKSDVW PEAL-NYGRYSSESDVW PESLALYNKF---SIKSDVW PEALNYGRY----SSESDVW
13
13 / 29 Optimal Pair vs. Implied Pair
14
14 / 29 Interior Node Classification Interior Nodes Classified by Percent Identity –PID = (# matched residues) / (# total residues) –User Specified Tiers –User Specified Cost Criterion Example: –PID > 60% -- PAM 40 – High Gap Penalties –PID > 40% -- PAM 120 – Medium Gap Penalties –PID < 40% -- PAM 200 – Low Gap Penalty
15
15 / 29 Ordering Alignments Isolate Sub Trees Threshold PID Order Alignments 1.Sub Tree 2.Border Nodes 3.Integrate All
16
16 / 29 Interior Alignments Sum of Pairs Bounded Search Implementation Modular Reentrant Flexible Cost Criterion
17
17 / 29 Generating Consensus Alignment (A1,A2,A3) Consensus X Min ( D i (A i,X) ) For Each Position i: X i = A1 X D1 D2 D3 A3 A2 Min (cost( , A1 i ) + cost( , A2 i ) + cost( , A3 i ))
18
18 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work
19
19 / 29 Testing the Method BAliBASE benchmark –“Correct” Alignments –Core Blocks of Conserved Motifs –Typical “Hard Problem” Sets Protein Parsimony –Measures “Evolutionary Steps” of Alignment
20
20 / 29 Baseline BAliBASE SP betterbetter
21
21 / 29 Baseline BAliBASE TC betterbetter
22
22 / 29 Baseline - ProtPars betterbetter
23
23 / 29 Orphans/Families BAliBASE SP betterbetter
24
24 / 29 Orphans/Families ProtPars betterbetter
25
25 / 29 Larger Families betterbetter
26
26 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work
27
27 / 29 Conclusions Solution Quality Captures Evolutionary Information Iterations Converge Quickly Useful Tool
28
28 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work
29
29 / 29 Future Work Improved Alignment Consensus Multiple Partitioning Thresholds Multiple Solutions Integrated Phylogeny Modifications Parallel Implementation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.