Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Sequence Alignment by Iterative Tree-Neighbor Alignments Susan Bibeault June 9, 2000.

Similar presentations


Presentation on theme: "Multiple Sequence Alignment by Iterative Tree-Neighbor Alignments Susan Bibeault June 9, 2000."— Presentation transcript:

1 Multiple Sequence Alignment by Iterative Tree-Neighbor Alignments Susan Bibeault June 9, 2000

2 2 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work

3 3 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work

4 4 / 29 VLSPADNVKAAWGKVGAHAGEYGAEALERMF VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVY GLSDGEWQLVLNVWGKVEADIPGHVLIRLFK V-LSPADN--VKAAWGKVGAHAGEYGAEALERM---F- VHLTPEEKSAVTALWGKVNVD--EVGGEALGRLLVVYP G-LSDGEWQLVLNVWGKVEA---DIPGHVLIRL---FK -VF---- -VLSPADN--VKAAWGKVGAHAGEYGAEALERMF---- VHLVVYP VHLTPEEKSAVTALWGKVNVD--EVGGEALGRLLVVYP -GFK--- -GLSDGEWQLVLNVWGKVEA---DIPGHVLIRLFK--- Multiple Sequence Alignment Problem Given Sequence Set: –Insert gaps into sequences so that evolutionary conserved regions are aligned Important tool –Relate Homologous Proteins –Discover Conserved Regions

5 5 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work

6 6 / 29 Tree based  cost(edge) m Sum of Pairs  cost(i,j)  cost(i,j) = 6  cost(edge) = 1 m Scoring Multiple Alignments gorilla orangutan gibbon chimpanzee human

7 7 / 29 Alignments Scoring Cost Matrix: C (aa 1, aa 2 ) Gaps Penalties: Simple: C (aa, -) Affine: C(-) + Len * C (aa,-) Cost(s[1..i],t[i..j]) = min( Cost(s[1..i],t[i..j-1]) – g, Cost(s[1..i-1],t[i..j-1]) – C(s[i],t[j]) Cost(s[1..i-1],t[i..j]) – g)) VLSPADNVKA G L S D G E W Q L V L

8 8 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work

9 9 / 29 Global Methods –Optimal Algorithms (MSA, MWT, MUSEQAL) –Progressive (MULTALIGN, PILEUP, CLUSTAL, MULTAL, AMULT, DFALIGN, MAP, PRRP, AMPS) Local methods –PIMA, DIALIGN, PRALIGN, MACAW, BlockMaker, Iteralign Combined (GENALIGN, ASSEMBLE, DCA) Statistical (HMMT, SAGA, SAM, Match Box) Parsimony (MALIGN, TreeAlign) Current Approaches Global Methods –Optimal Algorithms (MSA, MWT, MUSEQAL) –Progressive (MULTALIGN, PILEUP, CLUSTAL, MULTAL, AMULT, DFALIGN, MAP, PRRP, AMPS) Local methods –PIMA, DIALIGN, PRALIGN, MACAW, BlockMaker, Iteralign Combined (GENALIGN, ASSEMBLE, DCA) Statistical (HMMT, SAGA, SAM, Match Box) Parsimony (MALIGN, TreeAlign) Global Alignment ABCDEFGHI :::: ABCD-FGHI Local Alignment XXXABCDYYY :::: ZZZABCDEEEE

10 10 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work

11 11 / 29 Our Heuristic Distance Estimation Tree Construction Node Initialization Tree Partitioning Iteration

12 12 / 29 PEALNYGWY----SSESDVW PEVIRMQDDNPFSFSQSDVY Estimation of Protein Distance Aligned Sequences Estimated Pair Distances Issue: Implied vs. Optimal Pair Alignments PEAAALYGRFT---IKSDVW PESAALYGRFT---IKSDVW PESLALYNKF---SIKSDVW PEALNYGRY----SSESDVW PEALNYGWY----SSESDVW PEVIRMQDDNPFSFSQSDVY PESLALYNKFSIKSDVW PEALNYGRY-SSESDVW PESLALYNKFSIKSDVW PEAL-NYGRYSSESDVW PESLALYNKF---SIKSDVW PEALNYGRY----SSESDVW

13 13 / 29 Optimal Pair vs. Implied Pair

14 14 / 29 Interior Node Classification Interior Nodes Classified by Percent Identity –PID = (# matched residues) / (# total residues) –User Specified Tiers –User Specified Cost Criterion Example: –PID > 60% -- PAM 40 – High Gap Penalties –PID > 40% -- PAM 120 – Medium Gap Penalties –PID < 40% -- PAM 200 – Low Gap Penalty

15 15 / 29 Ordering Alignments Isolate Sub Trees Threshold PID Order Alignments 1.Sub Tree 2.Border Nodes 3.Integrate All

16 16 / 29 Interior Alignments Sum of Pairs Bounded Search Implementation Modular Reentrant Flexible Cost Criterion

17 17 / 29 Generating Consensus Alignment (A1,A2,A3) Consensus X  Min (  D i (A i,X) ) For Each Position i: X i =   A1 X D1 D2 D3 A3 A2 Min (cost( , A1 i ) + cost( , A2 i ) + cost( , A3 i ))

18 18 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work

19 19 / 29 Testing the Method BAliBASE benchmark –“Correct” Alignments –Core Blocks of Conserved Motifs –Typical “Hard Problem” Sets Protein Parsimony –Measures “Evolutionary Steps” of Alignment

20 20 / 29 Baseline BAliBASE SP betterbetter

21 21 / 29 Baseline BAliBASE TC betterbetter

22 22 / 29 Baseline - ProtPars betterbetter

23 23 / 29 Orphans/Families BAliBASE SP betterbetter

24 24 / 29 Orphans/Families ProtPars betterbetter

25 25 / 29 Larger Families betterbetter

26 26 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work

27 27 / 29 Conclusions Solution Quality Captures Evolutionary Information Iterations Converge Quickly Useful Tool

28 28 / 29 Outline Problem Statement and Importance Terminology Current Approaches Our Alignment Heuristic Performance Results Conclusions Future Work

29 29 / 29 Future Work Improved Alignment Consensus Multiple Partitioning Thresholds Multiple Solutions Integrated Phylogeny Modifications Parallel Implementation


Download ppt "Multiple Sequence Alignment by Iterative Tree-Neighbor Alignments Susan Bibeault June 9, 2000."

Similar presentations


Ads by Google