Presentation is loading. Please wait.

Presentation is loading. Please wait.

J AMES A. FOSTER And Luke Sheneman 1 October 2008 I NITIATIVE FOR B IOINFORMATICS AND E VOLUTIONARY S TUDIES (IBEST) Guide Trees and Progressive Multiple.

Similar presentations


Presentation on theme: "J AMES A. FOSTER And Luke Sheneman 1 October 2008 I NITIATIVE FOR B IOINFORMATICS AND E VOLUTIONARY S TUDIES (IBEST) Guide Trees and Progressive Multiple."— Presentation transcript:

1 J AMES A. FOSTER And Luke Sheneman 1 October 2008 I NITIATIVE FOR B IOINFORMATICS AND E VOLUTIONARY S TUDIES (IBEST) Guide Trees and Progressive Multiple Sequence Alignment

2 Multiple Sequence Alignment Abstract representation of sequence homology Homologous molecular characters (nucleotides/residues) organized in columns Gaps (-) represent sequence indels

3 Multiple Sequence Alignment Many bioinformatics analyses depend on MSA. First step in inferring phylogenetic trees  MSA technique is at least as important as inference method and model parameters (Morrison & Ellis, 1997) Structural and functional sequence analyses

4 Progressive Alignment Idea: align “closely related” sequences first, two at a time with “optimal” subalignments (dynamic programming) Problem: once a gap, always a gap Advantage: fast

5 Guide Trees and Alignment Quality How important is it to find “good” guide trees? How much time should be spent looking for “better” guide trees?

6 Hypothesis Guide trees that are closer to the true phylogeny lead to better sequence alignments  Guide trees that are further from the true tree produce less accurate alignments.  The effect is measurable.  The correlation is significant.

7 Previous Work Folk wisdom, intuition: it matters, a lot  Basis for Clustal, and most other pMSA implementations Nelesen et al. (PSB ’08): doesn’t matter, much  No strong correlation  No large effect Edgar (2004): bad trees are sometimes better  UPGMA guide trees ultrametric but outperform NJ

8 Experimental Design: strategy For both natural data and simulation data, with reliable alignments and phylogenies: Explore the space of possible guide trees, moving outward from the “true tree”  Use each tree as a guide tree, perform pMSA  Compare quality of resulting alignment with known optimal value

9 Experimental Design: Naturally Evolved Case

10 Experimental Design: Degrading Guide Trees Random Nearest Neighbor Interchange (NNI)  Swaps two neighboring internal branches Random Tree Bisect/Reconnect (TBR) Randomly bisect tree Randomly reconnect two trees Images: hyphy.org

11 TreeBASE (“natural”) Input Datasets

12 Experimental Design: Simulated Evolution Case

13

14

15

16

17

18 Conclusions Statistically significant correlation between guide tree quality and alignment quality  Independent of tree transformation operator  Independent of alignment distance metric But very small absolute change in quality Non-linear / logarithmic  Largest alignment quality effect 5-10 steps from phylogeny The lesson: it helps to improve a really good guide tree, otherwise it helps but only a little

19 Acknowledgements  Dr. Luke Sheneman (mostly his slides!)  Faculty, staff, and students of BCB  Jason Evans  Darin Rokyta  Funding sources:  NIH P20 RR16454  NIH NCRR 1P20 RR16448  NSF EPS 00809035

20 Experimental Design: metrics  =pmsa(S, T)  where S is the set of input sequences  where T is the guide tree  (hidden parameters: pairwise algorithm, tie breaking strategy) A Q = CompareAlignments(A*, Â)  QSCORE (A*, Â) -> TC-error, SP-error  Nelesen had a nicer metric: error of estimated phylogeny T dist = TreeDistance(T*, T)  Upper bound estimate of edit distance via NNI or TBR

21 Alternative Scoring metric Idea: “quality” of an alignment is distance from the phylogeny it produces to the “true” phylogeny A Q = KTreeDist(ML_est(A*),ML_est( Â))  ML_est(A): max likelihood estimate of the phylogeny behind MSA A (we used RAXML)  KTreeDist(T1,T2): scales T2 to T2, measures Branch Length Distance (Sorio-Kurasko et al. 07; Kuhner & Felsenstein 94) Data sets: from L1 sequences in mammals, bats, humans, hand aligned A*

22 All methods pretty are good


Download ppt "J AMES A. FOSTER And Luke Sheneman 1 October 2008 I NITIATIVE FOR B IOINFORMATICS AND E VOLUTIONARY S TUDIES (IBEST) Guide Trees and Progressive Multiple."

Similar presentations


Ads by Google