Catherine S. Grasso Christopher J. Lee Multiple Sequence Alignment Construction, Visualization, and Analysis Using Partial Order Graphs.

Slides:



Advertisements
Similar presentations
Multiple Sequence Alignment (MSA) I519 Introduction to Bioinformatics, Fall 2012.
Advertisements

Multiple Sequence Alignment
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
COFFEE: an objective function for multiple sequence alignments
Multiple alignment: heuristics. Consider aligning the following 4 protein sequences S1 = AQPILLLV S2 = ALRLL S3 = AKILLL S4 = CPPVLILV Next consider the.
Introduction to Bioinformatics Algorithms Multiple Alignment.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Introduction to Bioinformatics Algorithms Multiple Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
Multiple sequence alignments and motif discovery Tutorial 5.
Multiple alignment: heuristics
Multiple sequence alignment
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Phylogenetic Shadowing Daniel L. Ong. March 9, 2005RUGS, UC Berkeley2 Abstract The human genome contains about 3 billion base pairs! Algorithms to analyze.
Protein Multiple Sequence Alignment Sarah Aerni CS374 December 7, 2006.
Multiple Sequence Alignments
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.
Deepak Verghese CS 6890 Gene Finding With A Hidden Markov model Of Genomic Structure and Evolution. Jakob Skou Pedersen and Jotun Hein.
Introduction to Bioinformatics Algorithms Multiple Alignment.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Introduction to Bioinformatics Algorithms Multiple Alignment.
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Developing Pairwise Sequence Alignment Algorithms
Multiple Alignment Modified from Tolga Can’s lecture notes (METU)
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Multiple sequence alignment
Biology 4900 Biocomputing.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Using the T-Coffee Multiple Sequence Alignment Package I - Overview Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Using Traveling Salesman Problem Algorithms to Determine Multiple Sequence Alignment Orders Weiwei Zhong.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Input Sensitive Algorithms for Multiple Sequence Alignment Pankaj Yonatan University Rachel
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight.
Multiple sequence alignment
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Motif discovery and Protein Databases Tutorial 5.
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
Introduction to Bioinformatics Algorithms Multiple Alignment Lecture 20.
Sequence Alignment.
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Pairwise Sequence Alignment. Three modifications for local alignment The scoring system uses negative scores for mismatches The minimum score for.
The ideal approach is simultaneous alignment and tree estimation.
Sequence comparison: Dynamic programming
Intro to Alignment Algorithms: Global and Local
In Bioinformatics use a computational method - Dynamic Programming.
CIS595: Lecture 5 Acknowledgement:
Sequence Alignment Algorithms Morten Nielsen BioSys, DTU
MULTIPLE SEQUENCE ALIGNMENT
Clustering.
Presentation transcript:

Catherine S. Grasso Christopher J. Lee Multiple Sequence Alignment Construction, Visualization, and Analysis Using Partial Order Graphs

Overview of Talk Intro to Partial Order Multiple Sequence Alignment Representation Multiple Sequence Alignment Construction Using Partial Order Graphs Conclusions

Q: Why Do Multiple Sequence Alignment? A: To model the process which constructed a set of sequences from a common source sequence.

A multiple sequence alignment allows biologists to infer: Protein Structure Protein Function Protein Domains Protein Active Sites Splice Sites Regulatory Motifs Single Nucleotide Polymorphisms mRNA Isoforms For example, protein sequences that are >30% identical often have the same structure and function.

Row Column Multiple Sequence Alignment RC-MSA

While it is easy to interpret single residue changes in this format, Large scale changes are not easy to interpret. RC-MSA Representation Does Not Reveal Large Scale Features

The Scale of Features of Interest Should Inform MSA Representation Features from single residue changes can be easily seen in RC-MSA Representation:  Regulatory Motifs  Single Nucleotide Polymorphisms  Promoter Binding Sites Features from large scale changes cannot:  Protein Domain Differences  Alternative Splicing  Genome Duplications

A:A’:.....ACATGTCGAT.....AGGTG TGCAC.....TCGATACATAAGGTG ACATG.....TCGAT.....AGGTG.....TGCACTCGATACATAAGGTG Degeneracy of RC-MSA Representation Alignment A is biologically equivalent to alignment A’. However, they look different solely due to representation degeneracy. We’d like a representation that is not degenerate.

What do we really want to know about an MSA? 1.The order of letters within a sequence. 5’ to 3’ or N-terminal to C-terminal. 2.Which letters are aligned between sequences. One sequence can impose its order on another sequence only through alignment. What do we really want to do with an MSA? We want to use it as an object in multiple sequence alignment method. We want to analyze it for biologically interesting features.

Partial Order Multiple Sequence Alignment PO-MSA Conventional Format Draw each sequence as a directed graph: node for each letter, connect by directed edges Fuse aligned, identical letters (RC-MSA) (PO-MSA)

Returning to the previous example….....ACATGTCGAT.....AGGTG TGCAC.....TCGATACATAAGGTG ACATG.....TCGAT.....AGGTG.....TGCACTCGATACATAAGGTG In the PO-MSA format, both A and A’ Can be represented as A:A’: ACATG TCGAT ACAT AGGTG TGCACA

RC-MSA in Text Format Hand Rendered PO-MSA Showing Domain Structure POAVIZ Rendered PO-MSA Reflects Domain Structure Real Example: Human SH2 Domain Containing Proteins

We want to analyze it for biologically interesting features. We want to use it as an object for building multiple sequence alignments. What do we really want to do with an MSA?

Multiple Sequence Alignment Construction Using PO-MSAs

Pair-wise Sequence Alignment Using Dynamic Programming Finding a PSA = Finding a path through a 2-Dim matrix. It’s O(L 2 ), where L is the sequence length.

Multiple Sequence Alignment Using Dynamic Programming Finding an MSA = Finding a path through an N-Dim matrix. It’s O(L N ), where N is the number of sequences and L is the sequence length. Note: More than 5 sequences takes a prohibitive amount of time. Heuristic methods, such as those used by CLUSTAL W, are used instead.

Progressive Alignment (CLUSTAL W) Approach 1. Compute pairwise distances of all N sequences. 2. Build Guide Tree seqB seqD seqC seqA seqE seqF ACDBEF 3. Align N sequences using guide tree. a. Use standard PSA to align leaf sequence. b. Profile multiple sequence alignments at branch nodes. c. Use standard PSA on profiles. d. Recurse.

Pair-wise Sequence Alignment of Leaf Nodes V. Branch Nodes PSA of sequences at leaf nodes: Requires a scoring function which can score a match between residues. PSA of profiles at branch nodes: Requires a scoring function which can score a match between profiles of columns of residues and gaps.

Problem with Aligning Profiles: Gap Artifacts! TGCACTCGATACATAAGGTG Alignment A is biologically equivalent to alignment A’. If we try to align another sequence which is identical to the second sequence in the alignment… We find that Score(S,A) not equal to Score(S,A’), but it should be. A:A’: S:.....ACATGTCGAT.....AGGTG TGCAC.....TCGATACATAAGGTG ACATG.....TCGAT.....AGGTG.....TGCACTCGATACATAAGGTG

In doing pair-wise sequence alignment on RC-MSA profiles: Each column is treated in isolation. But interpreting what’s a true gap requires looking outside of column. We can try to solve this problem by adjusting the scoring process. This results in a non-local scoring function, which violates dynamic programming.

We can instead replace the profile RC-MSA representation with the PO-MSA representation......ACATGTCGAT.....AGGTG TGCAC.....TCGATACATAAGGTG ACATG.....TCGAT.....AGGTG.....TGCACTCGATACATAAGGTG In the PO-MSA representation, both A and A’ Can be represented as A:A’: ACATG TCGAT ACAT AGGTG TGCACA

We can align S to A using Sequence to PO- MSA alignment algorithm. ACATG TCGAT ACAT AGGTG TGCACA TCGATACATAGGTGTGCACA A: S:

Partial Order Alignment of a Sequence to an Alignment. Sequence to PO-MSA Alignment Algorithm Conventional Alignment of Two Sequences

Considering all predecessor nodes that have a directed edge from p  n. Note: MATCH and INSERT moves may have more than one incoming edge p. Simply extend dynamic programming move set to include partial order moves: at each position (n,m) in the matrix, choose best move by: Sequence to PO-MSA Alignment Algorithm Requires a Simple Extension of Sequence to Sequence Alignment Algorithm... pNpN... p1p1 p2p2 p3p3 q nm

Recall Progressive Multiple Sequence Alignment with Profile Intermediates

Progressive Multiple Sequence Alignment with PO- MSA Intermediates Requires PO-MSA to PO-MSA Alignment Algorithm

Considering all predecessor nodes that have a directed edge from p  n and q  m. Note: MATCH and INSERT moves may have more than one incoming edge p or q. Simply extend dynamic programming move set to include partial order moves: at each position (n,m) in the matrix, choose best move by:... pNpN p1p1 p2p2 p3p3 q1q1 q2q2 q3q3 qMqM nm PO-MSA to PO-MSA Alignment Algorithm Requires a Simple Extension of Sequence to PO-MSA Alignment Algorithm

PO-MSA to PO-MSA Alignment Algorithm Finds Best Linear Match Between the PO-MSAs Can be extended heuristically to find best match

Thesis Work Developed partial order alignment visualizer Combined partial order alignment and progressive alignment Applied POA to detect alternative splicing events in expressed sequence data Formalized relationship between PO- MSAs and HMMs

Acknowledgements I’d like to thank:  Chris Lee for all of his guidance and support.  Michael Quist for all of his help with this project.  Barmak Modrek with whom I worked on annotating alternative splicing using PO-MSAs and POA.  Everyone in the Lee Lab for hours of helpful discussion.  DOE CSGF for supporting the work. To use or download POA or POAVIZ go to: