Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.

Slides:



Advertisements
Similar presentations
Improved Algorithms for Inferring the Minimum Mosaic of a Set of Recombinants Yufeng Wu and Dan Gusfield UC Davis CPM 2007.
Advertisements

Traveling Salesperson Problem
CS6800 Advanced Theory of Computation
algorithms and data structures
1.2 Row Reduction and Echelon Forms
Linear Equations in Linear Algebra
Shellsort. Review: Insertion sort The outer loop of insertion sort is: for (outer = 1; outer < a.length; outer++) {...} The invariant is that all the.
. Class 4: Fast Sequence Alignment. Alignment in Real Life u One of the major uses of alignments is to find sequences in a “database” u Such collections.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Multiple sequence alignment
Cluster Analysis (1).
Sequence Alignment II CIS 667 Spring Optimal Alignments So we know how to compute the similarity between two sequences  How do we construct an.
Multiple Sequence alignment Chitta Baral Arizona State University.
Geometric Crossovers for Supervised Motif Discovery Rolv Seehuus NTNU.
Identification of Domains using Structural Data Niranjan Nagarajan Department of Computer Science Cornell University.
Structure Alignment in Polynomial Time Rachel Kolodny Stanford University Nati Linial The Hebrew University of Jerusalem.
Carmine Cerrone, Raffaele Cerulli, Bruce Golden GO IX Sirmione, Italy July
Supplementary material Figure S1. Cumulative histogram of the fitness of the pairwise alignments of random generated ESSs. In order to assess the statistical.
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Or, What is a correspondence set anyway?! Topic 12 Chapter 16, Du and Bourne “Structural Bioinformatics”
Brandon Andrews.  What are genetic algorithms?  3 steps  Applications to Bioinformatics.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.
Motif Discovery in Protein Sequences using Messy De Bruijn Graph Mehmet Dalkilic and Rupali Patwardhan.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Structural alignments of Proteins using by TOPOFIT method Vitkup D., Melamud E., Moult J., Sander C. Completeness in structural genomics. Nature Struct.
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.
1 Randomized Algorithms for Three Dimensional Protein Structures Comparison Yaw-Ling Lin Dept Computer Sci and Info Engineering, Providence University,
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Fault-containment in Weakly Stabilizing Systems Anurag Dasgupta Sukumar Ghosh Xin Xiao University of Iowa.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Bill Payment Optimization Algorithms. Purpose To find and/or construct algorithms that will optimize the decision process of paying bills from an account.
Construction of Substitution Matrices
Using Traveling Salesman Problem Algorithms to Determine Multiple Sequence Alignment Orders Weiwei Zhong.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Data Structures and Algorithms Lecture 1 Instructor: Quratulain Date: 1 st Sep, 2009.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Lectures on Greedy Algorithms and Dynamic Programming
Pharm 201 Lecture 10, Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne.
BLAST, which stands for basic local alignment search tool, is a heuristic algorithm that is used to find similar sequences of amino acids or nucleotides.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
Searching for Solutions
October 1, 2013Computer Vision Lecture 9: From Edges to Contours 1 Canny Edge Detector However, usually there will still be noise in the array E[i, j],
Construction of Substitution matrices
Pairwise Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 4, 2004 ChengXiang Zhai Department of Computer Science University.
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
Dynamic programming with more complex models When gaps do occur, they are often longer than one residue.(biology) We can still use all the dynamic programming.
THE “COLLEGES I AM THINKING ABOUT” LIST IN YOUR FAMILY CONNECTIONS ACCOUNT.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
Lecture 1 INTRODUCTION TO ALGORITHMS Professor Uday Reddy
Curve Simplification under the L 2 -Norm Ben Berg Advisor: Pankaj Agarwal Mentor: Swaminathan Sankararaman.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
1 1.2 Linear Equations in Linear Algebra Row Reduction and Echelon Forms © 2016 Pearson Education, Ltd.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Learning to Align: a Statistical Approach
Piecewise linear gap alignment.
CS330 Discussion 4 Spring 2017.
Hidden Markov Models Part 2: Algorithms
Linear Equations in Linear Algebra
Basic Local Alignment Search Tool (BLAST)
Computational Genomics Lecture #3a
Closures of Relations Epp, section 10.1,10.2 CS 202.
Linear Equations in Linear Algebra
Presentation transcript:

Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne

Why Align Structures? Additional measure of protein similarity Structure generally preserved better than sequence over the course of evolution May help in protein fold identification Interesting combinatorial problem

The Structural Alignment Problem We know how to optimally superimpose two proteins of the same length so as to minimize RMSD (Hendrickson, 1979) However, no obvious way to compare objects of different length, or to optimally add or remove gaps Heuristic methods for structural alignment are the best we can do at the moment

Alignment Fragment Pairs  For a pair of proteins A and B, an alignment fragment pair (AFP) is defined as a continuous segment of A aligned against a continuous segment of B of the same size (without gaps).  If n 1 and n 2 are the lengths of A and B, and AFP length is set to m, then there is a total of (n 1  m)  (n 2  m) AFPs.

Defining an Alignment An alignment is defined as a continuous path of AFPs of fixed length m s.t. for every two consecutive AFPs there may be gaps inserted into either A or B, but not into both. That is, for every two consecutive AFPs i and i+1, we have 1) and or 2) and or 3) and Where p i A represents the starting position of AFP i in protein A

The CE Algorithm Goal: Find a “good” local alignment for structures of proteins A and B. Basic idea: 1.Select some initial AFP. 2.Build an alignment path by incrementally adding AFPs in a way that satisfies the conditions on the previous slide. 3.Repeat step (2) until the length of each protein is traversed, or until no “good” AFPs remain.

Algorithm Specifics How do we choose the starting AFP? What are the criteria for adding AFPs to our alignment path? How do we know when to stop? That is, at what point do we know that there no “good” AFPs left? There are various heuristics that could be used to supply answers to the above questions.

Sample Heuristics: AFP Distances We can define the distance between two different AFPs i and j as: Here, d A (p,q) represents the distance between the alpha carbon atoms at positions p and q in protein A. Setting i=j, and using the same formula, we can define the distance D ii between two fragments of the same AFP.

Sample Heuristics: Extending the Alignment Path Suppose our alignment path already consists of AFPs 0…n  1, and we are trying to decide whether to add AFP n to the path. We will do so only if: (4)

Extending Alignment Path (Cont) Where: D 0 and D 1 are specified cut-off distances. The decision whether AFP n is “fit” is based on 4. The decision whether AFP n “works” with all the other alignments in the path is based on the 5. The decision whether we should extend the alignment path at all is based on 6.

Alignment Assessment and Post-alignment Optimization To assess how good the alignment produced by CE is, we can compare it to the alignment of a random pair of structures, and compute the Z- score based on the RMSD distance and number of gaps in the final alignment. Since CE does not penalize gaps, we can perform additional optimization after the CE is completed in order to remove excess gaps using dynamic programming.

Results and Conclusion The CE method is highly configurable, which is at once its strength and weakness. Adjusting multiple parameters, such as AFP length m, cutoff distances D 0 and D 1, and definitions for AFP distances, can result varying alignments and execution speeds.

Results and Conclusion In general, CE does not outperform previously existing structural alignment methods, such as Dali and VAST: it does better for some pairs of structures, and worse for others. Since it is fairly straightforward and easy to implement, CE provides an interesting addition to the toolbox of structural alignment algorithms.