FAST: A Novel Protein Structure Alignment Algorithm Jianhua Zhu and Zhiping Weng PROTEINS: Structure, Function, and Bioinformatics 58:618–627 (2005) Created.

Slides:



Advertisements
Similar presentations
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
Advertisements

Graph Isomorphism Algorithms and networks. Graph Isomorphism 2 Today Graph isomorphism: definition Complexity: isomorphism completeness The refinement.
Introduction to Bioinformatics Algorithms Divide & Conquer Algorithms.
R. Johnsonbaugh Discrete Mathematics 5 th edition, 2001 Chapter 8 Network models.
Activity relationship analysis
Combinatorial Algorithms
Object Detection by Matching Longin Jan Latecki. Contour-based object detection Database shapes: …..
Sequence Similarity Searching Class 4 March 2010.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date:
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Multiple alignment: heuristics
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Similar Sequence Similar Function Charles Yan Spring 2006.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Dali: A Protein Structural Comparison Algorithm Using 2D Distance Matrices.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Protein Side Chain Packing Problem: A Maximum Edge-Weight Clique Algorithmic Approach Dukka Bahadur K.C, Tatsuya Akutsu and Tomokazu Seki Proceedings of.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
TECH Computer Science Graph Optimization Problems and Greedy Algorithms Greedy Algorithms  // Make the best choice now! Optimization Problems  Minimizing.
Multilevel Hypergraph Partitioning G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar Computer Science Department, U of MN Applications in VLSI Domain.
Developing Pairwise Sequence Alignment Algorithms
Gene expression & Clustering (Chapter 10)
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.
Graph Partitioning and Clustering E={w ij } Set of weighted edges indicating pair-wise similarity between points Similarity Graph.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Chapter 3 Computational Molecular Biology Michael Smith
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
DALI Method Distance mAtrix aLIgnment
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
CS774. Markov Random Field : Theory and Application Lecture 02
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
1/24 Introduction to Graphs. 2/24 Graph Definition Graph : consists of vertices and edges. Each edge must start and end at a vertex. Graph G = (V, E)
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Pair-wise Structural Comparison using DALILite Software of DALI Rajalekshmy Usha.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
NP-completeness NP-complete problems. Homework Vertex Cover Instance. A graph G and an integer k. Question. Is there a vertex cover of cardinality k?
Dynamic programming with more complex models When gaps do occur, they are often longer than one residue.(biology) We can still use all the dynamic programming.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Sequence Alignment. Assignment Read Lesk, Problem: Given two sequences R and S of length n, how many alignments of R and S are possible? If you.
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
April 21, 2016Introduction to Artificial Intelligence Lecture 22: Computer Vision II 1 Canny Edge Detector The Canny edge detector is a good approximation.
Cohesive Subgraph Computation over Large Graphs
An introduction to chordal graphs and clique trees
Connected Components Minimum Spanning Tree
Radio Propagation Simulation Based on Automatic 3D Environment Reconstruction D. He A novel method to simulate radio propagation is presented. The method.
Intro to Alignment Algorithms: Global and Local
A Fundamental Bi-partition Algorithm of Kernighan-Lin
Graphs G = (V, E) V are the vertices; E are the edges.
Presentation transcript:

FAST: A Novel Protein Structure Alignment Algorithm Jianhua Zhu and Zhiping Weng PROTEINS: Structure, Function, and Bioinformatics 58:618–627 (2005) Created by Yu-Chieh Lo Date:

ABSTRACT We present a novel algorithm named FAST for aligning protein three-dimensional structures. FAST uses a directionality-based scoring scheme to compare the intra- molecular residue– residue relationships in two structures. It employs an elimination heuristic to promote sparseness in the residue-pair graph and facilitate the detection of the global optimum. In order to test the overall accuracy of FAST, we determined its sensitivity and specificity with the SCOP classification (version 1.61) as the gold standard. FAST achieved higher sensitivities than several existing methods (DaliLite, CE, and K2) at all specificity levels. We also tested FAST against 1033 manually curated alignments in the HOMSTRAD database. The overall agreement was 96%. Close inspection of examples from broad structural classes indicated the high quality of FAST alignments. Moreover, FAST is an order of magnitude faster than other algorithms that attempt to establish residue–residue correspondence.

METHODS FAST comprises four steps 1.We compare the local geometric properties of the two proteins and select a small subset of MN pairs as the vertex set to construct G(V,E). 2.We assign edges by comparing intra-molecular relationships, using a directionality-based scoring scheme that promotes sparseness of the graph. 3.We iteratively prune the graph to eliminate “bad vertices,” which are residue pairs that are unlikely to constitute the global optimal alignment, offering the correct alignment a better chance to survive. With the substantially simplified product graph, an initial alignment is easily detected using dynamic programming. 4.We fine-tune the initial alignment by finding additional equivalent pairs and eliminating bad pairs.

Step 1: Local Geometric Comparison L ij denotes the similarity between a segment centered around residue i of protein A and a segment centered around residue j in protein B

Step 2: Scoring Scheme for Edge Computation The edge in G(V,E) connecting two vertices (i,j) and (m,n) is assigned the following weight

Step 3: Further Pruning and Initial Alignment Step 4: Alignment Refinement We would expect T to be high if (i,j) is contained in the optimal alignment Three empirical rules are used to define bad pairs (a) A vertex receiving a low T score is eliminated. (b) If the T score of a vertex is due to scattered contributors that do not form stretches, the vertex is eliminated. (c) If the two residues of a vertex are isolated In order to measure the extent to which the graph is close to a clique, we define the degree of unanimity of G(V,E) as the number of edges with positive weights divided by the total number of possible edges. We expect the degree of unanimity to increase as we iteratively eliminate more bad vertices.