Rapid Protein Side-Chain Packing via Tree Decomposition Jinbo Xu Department of Mathematics Computer Science and AI Lab MIT.

Slides:



Advertisements
Similar presentations
Minimum Vertex Cover in Rectangle Graphs
Advertisements

Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
Graph Isomorphism Algorithms and networks. Graph Isomorphism 2 Today Graph isomorphism: definition Complexity: isomorphism completeness The refinement.
Compact and Low Delay Routing Labeling Scheme for Unit Disk Graphs Chenyu Yan, Yang Xiang, and Feodor F. Dragan (WADS 2009) Kent State University, Kent,
Rooted Routing Using Structural Decompositions Jiao Tong University Shanghai, China June 17, 2013.
1 Steiner Tree on graphs of small treewidth Algorithms and Networks 2014/2015 Hans L. Bodlaender Johan M. M. van Rooij.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
CS774. Markov Random Field : Theory and Application Lecture 17 Kyomin Jung KAIST Nov
Optimization of Pearl’s Method of Conditioning and Greedy-Like Approximation Algorithm for the Vertex Feedback Set Problem Authors: Ann Becker and Dan.
1 University of Freiburg Computer Networks and Telematics Prof. Christian Schindelhauer Wireless Sensor Networks 21st Lecture Christian Schindelhauer.
Approximation Algorithms
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
1 Analyzing Kleinberg’s (and other) Small-world Models Chip Martel and Van Nguyen Computer Science Department; University of California at Davis.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.
Vertex Cover, Dominating set, Clique, Independent set
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
1 Separator Theorems for Planar Graphs Presented by Shira Zucker.
Distributed Combinatorial Optimization
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Protein Side Chain Packing Problem: A Maximum Edge-Weight Clique Algorithmic Approach Dukka Bahadur K.C, Tatsuya Akutsu and Tomokazu Seki Proceedings of.
1 Refined Search Tree Technique for Dominating Set on Planar Graphs Jochen Alber, Hongbing Fan, Michael R. Fellows, Henning Fernau, Rolf Niedermeier, Fran.
Approximation Algorithms
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Algorithms for Network Optimization Problems This handout: Minimum Spanning Tree Problem Approximation Algorithms Traveling Salesman Problem.
Primal-Dual Meets Local Search: Approximating MST’s with Non-uniform Degree Bounds Author: Jochen Könemann R. Ravi From CMU CS 3150 Presentation by Dan.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
Graph Coalition Structure Generation Maria Polukarov University of Southampton Joint work with Tom Voice and Nick Jennings HUJI, 25 th September 2011.
Efficient Gathering of Correlated Data in Sensor Networks
Network Aware Resource Allocation in Distributed Clouds.
Planning Near-Optimal Corridors amidst Obstacles Ron Wein Jur P. van den Berg (U. Utrecht) Dan Halperin Athens May 2006.
Kernel Bounds for Structural Parameterizations of Pathwidth Bart M. P. Jansen Joint work with Hans L. Bodlaender & Stefan Kratsch July 6th 2012, SWAT 2012,
Design Techniques for Approximation Algorithms and Approximation Classes.
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
1 Treewidth, partial k-tree and chordal graphs Delpensum INF 334 Institutt fo informatikk Pinar Heggernes Speaker:
Uib.no UNIVERSITY OF BERGEN A Near-Optimal Planarization Algorithm Bart M. P. Jansen Daniel Lokshtanov University of Bergen, Norway Saket Saurabh Institute.
Rotamer Packing Problem: The algorithms Hugo Willy 26 May 2010.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
Batch Scheduling of Conflicting Jobs Hadas Shachnai The Technion Based on joint papers with L. Epstein, M. M. Halldórsson and A. Levin.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
Register Placement for High- Performance Circuits M. Chiang, T. Okamoto and T. Yoshimura Waseda University, Japan DATE 2009.
Stabbing balls and simplifying proteins Ovidiu Daescu and Jun Luo Department of Computer Science University of Texas at Dallas Richardson, TX
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
CS270 Project Overview Maximum Planar Subgraph Danyel Fisher Jason Hong Greg Lawrence Jimmy Lin.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Computing Branchwidth via Efficient Triangulations and Blocks Authors: F.V. Fomin, F. Mazoit, I. Todinca Presented by: Elif Kolotoglu, ISE, Texas A&M University.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Solving and Analyzing Side-Chain Positioning Problems Using Linear and Integer Programming Carleton L. Kingsford, Bernard Chazelle and Mona Singh Bioinformatics.
Rapid Protein Side-Chain Packing via Tree Decomposition Jinbo Xu Toyota Technological Institute at Chicago.
Algorithms for hard problems Parameterized complexity Bounded tree width approaches Juris Viksna, 2015.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Approximation Algorithms Department of Mathematics and Computer Science Drexel University.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Approximation Algorithms based on linear programming.
Introduction Wireless Ad-Hoc Network  Set of transceivers communicating by radio.
Algorithms and networks
Exact Inference Continued
Bart M. P. Jansen June 3rd 2016, Algorithms for Optimization Problems
Boi Faltings and Martin Charles Golumbic
Introduction Wireless Ad-Hoc Network
Boi Faltings and Martin Charles Golumbic
Exact Inference Continued
Presentation transcript:

Rapid Protein Side-Chain Packing via Tree Decomposition Jinbo Xu Department of Mathematics Computer Science and AI Lab MIT

Outline Background Motivation Method Results

Protein Side-Chain Packing Problem: given the backbone coordinates of a protein, predict the coordinates of the side-chain atoms Insight: a protein structure is a geometric object with special features Method: decompose a protein structure into some very small blocks

Motivations of Structure Prediction Protein functions determined by 3D structures About 30,000 protein structures in PDB (Protein Data Bank) Experimental determination of protein structures time- consuming and expensive Many protein sequences available sequence protein structure function medicine

Protein Structure Prediction Stage 1: Backbone Prediction –Ab initio folding –Homology modeling –Protein threading Stage 2: Loop Modeling Stage 3: Side- Chain Packing Stage 4: Structure Refinement The picture is adapted from

Side-Chain Packing clash Each residue has many possible side-chain positions. Each possible position is called a rotamer. Need to avoid atomic clashes

Energy Function Minimize the energy function to obtain the best side-chain packing. Assume rotamer A(i) is assigned to residue i. The side-chain packing quality is measured by clash penalty occurring preference The higher the occurring probability, the smaller the value clash penalty : distance between two atoms :atom radii

Related Work NP-hard [Akutsu, 1997; Pierce et al., 2002] and NP- complete to achieve an approximation ratio O(N) [Chazelle et al, 2004] Dead-End Elimination: eliminate rotamers one-by-one SCWRL: biconnected decomposition of a protein structure [Dunbrack et al., 2003] –One of the most popular side-chain packing programs Linear integer programming [Althaus et al, 2000; Eriksson et al, 2001; Kingsford et al, 2004] Semidefinite programming [Chazelle et al, 2004 ]

Algorithm Overview Model the potential atomic clash relationship using a residue interaction graph Decompose a residue interaction graph into many small subgraphs Do side-chain packing to each subgraph almost independently

Residue Interaction Graph Each residue as a vertex Two residues interact if there is a potential clash between their rotamer atoms Add one edge between two residues that interact. Residue Interaction Graph a b c d f e m l k j i h s

Key Observations A residue interaction graph is a geometric neighborhood graph –Each rotamer is bounded to its backbone position by a constant distance –There is no interaction edge between two residues if their distance is beyond D. D is a constant depending on rotamer diameter. A residue interaction graph is sparse! –Any two residue centers cannot be too close. Their distance is at least a constant C. No previous algorithms exploit these features!

Tree Decomposition [Robertson & Seymour, 1986] h Greedy: minimum degree heuristic a b c d f e m l k j i g a c d f e m k j i h g abd l 1.Choose the vertex with minimal degree 2.The chosen vertex and its neighbors form a component 3.Add one edge to any two neighbors of the chosen vertex 4.Remove the chosen vertex 5.Repeat the above steps until the graph is empty

Tree Decomposition (Cont’d) Tree Decomposition Tree width is the maximal component size minus 1. a b c d f e m l k j i h g abd acd clk cdemdefm fg h eij abac clk c f fg h ij remove dem

Side-Chain Packing Algorithm 1.Bottom-to-Top: Calculate the minimal energy function 2. Top-to-Bottom: Extract the optimal assignment 3. Time complexity: exponential to tree width, linear to graph size The score of subtree rooted at X i The score of component X i The scores of subtree rooted at X j XrXr XpXp XiXi XjXj XlXl XqXq Xir X ji X li A tree decomposition rooted at X r The scores of subtree rooted at X l

For a general graph, it is NP-hard to determine its optimal treewidth. Has a treewidth –Can be found within a low-degree polynomial-time algorithm, based on Sphere Separator Theorem [G.L. Miller et al., 1997], a generalization of the Planar Separator Theorem Has a treewidth lower bound –The residue interaction graph is a cube –Each residue is a grid point Theoretical Treewidth Bounds

Empirical Component Size Distribution Tested on the 180 proteins used by SCWRL 3.0. Components with size ≤ 2 ignored.

Result (1) proteinsizeSCWRLSCATDspeedup 1gai a8i b0p bu xwl Five times faster on average, tested on 180 proteins used by SCWRL Same prediction accuracy as SCWRL 3.0 CPU time (seconds) Theoretical time complexity: << is the average number rotamers for each residue.

Accuracy A prediction is judged correct if its deviation from the experimental value is within 40 degree.

Has a PTAS if one of the following conditions is satisfied: –All the energy items are non-positive –All the pairwise energy items have the same sign, and the lowest system energy is away from 0 by a certain amount Result (2) An optimization problem admits a PTAS if given an error ε (0<ε<1), there is a polynomial-time algorithm to obtain a solution close to the optimal within a factor of (1±ε). Chazelle et al. have proved that it is NP-complete to approximate this problem within a factor of O(N), without considering the geometric characteristics of a protein structure.

Summary Give a novel tree-decomposition-based algorithm for protein side-chain prediction Exploit the geometric feature of a protein structure Efficient in practice Good accuracy Theoretical bound of time complexity Polynomial-time approximation scheme Available at

Acknowledgements Ming Li (Waterloo)Bonnie Berger (MIT)

Thank You

Tree Decomposition [Robertson & Seymour, 1986] Original Graph a b c d f e m l k j i h g c d f e m k j i h g abd ac d l Greedy: minimum degree heuristic a c d f e m k j i h g abd l

K-ply neighborhood system –A set of balls in three dimensional space –No point is within more than k balls Sphere separator theorem –If N balls form a k-ply system, then there is a sphere separator S such that –At most 4N/5 balls are totally inside S –At most 4N/5 balls are totally outside S –At most balls intersect S –S can be calculated in random linear time Sphere Separator Theorem [G.L. Miller et al, 1997]

Residue Interaction Graph Separator D Construct a ball with radius D/2 centered at each residue All the balls form a k-ply neighborhood system. k is a constant depending on D and C. All the residues in the green cycles form a balanced separator with size.

Each S i is a separator with size Each S i corresponds to a component –All the separators on a path from this S i to S 1 form a tree decomposition component. Separator-Based Decomposition S1S1 S2S2 S3S3 S6S6 S7S7 S4S4 S5S5 Height= S 10 S 11 S8S8 S9S9 S 12

A PTAS for Side-Chain Packing Partition the residue interaction graph to two parts and do side-chain assignment separately

A PTAS (Cont’d) To obtain a good solution –Cycle-shift the shadowed area by iD (i=1, 2, …, k-1) units to obtain k different partition schemes –At least one partition scheme can generate a good side-chain assignment

Tree Decomposition [Robertson & Seymour, 1986] Let G=(V,E) be a graph. A tree decomposition (T, X) satisfies the following conditions. –T=(I, F) is a tree with node set I and edge set F –Each element in X is a subset of V and is also a component in the tree decomposition. Union of all elements is equal to V. –There is an one-to-one mapping between I and X –For any edge (v,w) in E, there is at least one X(i) in X such that v and w are in X(i) –In tree T, if node j is a node on the path from i to k, then the intersection between X(i) and X(k) is a subset of X(j) Tree width is defined to be the maximal component size minus 1