Algorithms Exploiting the Chain Structure of Proteins Itay Lotan Computer Science.

Slides:



Advertisements
Similar presentations
Kinematic Synthesis of Robotic Manipulators from Task Descriptions June 2003 By: Tarek Sobh, Daniel Toundykov.
Advertisements

Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Forward and Inverse Kinematics CSE 3541 Matt Boggus.
4/15/2017 Using Gaussian Process Regression for Efficient Motion Planning in Environments with Deformable Objects Barbara Frank, Cyrill Stachniss, Nichola.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Geometric Algorithms for Conformational Analysis of Long Protein Loops J. Cortess, T. Simeon, M. Remaud- Simeon, V. Tran.
Algorithmic Robotics and Motion Planning Dan Halperin Tel Aviv University Fall 2006/7 Dynamic Maintenance and Self-Collision Testing for Large Kinematic.
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
With thanks to Zhijun Wu An introduction to the algorithmic problems of Distance Geometry.
Bio-CS Exploration of Molecular Conformational Spaces Jean-Claude Latombe Computer Science Department Robotics Laboratory & Bio-X Clark Center.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
Taking a Numeric Path Idan Szpektor. The Input A partial description of a molecule: The atoms The bonds The bonds lengths and angles Spatial constraints.
“Inverse Kinematics” The Loop Closure Problem in Biology Barak Raveh Dan Halperin Course in Structural Bioinformatics Spring 2006.
Computational Geometry, Algorithmic Robotics, and Molecular Modeling Dan Halperin School of Computer Science Tel Aviv University June 2007.
Graphical Models for Protein Kinetics Nina Singhal CS374 Presentation Nov. 1, 2005.
ChainTree: A tale of two hierarchies  Transform hierarchy: approximates kinematics of protein backbone at successive resolutions  Bounding volume hierarchy:
Thomas Blicher Center for Biological Sequence Analysis
Protein Primer. Outline n Protein representations n Structure of Proteins Structure of Proteins –Primary: amino acid sequence –Secondary:  -helices &
Robotics Algorithms for the Study of Protein Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University.
Dynamic Maintenance and Self Collision Testing for Large Kinematic Chains Lotan, Schwarzer, Halperin, Latombe.
Protein Structure Space Patrice Koehl Computer Science and Genome Center
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Algorithm for Fast MC Simulation of Proteins Itay Lotan Fabian Schwarzer Dan Halperin Jean-Claude Latombe.
Efficient Nearest-Neighbor Search in Large Sets of Protein Conformations Fabian Schwarzer Itay Lotan.
Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University.
Stochastic roadmap simulation for the study of ligand-protein interactions Mehmet Serkan Apaydin, Carlos E. Guestrin, Chris Varma, Douglas L. Brutlag and.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University NSF-ITR Meeting on November 14, 2002.
Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion Mehmet Serkan Apaydin, Douglas L. Brutlag, Carlos.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Computing Protein Structures from Electron Density Maps: The Missing Fragment Problem Itay Lotan † Henry van den Bedem* Ashley M. Deacon* Jean-Claude Latombe.
Algorithmic Robotics and Molecular Modeling Dan Halperin School of Computer Science Tel Aviv University June 2007.
FLANN Fast Library for Approximate Nearest Neighbors
 The factorial function (n!)  Permutations  Combinations.
Efficient Maintenance and Self-Collision Testing for Kinematic Chains Itay Lotan Fabian Schwarzer Dan Halperin Jean-Claude Latombe.
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.
Computer Animation Rick Parent Computer Animation Algorithms and Techniques Kinematic Linkages.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
02/03/10 CSCE 769 Dihedral Angles Homayoun Valafar Department of Computer Science and Engineering, USC.
Efficient Maintenance and Self- Collision Testing for Kinematic Chains Itay Lotan Fabian Schwarzer Dan Halperin Jean-Claude Latombe.
Inverse Kinematics. Inverse Kinematics (IK) T q1q1 q2q2 q3q3 q4q4 q5q5 Given a kinematic chain (serial linkage), the position/orientation of one end relative.
Rotamer Packing Problem: The algorithms Hugo Willy 26 May 2010.
Protein Design CS273: Final Project Charles Kou Crystal structure of top7 – A novel protein structure created with RosettaDesign.
Computing Missing Loops in Automatically Resolved X-Ray Structures Itay Lotan Henry van den Bedem (SSRL)
Approximation of Protein Structure for Fast Similarity Measures Fabian Schwarzer Itay Lotan Stanford University.
Conformational Space.  Conformation of a molecule: specification of the relative positions of all atoms in 3D-space,  Typical parameterizations:  List.
A Technical Introduction to the MD-OPEP Simulation Tools
Structure prediction: Homology modeling
1 Energy Maintenance for Molecular Simulation kinematics + energy  motion + structure Main computational issue: Proximity computation.
Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++
Modeling Protein Flexibility with Spatial and Energetic Constraints Yi-Chieh Wu 1, Amarda Shehu 2, Lydia Kavraki 2,3  Provided an approach to generating.
Conformational Space of a Flexible Protein Loop Jean-Claude Latombe Computer Science Department Stanford University (Joint work with Ankur Dhanik 1, Guanfeng.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Monte Carlo Simulation of Folding Processes for 2D Linkages Modeling Proteins with Off-Grid HP-Chains Ileana Streinu Smith College Leo Guibas Rachel Kolodny.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Ab-initio protein structure prediction ? Chen Keasar BGU Any educational usage of these slides is welcomed. Please acknowledge.
Zhijun Wu Department of Mathematics Program on Bio-Informatics and Computational Biology Iowa State University Joint Work with Tauqir Bibi, Feng Cui, Qunfeng.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
K -Nearest-Neighbors Problem. cRMSD  cRMSD(c,c ’ ) is the minimized RMSD between the two sets of atom centers: min T [(1/n)  i=1, …,n ||a i (c) – T(a.
The graph is neither Eulerian or Semi – Eulerian as it has 4 ODD vertices.
Character Animation Forward and Inverse Kinematics
Computational Structure Prediction
Reduce the need for human intervention in protein model building
Itay Lotan† Henry van den Bedem* Ashley M. Deacon*
Itay Lotan† Henry van den Bedem* Ashley M. Deacon*
Efficient Energy Computation for Monte Carlo Simulation of Proteins
Conformational Search
Presentation transcript:

Algorithms Exploiting the Chain Structure of Proteins Itay Lotan Computer Science

Proteins 101 Involved in all functions of our body: metabolism, motion, defense, etc. Michael Levitt

Protein representation  Torsion angle model:  Cα model:

Structure determination Bernhard Rupp X-ray crystallography

Outline 1.Fast energy computation during Monte Carlo simulation 2.Model completion for protein X-ray crystallography 3.Large scale computation of similarity Exploit specific properties of proteins to perform the computation efficiently

Outline 1.Fast energy computation during Monte Carlo simulation 2.Model completion for protein X-ray crystallography 3.Large scale computation of similarity Lotan, Schwarzer, Halperin* and Latombe. J. Comput. Bio (to appear) * CS Department, Tel-Aviv University

Monte Carlo simulation (MCS)  Estimate thermodynamic quantities  Search for low-energy conformations and the folded structure Popular method for sampling the conformation space of proteins:

MCS: How it works 2.Compute energy E of new conformation 3.Accept with probability: Requires >>10 6 steps to sample adequately 1.Propose random change in conformation

 Bonded terms: Bond lengths: Bond angles: Dihedral angles:  Non-bonded terms: Van der Waals: Electrostatic: Heuristic: Go models, HP models, etc. Energy function

Pair-wise interactions  Cutoff distance (6 - 12Å)  Linear number of interactions contribute to energy (Halperin & Overmars ’ 98) Challenge: Find all interacting pairs without enumerating all pairs

Related work Computer Science  Bounding volume hierarchies for collision detection Gotschalk et al. ’96 Larsen et al. ’00 Guibas et al. ’02  Space partition methods for collision detection Faverjon ’84 Halperin & Overmars ’98  Collisions detection for chains Halperin et al. ’97 Guibas et al. ’02 Biology  Neighbor lists Verlet ’67 Brooks et al. ’83  Grid Quentrec & Brot ’73 Hockney et al. ’74 Van Gunsteren et al. ’84  Neighbor lists + grid Yip & Elber ’89 Petrella ’02

Grid method d : Cutoff distance  Linear complexity  Optimal in worst case

Contributions  Efficient maintenance and self-collision detection for kinematic chains  Efficient computation of pair-wise interactions in MCS of proteins  Scheme for caching and reusing partial energy sums during MCS  MCS software* Much faster than existing algorithm (grid method) *Download at:

Properties of kinematic chains  Small changes  large effects

Properties of kinematic chains  Small changes  large effects

Properties of kinematic chains  Small changes  large effects  Local changes  global effects

Properties of kinematic chains  Small changes  large effects  Local changes  global effects  Few DoF changes  long rigid sub- chains

Properties of kinematic chains  Small changes  large effects  Local changes  global effects  Few DoF changes  long rigid sub- chains

ChainTree: A tale of two hierarchies  Transform hierarchy: approximates kinematics of protein backbone at successive resolutions  Bounding volume hierarchy: approximates geometry of protein at successive resolutions

Hierarchy of transforms

A B C D E F G H I T AB T BC T AC T HI T CD T DE T EF T FG T GH T CE T EG T GI T AE T EI T AI

Hierarchy of bounding volumesB BABA BHBH BGBG BFBF BEBE BDBD BCBC B CD B EF B GH B AB B AD B EH B AH

The ChainTree T AB B A T BC B B T CD B C T DE B D T EF B E T FG B F T GH B G T HI B H T AC B AB T CE B CD T EG B EF T GI B GH T AE B AD T EI B EH T AI B AH A B C D E F G H I

Updating the ChainTree T AB B A T BC B B T CD B C T DE B D T EF B E T FG B F T GH B G T HI B H T AC B AB T CE B CD T EG B EF T GI B GH T AE B AD T EI B EH T AI B AH A B C D E F G H I

Computing the energy ABCDEF GH JKLM NO P Pruning rules: 1.Prune search when distance between bounding volumes is more than cutoff distance 2.Do not search inside rigid sub-chains Recursively search ChainTree for interactions

ABCDEF GH JKLM NO P Computing the energy [ P ]

ABCDEF GH JKLM NO P [ N ] [ P ]

ABCDEF GH JKLM NO P [ N ][ O ] [ P ]

ABCDEF GH JKLM NO P [ N-O ][ N ][ O ] [ P ]

Computing the energy [ N-O ] [ J-K ] [ A-C ] [ B-C ] [ A-D ] [ B-D ] ABCDEF GH JKLM NO P [ J ] [ N ] [ K ] [ C ] [ D ] [ C-D ] [ O ] [ P ]

Computing the energy [ P ] [ N ][ N-O ] [ J-K ][ K ][ K-L ][ J-M ][ J-L ][ K-M ] [ A-G ] [ B-G ] [ A-H ] [ B-H ] [ A-C ] [ B-C ] [ A-D ] [ B-D ] [ C ] [ D ] [ C-D ] [ A-E ] [ B-E ] [ A-F ] [ B-F ] [ C-E ] [ C-F ] [ C-G ] [ C-H ] [ D-G ] [ D-H ] [ J ] [ A ] [ B ] [ A-B ] [ D-E ] [ D-F ] [ O ] [ L ][ L-M ][ M ] [ E ] [ F ] [ E-F ] [ E-G ] [ F-G ] [ E-H ] [ F-H ] [ H ] [ G ] [ H-G ] ABCDEF GH JKLM NO P

Computing the energy E(O) ABCDEF GH JKLM NO P [ P ] [ N ][ N-O ] [ J-K ][ K ][ K-L ][ J-M ][ J-L ][ K-M ] [ A-G ] [ B-G ] [ A-H ] [ B-H ] [ A-C ] [ B-C ] [ A-D ] [ B-D ] [ C ] [ D ] [ C-D ] [ A-E ] [ B-E ] [ A-F ] [ B-F ] [ C-E ] [ C-F ] [ C-G ] [ C-H ] [ D-G ] [ D-H ] [ J ] [ A ] [ B ] [ A-B ] [ D-E ] [ D-F ] [ O ] [ L ][ L-M ][ M ] [ E ] [ F ] [ E-F ] [ E-G ] [ F-G ] [ E-H ] [ F-H ] [ H ] [ G ] [ H-G ]

Computing the energy  Only changed interactions are found  Reuse unaffected partial sums  Better performance for Longer proteins Fewer simultaneous changes

 Updating:  Searching: Computational complexity worst case bound Much faster in practice

Test [68 res.][144 res.][374 res.][755 res.] [68 res.][144 res.][374 res.][755 res.] 1-DoF change5-DoF change

Simulation of α-Synuclein  140 res. protein implicated in Parkinson’s disease  Multi-canonical Replica-exchange MC regime  Over 1000 CPU days of simulation  Study conformations at room temp.  Joint work with Vijay Pande

Outline 1.Fast energy computation during Monte Carlo simulation 2.Model completion for protein X-ray crystallography 3.Large scale computation of similarity Lotan, van den Bedem*, Deacon* and Latombe, WAFR 2004 van den Bedem*, Lotan, Latombe and Deacon*, submitted to Acta. Cryst. D * Joint Center for Structural Genomics (JCSG) at SSRL

Protein Structure Initiative 152K sequenced genes (30K/year) 25K determined structures (3.6K/year)  Reduce cost and time to determine protein structure  Develop software to automatically interpret the electron density map (EDM)

EDM 3-D “image” of atomic structure High value (electron density) at atom centers Density falls off exponentially away from center

Automated model building  ~90% built at high resolution (2Å)  ~66% built at medium to low resolution (2.5 – 2.8Å)  Gaps left at noisy areas in EDM (blurred density) Gaps need to be resolved manually

The Fragment completion problem  Input EDM Partially resolved structure 2 Anchor residues Length of missing fragment  Output A small number of candidate structures for missing fragment A robotics inverse kinematics (IK) problem

Related work Computer Science  Exact IK solvers Manocha & Canny ’94 Manocha et al. ’95  Optimization IK solvers Wang & Chen ’91  Redundant manipulators Khatib ’87 Burdick ’89  Motion planning for closed loops Han & Amato ’00 Yakey et al. ’01 Cortes et al. ’02, ’04 Biology/Crystallography  Exact IK solvers Wedemeyer & Scheraga ’99 Coutsias et al. ’04  Optimization IK solvers Fine et al. ’86 Canutescu & Dunbrack Jr. ’03  Ab-initio loop closure Fiser et al. ’00 Kolodny et al. ’03  Database search loop closure Jones & Thirup ’86 Van Vlijman & Karplus ’97  Semi-automatic tools Jones & Kjeldgaard ’97 Oldfield ’01

Contributions  Sampling of gap-closing fragments biased by the EDM  Refinement of fit to density without breaking closure  Fully automatic fragment completion software for X-ray Crystallography Novel application of a combination of inverse kinematics techniques

Two-stage IK method 1.Candidate generations: Optimize density fit while closing the gap 2.Refinement: Optimize closed fragments without breaking closure

Stage 1: candidate generation  Generate random conformation  Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack Jr. ’03)

Stage 1: candidate generation  Generate random conformation  Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03)

Stage 1: candidate generation  Generate random conformation  Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03)

Stage 1: candidate generation  Generate random conformation  Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03)

Stage 1: candidate generation  Generate random conformation  Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03) CCD moves biased toward high-density

Stage 2: refinement 1-D manifold  Target function T (goodness of fit to EDM)  Minimize T while retaining closure  Closed conformations lie on Self-motion manifold of lower dimension

Stage 2: null-space minimization Jacobian: linear relation between joint velocities and end-effector linear and angular velocity. Compute minimizing move using: N – orthonormal basis of null space

Stage 2: minimization with closure 1.Choose sub-fragment with n > 6 DOFs 2.Compute using SVD 3.Project onto 4.Move until minimum is reached or closure is broken Escape from local minima using Monte Carlo with simulated annealing

Test: artificial gaps  Completed structure (gold standard)  Good density (1.6Å res.)  Remove fragment and rebuild LengthHigh (2.0Å)Medium (2.5Å)Low (2.8Å) 4100% (0.14Å)100% (0.19Å)100% (0.32Å) 8100% (0.18Å)100% (0.23Å)100% (0.36Å) 1291% (0.51Å)96% (0.41Å)91% (0.52Å) 1591% (0.53Å)88% (0.63Å)83% (0.76Å) Produced by H. van den Bedem

Test: true gaps  Completed structure (gold standard)  O.K. density (2.4Å res.)  6 gaps left by model builder (RESOLVE) LengthTop scorerLowest error 40.44Å0.40Å 40.22Å 50.78Å 50.36Å 70.72Å0.66Å Å Produced by H. van den Bedem

Example: TM0423 PDB: 1KQ3, 376 res. 2.0Å resolution 12 residue gap Best: 0.3Å aaRMSD

Example: TM0813 GLU-83 GLY-96 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap Best: 0.6Å aaRMSD

Example: TM0813 GLU-83 GLY-96 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap Best: 0.6Å aaRMSD

Example: TM0813 GLU-83 GLY-96 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap Best 0.6Å aaRMSD

Outline 1.Fast energy computation during Monte Carlo simulation 2.Model completion for protein X-ray crystallography 3.Large scale computation of similarity Lotan and Schwarzer, J. Comput. Biol. 11(2–3): 299–317, 2004

Large scale similarity  Analysis of simulation trajectories Molecular dynamics simulation Monte Carlo simulation  Clustering of decoy sets (e.g., Shortle et al. ’98)  Stochastic Roadmap Simulation (Apaydin et al. ’03) Fast similarity measures are needed for analyzing large sets of conformations

 Uniform simplification of protein structure for similarity computation  Speed-up existing similarity measures  Method offers trade-off between speed and precision  Efficient computation of nearest neighbors Contributions

m -Averaged approximation  Cut chain into pieces of length m  Replace each sequence of m C α atoms by its centroid 3n coordinates 3n/m coordinates

Chains and distances  Proximity along the chain entails spatial proximity  Far away links along the chain are spatially distant (on average) cici cjcj

Similarity measures

1. Decoy sets: conformations from the Park-Levitt set (Park et al, ’97), N =10, Random sets: conformations generated by the program FOLDTRAJ (Feldman & Hogue, ’00), N = 5000 Evaluation: test sets 8 structurally diverse proteins ( residues)

Evaluation results: decoy sets m cRMSdRMS  9x for cRMS (m = 9)  36x for dRMS (m = 6) Higher correlation for random sets!

Brute force complexity: for all k Nearest-neighbors problem Given a set S of conformations of a protein and a query conformation c, find the k conformations in S most similar to c N – size of S L – time to compute similarity

kd-tree: time per query Limitations: 1.Requires Minkowski metric: 2.Less efficient when d> 20 Efficient nearest neighbor search cRMS is not a Minkowski metric dRMS has dimensionality of Reduce dRMS dimensionality using SVD

Reduction using SVD 1. Stack m -averaged distance matrices as vectors 2. Compute the SVD of entire set 3. Project onto principle components dRMS is reduced to  20 dimensions Complexity of SVD ~

Testing the method  Use decoy sets ( N = 10,000 ) and random sets ( N = 5,000 )  m -averaging with ( m = 4 )  Project onto 16 PCs for decoys, 12 PCs for random sets  Find k = 10, 25, 100 NNs for 250 conformations in each set

Results  Decoy sets: ~77% correct Furthest NN off by 10% - 15% (0.7 Å – 1.5 Å ) ~4 k approximate NNs contain all true k NNs  Random sets: slightly better results Use reduction as fast filter

Running Time N = 100,000, m=4, PC = 16 Find k = 100 for each conformation Brute-force: ~84 hours Brute-force + m-averaging: ~4.8 hours Brute-force + m-averaging + SVD: 41 minutes kd-tree + m-averaging + SVD: 19 minutes kd-tree has more impact for larger sets

Contributions  Energy computation in MCS Efficient maintenance and self-collision detection for kinematic chains Efficient computation of pair-wise interactions in MCS of proteins Caching scheme for partial energy sums during MCS MCS software  Model completion in X-ray crystallography sampling of gap-closing fragments biased towards the EDM Refinement of fit to density without breaking closure Fully automatic fragment completion software  Similarity computation for large conformation sets Uniform simplification of protein structure for similarity computation Speed-up existing similarity measures Method offers trade-off between speed and precision Efficient computation of nearest neighbors

Take-home message Taking into account physical properties of proteins can lead to efficient algorithms for a wide variety of applications in structural biology

Outlook Models that simplify the physics and chemistry of proteins Algorithms that exploit properties of protein models computer scientistbiophysicist/biochemist Develop simplified protein models that lend themselves to efficient computations

Acknowledgements  Jean-Claude Latombe  Vijay Pande  Michael Levitt  Leo Guibas  Axel Brunger, Balaji Prabhakar, Serafim Batzoglou  Fabian Schwarzer, Henry van den Bedem, Dan Halperin  Carlo Tomasi  Daniel Russakoff, Rachel Kolodny  Latombe group Serkan Apaydin, Tim Bretl, Joel Brown, Phil Fong, Mitul Saha, Pekka Isto, Kris Hauser  Pande group Bojan Zagrovic, Stefan Larson, Lillian Chong, Young Min Rhee, Sidney Elmer, Chris Snow, Guha Jayachandran, Eric Sorin, Sung-Joo Lee, Jim Cladwell, Michael Shirts, Nina Singhal, Relly Brandman, Vishal Vaidyanathan, Nick Kelley, Mark Engelhardt  Levitt Group Patrice Koehl, Tanya Raschke, Erik Lindahl

Thank you!