Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University NSF-ITR Meeting on November 14, 2002.

Slides:



Advertisements
Similar presentations
Complete Motion Planning
Advertisements

By Guang Song and Nancy M. Amato Journal of Computational Biology, April 1, 2002 Presentation by Athina Ropodi.
Geometric Algorithms for Conformational Analysis of Long Protein Loops J. Cortess, T. Simeon, M. Remaud- Simeon, V. Tran.
Algorithmic Robotics and Motion Planning Dan Halperin Tel Aviv University Fall 2006/7 Dynamic Maintenance and Self-Collision Testing for Large Kinematic.
A COMPLEX NETWORK APPROACH TO FOLLOWING THE PATH OF ENERGY IN PROTEIN CONFORMATIONAL CHANGES Del Jackson CS 790G Complex Networks
Bio-CS Exploration of Molecular Conformational Spaces Jean-Claude Latombe Computer Science Department Robotics Laboratory & Bio-X Clark Center.
Protein Structure Alignment Human Myoglobin pdb:2mm1 Human Hemoglobin alpha-chain pdb:1jebA Sequence id: 27% Structural id: 90% Another example: G-Proteins:
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
CS 326 A: Motion Planning robotics.stanford.edu/~latombe/cs326/2003/index.htm Collision Detection and Distance Computation: Feature Tracking Methods.
The Probabilistic Roadmap Approach to Study Molecular Motion Jean-Claude Latombe Kwan Im Thong Hood Cho Temple Visiting Professor, NUS Kumagai Professor,
Application of Probabilistic Roadmaps to the Study of Protein Motion.
Stochastic Roadmap Simulation: An efficient representation and algorithm for analyzing molecular motion Mehmet Serkan Apaydιn May 27 th, 2004.
Computational Geometry, Algorithmic Robotics, and Molecular Modeling Dan Halperin School of Computer Science Tel Aviv University June 2007.
Graphical Models for Protein Kinetics Nina Singhal CS374 Presentation Nov. 1, 2005.
Dynamic Maintenance and Self Collision Testing for Large Kinematic Chains Lotan, Schwarzer, Halperin, Latombe.
Probabilistic Roadmaps: A Tool for Computing Ensemble Properties of Molecular Motions Serkan Apaydin, Doug Brutlag 1 Carlos Guestrin, David Hsu 2 Jean-Claude.
Protein Structure Space Patrice Koehl Computer Science and Genome Center
Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University NSF-ITR Meeting on November 14, 2002.
Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.
Motion Algorithms: Planning, Simulating, Analyzing Motion of Physical Objects Jean-Claude Latombe Computer Science Department Stanford University.
Algorithm for Fast MC Simulation of Proteins Itay Lotan Fabian Schwarzer Dan Halperin Jean-Claude Latombe.
Efficient Nearest-Neighbor Search in Large Sets of Protein Conformations Fabian Schwarzer Itay Lotan.
IMA, October 29, 2007 Slide 1 T H E B I O I N F O R M A T I C S C E N T R E A continuous probabilistic model of local RNA 3-D structure Jes Frellsen The.
Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University.
Stochastic roadmap simulation for the study of ligand-protein interactions Mehmet Serkan Apaydin, Carlos E. Guestrin, Chris Varma, Douglas L. Brutlag and.
CS273 Algorithms for Structure and Motion in Biology Instructors: Serafim Batzoglou and Jean-Claude Latombe Teaching Assistant: Sam Gross | serafim | latombe.
Proximity and Deformation Leonidas Guibas Stanford University “Tutto cambia perchè nulla cambi” T. di Lampedusa, Il Gattopardo (1860+)
Clustering protein fragments to extract a shape library data clustered data library [JMB (2002) 323, ]
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion Mehmet Serkan Apaydin, Douglas L. Brutlag, Carlos.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Algorithmic Robotics and Molecular Modeling Dan Halperin School of Computer Science Tel Aviv University June 2007.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Efficient Maintenance and Self-Collision Testing for Kinematic Chains Itay Lotan Fabian Schwarzer Dan Halperin Jean-Claude Latombe.
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
Molecular Motion Pathways: Computation of Ensemble Properties with Probabilistic Roadmaps 1)A.P. Singh, J.C. Latombe, and D.L. Brutlag. A Motion Planning.
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.
NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.
Efficient Maintenance and Self- Collision Testing for Kinematic Chains Itay Lotan Fabian Schwarzer Dan Halperin Jean-Claude Latombe.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Statistical Physics of the Transition State Ensemble in Protein Folding Alfonso Ramon Lam Ng, Jose M. Borreguero, Feng Ding, Sergey V. Buldyrev, Eugene.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Educational & Community Extending Activities. Education Outline Graduate training/mentoring Undergraduate training/mentoring Courses with Biogeometry.
Computer Simulation of Biomolecules and the Interpretation of NMR Measurements generates ensemble of molecular configurations all atomic quantities Problems.
Approximation of Protein Structure for Fast Similarity Measures Fabian Schwarzer Itay Lotan Stanford University.
Conformational Space.  Conformation of a molecule: specification of the relative positions of all atoms in 3D-space,  Typical parameterizations:  List.
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
1 Energy Maintenance for Molecular Simulation kinematics + energy  motion + structure Main computational issue: Proximity computation.
Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect ,Chap. 17 CS121 – Winter 2003.
Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++
Flexible Spanners: A Proximity and Collision Detection Tool for Molecules and Other Deformable Objects Jie Gao, Leonidas Guibas, An Nguyen Computer Science.
Modeling Protein Flexibility with Spatial and Energetic Constraints Yi-Chieh Wu 1, Amarda Shehu 2, Lydia Kavraki 2,3  Provided an approach to generating.
Conformational Space of a Flexible Protein Loop Jean-Claude Latombe Computer Science Department Stanford University (Joint work with Ankur Dhanik 1, Guanfeng.
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Protein structure prediction Computer-aided pharmaceutical design: Modeling receptor flexibility Applications to molecular simulation Work on this paper.
Modelling genome structure and function Ram Samudrala University of Washington.
Protein Structures from A Statistical Perspective Jinfeng Zhang Department of Statistics Florida State University.
A Computational Study of RNA Structure and Dynamics Rhiannon Jacobs and Dr. Harish Vashisth Department of Chemical Engineering, University of New Hampshire,
A Computational Study of RNA Structure and Dynamics Rhiannon Jacobs and Harish Vashisth Department of Chemical Engineering, University of New Hampshire,
K -Nearest-Neighbors Problem. cRMSD  cRMSD(c,c ’ ) is the minimized RMSD between the two sets of atom centers: min T [(1/n)  i=1, …,n ||a i (c) – T(a.
Research Overview III Jack Snoeyink UNC Chapel Hill.
PRM based Protein Folding
Greedy Algorithm for Community Detection
Efficient Energy Computation for Monte Carlo Simulation of Proteins
Finding Functionally Significant Structural Motifs in Proteins
BIOINFORMATICS Summary
Protein structure prediction
Presentation transcript:

Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University NSF-ITR Meeting on November 14, 2002

Stanford’s Participants  PI’s: L. Guibas, J.C. Latombe, M. Levitt  Research Associate: P. Koehl  Postdocs: F. Schwarzer, A. Zomorodian  Graduate students: S. Apaydin (EE), S. Ieong (CS), R. Kolodny (CS), I. Lotan (CS), A. Nguyen (Sc. Comp.), D. Russel (CS), R. Singh (CS), C. Varma (CS)  Undergraduate students: J. Greenberg (CS), E. Berger (CS)  Collaborating faculty:  A. Brunger (Molecular & Cellular Physiology)  D. Brutlag (Biochemistry)  D. Donoho (Statistics)  J. Milgram (Math)  V. Pande (Chemistry)

Problem Domains Biological functions derive from the structures (shapes) achieved by molecules through motions  Determination, classification, and prediction of 3D protein structures  Modeling of molecular energy and simulation of folding and binding motion

What’s New/Interesting for Computer Science? Massive amount of experimental data Importance of similarities Multiple representations of structure Continuous energy functions Many objects forming deformable chains Many degrees of freedom Ensemble properties of pathways

Importance of similarities Importance of similarities  Segmentation/matching/scoring techniques data set clustered data small library E.g.: Libraries of protein fragments [Kolodny, Koehl, Guibas, Levitt, JMB (2002)]

1tim Approximations Complexity 10 (100 fragments of length 5) A cRMS Complexity 2.26 (50 fragments of length 7) A cRMS real protein

Alignment of Structural Motifs [Singh and Saha; Kolodny and Linial] Problem: Determine if two structures share common motifs: 2 (labelled) structures in R 3 A={a 1,a 2,…,a n }, B={b 1,b 2,…,b m } Find subsequences s a and s b s.t the substructures {a s a (1),a s a (2),…, a s a (l) } {b s b (1),b s b (2),…, b s b (l) } are similar  Twofold problem: alignment and correspondence  Score  Approximation  Complexity

Iterative Closest Point (Besl-McKay) for alignment: [R. Singh and M. Saha. Identifying Structural Motifs in Proteins. Pacific Symp. on Biocomputing, Jan ]  Score: RMSD distance

[R. Singh and M. Saha. Identifying Structural Motifs in Proteins. Pacific Symp. on Biocomputing, Jan ] Trypsin Trypsin active site

[R. Singh and M. Saha. Identifying Structural Motifs in Proteins. Pacific Symp. on Biocomputing, Jan ] Trypsin active site against 42Trypsin like proteins

Multiple representations of structure Multiple representations of structure ProShape software [Koehl, Levitt (Stanford), Edelsbrunner (Duke)]

 Decoys generated using “physical” potentials  Select best decoys using distance information Statistical potentials for proteins based on alpha complex [Guibas, Koehl, Zomorodian]

 Many pairs of objects, but relatively few are close enough to interact  Data structures that capture proximity, but undergo small or rare changes During motion simulation - detect steric clashes (self-collisions) - find pairs of atoms closer than cutoff - find which energy terms can be reused Continuous energy function Continuous energy function Many objects in deformable chains Many objects in deformable chains

Other application domains:  Modular reconfigurable robots  Reconstructive surgery

 Fixed Bounding-Volume hierarchies don’t work  Instead, exploit what doesn’t change: chain topology  Adaptive BV hierarchies [Guibas, Nguyen, Russel, Zhang] [Lotan, Schwarzer, Halperin, Latombe] (SOCG’02) sec17

Wrapped bounding sphere hierarchies [Guibas, Nguyen, Russel, Zhang] (SoCG 2002) WBSH undergoes small number of changes Self-collision: O(n logn ) in R2 O(n2-2/d) in R d, d  3

ChainTrees [Lotan, Schwarzer, Halperin, Latombe] (SoCG’02)

Updating: Finding interacting pairs : (in practice, sublinear) Assumption: Few degrees of freedom change at each motion step (e.g., Monte Carlo simulation)

ChainTrees Application to MC simulation (comparison to grid method) (68)(144)(374) (755) (68)(144)(374) (755) m = 1m = 5

Many degrees of freedom Many degrees of freedom  Tools to explore large dimensional conformational (structure) spaces: - Structure sampling [Kolodny, Levitt] - Finding nearest neighbors [Lotan, Schwarzer]

Sampling structures by combining fragments [Kolodny, Levitt] a b c d cabcab bbc Library of protein fragments  Discrete set of candidate structures

Find k nearest neighbors of a given protein conformation in a set of n conformations (cRMS, dRMS) a0a0 a1a1 amam a6a6 a5a5 a4a4 a3a3 a2a2 Idea: Cut backbone into m equal subsequences Nearest neighbors in high-dimensional space [Lotan, Schwarzer]

Nearest neighbors in high-dimensional space [Lotan and Schwarzer] Full rep., dRMS (brute force)~84h Ave. rep., dRMS (brute force) :~4.8h SVD red. rep., dRMS (brute force)41min SVD red. rep., dRMS (kd-tree)19min 100,000 decoys of 1CTF (Park-Levitt set) Computation of 100 NN of each conformation ~80% of computed NNs are true NNs kd-tree software from ANN library (U. Maryland)

Ensemble properties of pathways Ensemble properties of pathways  Stochastic nature of molecular motion requires characterizing average properties of many pathways Probabilistic conformational roadmaps Applications to protein folding and ligand-protein binding [Apaydin, Brutlag, Guestrin, Hsu, Latombe]

Example: Probability of Folding p fold Unfolded set Folded set p fold 1- p fold “We stress that we do not suggest using p fold as a transition coordinate for practical purposes as it is very computationally intensive.” Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition Coordinate for Protein Folding” Journal of Chemical Physics (1998). HIV integrase [Du et al. ‘98]

vivi vjvj P ij Probabilistic Roadmap [Apaydin, Brutlag, Hsu, Guestrin, Latombe] (RECOMB’02, ECCB’02) Idea: Capture the stochastic nature of molecular motion by a network of randomly selected conformations and by assigning probabilities to edges

P ii F: Folded setU: Unfolded set P ij i k j l m P ik P il P im Let f i = p fold (i) After one step: f i = P ii f i + P ij f j + P ik f k + P il f l + P im f m =1  One linear equation per node  Solution gives p fold for all nodes  No explicit simulation run  All pathways are taken into account  Sparse linear system Probabilistic Roadmap

Correlation with MC Approach 1ROP (repressor of primer) 2  helices 6 DOF

Monte Carlo: 49 conformations Over 11 days of computer time Over 10 6 energy computations Roadmap: 5000 conformations hours of computer time ~15,000 energy computations ~4 orders of magnitude speedup! Probabilistic Roadmap Computation Times (1ROP)

Summary Interpretation of electron density maps Statistical potential Library of protein fragments Self-collision and energy maintenance Structure alignment ProShape software Tools for high-dimensional spaces Probabilistic roadmaps Biology –Structure determination Modeling –Shape representation –Hierarchies Algorithms –Deformation –Motion planning –Shape organization Software –Alpha shapes

Future Work Perform more substantial experiments E.g., more realistic potentials in ChainTree and probabilistic roadmaps Extend tools to solve more relevant problems E.g., encode Molecular Dynamics into probabilistic roadmaps Combine results E.g., use library of fragments to sample probabilistic roadmaps Develop new algorithms/data structures E.g., sparse spanners to capture proximity information

Our Future: The BioX – Clark Center June 2003