Bio-CS Exploration of Molecular Conformational Spaces Jean-Claude Latombe Computer Science Department Robotics Laboratory & Bio-X Clark Center.

Bio-CS Exploration of Molecular Conformational Spaces Jean-Claude Latombe Computer Science Department Robotics Laboratory & Bio-X Clark Center

Range of Bio-CS Research Gene Molecules Tissue/Organs Body system Robotic surgery Molecular structures, similarities and motions Soft-tissue simulation and surgical training Cells Simulation of cell interaction

Soft-tissue simulation and surgical training Range of Bio-CS Research Gene Molecules Tissue/Organs Body system Robotic surgery Cells Simulation of cell interaction Accuray Molecular structures, similarities and motions

Range of Bio-CS Research Gene Molecules Tissue/Organs Body system Robotic surgery Molecular structures, similarities and motions Soft-tissue simulation and surgical training Cells Simulation of cell interaction

Motion  Structure 1 2 3 4

Motion  Structure  Function Develop efficient algorithms and data structures to explore protein conformational spaces: Sampling Similarities Pathways

Vision for the Future In-silico experiments Drugs on demand  “Interactive” Biology

Analogy with Robotics free space [Kavraki, Svetska, Latombe,Overmars, 95]

But Biology  Robotics … Energy field, instead of joint control Continuous energy field, instead of binary free and in-collision spaces Multiple pathways, instead of single collision-free path Potentially many more degrees of freedom Relation to real world is more complex

Overview  Part I Probabilistic Roadmaps: A Tool for Computing Ensemble Properties of Molecular Motions M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, J.C. Latombe, and C. Varma. Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion. J. Computational Biology, 10(3- 4):257-281, 2003.  Part II ChainTree: A Data Structure for Efficient Monte Carlo Simulation of Proteins I. Lotan, F. Schwarzer, J.C. Latombe. Efficient Energy Computation for Monte Carlo Simulation of Proteins. 3 rd Workshop on Algorithms in Bioinformatics (WABI), Budapest, Hungary, Sept., 2003.

Part I Probabilistic Roadmaps: A Tool for Computing Ensemble Properties of Molecular Motions Serkan Apaydin, Doug Brutlag 1, Carlos Guestrin, David Hsu 2, Jean-Claude Latombe, Chris Varma Computer Science Department Stanford University 1 Department of Biochemistry, Stanford University 2 Computer Science Department, Nat. Univ. of Singapore

Initial Work [Singh, Latombe, Brutlag, 99] Study of ligand-protein binding Probabilistic roadmaps with edges weighted by energetic plausibility vivi vjvj w ij

Initial Work [Singh, Latombe, Brutlag, 99] Study of ligand-protein binding Probabilistic roadmaps with edges weighted by energetic plausibility Search of most plausible path vivi vjvj w ij

Initial Work [Singh, Latombe, Brutlag, 99]  Study of energy profiles along most plausible paths  Extensions to protein folding [Song and Amato, 01] [Apaydin et al., 01] But: Molecules fold/bind along a myriad of pathways. Any single pathway is of limited interest. Catalytic Site energy

New Idea: Capture the stochastic nature of molecular motion by assigning probabilities to edges vivi vjvj P ij

Edge probabilities Follow Metropolis criteria: Self-transition probability: vjvj vivi P ij P ii

Stochastic simulation on roadmap and Monte Carlo simulation converge to same Boltzmann distribution S Stochastic Roadmap Simulation P ij

Problems with Monte Carlo Simulation  Much time is wasted escaping local minima  Each run generates a single pathway

Proposed Solution P ij Treat a roadmap as a Markov chain and use First-Step Analysis tool

Example #1: Probability of Folding p fold Unfolded stateFolded state p fold 1- p fold “We stress that we do not suggest using p fold as a transition coordinate for practical purposes as it is very computationally intensive.” Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition Coordinate for Protein Folding” Journal of Chemical Physics (1998). HIV integrase [Du et al. ‘98]

P ii F: Folded setU: Unfolded set First-Step Analysis P ij i k j l m P ik P il P im Let f i = p fold (i) After one step: f i = P ii f i + P ij f j + P ik f k + P il f l + P im f m =1  One linear equation per node  Solution gives p fold for all nodes  No explicit simulation run  All pathways are taken into account  Sparse linear system

In Contrast … Computing p fold with MC simulation requires: For every conformation c of interest  Perform many MC simulation runs from c  Count number of times F is attained first

Computational Tests 1ROP (repressor of primer) 2  helices 6 DOF 1HDD (Engrailed homeodomain) 3  helices 12 DOF H-P energy model with steric clash exclusion [Sun et al., 95]

1ROP Correlation with MC Approach

Computation Times (1ROP) Monte Carlo: 49 conformations Over 11 days of computer time Over 10 6 energy computations Roadmap: 5000 conformations 1.5 hours of computer time ~15,000 energy computations ~4 orders of magnitude speedup!

Example #2: Ligand-Protein Interaction Computation of escape time from funnels of attraction around potential binding sites funnel = ball of 10Å rmsd [Camacho, Vajda, 01]

Similar Computation Through Simulation Similar Computation Through Simulation [Sept, Elcock and McCammon `99] 10K to 30K independent simulations

Computing Escape Time with Roadmap Funnel of Attraction i j k l m P ii P im P il P ik P ij  i = 1 + P ii  i + P ij  j + P ik  k + P il  l + P im  m (escape time is measured as number of steps of stochastic simulation) = 0

Distinguishing Catalytic Site Given several potential binding sites, which one is the catalytic site? Energy: electrostatic + van der Waals + solvation free energy terms

Complexes Studied ligandprotein# random nodes # DOFs oxamate1ldm80007 Streptavidin1stp800011 Hydroxylamine4ts180009 COT1cjw800021 THK1aid800014 IPM1ao5800010 PTI3tpi800013

Distinction Based on Energy ProteinBound state Best potential binding site 1stp-15.1-14.6 4ts1-19.4-14.6 3tpi-25.2-16.0 1ldm-11.8-13.6 1cjw-11.7-18.0 1aid-11.2-22.2 1ao5-7.5-13.1 (kcal/mol) Able to distinguish catalytic site Not able

Distinction Based on Escape Time ProteinBound state Best potential binding site 1stp3.4E+91.1E+7 4ts13.8E+101.8E+6 3tpi1.3E+115.9E+5 1ldm8.1E+53.4E+6 1cjw5.4E+84.2E+6 1aid9.7E+51.6E+8 1ao56.6E+75.7E+6 (# steps) Able to distinguish catalytic site Not able

Conclusion Probabilistic roadmaps are a promising tool for computing ensemble properties of molecular pathways Current work:  Non-uniform sampling strategies to handle more complex molecules  More realistic energetic models  Extension to molecular dynamic simulation  Connection to in-vitro experiments (interaction of two proteins)

Part II ChainTree: A Data Structure for Efficient Monte Carlo Simulation of Proteins Itay Lotan, Fabian Schwarzer, Dan Halperin 1, Jean-Claude Latombe Computer Science Department Stanford University 1 Computer Science Department, Tel Aviv University

 Used to study thermodynamic and kinetic properties of proteins  Random walk through conformation space  At each attempted step: –Perturb current conformation at random –Accept step with probability:  Problem: How to maintain energy efficiently? Monte Carlo Simulation (MCS)

Energy Function  E =  bonded terms +  non-bonded terms  Bonded terms, e.g. bond length Easy to compute  Non-bonded terms, e.g. Van der Waals, depend on distances between pairs of atoms Expensive to compute, O(n 2 )

Energy Function  Non-bonded terms  Use cutoff distance (6 - 12Å)  Only O(n) interacting pairs [Halperin & Overmars ’98] Problem: How to find interacting pairs without enumerating all atom pairs?

Grid Method  Subdivide space into cubic cells  Compute cell that contains each atom center  Store results in hash table d cutof f Θ(n) time to update grid O(1) time to find interactions for each atom Θ(n) to find all interactions Asymptotically optimal in worst-case!

Can We Do Better on Average?  Proteins are long kinematic chains

Protein’s Kinematic Structure  Angles  for backbone and  for side-chains  Conformational space torsional dof

Can We Do Better on Average?  Proteins are long chain kinematics  Few DOFs are perturbed at each MC step  Long sub-chains stay rigid at each step  Many partial energy sums remain constant How to retrieve unchanged partial sums?

Two New Data Structures 1.ChainTree  Fast detection of interacting atom pairs 2.EnergyTree  Reuse of unchanged partial energy sums

ChainTree Combination of two hierarchies:  Transform hierarchy:  Bounding volume hierarchy:

ChainTree Combination of two hierarchies:  Transform hierarchy: approximate kinematics of protein backbone at successive resolutions

ChainTree Combination of two hierarchies:  Bounding volume hierarchy: approximate geometry of protein at successive resolutions (Larsen et al., ’00)

ChainTree

Updating the ChainTree Update path to root –Recompute transforms that shortcut change –Recompute bounding volumes that contain change

Finding Interacting Pairs vs. Do not search inside rigid sub-chains (unmarked nodes) Do not test two nodes with no marked node between them

Computational Complexity n : total number of DOFs in protein backbone k : number of simultaneous DOF changes at each step of MCS Updating complexity: Worst-case complexity of finding all interacting pairs: but p erforms much better in practice!!!

EnergyTree E(N,N) E(N,O) E(P,P) E(O,O)

Experimental Setup Energy function: –Van der Waals –Electrostatic –Attraction between native contacts –Cutoff at 12Å 300,000 steps MCS Early rejection for large vdW terms

Results: 1-DOF change (68)(144)(374) (755)

Results: 5-DOF change (68)(144)(374)(755)

Two-Pass ChainTree (68)(144)(374) (755)

Conclusion Chain/EnergyTree reduces average time per step in MCS of proteins (vs. grid) Exploit chain kinematics of protein Larger speed-up for bigger proteins and for smaller number of simultaneous DOF changes

What is Computational Biology? Using computers in Biology? Designing efficient algorithms for analyzing biological data and simulating biological processes? Using Biology to design new algorithms and computing hardware? Cultural clash Biology  classification Computer Science  abstraction In any case, Computational Biology will be a critical domain for the next 20 years, probably the next “big thing” after the Internet

Bio-CS Exploration of Molecular Conformational Spaces Jean-Claude Latombe Computer Science Department Robotics Laboratory & Bio-X Clark Center.

Similar presentations

Presentation on theme: "Bio-CS Exploration of Molecular Conformational Spaces Jean-Claude Latombe Computer Science Department Robotics Laboratory & Bio-X Clark Center."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bio-CS Exploration of Molecular Conformational Spaces Jean-Claude Latombe Computer Science Department Robotics Laboratory & Bio-X Clark Center.

Similar presentations

Presentation on theme: "Bio-CS Exploration of Molecular Conformational Spaces Jean-Claude Latombe Computer Science Department Robotics Laboratory & Bio-X Clark Center."— Presentation transcript:

Similar presentations

About project

Feedback