Download presentation
Presentation is loading. Please wait.
1
Bio-CS Exploration of Molecular Conformational Spaces Jean-Claude Latombe Computer Science Department Robotics Laboratory & Bio-X Clark Center
2
Range of Bio-CS Research Gene Molecules Tissue/Organs Body system Robotic surgery Molecular structures, similarities and motions Soft-tissue simulation and surgical training Cells Simulation of cell interaction
3
Soft-tissue simulation and surgical training Range of Bio-CS Research Gene Molecules Tissue/Organs Body system Robotic surgery Cells Simulation of cell interaction Accuray Molecular structures, similarities and motions
4
Range of Bio-CS Research Gene Molecules Tissue/Organs Body system Robotic surgery Molecular structures, similarities and motions Soft-tissue simulation and surgical training Cells Simulation of cell interaction
5
Motion Structure 1 2 3 4
6
Motion Structure Function Develop efficient algorithms and data structures to explore protein conformational spaces: Sampling Similarities Pathways
7
Vision for the Future In-silico experiments Drugs on demand “Interactive” Biology
8
Analogy with Robotics free space [Kavraki, Svetska, Latombe,Overmars, 95]
9
But Biology Robotics … Energy field, instead of joint control Continuous energy field, instead of binary free and in-collision spaces Multiple pathways, instead of single collision-free path Potentially many more degrees of freedom Relation to real world is more complex
10
Overview Part I Probabilistic Roadmaps: A Tool for Computing Ensemble Properties of Molecular Motions M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, J.C. Latombe, and C. Varma. Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion. J. Computational Biology, 10(3- 4):257-281, 2003. Part II ChainTree: A Data Structure for Efficient Monte Carlo Simulation of Proteins I. Lotan, F. Schwarzer, J.C. Latombe. Efficient Energy Computation for Monte Carlo Simulation of Proteins. 3 rd Workshop on Algorithms in Bioinformatics (WABI), Budapest, Hungary, Sept., 2003.
11
Part I Probabilistic Roadmaps: A Tool for Computing Ensemble Properties of Molecular Motions Serkan Apaydin, Doug Brutlag 1, Carlos Guestrin, David Hsu 2, Jean-Claude Latombe, Chris Varma Computer Science Department Stanford University 1 Department of Biochemistry, Stanford University 2 Computer Science Department, Nat. Univ. of Singapore
12
Initial Work [Singh, Latombe, Brutlag, 99] Study of ligand-protein binding Probabilistic roadmaps with edges weighted by energetic plausibility vivi vjvj w ij
13
Initial Work [Singh, Latombe, Brutlag, 99] Study of ligand-protein binding Probabilistic roadmaps with edges weighted by energetic plausibility Search of most plausible path vivi vjvj w ij
14
Initial Work [Singh, Latombe, Brutlag, 99] Study of energy profiles along most plausible paths Extensions to protein folding [Song and Amato, 01] [Apaydin et al., 01] But: Molecules fold/bind along a myriad of pathways. Any single pathway is of limited interest. Catalytic Site energy
15
New Idea: Capture the stochastic nature of molecular motion by assigning probabilities to edges vivi vjvj P ij
16
Edge probabilities Follow Metropolis criteria: Self-transition probability: vjvj vivi P ij P ii
17
Stochastic simulation on roadmap and Monte Carlo simulation converge to same Boltzmann distribution S Stochastic Roadmap Simulation P ij
18
Problems with Monte Carlo Simulation Much time is wasted escaping local minima Each run generates a single pathway
19
Proposed Solution P ij Treat a roadmap as a Markov chain and use First-Step Analysis tool
20
Example #1: Probability of Folding p fold Unfolded stateFolded state p fold 1- p fold “We stress that we do not suggest using p fold as a transition coordinate for practical purposes as it is very computationally intensive.” Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition Coordinate for Protein Folding” Journal of Chemical Physics (1998). HIV integrase [Du et al. ‘98]
21
P ii F: Folded setU: Unfolded set First-Step Analysis P ij i k j l m P ik P il P im Let f i = p fold (i) After one step: f i = P ii f i + P ij f j + P ik f k + P il f l + P im f m =1 One linear equation per node Solution gives p fold for all nodes No explicit simulation run All pathways are taken into account Sparse linear system
22
In Contrast … Computing p fold with MC simulation requires: For every conformation c of interest Perform many MC simulation runs from c Count number of times F is attained first
23
Computational Tests 1ROP (repressor of primer) 2 helices 6 DOF 1HDD (Engrailed homeodomain) 3 helices 12 DOF H-P energy model with steric clash exclusion [Sun et al., 95]
24
1ROP Correlation with MC Approach
25
Computation Times (1ROP) Monte Carlo: 49 conformations Over 11 days of computer time Over 10 6 energy computations Roadmap: 5000 conformations 1.5 hours of computer time ~15,000 energy computations ~4 orders of magnitude speedup!
26
Example #2: Ligand-Protein Interaction Computation of escape time from funnels of attraction around potential binding sites funnel = ball of 10Å rmsd [Camacho, Vajda, 01]
27
Similar Computation Through Simulation Similar Computation Through Simulation [Sept, Elcock and McCammon `99] 10K to 30K independent simulations
28
Computing Escape Time with Roadmap Funnel of Attraction i j k l m P ii P im P il P ik P ij i = 1 + P ii i + P ij j + P ik k + P il l + P im m (escape time is measured as number of steps of stochastic simulation) = 0
29
Distinguishing Catalytic Site Given several potential binding sites, which one is the catalytic site? Energy: electrostatic + van der Waals + solvation free energy terms
30
Complexes Studied ligandprotein# random nodes # DOFs oxamate1ldm80007 Streptavidin1stp800011 Hydroxylamine4ts180009 COT1cjw800021 THK1aid800014 IPM1ao5800010 PTI3tpi800013
31
Distinction Based on Energy ProteinBound state Best potential binding site 1stp-15.1-14.6 4ts1-19.4-14.6 3tpi-25.2-16.0 1ldm-11.8-13.6 1cjw-11.7-18.0 1aid-11.2-22.2 1ao5-7.5-13.1 (kcal/mol) Able to distinguish catalytic site Not able
32
Distinction Based on Escape Time ProteinBound state Best potential binding site 1stp3.4E+91.1E+7 4ts13.8E+101.8E+6 3tpi1.3E+115.9E+5 1ldm8.1E+53.4E+6 1cjw5.4E+84.2E+6 1aid9.7E+51.6E+8 1ao56.6E+75.7E+6 (# steps) Able to distinguish catalytic site Not able
33
Conclusion Probabilistic roadmaps are a promising tool for computing ensemble properties of molecular pathways Current work: Non-uniform sampling strategies to handle more complex molecules More realistic energetic models Extension to molecular dynamic simulation Connection to in-vitro experiments (interaction of two proteins)
34
Part II ChainTree: A Data Structure for Efficient Monte Carlo Simulation of Proteins Itay Lotan, Fabian Schwarzer, Dan Halperin 1, Jean-Claude Latombe Computer Science Department Stanford University 1 Computer Science Department, Tel Aviv University
35
Used to study thermodynamic and kinetic properties of proteins Random walk through conformation space At each attempted step: –Perturb current conformation at random –Accept step with probability: Problem: How to maintain energy efficiently? Monte Carlo Simulation (MCS)
36
Energy Function E = bonded terms + non-bonded terms Bonded terms, e.g. bond length Easy to compute Non-bonded terms, e.g. Van der Waals, depend on distances between pairs of atoms Expensive to compute, O(n 2 )
37
Energy Function Non-bonded terms Use cutoff distance (6 - 12Å) Only O(n) interacting pairs [Halperin & Overmars ’98] Problem: How to find interacting pairs without enumerating all atom pairs?
38
Grid Method Subdivide space into cubic cells Compute cell that contains each atom center Store results in hash table d cutof f Θ(n) time to update grid O(1) time to find interactions for each atom Θ(n) to find all interactions Asymptotically optimal in worst-case!
39
Can We Do Better on Average? Proteins are long kinematic chains
40
Protein’s Kinematic Structure Angles for backbone and for side-chains Conformational space torsional dof
41
Can We Do Better on Average? Proteins are long chain kinematics Few DOFs are perturbed at each MC step Long sub-chains stay rigid at each step Many partial energy sums remain constant How to retrieve unchanged partial sums?
42
Two New Data Structures 1.ChainTree Fast detection of interacting atom pairs 2.EnergyTree Reuse of unchanged partial energy sums
43
ChainTree Combination of two hierarchies: Transform hierarchy: Bounding volume hierarchy:
44
ChainTree Combination of two hierarchies: Transform hierarchy: approximate kinematics of protein backbone at successive resolutions
45
ChainTree Combination of two hierarchies: Bounding volume hierarchy: approximate geometry of protein at successive resolutions (Larsen et al., ’00)
46
ChainTree
47
Updating the ChainTree Update path to root –Recompute transforms that shortcut change –Recompute bounding volumes that contain change
48
Finding Interacting Pairs vs. Do not search inside rigid sub-chains (unmarked nodes) Do not test two nodes with no marked node between them
49
Finding Interacting Pairs vs. Do not search inside rigid sub-chains (unmarked nodes) Do not test two nodes with no marked node between them
50
Computational Complexity n : total number of DOFs in protein backbone k : number of simultaneous DOF changes at each step of MCS Updating complexity: Worst-case complexity of finding all interacting pairs: but p erforms much better in practice!!!
51
EnergyTree E(N,N) E(N,O) E(P,P) E(O,O)
52
EnergyTree E(N,N) E(N,O) E(P,P) E(O,O)
53
Experimental Setup Energy function: –Van der Waals –Electrostatic –Attraction between native contacts –Cutoff at 12Å 300,000 steps MCS Early rejection for large vdW terms
54
Results: 1-DOF change (68)(144)(374) (755)
55
Results: 5-DOF change (68)(144)(374)(755)
56
Two-Pass ChainTree (68)(144)(374) (755)
57
Conclusion Chain/EnergyTree reduces average time per step in MCS of proteins (vs. grid) Exploit chain kinematics of protein Larger speed-up for bigger proteins and for smaller number of simultaneous DOF changes
58
What is Computational Biology? Using computers in Biology? Designing efficient algorithms for analyzing biological data and simulating biological processes? Using Biology to design new algorithms and computing hardware? Cultural clash Biology classification Computer Science abstraction In any case, Computational Biology will be a critical domain for the next 20 years, probably the next “big thing” after the Internet
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.