Stochastic Roadmap Simulation: An efficient representation and algorithm for analyzing molecular motion Mehmet Serkan Apaydιn May 27 th, 2004
Molecular motion is an essential process of life Stanford bio-x cluster An NMR spectrometer (CS273) Bovine Spongiform Encephalopathy (BSE) protein (mis)-folding Drug molecules act by binding to proteins Ligand-protein binding
Computing p fold, the best order parameter in protein folding is expensive using classical simulation techniques Unfolded set Folded set p fold 1- p fold “We stress that we do not suggest using p fold as a transition coordinate for practical purposes as it is very computationally intensive.” Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition Coordinate for Protein Folding” Journal of Chemical Physics (1998). HIV integrase [Du et al. ‘98]
Stochastic Roadmap Simulation (SRS) molecular motion Develop efficient computational representations and algorithms to study molecular motion pathways for protein folding and ligand-protein binding
Contributions New computational framework for studying molecular motion –Transition probabilities –Correspondence to Monte Carlo –First step analysis –Extension to non-uniform sampling Computation of ensemble properties: –protein folding: p fold parameter comparison with Monte Carlo Quantitative predictions of experimental values –ligand-protein binding: escape time Qualitative predictions about the role of amino acids in the active site of a protein Application to distinguish the catalytic site from a set of potential binding sites P ij
Outline Background Stochastic Roadmap Simulation Applications –Protein folding –Ligand-protein binding Extension of basic framework Quantitative prediction of experimental results on protein folding
Proteins and their structure Macromolecule Building block of life.
Ligand-Protein Binding
Simulating molecular motion Monte Carlo (MC) or Molecular Dynamics
Molecular Representations Atomistic model Linkage model –Internal parameter representation (bond angles, lengths, torsional angles) –Each secondary structure element as a vector [Lotan `04]
Analogy with Robotics X0X0 11 22 33 X1X1 Y0Y0 X2X2 X3X3
Molecular Energetics E = E S + E + E S-B + E T or + E vdW + E dipole bonded terms non-bonded terms Force fields Gō models Hydrophobic-Polar models (cs273)
MC simulation
Problems with Monte Carlo Simulation Each run generates a single pathway Much time is wasted in local minima
A path planning technique: Probabilistic Roadmaps (PRM) [Kavraki et.al.`96] C-obstacle Preprocessing Configuration space node Query edge Qinit Qgoal
Application of PRM to molecular motion Study of ligand-protein binding Probabilistic roadmaps with edges weighted by energetic plausibility Search for the minimum weight paths [Singh, Latombe, Brutlag, `99]
Application of PRM to molecular motion Study of ligand-protein binding Probabilistic roadmaps with edges weighted by energetic plausibility Search for the minimum weight paths Extensions to protein folding [Singh, Latombe, Brutlag, `99] [Song and Amato, `01] [Apaydın et al., `01]
How many pathways are there in a roadmap? n/m , 2, 12, 184, 8512, , , , , (10x10) , (11x11) , (12x12) Number of Self-Avoiding Walks on a 2D Grid
Outline Background Stochastic Roadmap Simulation Applications –Protein folding –Ligand-protein binding Future work
New Idea: Stochastic Conformational Roadmaps Capture the stochastic nature of molecular motion by assigning probabilities to edges vivi vjvj P ij [Apaydın et. al., RECOMB `02, WAFR`02] Collaborators: C. Guestrin, D. Hsu
Edge probabilities Self transition probabilities: P ij vivi vjvj P ii Follow Metropolis criteria: Correspond to probabilities in Monte Carlo simulation.
S Relationship to MC simulation P ij Each path on graph = a path of MC simulation Roadmap represents many MC simulation paths simultaneously Stochastic Roadmap Simulation and Monte Carlo Simulation converge to the same distribution (the Boltzmann distribution).
Using SRS to compute ensemble properties P ij Markov chain Treat roadmap as a Markov chain and use First-Step Analysis
Application of SRS to protein folding: Probability of Folding p fold Unfolded set Folded set p fold 1- p fold “We stress that we do not suggest using p fold as a transition coordinate for practical purposes as it is very computationally intensive.” Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition Coordinate for Protein Folding” Journal of Chemical Physics (1998). HIV integrase [Du et al. ‘98]
P ii F: Folded setU: Unfolded set First-Step Analysis P ij i k j l m P ik P il P im Let f i = p fold (i) After one step: f i = P ii f i + P ij f j + P ik f k + P il f l + P im f m =1 One linear equation per node Solution gives p fold for all nodes No explicit simulation run All pathways are taken into account Sparse linear system
In Contrast … Computing p fold with MC simulation requires: Performing many MC simulation runs Counting the number of times F is attained first for every conformation of interest:
Comparison: SRS vs. MC (on synthetic landscape) Number of nodes L1 Distance
Computational Tests on two real proteins 1ROP (repressor of primer) 2 helices 6 DOF 1HDD (Engrailed homeodomain) 3 helices 12 DOF H-P energy model with steric clash exclusion [Sun et al., `95]
Differences in p fold values obtained by SRS and MC for 1ROP and 1HDD Number of nodes L1 Distance
p fold on real protein: ß hairpin Immunoglobin binding protein (Protein G) Last 16 amino acids C-α based representation Gō model based energy 42 DOFs [Zhou and Karplus, `99]
Comparison between SRS and MC for ß hairpin Number of nodes L1 Distance
Computation Times (ß hairpin) Monte Carlo: (30 simulations) 1 conformation ~10 hours of computer time Over 10 7 energy computations Roadmap: 2000 conformations 23 seconds of computer time ~50,000 energy computations ~6 orders of magnitude speedup!
Outline Background Stochastic Roadmap Simulation Applications –Protein folding –Ligand-protein binding Extension of basic framework Quantitative prediction of experimental results on protein folding
Application of SRS to Ligand-Protein Interactions Distinguishing catalytic site: Among several potential binding sites, which one is the catalytic site? Studying effect of catalytic amino acids upon binding/unbinding [Apaydın et. al., ECCB ‘02] Collaborators: C. Guestrin, C. Varma
Funnels of attractions and escape time from a funnel Potential binding sites Funnel = Energy gradient around a site that guides the ligand to that site. Defined as all ligand conformations within 10A rmsd of the site. [Camacho and Vajda `01] Computation of escape time from funnels of attraction around potential binding sites
Computing Escape Time with Roadmap Funnel of Attraction i j k l m P ii P im P il P ik P ij i = 1 + P ii i + P ij j + P ik k + P il l + P im m (escape time is measured as number of steps of stochastic simulation) = 0
Results on lactate dehydrogenase C C O O O GLN-101 ARG-106 ASP-195 HIS-193 ASP-166 ARG-169 NADH Loop CH 3 THR E6 Escape Time N/A Change Wildtype Mutant
Results on lactate dehydrogenase C C O O O GLN-101 ALA-106 ASP-195 ALA-193 ASP-166 ARG-169 NADH + Loop CH E E6 Escape Time N/A Change His193 Ala- Arg106 Ala Wildtype Mutant
Results on lactate dehydrogenase 4.607E E E E E E E6 Escape Time No change N/A Change Thr245 Gly Gln101 Arg Asp195 Asn Arg106 Ala His193 Ala His193 Ala- Arg106 Ala Wildtype Mutant C C O O O GLN-101 ARG-106 ASP-195 HIS-193 ASP-166 ARG-169 NADH Loop CH 3 GLY-245
Outline Background Stochastic Roadmap Simulation Applications –Protein folding –Ligand-protein binding Extension of basic framework Quantitative prediction of experimental results on protein folding
A non uniform sampling strategy: sampling local minima and saddles of the landscape [Henkelman, Jonsson’99]
Adding critical points to the roadmap obtains the same quality in p fold values with less number of nodes Number of nodes L1 Distance
Outline Background Stochastic Roadmap Simulation Applications –Protein folding –Ligand-protein binding Extension of basic framework Quantitative prediction of experimental results on protein folding
Using p fold to make quantitative predictions Connecting theory with experiment: –Rates –Φ values Transition State computation using: –Energy barriers considering monotonic pathways –P fold considering all pathways [Garbuzynskiy, Finkelstein, Galzitskaya `04] Collaborators: TH Chiang, D. Hsu (N.U. Singapore) [Fersht `99]
Φ Value Results using pfold are better for 3 (out of 5) proteins Protein Correlation to experiment in [Garbuzynskiy et. al., `04] Correlation to experiment with p fold B1 IgG-binding domain of protein G Src SH3 domain SH3 domain of -spectrin Sso7d CI
Computing rates with p fold results in better correlation with experiment [Garbuzynskiy et. al., `04] using p fold Correlation: 0.67Correlation: experimental rate --computed rate Protein # log(k f )
Contributions New computational framework for studying molecular motion –Transition probabilities –Correspondence to Monte Carlo –First step analysis –Extension to non-uniform sampling Computation of ensemble properties: –protein folding: p fold parameter comparison with Monte Carlo Quantitative predictions of experimental values –ligand-protein binding: escape time Qualitative predictions about the role of amino acids in the active site of a protein Application to distinguish the catalytic site from a set of potential binding sites P ij
Future work Non-uniform sampling on high-dimensional examples Computing and reducing the error in the computed parameters Estimating the number of nodes needed Exploring larger systems and pushing the experiment q3q3 q1q1 q2q2 q4q4 q5q5
SRS code available! Visit:
Acknowledgements My advisors: Prof. Latombe, Prof. Brutlag Prof. Van Roy Prof. McCluskey My committee: Prof. Motwani, Prof. Vuckovic Coauthors: D. Hsu, C. Guestrin, S. Kasif, A. Singh, C. Varma Collaborators: TH Chiang, J. Greenberg, S. Ieong, F. Schwarzer, R. Singh, A. Tellez Faculty: Prof. Altman, Prof. Baldwin, Prof. Guibas, Prof. Pande Prof. Kavraki (Rice) Prof. Zell (Tuebingen) Prof. Snoeyink (UNC) Funding: David L. Cheriton Stanford Graduate Fellowship NSF Biogeometry grant Stanford’s Bio-X program Resources: Bio-X SGI Supercomputer, Bio-X PC computer cluster Colleagues: N. Batada, A. Ben-Hur, S. Bennett, E. Boas, T. Bretl, J. Brown, F. Buron, L. Chong, A. Collins, S. Elmer, P. Fong, A. Garg, S. Gokturk, H. Gonzales- Banos, K. Hauser, G. Henkelman, P. Isto, G. Jayachandran, J. Kuffner, S. Larson, M. Liang, B. Naughton, X. Liu, I. Lotan, H. Mandyam, N. Mitra, S. Mitra, A. Nguyen, YM Rhee, D. Russel, M. Saha, G. Sanchez-Ante, S. Saxonov, S. Schmidler, J. Shapiro, J. Shin, P. Shirvani, M. Shirts, C. Snow, C. Yu, B. Zagrovic, A. Zomorodian Staff: I. Contreras, P. Cook, J. Engelson, K. Hedjasi, J. McCormick, H. Nguyen, N. Riewerts, D. Shankle Friends and family
Thank you!