Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stochastic Roadmap Simulation: An efficient representation and algorithm for analyzing molecular motion Mehmet Serkan Apaydιn May 27 th, 2004.

Similar presentations


Presentation on theme: "Stochastic Roadmap Simulation: An efficient representation and algorithm for analyzing molecular motion Mehmet Serkan Apaydιn May 27 th, 2004."— Presentation transcript:

1 Stochastic Roadmap Simulation: An efficient representation and algorithm for analyzing molecular motion Mehmet Serkan Apaydιn May 27 th, 2004

2 Molecular motion is an essential process of life Stanford bio-x cluster An NMR spectrometer (CS273) Bovine Spongiform Encephalopathy (BSE) http://www.usd.edu/eric/ protein (mis)-folding Drug molecules act by binding to proteins http://www.the-scientist.com Ligand-protein binding

3 Computing p fold, the best order parameter in protein folding is expensive using classical simulation techniques Unfolded set Folded set p fold 1- p fold “We stress that we do not suggest using p fold as a transition coordinate for practical purposes as it is very computationally intensive.” Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition Coordinate for Protein Folding” Journal of Chemical Physics (1998). HIV integrase [Du et al. ‘98]

4 Stochastic Roadmap Simulation (SRS) molecular motion Develop efficient computational representations and algorithms to study molecular motion pathways for protein folding and ligand-protein binding

5 Contributions New computational framework for studying molecular motion –Transition probabilities –Correspondence to Monte Carlo –First step analysis –Extension to non-uniform sampling Computation of ensemble properties: –protein folding: p fold parameter comparison with Monte Carlo Quantitative predictions of experimental values –ligand-protein binding: escape time Qualitative predictions about the role of amino acids in the active site of a protein Application to distinguish the catalytic site from a set of potential binding sites P ij

6 Outline Background Stochastic Roadmap Simulation Applications –Protein folding –Ligand-protein binding Extension of basic framework Quantitative prediction of experimental results on protein folding

7 Proteins and their structure Macromolecule Building block of life.

8 Ligand-Protein Binding

9 Simulating molecular motion Monte Carlo (MC) or Molecular Dynamics http://folding.stanford.edu

10 Molecular Representations Atomistic model Linkage model –Internal parameter representation (bond angles, lengths, torsional angles) –Each secondary structure element as a vector [Lotan `04]

11 Analogy with Robotics X0X0 11 22 33 X1X1 Y0Y0 X2X2 X3X3

12 Molecular Energetics E = E S + E  + E S-B + E T or + E vdW + E dipole bonded terms non-bonded terms Force fields Gō models Hydrophobic-Polar models (cs273)

13 MC simulation

14

15 Problems with Monte Carlo Simulation  Each run generates a single pathway  Much time is wasted in local minima

16 A path planning technique: Probabilistic Roadmaps (PRM) [Kavraki et.al.`96] C-obstacle Preprocessing Configuration space node Query edge Qinit Qgoal

17 Application of PRM to molecular motion Study of ligand-protein binding Probabilistic roadmaps with edges weighted by energetic plausibility Search for the minimum weight paths [Singh, Latombe, Brutlag, `99]

18 Application of PRM to molecular motion Study of ligand-protein binding Probabilistic roadmaps with edges weighted by energetic plausibility Search for the minimum weight paths Extensions to protein folding [Singh, Latombe, Brutlag, `99] [Song and Amato, `01] [Apaydın et al., `01]

19 How many pathways are there in a roadmap? n/m 23456 22 3412 4838184 5161259768512 6324145382793841262816 1, 2, 12, 184, 8512, 1262816, 575780564, 789360053252, 3266598486981642, (10x10) 41044208702632496804, (11x11) 1568758030464750013214100, (12x12) 182413291514248049241470885236 http://mathworld.wolfram.com/Self-AvoidingWalk.html Number of Self-Avoiding Walks on a 2D Grid

20 Outline Background Stochastic Roadmap Simulation Applications –Protein folding –Ligand-protein binding Future work

21 New Idea: Stochastic Conformational Roadmaps Capture the stochastic nature of molecular motion by assigning probabilities to edges vivi vjvj P ij [Apaydın et. al., RECOMB `02, WAFR`02] Collaborators: C. Guestrin, D. Hsu

22 Edge probabilities Self transition probabilities: P ij vivi vjvj P ii Follow Metropolis criteria: Correspond to probabilities in Monte Carlo simulation.

23 S Relationship to MC simulation P ij Each path on graph = a path of MC simulation Roadmap represents many MC simulation paths simultaneously Stochastic Roadmap Simulation and Monte Carlo Simulation converge to the same distribution  (the Boltzmann distribution).

24 Using SRS to compute ensemble properties P ij Markov chain Treat roadmap as a Markov chain and use First-Step Analysis

25 Application of SRS to protein folding: Probability of Folding p fold Unfolded set Folded set p fold 1- p fold “We stress that we do not suggest using p fold as a transition coordinate for practical purposes as it is very computationally intensive.” Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition Coordinate for Protein Folding” Journal of Chemical Physics (1998). HIV integrase [Du et al. ‘98]

26 P ii F: Folded setU: Unfolded set First-Step Analysis P ij i k j l m P ik P il P im Let f i = p fold (i) After one step: f i = P ii f i + P ij f j + P ik f k + P il f l + P im f m =1  One linear equation per node  Solution gives p fold for all nodes  No explicit simulation run  All pathways are taken into account  Sparse linear system

27 In Contrast … Computing p fold with MC simulation requires:  Performing many MC simulation runs  Counting the number of times F is attained first for every conformation of interest:

28 Comparison: SRS vs. MC (on synthetic landscape) Number of nodes L1 Distance

29 Computational Tests on two real proteins 1ROP (repressor of primer) 2  helices 6 DOF 1HDD (Engrailed homeodomain) 3  helices 12 DOF H-P energy model with steric clash exclusion [Sun et al., `95]

30 Differences in p fold values obtained by SRS and MC for 1ROP and 1HDD Number of nodes L1 Distance

31 p fold on real protein: ß hairpin Immunoglobin binding protein (Protein G) Last 16 amino acids C-α based representation Gō model based energy 42 DOFs [Zhou and Karplus, `99]

32 Comparison between SRS and MC for ß hairpin Number of nodes L1 Distance

33 Computation Times (ß hairpin) Monte Carlo: (30 simulations) 1 conformation ~10 hours of computer time Over 10 7 energy computations Roadmap: 2000 conformations 23 seconds of computer time ~50,000 energy computations ~6 orders of magnitude speedup!

34 Outline Background Stochastic Roadmap Simulation Applications –Protein folding –Ligand-protein binding Extension of basic framework Quantitative prediction of experimental results on protein folding

35 Application of SRS to Ligand-Protein Interactions  Distinguishing catalytic site: Among several potential binding sites, which one is the catalytic site?  Studying effect of catalytic amino acids upon binding/unbinding [Apaydın et. al., ECCB ‘02] Collaborators: C. Guestrin, C. Varma

36 Funnels of attractions and escape time from a funnel Potential binding sites Funnel = Energy gradient around a site that guides the ligand to that site. Defined as all ligand conformations within 10A rmsd of the site. [Camacho and Vajda `01] Computation of escape time from funnels of attraction around potential binding sites

37 Computing Escape Time with Roadmap Funnel of Attraction i j k l m P ii P im P il P ik P ij  i = 1 + P ii  i + P ij  j + P ik  k + P il  l + P im  m (escape time is measured as number of steps of stochastic simulation) = 0

38 Results on lactate dehydrogenase C C O O O GLN-101 ARG-106 ASP-195 HIS-193 ASP-166 ARG-169 NADH + + + Loop CH 3 THR-245 3.216E6 Escape Time N/A Change Wildtype Mutant

39 Results on lactate dehydrogenase C C O O O GLN-101 ALA-106 ASP-195 ALA-193 ASP-166 ARG-169 NADH + Loop CH 3 4.126E2 3.216E6 Escape Time  N/A Change His193  Ala- Arg106  Ala Wildtype Mutant

40 Results on lactate dehydrogenase 4.607E5 1.669E6 5.221E7 2.550E2 3.381E3 4.126E2 3.216E6 Escape Time  No change     N/A Change Thr245  Gly Gln101  Arg Asp195  Asn Arg106  Ala His193  Ala His193  Ala- Arg106  Ala Wildtype Mutant C C O O O GLN-101 ARG-106 ASP-195 HIS-193 ASP-166 ARG-169 NADH + + + Loop CH 3 GLY-245

41 Outline Background Stochastic Roadmap Simulation Applications –Protein folding –Ligand-protein binding Extension of basic framework Quantitative prediction of experimental results on protein folding

42 A non uniform sampling strategy: sampling local minima and saddles of the landscape [Henkelman, Jonsson’99]

43 Adding critical points to the roadmap obtains the same quality in p fold values with less number of nodes Number of nodes L1 Distance

44 Outline Background Stochastic Roadmap Simulation Applications –Protein folding –Ligand-protein binding Extension of basic framework Quantitative prediction of experimental results on protein folding

45 Using p fold to make quantitative predictions Connecting theory with experiment: –Rates –Φ values Transition State computation using: –Energy barriers considering monotonic pathways –P fold considering all pathways [Garbuzynskiy, Finkelstein, Galzitskaya `04] Collaborators: TH Chiang, D. Hsu (N.U. Singapore) [Fersht `99]

46 Φ Value Results using pfold are better for 3 (out of 5) proteins Protein Correlation to experiment in [Garbuzynskiy et. al., `04] Correlation to experiment with p fold B1 IgG-binding domain of protein G 0.740.78 Src SH3 domain 0.630.65 SH3 domain of  -spectrin 0.810.78 Sso7d 0.580.28 CI2 0.350.51

47 Computing rates with p fold results in better correlation with experiment [Garbuzynskiy et. al., `04] using p fold Correlation: 0.67Correlation: 0.83 --experimental rate --computed rate Protein # log(k f )

48 Contributions New computational framework for studying molecular motion –Transition probabilities –Correspondence to Monte Carlo –First step analysis –Extension to non-uniform sampling Computation of ensemble properties: –protein folding: p fold parameter comparison with Monte Carlo Quantitative predictions of experimental values –ligand-protein binding: escape time Qualitative predictions about the role of amino acids in the active site of a protein Application to distinguish the catalytic site from a set of potential binding sites P ij

49 Future work Non-uniform sampling on high-dimensional examples Computing and reducing the error in the computed parameters Estimating the number of nodes needed Exploring larger systems and pushing the experiment q3q3 q1q1 q2q2 q4q4 q5q5

50 SRS code available! Visit: http://robotics.stanford.edu/~apaydin/software.html

51 Acknowledgements  My advisors: Prof. Latombe, Prof. Brutlag Prof. Van Roy Prof. McCluskey  My committee: Prof. Motwani, Prof. Vuckovic  Coauthors: D. Hsu, C. Guestrin, S. Kasif, A. Singh, C. Varma  Collaborators: TH Chiang, J. Greenberg, S. Ieong, F. Schwarzer, R. Singh, A. Tellez  Faculty: Prof. Altman, Prof. Baldwin, Prof. Guibas, Prof. Pande Prof. Kavraki (Rice) Prof. Zell (Tuebingen) Prof. Snoeyink (UNC)  Funding: David L. Cheriton Stanford Graduate Fellowship NSF Biogeometry grant Stanford’s Bio-X program  Resources: Bio-X SGI Supercomputer, Bio-X PC computer cluster  Colleagues: N. Batada, A. Ben-Hur, S. Bennett, E. Boas, T. Bretl, J. Brown, F. Buron, L. Chong, A. Collins, S. Elmer, P. Fong, A. Garg, S. Gokturk, H. Gonzales- Banos, K. Hauser, G. Henkelman, P. Isto, G. Jayachandran, J. Kuffner, S. Larson, M. Liang, B. Naughton, X. Liu, I. Lotan, H. Mandyam, N. Mitra, S. Mitra, A. Nguyen, YM Rhee, D. Russel, M. Saha, G. Sanchez-Ante, S. Saxonov, S. Schmidler, J. Shapiro, J. Shin, P. Shirvani, M. Shirts, C. Snow, C. Yu, B. Zagrovic, A. Zomorodian  Staff: I. Contreras, P. Cook, J. Engelson, K. Hedjasi, J. McCormick, H. Nguyen, N. Riewerts, D. Shankle  Friends and family

52 Thank you!


Download ppt "Stochastic Roadmap Simulation: An efficient representation and algorithm for analyzing molecular motion Mehmet Serkan Apaydιn May 27 th, 2004."

Similar presentations


Ads by Google