1 Protein Folding Atlas F. Cook IV & Karen Tran
2 Overview What is Protein Folding? Motivation Experimental Difficulties Simulation Models: Configuration Spaces Triangular Lattice models Pull Moves Probabilistic Roadmaps Map-Based Master Equation (MME) Map-Based Monte Carlo (MMC) Conclusion
3 Motivation What is protein folding? Folding/Morphing process 1D Amino Acid Chain 3D Folded protein
4 Motivation Why study protein folding? Proteins regulate almost all cellular functions 1D chain dictates 3D shape (NP-Hard) 3D Shape determines protein’s function 1D amino acid chain 3D folded protein
5 Motivation Holy grail of Protein Folding Build amino acid chain that: folds into a desired shape and has a nice function (e.g., kill cancer cells) How would we do this? Kill Cancer Cells Desired Function Required Shape Required Amino Acid Chain
6 Motivation Another reason to study protein folding: Unfolded protein = vulnerable protein
7 Motivation Misfolded proteins cause diseases: Alzheimer’s Mad Cow Parkinson’s Understand protein folding cure diseases!
8 Terminology Primary Structure 1D Amino Acid Chain (string) MGDVEKGKKIFIMKCSQCH Secondary Structure Local patterns in a global folding Helices and Strands Tertiary Structure Global 3D folded shape
9 Experimental Difficulties Levinthal Paradox Exponentially many ways to fold, yet folding occurs rapidly (milliseconds to seconds) Why is folding so fast? Unfolded protein = vulnerable protein Experimental observation Too slow to capture all significant motions Our Goal: Simulate protein folding on computer!
10 Simulation Models HP Lattice Model: [Böckenhauer08] HP = Hydrophobic-Polar Models forces between Hydrophobic amino acids
11 Simulation Models HP Lattice Model: [Böckenhauer08] Amino acid vertex in a grid Protein self-avoiding chain in a grid Amino AcidChain
12 Simulation Models HP Lattice Model: [Böckenhauer08] Spring-like forces are modeled between neighboring amino acids. Sum of forces for a state Energy Energy = Energy =
13 Simulation Models HP Lattice Model: [Böckenhauer08] Global min energy “native state” = final folded state Native state is stable. Global minimum is MUCH smaller than local minima Global min Energy =
14 Simulation Models HP Lattice Model: [Böckenhauer08] A state is defined by the position of every amino acid in the chain A StateAnother State
15 Simulation Models HP Lattice Model: [Böckenhauer08] Configuration space = set of all possible states Exponential to protein length Protein folding simulation: “Move” from start state goal state.
16 Simulation Models HP Lattice Model: [Böckenhauer08] Move Properties: Complete – moves can reach all feasible states Reversible – every move has an inverse
17 Simulation Models HP Lattice Model: [Böckenhauer08] Forward Pull Move Pull vertex 5 to a new position Before moveAfter move
18 Simulation Models HP Lattice Model: [Böckenhauer08] Tabu Search Greedy, heuristic search Simulates protein folding Pull moves transform start state local minimum Records recent moves in a Tabu list Fast backtracking to different paths Summary of HP Lattice Model: Input: Amino acid sequence Output: Heuristically folded protein
19 Probabilistic Roadmap Model
20 Simulation Models Probabilistic Roadmap [Song04] 2D Graph (Configuration space): Each point represents an entire state (all amino acids). Obstacles are infeasible states
21 Simulation Models Probabilistic Roadmap [Song04] Goal: Given start & goal states Find “best path” from start goal
22 Simulation Models Probabilistic Roadmap [Song04] 3 Steps: 1.Node generation: Generate points randomly (dense near the goal state) 2.Roadmap Construction Connect nearest neighbors graph 3.Query roadmap Dijkstra’s algorithm shortest path Shortest path = set of states Describes the dynamic folding process
23 Simulation Models Probabilistic Roadmap [Song04] 1.Node generation: Generate random points “Obstacles” are infeasible (self-overlapping) states
24 Simulation Models Probabilistic Roadmap [Song04] 2.Roadmap Construction Connect nearest neighbors graph
25 Simulation Models Probabilistic Roadmap [Song04] 3.Query roadmap Dijkstra’s algorithm shortest path Path = set of states that describes the folding process
26 Molecular Dynamics Model
27 Simulation Models Molecular Dynamics Models [Tapia07] Model forces based on Newton’s laws of motion Very accurate Very slow! Simulating one microsecond of folding for a 36 residue protein = Months of supercomputer time! Cannot handle full length proteins
28 Simulation Models Map-based Master Equation (MME) [Tapia07] Fast enough to study full length proteins More accurate than simplistic lattice models MME is an extension of a Probabilistic Roadmap Probabilistic roadmap ≈ Viterbi algorithm returns one optimal path MME ≈ Baum-Welch algorithm Maintains transition probabilities for every state Learning is executed until probabilities stabilize. Can return the probability of any state at time t.
29 Simulation Models Map-based Monte-Carlo (MMC) [Tapia07] MMC = Probabilistic Roadmap + Monte-Carlo Monte-Carlo [Wiki08_MC] random sampling + algorithms = result Example: Battleship Make random shots Apply prior knowledge Battleship = 4 vertical/horizontal dots Apply algorithms to quickly sink the ship
30 Simulation Models Map-based Monte-Carlo (MMC) [Tapia07] Fast & reasonably accurate Models the protein as an articulated figure Each joint = set of angles Movement-based (kinetic) statistics Results suggest that: Local helix structures form first Folding occurs around hydrophobic core
31 Conclusion Protein Folding: 1D Amino acid chain folds into 3D structure Misfolding Alzheimer’s, Parkinson’s, Mad Cow diseases Folding is too fast to observe experimentally Four Simulation Models: 1.Triangular Lattice model (2D Graph) Vertex = one amino acid “Moves” transition between states
32 Conclusion Four Simulation Models (cont.) 2.Probabilistic Roadmaps Vertex represents state of entire protein Random sampling + Dijkstra’s alg Best folding route ≈ Viterbi (returns one path) 3.Map-Based Master Equation (MME) Learn probabilities ≈ Baum-Welch (confidence level for each state) 4.Map-Based Monte Carlo (MMC) Articulated figures with joints model proteins
33 References: [Böckenhauer08] Hans-Joachim Böckenhauer, Abu Zafer M. Dayem Ullah, Leonidas Kapsokalivas, and Kathleen Steinhöfel. A local move set for protein folding in triangular lattice models. In Keith A. Crandall and Jens Lagergren, editors, WABI, volume 5251 of Lecture Notes in Computer Science, pages 369–381. Springer, [Dobson99] C. Dobson and M. Karplus. The fundamentals of protein folding: bringing together theory and experiment. Current Opinion in Structural Biology, 9:928–101, 1999.
34 References: [Song04] G. Song and N. M. Amato. A motion planning approach to folding: From paper craft to protein folding. Proc. IEEE Transactions on Robotics and Automatics, 20:60–71, [Tapia07] Lydia Tapia, Xinyu Tang, Shawna Thomas, and Nancy M. Amato. Kinetics analysis methods for approximate folding landscapes. Bioinformatics, 23(13):i539–i548, 2007.
35 References: [˘Sali94] A., E. Shakhnovich, and M. Karplus. How does a protein fold? Nature, 369:248–251, [Wiki08] Wikipedia. Protein folding — Wikipedia, the free encyclopedia, [Wiki08_MC] Wikipedia. Monte-Carlo method — Wikipedia, the free encyclopedia,
36 Thank you for your attention. Questions
37 Extra Slides
38 Simulation Models Map-based Master Equation (MME) [Tapia07] MME = Probabilistic roadmap + Master Equation Master Equation – set of equations defining the probability of a system to be in a discrete set of states at a given time.