Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.

Slides:



Advertisements
Similar presentations
PRAGMA – 9 V.S.S.Sastry School of Physics University of Hyderabad 22 nd October, 2005.
Advertisements

Simulazione di Biomolecole: metodi e applicazioni giorgio colombo
Autonomic Scaling of Cloud Computing Resources
Multiscale Dynamics of Bio-Systems: Molecules to Continuum February 2005.
Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.
Statistics in Bioinformatics May 2, 2002 Quiz-15 min Learning objectives-Understand equally likely outcomes, Counting techniques (Example, genetic code,
Biointelligence Laboratory, Seoul National University
ANDREW MAO, STACY WONG Regrets and Kidneys. Intro to Online Stochastic Optimization Data revealed over time Distribution of future events is known Under.
Dynamic Bayesian Networks (DBNs)
Multiscale Stochastic Simulation Algorithm with Stochastic Partial Equilibrium Assumption for Chemically Reacting Systems Linda Petzold and Yang Cao University.
Computational methods in molecular biophysics (examples of solving real biological problems) EXAMPLE I: THE PROTEIN FOLDING PROBLEM Alexey Onufriev, Virginia.
Uncertainty Representation. Gaussian Distribution variance Standard deviation.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
With thanks to Zhijun Wu An introduction to the algorithmic problems of Distance Geometry.
A COMPLEX NETWORK APPROACH TO FOLLOWING THE PATH OF ENERGY IN PROTEIN CONFORMATIONAL CHANGES Del Jackson CS 790G Complex Networks
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Graphical Models for Protein Kinetics Nina Singhal CS374 Presentation Nov. 1, 2005.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Atomistic Protein Folding Simulations on the Submillisecond Timescale Using Worldwide Distributed Computing Qing Lu CMSC 838 Presentation.
P2P-based Simulator for Protein Folding Shun-Yun Hu 2005/06/03.
IMA, October 29, 2007 Slide 1 T H E B I O I N F O R M A T I C S C E N T R E A continuous probabilistic model of local RNA 3-D structure Jes Frellsen The.
After Calculus I… Glenn Ledder University of Nebraska-Lincoln Funded by the National Science Foundation.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.
Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion Mehmet Serkan Apaydin, Douglas L. Brutlag, Carlos.
Molecular Dynamics and Normal Mode Analysis of WW domain Santanu Chatterjee 1, Christopher Sweet 1, Tao Peng 2, John Zintsmaster 2, Brian Wilson 2, Jesus.
Leipzig, 17 May Markov Models of Protein Folding - Application to Molecular Dynamics Simulations Christian Hedegaard Jensen.
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
Bioinf. Data Analysis & Tools Molecular Simulations & Sampling Techniques117 Jan 2006 Bioinformatics Data Analysis & Tools Molecular simulations & sampling.
Introduction to Monte Carlo Methods D.J.C. Mackay.
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:
Using Motion Planning to Study Protein Folding Pathways Susan Lin, Guang Song and Nancy M. Amato Department of Computer Science Texas A&M University
Statistical Physics of the Transition State Ensemble in Protein Folding Alfonso Ramon Lam Ng, Jose M. Borreguero, Feng Ding, Sergey V. Buldyrev, Eugene.
Deca-Alanine Stretching
Probabilistic Robotics Bayes Filter Implementations Gaussian filters.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Chem. 860 Molecular Simulations with Biophysical Applications Qiang Cui Department of Chemistry and Theoretical Chemistry Institute University of Wisconsin,
Molecular Dynamics Simulation
Bioinformatics: Practical Application of Simulation and Data Mining Markov Modeling II Prof. Corey O’Hern Department of Mechanical Engineering Department.
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
Markov Cluster (MCL) algorithm Stijn van Dongen.
STATISTICAL COMPLEXITY ANALYSIS Dr. Dmitry Nerukh Giorgos Karvounis.
Tracking Multiple Cells By Correspondence Resolution In A Sequential Bayesian Framework Nilanjan Ray Gang Dong Scott T. Acton C.L. Brown Department of.
Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++
Sequential Monte-Carlo Method -Introduction, implementation and application Fan, Xin
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Bioinformatics: Practical Application of Simulation and Data Mining Markov Modeling I Prof. Corey O’Hern Department of Mechanical Engineering Department.
Data and Knowledge Engineering Laboratory Clustered Segment Indexing for Pattern Searching on the Secondary Structure of Protein Sequences Minkoo Seo Sanghyun.
PROTEIN FOLDING: H-P Lattice Model 1. Outline: Introduction: What is Protein? Protein Folding Native State Mechanism of Folding Energy Landscape Kinetic.
Molecular simulations of polypeptides under confinement CHEN633: Final Project Rafael Callejas-Tovar Artie McFerrin Department of Chemical Engineering.
Reducing MCMC Computational Cost With a Two Layered Bayesian Approach
Zhijun Wu Department of Mathematics Program on Bio-Informatics and Computational Biology Iowa State University Joint Work with Tauqir Bibi, Feng Cui, Qunfeng.
Cluster Based Protein Folding Douglas Fuller and Brandon McKethan.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
1 Xin Zhou Asia Pacific Center for Theoretical Physics, Dep. of Phys., POSTECH, Pohang, Korea Structuring and Sampling in Complex Conformational.
A Computational Study of RNA Structure and Dynamics Rhiannon Jacobs and Dr. Harish Vashisth Department of Chemical Engineering, University of New Hampshire,
A Computational Study of RNA Structure and Dynamics Rhiannon Jacobs and Harish Vashisth Department of Chemical Engineering, University of New Hampshire,
Introduction to Sampling based inference and MCMC
The Accelerated Weighted Ensemble
Modeling molecular dynamics from simulations
Master Equation Formalism
Home - Distributed Parallel Protein folding
1.
Filtering and State Estimation: Basic Concepts
Large Time Scale Molecular Paths Using Least Action.
Statistical Prediction and Molecular Dynamics Simulation
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Deep learning enhanced Markov State Models (MSMs)
Presentation transcript:

Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009

Motivation Proteins are essential parts of living organisms –enzymes, cell signaling, membrane transport... Composed of chain of amino acids Fold to unique 3-dimensional structure Misfolding can cause diseases –Alzheimer’s, Mad cow, Huntington’s... How do proteins fold?

Molecular dynamics Represent atoms of molecule and solvent Model forces on atoms Integrate laws of motion Small integration time step compared to motion timescales

Distributed computing for biomolecular simulation Perform multiple simulations in parallel Total simulation times – hundreds of microseconds (hundreds of CPU- years) Very powerful computational resource –~200 Teraflops sustained performance –>1,000,000 total CPUs; 200,000 active

Challenge: How to analyze? Enormous datasets –Describe dynamics in microscopic detail Questions we want to answer –Rate of folding, mechanism of folding... How can we extract these properties from our data?

Outline Markovian state model for molecular motion –Model description, uses, examples New algorithms for building these models –Defining states and transition probabilities New methods for dealing with finite sampling –Model complexity, uncertainty analysis, targeted sampling

Chemical intuition Chemical reactions often exhibit stochastic behavior n-butane Chandler, Journal of Chemical Physics (1977)

Markovian state model Define transition probabilities, or edges, between states Define states in the conformation space

Uses of the model Populations of states over time Eigenvalues and eigenvectors – conformational changes Kinetic properties – virtually any kinetic property Mechanistic properties – most likely path, probability of transitions as graph algorithms Chodera et al., Multiscale Modeling and Simulation (2006) t p

Example models Chodera et al., Multiscale Modeling and Simulation (2006) Kasson et al., PNAS (2006) lipid vesicle fusion alanine peptide Sorin and Pande, Biophysical Journal (2005) Jayachandran et al., Journal of Structural Biology (2006) villin headpiece alpha helix

Building Markovian state model –Defining states that are Markovian –Calculating the transition probabilities Refining Markovian state model –Finding the best model –Determining model uncertainty –Designing new simulations Computational and statistical challenges

Challenge: Find appropriate states Individual conformations as states does not scale Group conformations into discrete states Structural clustering is insufficient Basic algorithm – combine structural and kinetic similarity Automatic state decomposition J. D. Chodera*, N. Singhal*, V. S. Pande, K. A. Dill, and W. C. Swope. Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. Journal of Chemical Physics, 126, (2007). (*These authors contributed equally to this work) Building Markovian State Model –Defining states that are Markovian –Calculating the transition probabilities

Comparison of structural and kinetic clustering structural clusteringkinetic clustering trpzip2 Cochran et al. PNAS 98:5578, 2001.

State decomposition – splitting Cluster conformations by root mean square distance (RMSD)

State decomposition – lumping group states which inter-convert quickly

State decomposition – resplitting Cluster conformations, restricted to each state

Blocked alanine peptide   Chodera et al., Multiscale Modeling and Simulation (2006)

Automatic state decomposition of alanine peptide   Black state sits on top of multiple other states! Benefit of automatic algorithm These conformations had an unusual peptide bond

Stability of decomposition

TrpZip peptide

N. Singhal, C. D. Snow, and V. S. Pande. Using path sampling to build better Markovian state models: Predicting the folding rate and mechanism of a trp zipper beta hairpin. Journal of Chemical Physics, 121(1), (2004). Transition probabilities Discretize trajectories into series of states 12234351223435 normalize Count number of transitions between all pairs of states transition countstransition probabilities Building Markovian State Model –Defining states that are Markovian –Calculating the transition probabilities

Model selection Challenge: How many states should we have? –More states are more Markovian –More states have more parameters How do we evaluate this tradeoff? N. S. Hinrichs and V. S. Pande. Bayesian metrics for validating and improving Markovian state models for molecular dynamics simulations. (In preparation) Refining Markovian State Model –Finding the best model –Determining model uncertainty –Designing new simulations

Hidden Markov Model formulation Formulate the problem as a Hidden Markov Model structure scoring question Different discretizations of continuous space Benefits of Bayesian scores –Naturally handles tradeoff between complexity of model and amount of data –Avoids over-fitting of parameters States Observations

Alanine peptide results Score of Hidden Markov models for different lag times Last model is worse at shorter times but preferred at longer times No previous evaluation methods could distinguish these models

Uncertainty analysis Goal: Once we have the states, what is the uncertainty in the model? Both are reasonable but give different transition probabilities  Different MFPT, P fold, eigenvalues, eigenvectors... N. Singhal and V. S. Pande. Error analysis and efficient sampling in Markovian state models for protien folding. Journal of Chemical Physics, 123, (2005). N. S. Hinrichs and V. S. Pande. Calculation of the distribution of eigenvalues and eigenvectors in Markovian state models for molecular dynamics. Journal of Chemical Physics, 126, (2007) Uncertainty caused by finite sampling Refining Markovian State Model –Finding the best model –Determining model uncertainty –Designing new simulations

Transition probabilities Recall that we calculate transition probabilities by counting: Instead of getting a single value, we can talk about the distribution of transition probabilities Bayes’ Rule: p ij i k j i k j

Sampling approach Possible solution to get distribution of eigenvalues: Problem: sampling can be expensive solving per sample can be expensive solve for eigenvalue [p ij ] solve for eigenvalue [p ij ] solve for eigenvalue [p ij ]

Closed-form solution Idea: trade exact distribution for efficient approximation Taylor series expansion: efficient to calculate using adjoint systems Multivariate normal approximation of  p i*  Closed-form normal distribution for Eigenvalue equation:

Uncertainty results 5000 trajectories from each state Running times (6 states) Sampling-based: 40 seconds Closed-form: < 0.01 seconds Alanine SystemTransition Counts Running times (87 states) Sampling-based: 3600 seconds Closed-form: < 0.07 seconds

Sampling strategies Problem: Simulations are expensive. Even with we run simulations for months How to intelligently allocate our resources? Common approaches: equilibrium sampling – sample each conformation from its equilibrium distribution even sampling – sample equally from each state New sequential approaches N. Singhal and V. S. Pande. Error analysis and efficient sampling in Markovian state models for protien folding. Journal of Chemical Physics, 123, (2005). N. S. Hinrichs and V. S. Pande. Calculation of the distribution of eigenvalues and eigenvectors in Markovian state models for molecular dynamics. Journal of Chemical Physics, 126, (2007). Refining Markovian State Model –Finding the best model –Determining model uncertainty –Designing new simulations

Adaptive sampling Goal: Reduce uncertainty of eigenvalue Uncertainty analysis decomposes by transitions from each state Variance depends on both uncertainty of and sensitivity to transition probabilities

Adaptive sampling – alanine On 6-state alanine system, select trajectories randomly for 3 sampling strategies Transition Counts

Adaptive sampling – villin Benefits –Very quickly reduce the variance –Reduce the total number of simulations –Need less computational power –Can study more complex systems Villin Headpiece Jayachandran, et al., Journal of Chemical Physics (2006) 2454 states

Summary Markovian state models are convenient methods to describe molecular motion Automatic state decomposition –Scalable to large size systems Model selection –Evaluate tradeoff between model complexity and amount of data Uncertainty analysis –Efficient and decomposable Adaptive sampling –Reduce number of simulations

Acknowledgements Vijay Pande – Stanford University adviser Bill Swope, Jed Pitera – IBM collaborators John Chodera – state decomposition work