Protein Design CS273: Final Project Charles Kou Crystal structure of top7 – A novel protein structure created with RosettaDesign.

Slides:



Advertisements
Similar presentations
ROTAMER OPTIMIZATION FOR PROTEIN DESIGN THROUGH MAP ESTIMATION AND PROBLEM-SIZE REDUCTION Hong, Lippow, Tidor, Lozano-Perez. JCC Presented by Kyle.
Advertisements

Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Crystallography -- lecture 21 Sidechain chi angles Rotamers Dead End Elimination Theorem Sidechain chi angles Rotamers Dead End Elimination Theorem.
Short fast history of protein design Site-directed mutagenesis -- protein engineering (J. Wells, 1980's) Coiled coils, helix bundles (W. DeGrado, 1980's-90's)
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Geometric Algorithms for Conformational Analysis of Long Protein Loops J. Cortess, T. Simeon, M. Remaud- Simeon, V. Tran.
Optimization methods Morten Nielsen Department of Systems biology, DTU.
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
Protein Docking and Interactions Modeling CS 374 Maria Teresa Gil Lucientes November 4, 2004.
Taking a Numeric Path Idan Szpektor. The Input A partial description of a molecule: The atoms The bonds The bonds lengths and angles Spatial constraints.
“Inverse Kinematics” The Loop Closure Problem in Biology Barak Raveh Dan Halperin Course in Structural Bioinformatics Spring 2006.
Optimization via Search CPSC 315 – Programming Studio Spring 2009 Project 2, Lecture 4 Adapted from slides of Yoonsuck Choe.
FLEX* - REVIEW.
Mossbauer Spectroscopy in Biological Systems: Proceedings of a meeting held at Allerton House, Monticello, Illinois. Editors: J. T. P. DeBrunner and E.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Algorithm for Fast MC Simulation of Proteins Itay Lotan Fabian Schwarzer Dan Halperin Jean-Claude Latombe.
4. Modeling 3D-periodic systems Cut-off radii, charges groups Ewald summation Growth units, bonds, attachment energy Predicting crystal structures.
Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion Mehmet Serkan Apaydin, Douglas L. Brutlag, Carlos.
RAPID: Randomized Pharmacophore Identification for Drug Design PW Finn, LE Kavraki, JC Latombe, R Motwani, C Shelton, S Venkatasubramanian, A Yao Presented.
Algorithmic Robotics and Molecular Modeling Dan Halperin School of Computer Science Tel Aviv University June 2007.
Computational Structure-Based Redesign of Enzyme Activity Cheng-Yu Chen, Ivelin Georgiev, Amy C.Anderson, Bruce R.Donald A Different computational redesign.
Optimization via Search CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 4 Adapted from slides of Yoonsuck Choe.
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
Bioinf. Data Analysis & Tools Molecular Simulations & Sampling Techniques117 Jan 2006 Bioinformatics Data Analysis & Tools Molecular simulations & sampling.
Introduction to Monte Carlo Methods D.J.C. Mackay.
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
COMPARATIVE or HOMOLOGY MODELING
Conformational Sampling
Computational protein design. Reasons to pursue the goal of protein design In medicine and industry, the ability to precisely engineer protein hormones.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
02/03/10 CSCE 769 Dihedral Angles Homayoun Valafar Department of Computer Science and Engineering, USC.
ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory.
De novo Protein Design Presented by Alison Fraser, Christine Lee, Pradhuman Jhala, Corban Rivera.
Rotamer Packing Problem: The algorithms Hugo Willy 26 May 2010.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Doug Raiford Lesson 19.  Framework model  Secondary structure first  Assemble secondary structure segments  Hydrophobic collapse  Molten: compact.
Conformational Space.  Conformation of a molecule: specification of the relative positions of all atoms in 3D-space,  Typical parameterizations:  List.
SimBioSys Inc.© 2004http:// Conformational sampling in protein-ligand complex environment Zsolt Zsoldos SimBioSys Inc., © 2004 Contents:
Altman et al. JACS 2008, Presented By Swati Jain.
Structure prediction: Homology modeling
New Strategies for Protein Folding Joseph F. Danzer, Derek A. Debe, Matt J. Carlson, William A. Goddard III Materials and Process Simulation Center California.
Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++
Protein Design with Backbone Optimization Brian Kuhlman University of North Carolina at Chapel Hill.
Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson Let’s think!
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
FlexWeb Nassim Sohaee. FlexWeb 2 Proteins The ability of proteins to change their conformation is important to their function as biological machines.
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Mean Field Theory and Mutually Orthogonal Latin Squares in Peptide Structure Prediction N. Gautham Department of Crystallography and Biophysics University.
In silico Protein Design: Implementing Dead-End Elimination algorithm
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Protein Structures from A Statistical Perspective Jinfeng Zhang Department of Statistics Florida State University.
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
Bioinformatics 2 -- lecture 20 Protein design -- the state of the art.
Optimization via Search
Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction Mario Garza-Fabre, Shaun M. Kandathil, Julia.
Subject Name: Operation Research Subject Code: 10CS661 Prepared By:Mrs
Dead-End Elimination for Protein Design with Flexible Rotamers
Yang Zhang, Andrzej Kolinski, Jeffrey Skolnick  Biophysical Journal 
More on Search: A* and Optimization
Protein structure prediction.
Conformational Search
Md. Tanveer Anwar University of Arkansas
Stochastic Methods.
Presentation transcript:

Protein Design CS273: Final Project Charles Kou Crystal structure of top7 – A novel protein structure created with RosettaDesign.

What is Protein Design  Opposite of structure prediction: determine low energy sequence that yield given structure.  Computationally difficult:  Search space of 20^n where n = sequence length (20 amino acids)  Major algorithms: Dead-end elimination, genetic algorithms, Monte Carlo, Branch & Bound.

Major Algorithms  Trade off between thoroughness and computational speed.  Monte Carlo / Genetic Algorithm:  Can sample space with infinite number of solutions  Sidechain identity, side chain orientation and backbone structure can be varied continuously.  No guarantee of reaching global energy minimum.  Dead-End Elimination  Allows only discrete conformations.  Rejection criteria is used to prune the search space. Desjarlais JR, Clarke ND. Computer search algorithms in protein modification and design. Curr Opin Struct Biol Aug;8(4): PubMedPubMed

Review: Energy Landscape q1q1 qiqi q2q2 qjqj q N-1 qNqN defined over large dimensional conformation space JC Lantombe, Energy2.ppt

Review: Example Energy Function E =  bonded terms +  non-bonded terms +  solvation terms E = (ES + EQ + ES-B + ETor) + (EvdW + Edipole)  Bonded terms - Relatively few  Non-bonded terms - Depend on distances between pairs of atoms - O(n 2 )  Expensive to compute  Solvation terms - May require computing molecular surface JC Lantombe, Energy2.ppt

 Random walk through conformation space  At each cycle: –Perturb current conformation at random –Accept step with probability: (Metropolis acceptance criterion)  The conformations generated by an arbitrarily long MCS are Boltzman distributed, i.e., #conformations in V ~ Review: Monte Carlo Simulation (MCS) JC Lantombe, Energy2.ppt

Monte Carlo Simulation  Tend to waste time in local min.  May consist of millions of steps.  Energy must be evaluated frequently (computationally expensive).  Use ChainTree to improve performance. Koehl, P and Levitt, M. De novo protein design. I. In search of stability and specificity. Journal of Molecular Biology, 293, (1999). Lotan, I., Schwarzer, F., Halperin, D., Latombe, J.C.: Efficient maintenance and self-collision testing for kinematic chains. In: Symposium on Computational Geometry (2002) 43–52

Genetic Algorithm Starts with First generation pool.  Iteratively apply genetic operators (selection, recombination, mutation).  Evloves toward better solution (low energy function). S. M. Larson, J. England, J. DesJarlais, and V. S. Pande. Thoroughly sampling sequence space: large-scale protein design of structural ensembles. Protein Science (2002). Protein ScienceProtein Science

Selection Selection function takes into account the value of fitness function. This gives priority to the “fit” organism but also gives chance for “less fit” organisms.

Selection Method Roulette Method: probability of selection is proportional to the value of fitness function Tournament: picks k individuals (tournament size), and choose the individual with probability p. Iterate with probability p*(1-p), then p*(p*(1-p)) … Higher k = less chance for weaker individual.

Recombination, Mutation  Recombination: different segment of the structure which is optimized in parallel can be recombined into the same model. Recombination occurs with a set probability. Otherwise, the population is propogated to the next generation.  Mutation: avoids local minima by mutating the child with a set probability.  Similar to MC: there is no guarantee to converge into global minimum.

uses distributed computing and genetic algorithm. It also incorporates backbone flexibility using Monte Carlo (random perturbation with RMSD<1.0a) which improves the result.

Dead-end Elimination  Discrete conformational search.  Functionally equivalent to exhustive search.  It uses rejection criteria to prune the search space.  The robustness depends on the discreteness and the rejection criteria used.  Guaranteed convergence to global min.  Initially used for sidechain placement. More difficult for protein design because of high degrees of freedom. Looger LL, Hellinga HW. Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable: implications for protein design and structural genomics. J Mol Biol Mar 16;307(1): PubMedPubMed

Energy of conformation  Reformulation of sidechain placement problem: Amino acid identity is used instead of rotamer.  The general DEE allows residue up to 300.  Energy of conformation is defined as sum of interaction among side chains and sum of interaction of sidechain and the backbone.  Rejection criteria is used and iterated until no more rotamers can be eliminated. Convergence occurs, or reduces the problem sufficiently for exhaustive serach.

DEE filter: Rejection Criteria  Simple Criterion: If lowest energy struct that can be found using a given sidechain rotamer (low energy side chain conformation) is higer than the highest energy struct w/ different rotamer, the first rotamer is eliminated.

DEE filter: Rejection Criteria  Goldstein Criteria: if energy struct containing one rotamer is always lowered by changing to a second one, the first one is eliminated.

DEE filter: Rejection Criteria  Generalized Criterion: residues are added in group, eliminated clusters of rotamers in the groups maybe excluded from the minimum operator, in addition to those which form dead-end clusters with c.

Mean Field Theory Reduce search space. Self-consistency is sought by placing amino acids at pre-selected positions in a given structure. Energy function is minimized by mean field. Voigt CA, Mayo SL, Arnold FH, Wang ZG. Computational method to reduce the search space for directed protein evolution. Proc Natl Acad Sci U S A Mar 27;98(7): PNASPNAS

Review: Branch & Bound Set of solutions can be partitioned into subsets (branch) Upper limit on a subset’s solution can be computed fast (bound) Branch & Bound 1.Select subset with best possible bound 2.Subdivide it, and compute a bound for each subset S.Batzoglou, Threading2.ppt

Rosetta Design Initial backbone designed without regard to side-chain packing. Iterates between sequence design and backbone optimization using Monte Carlo. Perturbation in random change in the torsional angles of 1-5 random residue, or substitution of backbone torsonal angles of 1- 3 consecutive residues with torsional angles from a structure in the PDB. Sidechain optimization. Accept/reject using Metropolis criterion a backbone atom RMSD between model and structure. Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D. Design of a novel globular protein fold with atomic-level accuracy. Science Nov 21;302(5649): PubMedPubMed Crystal structure of top7 – A novel protein structure created with RosettaDesign.

Using Rosetta Design Red: PDB 1A1M: Mhc Class I Molecule B*5301 Complexed With Peptide Typdinqml From Gag Protein Of Hiv2 Blue: Rosetta Stone Designed Visualized with Deep View / Swiss- PdbViewer.

b.e.a.n.s. A simple openGL based program was developed to test monte carol and genetic algorithms on designing “chain of jelly beans.” User is able to vary the initial structure of the “beans” and compare the efficiency of the algorithms via built-in timer.