Protein Folding in the 2D HP Model Alexandros Skaliotis – King’s College London Joint work with: Andreas Albrecht (University of Hertfordshire) Kathleen.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Algorithm Design Methods (I) Fall 2003 CSE, POSTECH.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Hidden Markov Model in Biological Sequence Analysis – Part 2
Fast Algorithms For Hierarchical Range Histogram Constructions
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.
CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep
Protein Structure Prediction With Evolutionary Algorithms Natalio Krasnogor, U of the West of England William Hart, Sandia National Laboratories Jim Smith,
Channel Assignment using Chaotic Simulated Annealing Enhanced Neural Network Channel Assignment using Chaotic Simulated Annealing Enhanced Hopfield Neural.
Global Optimization: For Some Problems, There's HOPE Daniel M. Dunlavy Sandia National Laboratories, Albuquerque, NM, USA Dianne P. O’Leary Dept. of Computer.
Global Optimization: For Some Problems, There’s HOPE Daniel M. Dunlavy University of Maryland, College Park Applied Mathematics and Scientific Computation.
Recent Development on Elimination Ordering Group 1.
1 An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem Matthew Streeter & Stephen Smith Carnegie Mellon University NESCAI, April
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2006 Lecture 7 Monday, 4/3/06 Approximation Algorithms.
1 IOE/MFG 543 Chapter 14: General purpose procedures for scheduling in practice Sections : Dispatching rules and filtered beam search.
Protein Tertiary Structure Prediction. Protein Structure Prediction & Alignment Protein structure Secondary structure Tertiary structure Structure prediction.
Generating Supply Voltage Islands In Core-based System-on-Chip Designs Final Presentation Steven Beigelmacher Gall Gotfried 04/26/2005.
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
CDCTree: Novel Obstacle-Avoiding Routing Tree Construction based on Current Driven Circuit Model Speaker: Lei He.
Face-centered cubic (FCC) lattice models for protein folding: energy function inference and biplane packing Allan Stewart.
Introduction to Simulated Annealing 22c:145 Simulated Annealing  Motivated by the physical annealing process  Material is heated and slowly cooled.
By Rohit Ray ESE 251.  Most minimization (maximization) strategies work to find the nearest local minimum  Trapped at local minimums (maxima)  Standard.
Elements of the Heuristic Approach
1 Protein Folding Atlas F. Cook IV & Karen Tran. 2 Overview What is Protein Folding? Motivation Experimental Difficulties Simulation Models:  Configuration.
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
1 A Combinatorial Toolbox for Protein Sequence Design and Landscape Analysis in the Grand Canonical Model Ming-Yang Kao Department of Computer Science.
Efficient Model Selection for Support Vector Machines
1 IE 607 Heuristic Optimization Simulated Annealing.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Doshisha Univ. JapanGECCO2002 Energy Minimization of Protein Tertiary Structure by Parallel Simulated Annealing using Genetic Crossover Takeshi YoshidaTomoyuki.
1 Oblivious Routing in Wireless networks Costas Busch Rensselaer Polytechnic Institute Joint work with: Malik Magdon-Ismail and Jing Xi.
Informed search algorithms
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
A Survey of Protein Folding in HP Model Presented by: T.K. Yu 2003/7/24
Simulated Annealing.
Comp. Genomics Recitation 3 The statistics of database searching.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Mathematical Models & Optimization?
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Approximation Algorithms For Protein Folding Prediction Giancarlo MAURI,Antonio PICCOLBONI and Giulio PAVESI Symposium on Discrete Algorithms, pp ,
Solving the Maximum Cardinality Bin Packing Problem with a Weight Annealing-Based Algorithm Kok-Hua Loh University of Maryland Bruce Golden University.
Protein Structure Prediction
Simulated Annealing G.Anuradha.
Vaida Bartkutė, Leonidas Sakalauskas
Probabilistic Algorithms Evolutionary Algorithms Simulated Annealing.
Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette.
Optimization Problems
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
An Introduction to Simulated Annealing Kevin Cannons November 24, 2005.
Evolving RBF Networks via GP for Estimating Fitness Values using Surrogate Models Ahmed Kattan Edgar Galvan.
CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.
Metaheuristics for the New Millennium Bruce L. Golden RH Smith School of Business University of Maryland by Presented at the University of Iowa, March.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
Heuristic Optimization Methods
Digital Optimization Martynas Vaidelys.
Face-centered cubic (FCC) lattice models for protein folding: energy function inference and biplane packing Allan Stewart.
Determine protein structure from amino acid sequence
Sheqin Dong, Song Chen, Xianlong Hong EDA Lab., Tsinghua Univ. Beijing
Optimization with Meta-Heuristics
Molecular Modeling By Rashmi Shrivastava Lecturer
SEG5010 Presentation Zhou Lanjun.
謝孫源 (Sun-Yuan Hsieh) 成功大學 電機資訊學院 資訊工程系
Energy Minimization of Protein Tertiary Structure by Parallel Simulated Annealing using Genetic Crossover Doshisha University, Kyoto, Japan Takeshi Yoshida.
Presentation transcript:

Protein Folding in the 2D HP Model Alexandros Skaliotis – King’s College London Joint work with: Andreas Albrecht (University of Hertfordshire) Kathleen Steinhöfel (King’s College London)

Overview 1.Proteins 2.Protein Folding 3.2D HP Model 4.Simple Example 5.Local Search for Protein Folding 6.Set of Moves 7.Logarithmic Cooling Schedule 8.Selected Benchmarks 9.Experiment

1. Proteins A protein is a sequence of amino acids encoded by a gene in a genome. There are 20 different amino acids. The length of the sequence can range from about 20 to The function of a protein is determined by its three- dimensional structure. Predicting this structure is quite daunting and very expensive.

2. Protein Folding Protein Folding is the process by which a sequence of amino acids conforms to a three-dimensional shape. Anfinsen’s hypothesis suggests that proteins fold to a minimum energy state. So, our goal is to find a conformation with minimum energy. We want to investigate algorithmic aspects of simulating the folding process. We need to simplify it.

3.1 2D HP Model [Dill et al. 1985] 1.Classify each amino acid as hydrophobic (H) or hydrophilic (P). 2.Confine consecutive amino acids to adjacent nodes in a lattice (Treat search space as a grid). 3.Flatten the search on a 2D lattice. Function HH c : Number of new HH contacts Parameter ξ < 0: Influence ratio of the new HH contacts (usually ξ = -1) Objective Function = HH c * ξ = -HH c

3.2 2D HP Model [Dill et al. 1985] Protein Folding in the 2D HP Model is NP-Hard for a variety of lattice structures [Paterson/Przytycka 1996; Hart/Istrail 1997; Berger/Leighton 1998; Atkins/Hart 1999]. Constant factor approximations in linear time but not helpful for predictions of real protein sequences [Hart/Istrail 1997]. Exact methods work only for sequences up to double digits length.

4. Simple Example Normally the energy is a positive number But we have a minimisation problem, so we talk about negative energies Energy = 0 Energy = -3 H = RED P = PINK

5. Local Search for Protein Folding A wide range of heuristics have been applied to find optimal HP structures, especially evolutionary algorithms. Lesh et. Al (2003) and Blazewicz et al. (2005) applied tabu search to the problem. We apply Logarithmic Simulated Annealing. To move in the search space we employ a complete and reversible set of moves proposed by Lesh et al. in 2003 and Blazewicz et al. in 2005.

6. Set of Moves L L C L LL

7. Logarithmic Cooling Schedule Following Hajek’s theorem (1988), we are guaranteed to find the optimal solution after an infinite number of steps if and only if. is the maximum value of the minimal escape heights from local minima. Albrecht et al. show that after transitions, the probability to be in a minimum energy conformation is at least, where n is the maximum size of the neighbourhood of sequences. Cooling Function:

8. Selected Benchmarks S36: 3P 2H 2P 2H 5P 7H 2P 2H 4P 2H 2P 1H 2P S60:2P 3H 1P 8H 3P 10H 1P 1H 3P 12H 4P 6H 1P 2H 1P 1H 1P S64:12H 1P 1H 1P 1H 2P 2H 2P 2H 2P 1H 2P 2H 2P 2H 2P 1H 2P 2H 2P 2H 2P 1H 1P 1H 1P 12H

9.1 Experiment Estimate experimentally. 20 runs Sequence Time Frame S36Optimal 10 min S60Optimal No30 min S64Optimal No90 min Processor: 2.2 GHz AMD Athlon

9.2 Experiment We found that is a good estimated upper bound for. We checked this against S85 and got the best known results in 10 / 10 runs. Of course we need more benchmarks. But this can be a good starting point in trying to develop a formal proof for the value of.