An Extended Dead-End Elimination Algorithm to Determine Gap-Free Lists of Low Energy States EDDA KLOPPMANN, G. MATTHIAS ULLMANN, TORSTEN BECKER.

Slides:



Advertisements
Similar presentations
Divide-and-Conquer CIS 606 Spring 2010.
Advertisements

Heuristic Search techniques
Ali Husseinzadeh Kashan Spring 2010
Traveling Salesperson Problem
ROTAMER OPTIMIZATION FOR PROTEIN DESIGN THROUGH MAP ESTIMATION AND PROBLEM-SIZE REDUCTION Hong, Lippow, Tidor, Lozano-Perez. JCC Presented by Kyle.
Branch & Bound Algorithms
CSC 423 ARTIFICIAL INTELLIGENCE
Graduate Center/City University of New York University of Helsinki FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES WITH CONSTRAINTS LEARNED FROM DATA Xiannian.
CS4413 Divide-and-Conquer
Best-First Search: Agendas
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Planning under Uncertainty
Lecture 8 Jianjun Hu Department of Computer Science and Engineering University of South Carolina CSCE350 Algorithms and Data Structure.
1 Rare Event Simulation Estimation of rare event probabilities with the naive Monte Carlo techniques requires a prohibitively large number of trials in.
The Calculation of Enthalpy and Entropy Differences??? (Housekeeping Details for the Calculation of Free Energy Differences) first edition: p
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Recent Development on Elimination Ordering Group 1.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Evaluating Hypotheses
MAE 552 – Heuristic Optimization
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Design and Analysis of Algorithms - Chapter 41 Divide and Conquer The most well known algorithm design strategy: 1. Divide instance of problem into two.
26 April 2013Lecture 5: Constraint Propagation and Consistency Enforcement1 Constraint Propagation and Consistency Enforcement Jorge Cruz DI/FCT/UNL April.
Divide and Conquer The most well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances 2. Solve smaller instances.
CS401 presentation1 Effective Replica Allocation in Ad Hoc Networks for Improving Data Accessibility Takahiro Hara Presented by Mingsheng Peng (Proc. IEEE.
Computational Structure-Based Redesign of Enzyme Activity Cheng-Yu Chen, Ivelin Georgiev, Amy C.Anderson, Bruce R.Donald A Different computational redesign.
Genetic Algorithm.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
1 Energy-aware stage illumination. Written by: Friedrich Eisenbrand Stefan Funke Andreas Karrenbauer Domagoj Matijevic Presented By: Yossi Maimon.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
Dynamic Programming. Well known algorithm design techniques:. –Divide-and-conquer algorithms Another strategy for designing algorithms is dynamic programming.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
1 2. Independence and Bernoulli Trials Independence: Events A and B are independent if It is easy to show that A, B independent implies are all independent.
Analysis of Algorithms
Rotamer Packing Problem: The algorithms Hugo Willy 26 May 2010.
Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.
Optimization Problems - Optimization: In the real world, there are many problems (e.g. Traveling Salesman Problem, Playing Chess ) that have numerous possible.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
CSC 211 Data Structures Lecture 13
1 Short Term Scheduling. 2  Planning horizon is short  Multiple unique jobs (tasks) with varying processing times and due dates  Multiple unique jobs.
FORS 8450 Advanced Forest Planning Lecture 11 Tabu Search.
Applications of Dynamic Programming and Heuristics to the Traveling Salesman Problem ERIC SALMON & JOSEPH SEWELL.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Union-find Algorithm Presented by Michael Cassarino.
Structure prediction: Homology modeling
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
1 Branch and Bound Searching Strategies Updated: 12/27/2010.
Problem Reduction So far we have considered search strategies for OR graph. In OR graph, several arcs indicate a variety of ways in which the original.
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
Introduction to Algorithms (2 nd edition) by Cormen, Leiserson, Rivest & Stein Chapter 2: Getting Started.
Solving and Analyzing Side-Chain Positioning Problems Using Linear and Integer Programming Carleton L. Kingsford, Bernard Chazelle and Mona Singh Bioinformatics.
CSCE350 Algorithms and Data Structure Lecture 21 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Divide and Conquer Faculty Name: Ruhi Fatima Topics Covered Divide and Conquer Matrix multiplication Recurrence.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
1 Chapter 5 Branch-and-bound Framework and Its Applications.
Updating SF-Tree Speaker: Ho Wai Shing.
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Introduction to Operations Research
Analysis and design of algorithm
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Dead-End Elimination for Protein Design with Flexible Rotamers
Effective Replica Allocation
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

An Extended Dead-End Elimination Algorithm to Determine Gap-Free Lists of Low Energy States EDDA KLOPPMANN, G. MATTHIAS ULLMANN, TORSTEN BECKER

Improved Pruning algorithms and Divide-and-Conquer strategies for Dead-End Elimination, with application to protein design Ivelin Georgiev1, Ryan H. Lilien, Bruce R. Donald 2006

Dead End Elimination Motivation Structure determines function Lowest free energy state is most probable by laws of thermodynamics Direct calculation rarely possible So: Conformation space is discretized Allows for exhaustive search Desire for an algorithm which deterministically finds the lowest energy state while circumventing combinatorial exhaustion

DEE Overview (Desment, et al, 1992) Originally applied to predict side chain positions in homology modeling Views proteins as a set of residues (sites), each of which may adopt a finite number of rotamers (forms) DEE identifies the highest energy forms of sites which are incompatible with the state of lowest energy High energy forms are considered dead-ends and pruned from consideration

DEE Overview Continued DEE solves the combinatorial problem of identifying the global energy minimum for discrete pairwise system Energy is expressed in terms of intrinsic energies of sites and pairwise interactions between sites Each site adopts a discrete form that determines its contribution to the total energy

DEE Theory DEE identifies and eliminate forms of sites which cannot contribute to the lowest energy conformation in order to circumvent an exhaustive search The DEE criterion employs rotameric energy interactions to identify and prune rotamers that are provably not part of the GMEC. DEE criterion compares the energy of two forms of a site μ, dμ and cμ If all states that contain dμ are higher in energy than the corresponding states that contain cμ, dμ is a dead end and removed from consideration

Motivation for X-DEE (Kloppmann, et al, 2007) Proteins are flexible systems which may adapt several functionally relevant states Preference for a more complete picture of the available low energy states X-DEE produces a gap-free list of low energy states (i.e., complete up to a given distance from the global energy minimum) Implemented to determine the lowest energy protonation states of proteins

X-DEE Intuition General idea is to exclude a list of states from the search space explored by DEE in order to construct a gap-free list Basic idea: If a gap-free list of k low energy states {x1, · · ·, xk} is already known, the (k + 1) th state can be found by restricting the search for the lowest energy state to the set of all states M excluding the set of already known states General idea: restrict the DEE search space to a set M (complete set of states) \ L (list of states to be excluded) for any given list L of states. – In case L is not gap-free, identify the state of lowest energy not included in L until a gap-free list of low energy states is obtained.

Excluding a list of states from consideration There is no straightforward way to exclude an arbitrary list of states L from the search space explored by DEE So, we aim to restrict a DEE search to a specific type of subset of M: – Fixing a number of sites during a DEE search yields the state of lowest energy of a subset S of M characterized by the forms of the fixed sites – So, applying DEE to the subset S of those states that have form f at site s will determine the state of lowest energy with form f at site s How do we do this?

Constructing a Search Bias The idea of X-DEE is to derive a search basis B composed of a set of search keys bS, such that L is excluded from the search and the complete set M \ L is searched. The authors present a recursive procedure “CreateSearchBias” which given the list of states L to be excluded, constructs a search bias keys Initial conditions – List L of states to be excluded from the search – Associated list vector T that contains an element for each site which keeps track of the sites which are already fixed to specific forms – Initially, all sites are unfixed (i.e., undefined)

Constructing a Search Bias: Overview With each recursion, L is divided into sublists and one additional site is fixed in the associated list vectors. CreateSearchBasis terminates when all sites of a list vector are fixed. With each recursion, search keys can be generated that differ from the list vector in one form. The search keys are added to the search basis B. CreateSearchBasis generates a set of search keys bS characterizing subsets S whose union represent M\ L.

Introducing Search Keys This subset S can be represented by a so-called search key bS = (h1, ∗ 2, · · ·, ∗ μ, · · ·, ∗ N), where: – h is the specified form of site 1 and ∗ indicates that this site is undefined (the idea being undefined sites will be determined during the DEE search) – For each site μ of the system, these search keys have a component bμ which is either fixed to a specific form or undefined. X-DEE will define search keys bS = (b1, · · ·, bμ, · · ·, bN) such that the subsets S represented by the individual search keys together represent M \ L. Determining the state of lowest energy of all subsets via the DEE algorithm yields the desired state of lowest energy of M \ L.

Recursive CreateSearchBias (L, T) 1.Base case: Return if T does not contain any undefined sites.

Recursive CreateSearchBias (L, T) 1.Base case: Return if T does not contain any undefined sites.. 2.Find a site μ with unused forms (i.e., forms which are not present in any of the state vectors in L). If no such site exists, choose the first undefined site and jump to step 4.

Recursive CreateSearchBias (L, T) 3.Create a search key: For each unused form h of site μ, a search key b is defined by copying the list vector t to b and fixing site μ to form h in b; bμ = h. So, each search key differs from the current list vector only at site μ. Fixing site μ to forms h not occurring in, guarantees that the subset represented by b and L are disjoint, i.e., b represents a subset of M \ L. Now add b to the search basis B.

Recursive CreateSearchBias (L, T) 4.Divide the vectors L into sublists such that site μ has form g in all state vectors x in L sub, i.e., xμ = g for all states in L sub. To each sublist L sub, a separate list vector t sub is assigned by copying list vector t to t sub and fixing site μ to the form g common to all state vectors in L sub ; tμ = g.

Recursive CreateSearchBias (L, T) 4.Divide the vectors L into sublists such that site μ has form g in all state vectors x in L sub, i.e., xμ = g for all states in L sub. To each sublist L sub, a separate list vector t sub is assigned by copying list vector t to t sub and fixing site μ to the form g common to all state vectors in L sub ; tμ = g. 5. Recurse on each sublist L s and its list vector t

Using the Search Keys All search keys in B are subjected to a DEE search yielding the states of lowest energies of the represented subsets S. These states include the state of lowest energy of M \ L. The completeness of the Search Bias B is provable – Basic idea is to show (i) all subsets of states S represented by the search keys are subsets of M\ L and that (ii) the union of all subsets S represent the complete set M\ L

X-DEE Application Domain On the right: light absorption triggers Bacteriorhodopsin’s pumping cycle during which a proton is transferred from the cytoplasm to the extracellularspace. Basic idea: Proteins contain protonatable residues whose charged state depends on their interaction with the protein environment. These protonatable residues are treated as sites and each site with each site adopting one of two forms (protonated, unprotonated).

X-DEE Application Domain Charge distribution of a protein is essential to its function – In proteins, not only the state of lowest energy but also the next higher protonation states are commonly significantly populated and often play a functional role

X-DEE Performance Characteristics Total search keys generated depends approximately linearly on the number of states in L, which influence the number of search keys in two different ways: – Each additional state in L increases the number of states to be excluded from the search and thereby tends to increase the number of generated keys – Each additional state in L decreases the search space M \ L and thereby tends to decrease the number of generated keys Ultimately, the number of search keys will decrease with the number of states in L. However, as long as L is small compared to M \ L, an approximately linear increase of the total number of search keys can be observed

X-DEE Performance Characteristics Computational cost of X-DEE depends approximately linearly on the size of the system and the number of states to be excluded from the search For low energy states which are built up one after the other, the computational cost to determine an additional state remains on average constant.

Improved Pruning algorithms and Divide-and-Conquer strategies for Dead-End Elimination, with application to protein design Ivelin Georgiev1, Ryan H. Lilien, Bruce R. Donald 2006

DACS Motivation DACS: a provably-accurate divide-and-conquer enhancement to traditional-DEE. Protein design for a rigid backbone and using rotamers and a pairwise energy function is provably NP-hard Desire for provable, deterministic algorithms which make real guarantees (as opposed to heuristic methods, Monte Carlo, genetic algorithms, etc)

Traditional DEE The DEE criterion uses rotameric energy interactions to identify and prune rotamers that are provably not part of the GMEC A target rotamer is pruned if a competitor rotamer is found such that the lowest possible energy among conformations containing the competitor rotamer is higher than the worst possible energy among conformations containing the target DEE does not guarantee a unique solution: multiple unpruned conformations may remain after pruning with DEE is exhausted. If this happens, the DEE pruning stage is be followed by an enumeration stage, in which the remaining conformations are examined and the GMEC is identified – exponential time One improvement is to partition the search space

split-DEE and DACS By partitioning the conformational search space, split-DEE enhances the pruning efficiency of traditional-DEE In split-DEE, the conformational space can be divided into several partitions, such that for each partition, there is some competitor that has better conformational energies than a rotamer within that partition The advantage of split-DEE is that no single competitor is required to outperform a rotamer for every conformation as long as there exists a different dominant competitor for each partition, a rotamer can be pruned We can still do better: DACS enhances split-DEE by performing DEE pruning within individual partitions

DACS as an enhancement to split-DEE (Divide-And-Conquer Splitting) Like in split-DEE, the conformational space is divided into partitions

DACS as an enhancement to split-DEE (Divide-And-Conquer Splitting) Like in split-DEE, the conformational space is divided into partitions Within each partition, DEE pruning is applied to determine if there is a competitor rotamer at a residue that always outperforms our original rotamer

DACS as an enhancement to split-DEE (Divide-And-Conquer Splitting) Like in split-DEE, the conformational space is divided into partitions Within each partition, DEE pruning is applied to determine if there is a competitor rotamer at a residue that always outperforms our original rotamer If DEE pruning does not produce a unique solution, enumeration of the conformations in the current partition must be performed by A*

DACS as an enhancement to split-DEE (Divide-And-Conquer Splitting) Like in split-DEE, the conformational space is divided into partitions Within each partition, DEE pruning is applied to determine if there is a competitor rotamer at a residue that always outperforms our original rotamer If DEE pruning does not produce a unique solution, enumeration of the conformations in the current partition must be performed by A*. The lowest-energy conformation among the local rigid-GMECs for all partitions is the overall rigid- GMEC

split-Flags The general advantage of DACS over split-DEE is the ability to prune an additional combinatorial subset of the conformational space by exploiting partition-specific prunings The DEE pruning stage in DACS can incorporate any combination of the available provably-accurate traditional- DEE techniques The split-flags (Gordon et al., 2003) algorithm has similar intent – If a target rotamer cannot be pruned for all partitions, the partitions in which it can be pruned are flagged as dead-ending. – Like DACS, split-flags uses pruning information discarded by split-DEE

split-Flags vs DACS One advantage of DACS over split flags stems from the divide-and-conquer paradigm. – The cost of expanding the A search tree depends combinatorially on the number of rotamers for each residue position – A divide-and-conquer approach (which reduces the number of rotamers in each partition) is more efficient than directly finding the global solution A bonus of divide and conquer approaches is that they are naturally parallelizable, reducing real- world running time

min-DEE Overview Used when the protein design process incorporates rotameric energy minimization (DEE no longer provably- accurate) MinDEE is similar to traditional-DEE in that rotameric energy interactions are used to determine which rotamers are provably not part of the minGMEC and can be pruned. MinDEE guarantees that no rotamers are pruned which belong to the conformation with the lowest energy among all energy-minimized conformations Since rotamers are allowed to energy-minimize, lower and upper bounds on the self- and pairwise rotamer energies must be used, instead of the rigid-energy terms

min-DEE vs. DEE Without energy minimization, a rotamer stays in the same rigid conformation, independent of the rotamer identities for the remaining residues. With energy minimization, a rotamer may minimize from its initial conformation in order to accommodate a change in another rotamer So that one rotamer does not minimize into another, rotameric movement is constrained to a voxel of conformation space The most significant difference between traditional- DEE and MinDEE is the accounting for possible energy changes during minimization

DACS and minDEE It’s straightforward to modify DACS to incorporate energy minimization To only prune rotamers that are provably not part of the minGMEC, the traditional-DEE criteria in the DEE cycle of DACS must be discarded and their MinDEE equivalents used instead

MinDEE/A* Incorporates splitting, MinBounds (a provably- correct with energy minimization approach analogous to (Gordon et al., 2003) for traditional- DEE), and DACS for MinDEE A* is then applied in the enumeration stage to extract the minGMEC from the set of remaining conformations. Similar to DACS, the lowest-energy conformation among the rigid-GMECs for all mutation sequences is identified as the overall rigid-GMEC

DACS / MinDEE-A*Performance Partition specific prunings – By using a divide-and-conquer approach to partition the conformational space and identify partition-specific prunings, DACS allows for additional elimination, after pruning with the original split- DEE and split flags techniques is exhausted. Reduced cost of expending A* search trees – The improved execution times of DACS stems from the reduced cost of expanding the A search trees for each partition, resulting from the divide-and-conquer approach as opposed to expanding the single A tree for the full conformational space. Increased pruning efficiency – MinDEE benefits from increased pruning efficiency, and so works best on MinDEE/A larger systems where the cost of expanding the search tree in the enumeration stage dominates the computation (rather than the energy minimization).