The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University.

Slides:

Advertisements

Similar presentations

The simplex algorithm The simplex algorithm is the classical method for solving linear programs. Its running time is not polynomial in the worst case.

Advertisements

C&O 355 Mathematical Programming Fall 2010 Lecture 9

C&O 355 Lecture 23 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.

Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.

C&O 355 Mathematical Programming Fall 2010 Lecture 22 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.

Totally Unimodular Matrices

Introduction to Algorithms

The Randomization Repertoire Rajmohan Rajaraman Northeastern University, Boston May 2012 Chennai Network Optimization WorkshopThe Randomization Repertoire1.

Heuristics for the Hidden Clique Problem Robert Krauthgamer (IBM Almaden) Joint work with Uri Feige (Weizmann)

Approximation Algorithms for Unique Games Luca Trevisan Slides by Avi Eyal.

Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.

Basic Feasible Solutions: Recap MS&E 211. WILL FOLLOW A CELEBRATED INTELLECTUAL TEACHING TRADITION.

Semi-Definite Algorithm for Max-CUT Ran Berenfeld May 10,2005.

The General Linear Model. The Simple Linear Model Linear Regression.

Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.

Geometric Algorithms for Conformational Analysis of Long Protein Loops J. Cortess, T. Simeon, M. Remaud- Simeon, V. Tran.

Approximation Algoirthms: Semidefinite Programming Lecture 19: Mar 22.

Identifying Early Buyers from Purchase Data Paat Rusmevichientong, Shenghuo Zhu & David Selinger Presented by: Vinita Shinde Feb 18 th, 2010.

Sparsest Cut S S  G) = min |E(S, S)| |S| S µ V G = (V, E) c- balanced separator  G) = min |E(S, S)| |S| S µ V c |S| ¸ c ¢ |V| Both NP-hard.

Semidefinite Programming

Chebyshev Estimator Presented by: Orr Srour. References Yonina Eldar, Amir Beck and Marc Teboulle, "A Minimax Chebyshev Estimator for Bounded Error Estimation"

Perfect Graphs Lecture 23: Apr 17. Hard Optimization Problems Independent set Clique Colouring Clique cover Hard to approximate within a factor of coding.

Approximation Algorithms

Semidefinite Programming Based Approximation Algorithms Uri Zwick Uri Zwick Tel Aviv University UKCRC’02, Warwick University, May 3, 2002.

Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.

Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.

(work appeared in SODA 10’) Yuk Hei Chan (Tom)

Approximation Algorithms: Bristol Summer School 2008 Seffi Naor Computer Science Dept. Technion Haifa, Israel TexPoint fonts used in EMF. Read the TexPoint.

Dana Moshkovitz, MIT Joint work with Subhash Khot, NYU.

C&O 355 Mathematical Programming Fall 2010 Lecture 17 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.

LP-based Algorithms for Capacitated Facility Location Chaitanya Swamy Joint work with Retsef Levi and David Shmoys Cornell University.

Chapter 5: The Orthogonality and Least Squares

Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,

Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.

Rotamer Packing Problem: The algorithms Hugo Willy 26 May 2010.

National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.

Batch Scheduling of Conflicting Jobs Hadas Shachnai The Technion Based on joint papers with L. Epstein, M. M. Halldórsson and A. Levin.

C&O 355 Mathematical Programming Fall 2010 Lecture 18 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.

1 Kernel based data fusion Discussion of a Paper by G. Lanckriet.

Conformational Entropy Entropy is an essential component in ΔG and must be considered in order to model many chemical processes, including protein folding,

Elementary Linear Algebra Anton & Rorres, 9th Edition

Semidefinite Programming

C&O 355 Mathematical Programming Fall 2010 Lecture 16 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.

Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006.

Approximation Algorithms Department of Mathematics and Computer Science Drexel University.

CPSC 536N Sparse Approximations Winter 2013 Lecture 1 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA.

C&O 355 Lecture 24 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A A A A A A.

Unique Games Approximation Amit Weinstein Complexity Seminar, Fall 2006 Based on: “Near Optimal Algorithms for Unique Games" by M. Charikar, K. Makarychev,

Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016.

Solving and Analyzing Side-Chain Positioning Problems Using Linear and Integer Programming Carleton L. Kingsford, Bernard Chazelle and Mona Singh Bioinformatics.

C&O 355 Lecture 19 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.

TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.

Approximation Algorithms based on linear programming.

TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.

Lap Chi Lau we will only use slides 4 to 19

Topics in Algorithms Lap Chi Lau.

Approximation algorithms

Polynomial Norms Amir Ali Ahmadi (Princeton University) Georgina Hall

Amir Ali Ahmadi (Princeton University)

Polynomial DC decompositions

Turnstile Streaming Algorithms Might as Well Be Linear Sketches

Chapter 6. Large Scale Optimization

Haplotype Inference Yao-Ting Huang Kun-Mao Chao.

2. Generating All Valid Inequalities

Haplotype Inference Yao-Ting Huang Kun-Mao Chao.

Quantitative Reasoning

Lecture 19 Linear Program

Haplotype Inference Yao-Ting Huang Kun-Mao Chao.

Chapter 2. Simplex method

Presentation transcript:

The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University

V C R R Proteins Many functions: Structural, messaging, catalytic, … Sequence of amino acids strung together on a backbone Each amino acid has a flexible side-chain Proteins fold. Function depends highly on 3D shape

Backbone Protein Structure Side-chains

Side-chain Positioning Problem Given: fixed backbone amino acid sequence Find the 3D positions for the side-chains that minimize the energy of the structure Assume lowest energy is best IILVPACW…

Side-chain Positioning Applications Homology-modeling: Use known backbone of similar protein to predict new structure Unknown:KNVACKNGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTTQANKHII NV CKNG NCY S S + ITDCR G+SKYPNC YKT+ KHII Known:ENVTCKNGKKNCYKSTSALHITDCRLKGNSKYPNCDYKTSDYQKHII

Rotamers Each amino acid has some number of statistically preferred side-chain positions These are called rotamers Continuum of positions is well approximated by rotamers 3 rotamers of Arginine

An Equivalent Graph Problem For protein with p side-chains: p-partite graph: part V i for each side-chain i node u for each rotamer edge {u,v} if u interacts with v Weights: E(u) = self-energy E(u,v) = interaction energy n nodes rotamer position interaction V1V1 V2V2

Feasible Solution Feasible solution: one node from each part cost(feasible) = cost of induced subgraph Hard to approximate within a factor of cn where n is the # of nodes rotamer position interaction V1V1 V2V2

Determining the Energy Energy of a protein conformation is the sum of several energy terms No  -inequality van der Waals electrostatics bond lengths bond angles dihedral angles hydrogen bonds A B

Plan of Attack 1.Formulate as a quadratic integer program 2.Relax into a semidefinite program 3.Solve the SDP in polynomial time 4.Round solution vectors to choice of rotamers

Quadratic Integer Program min for each posn j subject to for each posn j, node v

Relax Into Vector Program Use x u = x u 2 for to write as pure quadratic program Variables  n-dimensional vectors (    ) minimize subject to for each posn j for each node v, posn j

Rewrite As Semidefinite Program X  (x uv ) is PSD  x uv = x u T x v minimize subject to for each posn j for each node v, posn j

position constraints sum of the node variables in each position is 1 ViVi x vv Constraints & Dummy Position xu0xu0 V0V0 Insert a new position with a single node. No edges, no node cost. x uv VjVj flow constraints sum of edge variables adjacent to a node equals that node variable

Geometry of the Solution Vectors

Let Simple algebra shows that: Geometry of Solution Vectors Lemma. Proof. Length of y is 1 Length of x u 0 is 1 Length of projection of y onto x u 0 is 1.

Solution Vectors Lie on a Sphere xu0xu0 xuxu a O because Note. Length of projection of x u onto x u 0 is the length of vector x u squared. Each solution vector lies on a sphere of radius ½ centered at x u 0 /2: a 2 =

How do we round the solution of the SDP relaxation? Convert fractional solutions into feasible 0/1 solutions Projection rounding Perron-Frobenius rounding

Projection Rounding O Since, the x uu give a probability distribution at at each position. Pick node u with probability x uu xu0xu0 xuxu xvxv x uu =length of the projection onto x u 0. X =

Drift for Projection Rounding Drift   expected difference between fractional & rounded solutions. Comes entirely from pairwise interactions. In fact, yuyu yvyv xuxu xvxv By Cauchy-Schwartz,  uv = E(u,v)(x uv – Pr[uv]) Because x u are on a sphere,

Perron-Frobenius Rounding   0/1 characteristic n-vector of optimal solution Optimal integral X*    T  rank(X*) = 1 Idea: Approximate fractional X by a rank 1 matrix qq T Want to sample from , but settle for q  = =  = 1 q =q = q needs to contain probability distributions for each position. How do we choose q?

Lemma. Any nonnegative vector q with L 1 -norm p in the image space of X contains the required set of probability distributions. Proof.X = W T W, where W = [x 1 x 2 … x n ]. Let 1 i  characteristic vector for position i Suppose q = Xy for some y. Then, The final value is independent of i  each position sums to 1. Possible Choices for q

A Choice for q By spectral decomposition where Take By Perron-Frobenius theorem for nonnegative matrices  q ≥ 0. By Lemma, q contains the needed probability distributions. z 1 is in the image space of X.

Computational Results Compare solutions from  Simple LP  SDP Fractional  Projection rounded  Perron-Frobenius rounded 30 random graphs  60 nodes, 15 positions  edge probability ½  weights uniformly from [0,1]

Future Work Can the rounding schemes be applied to other problems? Can the semidefinite program be sped up? ─ Can only routinely solve graphs with ≤ 120 nodes (reasonable protein problems contain 1000 to 5000 nodes) ─ x uv ≥ 0 constraints are the bottleneck Can the requirement of a fixed backbone be relaxed? We’ve worked quite a bit with real proteins using a LP approach Seems an SDP formulation might be useful

More Information The Side-Chain Positioning Problem: A Semidefinite Programming Formulation with New Rounding SchemesThe Side-Chain Positioning Problem: A Semidefinite Programming Formulation with New Rounding Schemes, B. Chazelle, C. Kingsford, M. Singh, Proc. ACM FCRC'2003, Principles of Computing and Knowledge: Paris Kanellakis Memorial Workshop (2003).