The Side-Chain Positioning Problem Joint work with Bernard Chazelle and Mona Singh Carl Kingsford Princeton University
V C R R Proteins Many functions: Structural, messaging, catalytic, … Sequence of amino acids strung together on a backbone Each amino acid has a flexible side-chain Proteins fold. Function depends highly on 3D shape
Backbone Protein Structure Side-chains
Side-chain Positioning Problem Given: fixed backbone amino acid sequence Find the 3D positions for the side-chains that minimize the energy of the structure Assume lowest energy is best IILVPACW…
Side-chain Positioning Applications Homology-modeling: Use known backbone of similar protein to predict new structure Unknown:KNVACKNGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTTQANKHII NV CKNG NCY S S + ITDCR G+SKYPNC YKT+ KHII Known:ENVTCKNGKKNCYKSTSALHITDCRLKGNSKYPNCDYKTSDYQKHII
Rotamers Each amino acid has some number of statistically preferred side-chain positions These are called rotamers Continuum of positions is well approximated by rotamers 3 rotamers of Arginine
An Equivalent Graph Problem For protein with p side-chains: p-partite graph: part V i for each side-chain i node u for each rotamer edge {u,v} if u interacts with v Weights: E(u) = self-energy E(u,v) = interaction energy n nodes rotamer position interaction V1V1 V2V2
Feasible Solution Feasible solution: one node from each part cost(feasible) = cost of induced subgraph Hard to approximate within a factor of cn where n is the # of nodes rotamer position interaction V1V1 V2V2
Determining the Energy Energy of a protein conformation is the sum of several energy terms No -inequality van der Waals electrostatics bond lengths bond angles dihedral angles hydrogen bonds A B
Plan of Attack 1.Formulate as a quadratic integer program 2.Relax into a semidefinite program 3.Solve the SDP in polynomial time 4.Round solution vectors to choice of rotamers
Quadratic Integer Program min for each posn j subject to for each posn j, node v
Relax Into Vector Program Use x u = x u 2 for to write as pure quadratic program Variables n-dimensional vectors ( ) minimize subject to for each posn j for each node v, posn j
Rewrite As Semidefinite Program X (x uv ) is PSD x uv = x u T x v minimize subject to for each posn j for each node v, posn j
position constraints sum of the node variables in each position is 1 ViVi x vv Constraints & Dummy Position xu0xu0 V0V0 Insert a new position with a single node. No edges, no node cost. x uv VjVj flow constraints sum of edge variables adjacent to a node equals that node variable
Geometry of the Solution Vectors
Let Simple algebra shows that: Geometry of Solution Vectors Lemma. Proof. Length of y is 1 Length of x u 0 is 1 Length of projection of y onto x u 0 is 1.
Solution Vectors Lie on a Sphere xu0xu0 xuxu a O because Note. Length of projection of x u onto x u 0 is the length of vector x u squared. Each solution vector lies on a sphere of radius ½ centered at x u 0 /2: a 2 =
How do we round the solution of the SDP relaxation? Convert fractional solutions into feasible 0/1 solutions Projection rounding Perron-Frobenius rounding
Projection Rounding O Since, the x uu give a probability distribution at at each position. Pick node u with probability x uu xu0xu0 xuxu xvxv x uu =length of the projection onto x u 0. X =
Drift for Projection Rounding Drift expected difference between fractional & rounded solutions. Comes entirely from pairwise interactions. In fact, yuyu yvyv xuxu xvxv By Cauchy-Schwartz, uv = E(u,v)(x uv – Pr[uv]) Because x u are on a sphere,
Perron-Frobenius Rounding 0/1 characteristic n-vector of optimal solution Optimal integral X* T rank(X*) = 1 Idea: Approximate fractional X by a rank 1 matrix qq T Want to sample from , but settle for q = = = 1 q =q = q needs to contain probability distributions for each position. How do we choose q?
Lemma. Any nonnegative vector q with L 1 -norm p in the image space of X contains the required set of probability distributions. Proof.X = W T W, where W = [x 1 x 2 … x n ]. Let 1 i characteristic vector for position i Suppose q = Xy for some y. Then, The final value is independent of i each position sums to 1. Possible Choices for q
A Choice for q By spectral decomposition where Take By Perron-Frobenius theorem for nonnegative matrices q ≥ 0. By Lemma, q contains the needed probability distributions. z 1 is in the image space of X.
Computational Results Compare solutions from Simple LP SDP Fractional Projection rounded Perron-Frobenius rounded 30 random graphs 60 nodes, 15 positions edge probability ½ weights uniformly from [0,1]
Future Work Can the rounding schemes be applied to other problems? Can the semidefinite program be sped up? ─ Can only routinely solve graphs with ≤ 120 nodes (reasonable protein problems contain 1000 to 5000 nodes) ─ x uv ≥ 0 constraints are the bottleneck Can the requirement of a fixed backbone be relaxed? We’ve worked quite a bit with real proteins using a LP approach Seems an SDP formulation might be useful
More Information The Side-Chain Positioning Problem: A Semidefinite Programming Formulation with New Rounding SchemesThe Side-Chain Positioning Problem: A Semidefinite Programming Formulation with New Rounding Schemes, B. Chazelle, C. Kingsford, M. Singh, Proc. ACM FCRC'2003, Principles of Computing and Knowledge: Paris Kanellakis Memorial Workshop (2003).