Doug Raiford Lesson 19
Framework model Secondary structure first Assemble secondary structure segments Hydrophobic collapse Molten: compact but denatured Formation of secondary structure after: settles in van der Waals forces and hydrogen bonds require close proximity 10/26/20152Protein Conformation Prediction (Part III)
Two main approaches Focus this lesson: De novo 10/26/2015Protein Conformation Prediction (Part III)3 Structure prediction methods De novo Ab initio Molecular Dynamics Knowledge based Lattice Off-lattice Local structure Comparative / Homology modeling
Did a quick look at threading (homology based) Chou-Fasman (frequency of occurrence of aa’s at specific locations in structure) Looked at HMM’s (HMMR and Protein Families—PFAM) Looked at ROSETTA (De Novo, knowledge based) 10/26/20154Protein Conformation Prediction (Part III) Name P(a) P(b) P(turn) Alanine Arginine Aspartic Acid Valine
Lattice Approach Abstraction: take a problem of extreme complexity and simplify Levinthal’s paradox (Physicist, Berkely, MIT, Columbia) Protein with 100 amino acids => possible structures Even if really fast ( seconds to sample each structure) 1.6*10 27 years to go through all structures 10/26/20155Protein Conformation Prediction (Part III)
Premise: proteins fold into lowest energy conformation Reduce complexity by restricting amino acid locations to evenly spaced lattice points Generate all possible conformations (within certain constraints) Lowest energy models should be representative 10/26/20156Protein Conformation Prediction (Part III)
Only occupy nodes of a lattice Globular limit number of nodes to 50 Ellipsoidal bounding volume No nodes without at least 2 connecting edges (no dead- ends) Fewer nodes than aa’s in sequence (n/2) Must align after the fact From 0 to 3 residues between nodes 10/26/20157Protein Conformation Prediction (Part III)
Limit to sequence length of 100 (n) Energy function statistically derived (verses computationally expensive energy calculations) Minimal edge lattice – diamond lattice Between 10 5 and 10 7 enumerated conformations 10/26/2015Protein Conformation Prediction (Part III)8
“We are able to do exhaustive searches of compact, bounded lattice structures with up to approximately 40 vertices. These searches take on the order of a few hours on a fast workstation, and can easily be executed in parallel over several machines.” 10/26/20159Protein Conformation Prediction (Part III)
At most 3 choices at each node Self avoiding therefore much pruning Constrained to small volume (ellipse) Probably recursive enumeration with self avoidance Filter Symmetry check: remove conformations that differ only in their orientation 26 already Remember, total of 50 10/26/201510Protein Conformation Prediction (Part III)
How to align sequence Remember there are more aa’s than nodes (from 0 to 3 residues between nodes) How to score overall energy of a conformation How to judge similarity to known protein (native) conformation 10/26/201511Protein Conformation Prediction (Part III)
Iterative/Dynamic Start out evenly spaced For each node determine the seven possible residues Choose lowest energy not taken previously Rinse and repeat Converges in 3 to 5 iterations 10/26/201512Protein Conformation Prediction (Part III) Sequence Position Nodal Position
mm+1m-1 nn+1n-1 Energy associated with m,n contact average of 5 adjacent energies m and n given double weight Rest given single weight Average of all energies (divide by 6) 10/26/201513Protein Conformation Prediction (Part III)
But from where did e rm,rn come Statistically derived 10/26/201514Protein Conformation Prediction (Part III)
Given a database of proteins the energy of any given combination of two amino acids is given by: How contacty is a given protein Expected number of u,v contacts Across all proteins, number of v’s next to u’s If 1 then across all proteins there are about as many u,v’s as expected. If >1 then more If <1 then fewer 10/26/201515Protein Conformation Prediction (Part III)
Instead of limiting residues to regularly spaced lattice nodes in space… Limit phi and psi angles to a reduced set of discrete angles 10/26/2015Protein Conformation Prediction (Part III)16
Off lattice models often attempt to minimize total energy 10/26/2015Protein Conformation Prediction (Part III)17 G : Free energy H : Enthalpy S : Entropy G : Free energy H : Enthalpy S : Entropy ΔE=q-w ΔH=ΔE+Δ(PV) S=klnΩ ΔG = ΔG van der Waals + ΔG H-bonds + ΔG solvent + ΔG Coulomb
Backbone RMSD Root mean square deviation Usually choose top 100 or so predictions and show that actual resides in the set 10/26/2015Protein Conformation Prediction (Part III)18 Top 100 conformations !!Actual!! Top 100 conformations !!Actual!!
10/26/201519Protein Conformation Prediction (Part III)
10/26/201520Protein Conformation Prediction (Part III) X Y Z Occu Temp Element ATOM 1 N THR A N ATOM 2 CA THR A C ATOM 3 C THR A C ATOM 4 O THR A O ATOM 5 CB THR A C ATOM 6 OG1 THR A O ATOM 7 CG2 THR A C
10/26/2015Protein Conformation Prediction (Part III)21 Name P(a) P(b) P(turn) f(i) f(i+1) f(i+2) f(i+3) Alanine Arginine Aspartic Acid Asparagine Cysteine Glutamic Acid Glutamine Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine