Protein Folding Bioinformatics Ch 7 (with a little of Ch 8)
The Protein Folding Problem Given a particular sequence of amino acid residues (primary structure), what will the tertiary/quaternary structure of the resulting protein be?Central question of molecular biology:Given a particular sequence of amino acid residues (primary structure), what will the tertiary/quaternary structure of the resulting protein be? Input: AAVIKYGCAL… Output: 1 1, 2 2 … = backbone conformation: (no side chains yet)
Disulfide Bonds Two cyteines in close proximity will form a covalent bond Disulfide bond, disulfide bridge, or dicysteine bond. Significantly stabilizes tertiary structure.
Protein Folding – Biological perspective Central dogma: Sequence specifies structureCentral dogma: Sequence specifies structure Denature – to unfold a protein back to random coil configuration – -mercaptoethanol – breaks disulfide bonds –Urea or guanidine hydrochloride – denaturant –Also heat or pH Anfinsens experiments –Denatured ribonuclease –Spontaneously regained enzymatic activity –Evidence that it re-folded to native conformation
Folding intermediates Levinthals paradox – Consider a 100 residue protein. If each residue can take only 3 positions, there are = possible conformations. –If it takes s to convert from 1 structure to another, exhaustive search would take years! Folding must proceed by progressive stabilization of intermediates –Molten globules – most secondary structure formed, but much less compact than native conformation.
Forces driving protein folding It is believed that hydrophobic collapse is a key driving force for protein folding –Hydrophobic core –Polar surface interacting with solvent Minimum volume (no cavities) Disulfide bond formation stabilizes Hydrogen bonds Polar and electrostatic interactions
Folding help Proteins are, in fact, only marginally stable –Native state is typically only 5 to 10 kcal/mole more stable than the unfolded form Many proteins help in folding –Protein disulfide isomerase – catalyzes shuffling of disulfide bonds –Chaperones – break up aggregates and (in theory) unfold misfolded proteins
The Hydrophobic Core Hemoglobin A is the protein in red blood cells (erythrocytes) responsible for binding oxygen. The mutation E6 V in the chain places a hydrophobic Val on the surface of hemoglobin The resulting sticky patch causes hemoglobin S to agglutinate (stick together) and form fibers which deform the red blood cell and do not carry oxygen efficiently Sickle cell anemia was the first identified molecular disease
Sickle Cell Anemia Sequestering hydrophobic residues in the protein core protects proteins from hydrophobic agglutination.
Computational Problems in Protein Folding Two key questions: –Evaluation – how can we tell a correctly-folded protein from an incorrectly folded protein? H-bonds, electrostatics, hydrophobic effect, etc. Derive a function, see how well it does on real proteins –Optimization – once we get an evaluation function, can we optimize it? Simulated annealing/monte carlo EC Heuristics
Fold Optimization Simple lattice models (HP-models) –Two types of residues: hydrophobic and polar –2-D or 3-D lattice –The only force is hydrophobic collapse –Score = number of H H contacts
H/P model scoring: count noncovalent hydrophobic interactions. Sometimes: –Penalize for buried polar or surface hydrophobic residues Scoring Lattice Models
What can we do with lattice models? For smaller polypeptides, exhaustive search can be used –Looking at the best fold, even in such a simple model, can teach us interesting things about the protein folding process For larger chains, other optimization and search methods must be used –Greedy, branch and bound –Evolutionary computing, simulated annealing –Graph theoretical methods
The hydrophobic zipper effect: Learning from Lattice Models Ken Dill ~ 1997
Absolute directions –UURRDLDRRU Relative directions –LFRFRRLLFL –Advantage, we cant have UD or RL in absolute –Only three directions: LRF What about bumps? LFRRR –Bad score –Use a better representation Representing a lattice model
Preference-order representation Each position has two preferences –If it cant have either of the two, it will take the least favorite path if possible Example: {LR},{FL},{RL}, {FR},{RL},{RL},{FR},{RF} Can still cause bumps: {LF},{FR},{RL},{FL}, {RL},{FL},{RF},{RL}, {FL}
Decoding the representation The optimizer works on the representation, but to score, we have to decode into a structure that lets us check for bumps and score. Example: How many bumps in: URDDLLDRURU? We can do it on graph paper –Start at 0,0 –Fill in the graph
More realistic models Higher resolution lattices (45° lattice, etc.) Off-lattice models –Local moves –Optimization/search methods and / representations Greedy search Branch and bound EC, Monte Carlo, simulated annealing, etc.
Threading: Fold recognition Given: –Sequence: IVACIVSTEYDVMKAAR… –A database of molecular coordinates Map the sequence onto each fold Evaluate –Objective 1: improve scoring function –Objective 2: folding
X-Ray Crystallography ~0.5mm The crystal is a mosaic of millions of copies of the protein. As much as 70% is solvent (water)! May take months (and a green thumb) to grow.
X-Ray diffraction Image is averaged over: –Space (many copies) –Time (of the diffraction experiment)
The Protein Data Bank ATOM 1 N ALA E APR 213 ATOM 2 CA ALA E APR 214 ATOM 3 C ALA E APR 215 ATOM 4 O ALA E APR 216 ATOM 5 CB ALA E APR 217 ATOM 6 N GLY E APR 218 ATOM 7 CA GLY E APR 219 ATOM 8 C GLY E APR 220 ATOM 9 O GLY E APR 221 ATOM 10 N VAL E APR 222 ATOM 11 CA VAL E APR 223 ATOM 12 C VAL E APR 224 ATOM 13 O VAL E APR 225 ATOM 14 CB VAL E APR 226 ATOM 15 CG1 VAL E APR 227 ATOM 16 CG2 VAL E APR 228