Protein Structure Prediction Sequence + Dr. G.P.S. Raghava Structure
Protein Structure Prediction Experimental Techniques X-ray Crystallography NMR Limitations of Current Experimental Techniques Protein DataBank (PDB) -> 23000 protein structures SwissProt -> 100,000 proteins Non-Redudant (NR) -> 10,00,000 proteins Importance of Structure Prediction Fill gap between known sequence and structures Protein Engg. To alter function of a protein Rational Drug Design
Different Levels of Protein Structure
Protein Architecture Proteins consist of amino acids linked by peptide bonds Each amino acid consists of: a central carbon atom an amino group a carboxyl group and a side chain Differences in side chains distinguish the various amino acids
Amino Acid Side Chains Vary in: Size Shape Polarity
Peptide Bond
Peptide Bonds
Dihedral Angles
Conformation Flexibility Backbone (main chain of atoms in peptide bonds, minus side chains) conformation: Torsion or rotation angles around: C-N bond () C-C bond () Sterical hinderance: Most – Pro Least - Gly
Ramachandran Plot
Protein Secondary Structure Regular Secondary Structure (-helices, -sheets) Irregular Secondary Structure (Tight turns, Random coils, bulges)
Secondary Structure: Helices ALPHA HELIX : a result of H-bonding between every fourth peptide bond (via amino and carbonyl groups) along the length of the polypeptide chain Individual Amino acid H-bond
Helix formation is local THYROID hormone receptor (2nll)
Secondary Structure: Beta Sheets BETA PLEATED SHEET: a result of H-bonding between polypeptide chains
b-sheet formation is NOT local
Definition of -turn A -turn is defined by four consecutive residues i, i+1, i+2 and i+3 that do not form a helix and have a C(i)-C(i+3) distance less than 7Å and the turn lead to reversal in the protein chain. (Richardson, 1981). The conformation of -turn is defined in terms of and of two central residues, i+1 and i+2 and can be classified into different types on the basis of and . i+1 i+2 i H-bond i+3 D <7Å
Tight turns 2 3 4 5 6 Type No. of residues H-bonding -turn NH(i)-CO(i+1) -turn 3 CO(i)-NH(i+2) -turn 4 CO(i)-NH(i+3) -turn 5 CO(i)-NH(i+4) -turn 6 CO(i)-NH(i+5)
Secondary Structure shortcuts
Tertiary Structure: Hexokinase (6000 atoms, 48 kD, 457 amino acids) polypeptides with a tertiary level of structure are usually referred to as globular proteins, since their shape is irregular and globular in form
Quarternary Structure: Haemoglobin
What determines fold? Anfinsen’s experiments in 1957 demonstrated that proteins can fold spontaneously into their native conformations under physiological conditions. This implies that primary structure does indeed determine folding or 3-D stucture. Some exceptions exist Chaperone proteins assist folding Abnormally folded Prion proteins can catalyze misfolding of normal prion proteins that then aggregate
Levels of Description of Structural Complexity Primary Structure (AA sequence) Secondary Structure Spatial arrangement of a polypeptide’s backbone atoms without regard to side-chain conformations , , coil, turns (Venkatachalam, 1968) Super-Secondary Structure , , /, + (Rao and Rassman, 1973) Tertiary Structure 3-D structure of an entire polypeptide Quarternary Structure Spatial arrangement of subunits (2 or more polypeptide chains)
Techniques of Structure Prediction Computer simulation based on energy calculation Based on physio-chemical principles Thermodynamic equilibrium with a minimum free energy Global minimum free energy of protein surface Knowledge Based approaches Homology Based Approach Threading Protein Sequence Hierarchical Methods
Energy Minimization Techniques Energy Minimization based methods in their pure form, make no priori assumptions and attempt to locate global minma. Static Minimization Methods Classical many potential-potential can be construted Assume that atoms in protein is in static form Problems(large number of variables & minima and validity of potentials) Dynamical Minimization Methods Motions of atoms also considered Monte Carlo simulation (stochastics in nature, time is not cosider) Molecular Dynamics (time, quantum mechanical, classical equ.) Limitations large number of degree of freedom,CPU power not adequate Interaction potential is not good enough to model
Molecular Dynamics Provides a way to observe the motion of large molecules such as proteins at the atomic level – dynamic simulation Newton’s second law applied to molecules Potential energy function Molecular coordinates Force on all atoms can be calculated, given this function Trajectory of motion of molecule can be determined
Knowledge Based Approaches Homology Modelling Need homologues of known protein structure Backbone modelling Side chain modelling Fail in absence of homology Threading Based Methods New way of fold recognition Sequence is tried to fit in known structures Motif recognition Loop & Side chain modelling Fail in absence of known example
Homology Modeling Simplest, reliable approach Basis: proteins with similar sequences tend to fold into similar structures Has been observed that even proteins with 25% sequence identity fold into similar structures Does not work for remote homologs (< 25% pairwise identity)
Homology Modeling Given: A query sequence Q A database of known protein structures Find protein P such that P has high sequence similarity to Q Return P’s structure as an approximation to Q’s structure
Threading Given: Find: sequence of protein P with unknown structure Database of known folds Find: Most plausible fold for P Evaluate quality of such arrangement Places the residues of unknown P along the backbone of a known structure and determines stability of side chains in that arrangement
Hierarcial Methods Intermidiate structures are predicted, instead of predicting tertiary structure of protein from amino acids sequence Prediction of backbone structure Secondary structure (helix, sheet,coil) Beta Turn Prediction Super-secondary structure Tertiary structure prediction Limitation Accuracy is only 75-80 % Only three state prediction
Thanks