Protein structure Anne Mølgaard, Center for Biological Sequence Analysis
“Could the search for ultimate truth really have revealed so hideous and visceral-looking an object?” Max Perutz, 1964 on protein structure John Kendrew, 1959 with myoglobin model
Sep. 2001Feb X-ray NMR theoretical3380 total Holdings of the Protein Data Bank (PDB):
Methods for structure determination X-ray crystallography Nuclear Magnetic Resonance (NMR) Modeling techniques
Modeling Only applicable to ~50% of sequences Fast Accuracy poor for low sequence id. There is still need for experimental structure determination!
Structual genomics consortium (SGC) The SGC deposited its 275th structure into the Protein Data Bank in August 2006 currently operating at a pace of 170 structures per year at a cost of USD$125,000 per structure. Scientific highlights include: several (> 1!!) novel structures of protein kinases completing the structural descriptions of the human adenylate kinase and cytosolic sulfotransferase protein families human chromatin modifying enzymes; human inositol phosphate signaling and a significant number of structures from human parasites.
Amino acids
Livingstone & Barton, CABIOS, 9, , 1993 A – Ala C – Cys D – Asp E – Glu F – Phe G – Gly H – His I – Ile K – Lys L – Leu M – Met N – Asn P – Pro Q – Gln R – Arg S – Ser T – Thr V – Val W – Trp Y - Tyr
Primary Secondary Tertiary Quarternary Levels of protein structure
Primary structure MKTAALAPLFFLPSALATTVYLA GDSTMAKNGGGSGTNGWGEYL ASYLSATVVNDAVAGRSAR… (etc)
-helix -sheet left-handed -helix Ramachandran plot
Hydrophobic core Hydrophobic side chains go into the core of the molecule – but the main chain is highly polar The polar groups (C=O and NH) are neutralized through formation of H-bonds
Secondary structure -helix C=O (n) … HN (n+4) -sheet (anti-parallel)
… and all the rest 3 10 helices (C=O (n) … HN (n+3) ), -helices (C=O (n) … HN (n+5) ) -turns and loops (in old textbooks sometimes referred to as random coil)
The -helix has a dipole moment + - C N
Two types of -sheet: anti-parallel parallel
Tertiary structure (domains, modules) Rhamnogalacturonan lyase (1nkg) Rhamnogalacturonan acetylesterase (1k7c)
Quaternary structure B.caldolyticus UPRTase (1i5e) B.subtilis PRPP synthase (1dkr)
A. aculeatus RG acetylesterase Protein structure and water
Classification schemes SCOP –Manual classification (A. Murzin) CATH –Semi manual classification (C. Orengo) FSSP –Automatic classification (L. Holm)
Levels in SCOP Class# Folds# Superfamilies # Families All alpha proteins All beta proteins Alpha and beta proteins (a/b) Alpha and beta proteins (a+b) Multi-domain proteins Membrane and cell surface proteins Small proteins Total
Major classes in SCOP Classes –All alpha proteins –Alpha and beta proteins (a/b) –Alpha and beta proteins (a+b) –Multi-domain proteins –Membrane and cell surface proteins –Small proteins
All : Hemoglobin (1bab)
All : Immunoglobulin (8fab)
Triose phosphate isomerase (1hti)
a+b: Lysozyme (1jsf)
Folds * Proteins which have >~50% of their secondary structure elements arranged the in the same order in the protein chain and in three dimensions are classified as having the same fold No evolutionary relation between proteins *confusingly also called fold classes
Superfamilies Proteins which are (remote) evolutionarily related –Sequence similarity low –Share function –Share special structural features Relationships between members of a superfamily may not be readily recognizable from the sequence alone
Families Proteins whose evolutionarily relationship is readily recognizable from the sequence (>~25% sequence identity) Families are further subdivided into Proteins Proteins are divided into Species –The same protein may be found in several species
Links PDB (protein structure database) – SCOP (protein classification database) –scop.berkeley.eduscop.berkeley.edu CATH (protein classification database) – FSSP (protein classification database) –
Why are protein structures so interesting? They provide a detailed picture of interesting biological features, such as active site, substrate specificity, allosteric regulation etc. They aid in rational drug design and protein engineering They can elucidate evolutionary relationships undetectable by sequence comparisons
COOH NH 2 Asp His Ser Topological switchpoint Inferring biological features from the structure 1deo
Inferring biological features from the structure Active site Triose phosephate isomerase (1ag1) (Verlinde et al. (1991) Eur.J.Biochem. 198, 53)
Engineering thermostability in serpins Overpacking Buried polar groups Cavities Im, Ryu & Yu (2004) Engineering thermostability in serine protease inhibitors PEDS, 17,
Evolution... Structure is conserved longer than both sequence and function
Rhamnogalacturonan acetylesterase (A. aculeatus) (1k7c) Platelet activating factor acetylhydrolase (Bos Taurus) (1wab) Serine esterase (S. scabies) (1esc)
Platelet activating factor acetylhydrolase Serine esterase Rhamnogalacturonan acetylesterase Mølgaard, Kauppinen & Larsen (2000) Structure, 8,
"We wish to suggest a structure for the salt of deoxyribose nucleic acid (D.N.A.). This structure has novel features which are of considerable biological interest…. …It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material." J.D. Watson & F.H.C. Crick (1953) Nature, 171, 737.