BMMB597E Protein Evolution Protein classification 1.

Slides:



Advertisements
Similar presentations
Evolution and proteins You can see the effects of evolution, not only in the whole organism, but also in its molecules - DNA and protein For a mutation.
Advertisements

LG 4 Outline Evolutionary Relationships and Classification
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Pfam(Protein families )
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Tema 14. Bases of protein structure and structural prediction. Structural data bank. Protein Data Bank. Molecular Visualization Tools for 3D. Prediction.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Strict Regularities in Structure-Sequence Relationship
Bioinformatics master course DNA/Protein structure-function analysis and prediction Lecture 5: Protein Fold Families Jaap Heringa Integrative Bioinformatics.
Protein Structure, Databases and Structural Alignment
Protein structure (Part 2 of 2).
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Appendix: Automated Methods for Structure Comparison Basic problem: how are any two given structures to be automatically compared in a meaningful way?
The Protein Data Bank (PDB)
ProteinStructuralDatabases. Proteins are built from amino-acids. Introduction H | NH2-c-CO2H | R.
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Protein structures in the PDB
Classification and comparison of protein structures Overview Domains as the fundamental unit of classification Major structural classification systems-CATH,
. Computational Genomics Lecture #3a (revised 24/3/09) This class has been edited from Nir Friedman’s lecture which is available at
Protein structure Classification Ole Lund, Associate professor, CBS, DTU.
BMI 731 Protein Structures and Related Database Searches.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
Protein Structure Prediction II
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structure Prediction and Analysis
Lecture 3. α domain structures Coiled-coil, knobs and hole packing Four-helix bundle Donut ring large structure Globin fold Ridges and grooves model CS882,
D.5: Phylogeny and Systematics
Protein Tertiary Structure Prediction
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Supersecondary structures. Supersecondary structures motifs motifs or folds, are particularly stable arrangements of several elements of the secondary.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Bioinformatics master course DNA/Protein structure-function analysis and prediction Lecture 5: Protein Fold Families Centre for Integrative Bioinformatics.
Systematics the study of the diversity of organisms and their evolutionary relationships Taxonomy – the science of naming, describing, and classifying.
Structural databases Lecture 5 Structural Bioinformatics Dr. Avraham Samson
STRUCTURAL ORGANIZATION
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
CATH – a hierarchic classification of protein domain structures Rui Kuang.
PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)
Protein structure and modelling ● Orientation ● Protein structure ● Protein modelling Andreas Heger University of Helsinki Bioinformatics Group Slides.
Tertiary structure combines regular secondary structures and loops (coil) Bovine carboxypeptidase A.
Protein Structure Comparison. Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein Strucure Comparison Chapter 6,7 Orengo. Helices α-helix4-turn helix, min. 4 residues helix3-turn helix, min. 3 residues π-helix5-turn helix,
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Structural proteomics
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
Principles of Protein Structure. AMINOACIDS Estereoisomer L Side-chain (-CH 3 ) }carboxyl-COOH amino amino -NH 2.
Taxonomy & Phylogeny. B-5.6 Summarize ways that scientists use data from a variety of sources to investigate and critically analyze aspects of evolutionary.
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
Chapter 14 Protein Structure Classification
Protein Structure September 7,
Biological Classification: The science of taxonomy
Biological Classification: The science of taxonomy
There are four levels of structure in proteins
Sequence Based Analysis Tutorial
D.5: Phylogeny and Systematics
Classification: understanding the diversity and principles of
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

BMMB597E Protein Evolution Protein classification 1

Protein families The first protein structures determined by X-ray crystallography, myoglobin and haemoglobin, were solved (in 1959—60) before the amino acid sequences were determined It came as a surprise that the structures were quite similar Soon it became clear, on the basis of both sequences and structures, that there were families of proteins 2

myoglobin haemoglobin 3

50 years earlier, there were some hints … E.T. Reichert & A.P. Brown. The differentiation and specificity of corresponding proteins and other vital substances in relation to biological classification and organic evolution: the crystallography of hemoglobins. (Carnegie Institution of Washington, 1909) Crystallography 3 years before discovery of X- ray diffraction? 4

Reichert and Brown studied interfacial angles in haemoglobin crystals Stenö’s law (1669): different crystals of the same substance may have differerent sizes and shapes, but the angles between faces are constant for each substance They found that the angles differed from species to species Similarities in values of interfacial angles were consistent with classical taxonomic tree They even found differences between oxy- and deoxyhaemoglobin 5

Most premature scientific result ever? These results implied: – That proteins adopted (or at least could adopt) unique structures, to form a crystal – That protein structures varied between species – That this variation was parallel with the evolution of the species – That proteins could change structure as a result of changes in state of ligation In 1909! 6

M.O. Dayhoff Pioneer of bioinformatics Collected protein sequences First curated ‘database’ Recognized that proteins form families, on the basis of amino acid sequences Computational sequence alignments First evolutionary tree First amino-acid substitution matrix (later replaced by BLOSUM) 7

Can relationships among proteins be extended beyond families? Families = sets of proteins with such obvious similarities that we assume that they are related One question: how much similarity do we need to believe in a relationship? How far can evolution go? Convergent evolution? Cautionary tale: chymotrypsin / subtilisin 8

Chymotrypsin-subtilisin Both proteolytic enzymes – Chymotrypsin mammalian – subtilisin from B. subtilis Both have catalytic triads Same function – same mechanism Sequences 12% similar (near noise level) However, structures show them to be unrelated 9

Chymotrypsin / Subtilisin 10

Catalytic triad in serine proteinases 11

Chymotrypsin and subtilisin have similar catalytic triads 12

How can we classify proteins that belong to families? Align sequences Calculate phylogenetic tree (various ways to do this, depend on sequence alignment) Usually, phylogenetic tree of homologous proteins from different species follow phylogenetic tree based on classical taxonomy That is reassuring But what happens as divergence proceeds? 13

How can we classify proteins that do not obviously belong to families? Base this on structure rather than sequence Structural similarities are maintained as divergence proceeds, better than sequence similarities For closely related proteins, expect no difference between sequence-based and structure based classification How far can classification be extended? 14

SCOP Structural Classification of Proteins Idea of A.G. Murzin, based on old work by C. Chothia and M. Levitt Even if two proteins are not obviously homologous, they may share structural features, to a greater or lesser degree. For instance, the secondary structures of some proteins are only  -helices Others, have  -sheets but no  -helices 15

SCOP SCOP is a database that gives a hierarchical classification of all protein domains Recall that a domain is a compact subunit of a protein structure that ‘looks as if’ it would have independent stability 16 Fragment of fibronectin

Dissection of structure into domains It is not always quite so obvious how to divide a protein into domains There is some (not a lot) of room for argument Note that sometimes the chain passes back and forth between domains In these cases one or both domains do not consist entirely of a consecutive set of residues 17

lactoferrin 18

SCOP, CATH, DALI Database classify protein structures SCOP (Structural Classification of Proteins) CATH (Class, Architecture, Topology, Homologous superfamily) DALI Database These web sites have many useful features: – information-retrieval engines, including search by keyword or sequence – presentation of structure pictures – links to other related sites including bibliographical databases. 19

SCOP SCOP organizes protein structures in a hierarchy according to evolutionary origin and structural similarity. Domains -- extracted from the Protein Data Bank entries. Sets of domains are grouped into families: sets domains for which imilarities in structure, function and sequence imply a common evolutionary origin. 20

The SCOP hierarchy Families that share a common structure, or even a common structure and a common function, but lack adequate sequence similarity – so that the evidence for evolutionary relationship is suggestive but not compelling – are grouped into superfamilies Superfamilies that share a common folding topology, for at least a large central portion of the structure, are grouped as folds. Finally, each fold group falls into one of the general classes. 21

Major classes in SCOP  – secondary structure all helical  – secondary structure all sheet  /  – helices and sheets, but in different parts of structure  +  – contain  -  -  supersecondary structure ‘small proteins’ – which often have little secondary structure and are held together by disulphide bridges or ligands; for instance, wheat- germ agglutinin) 22

Summary of SCOP hierarchy Class Fold Superfamily Family Domain 23

SCOP classification of flavodoxin Protein: Flavodoxin from Clostridium beijerinckii [TaxId: 1520][TaxId: 1520] Lineage: Root: scopscop Class: Alpha and beta proteins (a/b) [51349] Mainly parallel beta sheets (beta-alpha-beta units)Alpha and beta proteins (a/b) Fold: Flavodoxin-like [52171] 3 layers, a/b/a; parallel beta-sheet of 5 strand, order 21345Flavodoxin-like Superfamily: Flavoproteins [52218] Family: Flavodoxin-related [52219] binds FMNFlavoproteinsFlavodoxin-related Protein: Flavodoxin [52220] Species: Clostridium beijerinckii [TaxId: 1520] [52226]Clostridium beijerinckii [TaxId: 1520] PDB Entry Domains: 5nul5nul complexed with fmn; mutant chain a [31191] chain a 2fax2fax complexed with fmn; mutant chain a [31194] chain a … many others 24

Clostridium beijerinckii Flavodoxin (stereo pair) 25

Flavodoxin NADPH-cytochrome P450 reductase same superfamily, different family 26

Flavodoxin CHEY same fold, different superfamily 27

Flavodoxin Spinach ferredoxin reductase same class, different folds 28

Flavodoxin in the SCOP hierarchy To give some idea of the nature of the similarities expressed by the different levels of the hierarchy Flavodoxin from Clostridium beijerinckii and NADPH- cytochrome P450 reductase are in the same superfamily, but different families. Flavodoxin and the signal transduction protein CHEY are in the same fold category, but different superfamilies. Flavodoxin and Spinach ferredoxin reductase are in the same class –  +  – but have different folds. 29

CATH presents a classification scheme similar to that of SCOP CATH = Class, Architecture, Topology, Homologous superfamily, the levels of its hierarchy. In CATH, proteins with very similar structures, sequences and functions are grouped into sequence families. A homologous superfamily contains proteins for which similarity of sequence and structure gives evidence of common ancestry A topology or fold family comprises sets of homologous superfamilies that share the spatial arrangement and connectivity of helices and strands Architectures are groups of proteins with similar arrangements of helices and sheets, but with different connectivity. For instance, different four  -helix bundles with different connectivities would share the same architecture but not the same topology in CATH General classes of architectures in CATH are: . ,  -  (subsuming the  /  and  +  classes of SCOP), and domains of low secondary structure content. 30

Do different classification schemes agree? To classify protein structures (or any other set of objects) you need to be able to measure the similarities among them. The measure of similarity induces a tree-like representation of the relationships. CATH, SCOP, DALI and the others, agree, for the most part, on what is similar, and the tree structures of their classifications are therefore also similar. However, even an objective measure of similarity does not specify how to define the different levels of the hierarchy. These are interpretative decisions, and any apparent differences in the names and distinctions between the levels disguise the underlying general agreement about what is similar and what is different. 31