Chapter 14 Protein Structure Classification

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

C A T H C A T H lass rchitecture opology or Fold Group
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
The amino acids in their natural habitat. Topics: Hydrogen bonds Secondary Structure Alpha helix Beta strands & beta sheets Turns Loop Tertiary & Quarternary.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Alpha/Beta structures Barrels, sheets and horseshoes.
Protein structure. Amino acids Amino acids: R group properties.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Strict Regularities in Structure-Sequence Relationship
Bioinformatics master course DNA/Protein structure-function analysis and prediction Lecture 5: Protein Fold Families Jaap Heringa Integrative Bioinformatics.
Protein structure (Part 2 of 2).
Statistics Are Fun! Analysis of Variance
Appendix: Automated Methods for Structure Comparison Basic problem: how are any two given structures to be automatically compared in a meaningful way?
The Protein Data Bank (PDB)
Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia.
ProteinStructuralDatabases. Proteins are built from amino-acids. Introduction H | NH2-c-CO2H | R.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Structures and Structure Descriptions Chapter 8 Protein Bioinformatics.
Protein Homology Detection Using String Alignment Kernels Jean-Phillippe Vert, Tatsuya Akutsu.
Protein structure prediction May 30, 2002 Quiz#4 on June 4 Learning objectives-Understand difference between primary secondary and tertiary structure.
Protein structures in the PDB
Identification of Domains using Structural Data Niranjan Nagarajan Department of Computer Science Cornell University.
Protein structure Classification Ole Lund, Associate professor, CBS, DTU.
BMI 731 Protein Structures and Related Database Searches.
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
Protein Structure Prediction II
Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Structure Prediction and Analysis
IBGP/BMI 705 Lab 4: Protein structure and alignment TA: L. Cooper.
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
Bioinformatics master course DNA/Protein structure-function analysis and prediction Lecture 5: Protein Fold Families Centre for Integrative Bioinformatics.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Statistical Decision Theory
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.
1 Randomized Algorithms for Three Dimensional Protein Structures Comparison Yaw-Ling Lin Dept Computer Sci and Info Engineering, Providence University,
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
CATH – a hierarchic classification of protein domain structures Rui Kuang.
BMMB597E Protein Evolution Protein classification 1.
Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND.
Tertiary structure combines regular secondary structures and loops (coil) Bovine carboxypeptidase A.
Protein Structure Comparison. Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein Strucure Comparison Chapter 6,7 Orengo. Helices α-helix4-turn helix, min. 4 residues helix3-turn helix, min. 3 residues π-helix5-turn helix,
DALI Method Distance mAtrix aLIgnment
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
1 Psych 5500/6500 Measures of Variability Fall, 2008.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
An Efficient Index-based Protein Structure Database Searching Method 陳冠宇.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
Volume 112, Issue 7, Pages (April 2017)
Classification: understanding the diversity and principles of
Protein Structures.
Protein structure prediction.
DALI Method Distance mAtrix aLIgnment
Protein Structural Classification
Protein structure prediction
Presentation transcript:

Chapter 14 Protein Structure Classification A classification of structures is useful for different reasons It is helpful in understanding the evolution Useful to describe protein fold space Which of the possible folds exist in nature How many different folds exists. If there exist a finite number of folds , the structure prediction problem becomes more easy Useful to classify a new structure as being of a known/new fold Help in understanding the relationship between structure and function Classification makes protein 3D structure data more accessible and understandable Classification on the basis of common fold and function informs hypothesies about how proteins evolve new functions. What is the relationship between protein fold and folding pathway? Chapter 14 Structure classification

Protein Structure Classification Mainly three systems exist for structure classification CATH: Class - Architecture – Topology – Homologe superfamily SCOP: Structure Classification of Proteins Dali-FSSP and Dali-DD (Fold classification based on Structure Structure comparison of Proteins) Most of them use Protein domains as unit for classification Chapter 14 Structure classification

Chapter 14 Structure classification Protein domains There does not exist a general accepted definition of what a domain is, but some properties are: A domain is part of a polypeptide chain of a protein or the whole chain It does not need to be a contigeous region of the polypeptide chain It can fold independently to its stabil fold It has its own function It contains at least one hydrophobic core It is local compact Chapter 14 Structure classification

Identifying protein domains Different classification methods use different properties for domain definition, so the identified domains of a protein can vary with the method The most common concepts used for domain identification are Local compactness, a domain makes more intra-domain contacts than contacts to the residues in the remainder of the structure It must have at least one hydrophobic core Minimizing the number of chain-breaks needed to separate domains while also measuring the degree of contacts between the separating units Solvent area calculation Secondary structure elements should rarely cross between different domains Chapter 14 Structure classification

An Ising model for identifying protein domains An Ising model consists of nodes, which can be in one of several states Each node has an initial state The states are changed in an iteration, until all nodes belonging to a ”group” are in the same state The changing depends on the state of its neighbour states For domain identifiaction we have The nodes are the residues A group is a domain The states are specified by numerical values The average value of the neighbouring states (in space) are used to decide if changing, and to what Chapter 14 Structure classification

An Ising model for identifying protein domains, cont’ We then must decide The initial value Let it be the residue number The neighbourhood Define a radius around the residue How to update (change) the state of residue i Let sit be the state of residue i after t iterations Sit+1 = sit +k, where k is 1 if the neighbourhood has ”greater states” than residue i -1 if the neighbourhood has ”lower states” than residue i 0 otherwise The state of the neighbourhood depends on the states of its residues, and the distances to residue i Must have a method for assuring termination of the iteration Chapter 14 Structure classification

Chapter 14 Structure classification Domain classes The core of a protein is made by packing the SSEs Two types take part in the packing , hence only three types of pairwise connections: alpha with alpha beta with beta alpha with beta All these connections may exist in a domain, but very often one of the connections dominate The domains can therefore be classified after the dominance of a connection into different classes Mainly alpha Mainly beta Alpha-beta, which can be divided into alpha/beta and alpha+beta Chapter 14 Structure classification

Chapter 14 Structure classification Folds A fold is a special arrangement of SSEs An open question is how many (different) folds exist in nature Proteins in the same fold are homologous, or converged to the same fold (Automatic) classification SCOP is completely manually constructed CATH partly automatically constructed FSSP/DaliDD is fully automatically constructed A representative set of nonredundant (unrelated) structures (less than 25% sequence identity) from PDB is constructed Construct a distribution of the scores of all pairwise alignments between the unrelated structures Calculate the middle value m and the standard deviation s Two structures with scoring larger than m+2s are said to have equal folds Chapter 14 Structure classification

Comparison of the different classification methods Chapter 14 Structure classification

Classification by CATH The structures are first divided into domains. Three different methods for domain identification are first used, and if not agreement, manually decision is performed. Then the domains are classified Class assignment: (three classes) assign SSE to each residue (alpha, beta, loop) represent the SSEs as sticks count the number of residues in each SSE-type count the numbers of contacts for alpha-alpha and beta-beta use 2 to 4 to decide the class Architecture assignment Use how the SSEs are organized, independent of topology. Is performed manually Chapter 14 Structure classification

Classification by CATH, cont’ Fold assignment: (Topology) Use SSAP for comparison Main rule: SSAP-scoring greater than 70 and 60% of the smallest structure matches the largest, is interpreted as similar fold Homologous superfamily Use SSAP scoring and sequence equality Sequence families Sequence identity greater than 35% (How to measure sequence similarity?) Chapter 14 Structure classification

Classification by CATH, the procedure Chapter 14 Structure classification