Download presentation
Presentation is loading. Please wait.
1
Chapter 14 Protein Structure Classification
A classification of structures is useful for different reasons It is helpful in understanding the evolution Useful to describe protein fold space Which of the possible folds exist in nature How many different folds exists. If there exist a finite number of folds , the structure prediction problem becomes more easy Useful to classify a new structure as being of a known/new fold Help in understanding the relationship between structure and function Classification makes protein 3D structure data more accessible and understandable Classification on the basis of common fold and function informs hypothesies about how proteins evolve new functions. What is the relationship between protein fold and folding pathway? Chapter 14 Structure classification
2
Protein Structure Classification
Mainly three systems exist for structure classification CATH: Class - Architecture – Topology – Homologe superfamily SCOP: Structure Classification of Proteins Dali-FSSP and Dali-DD (Fold classification based on Structure Structure comparison of Proteins) Most of them use Protein domains as unit for classification Chapter 14 Structure classification
3
Chapter 14 Structure classification
Protein domains There does not exist a general accepted definition of what a domain is, but some properties are: A domain is part of a polypeptide chain of a protein or the whole chain It does not need to be a contigeous region of the polypeptide chain It can fold independently to its stabil fold It has its own function It contains at least one hydrophobic core It is local compact Chapter 14 Structure classification
4
Identifying protein domains
Different classification methods use different properties for domain definition, so the identified domains of a protein can vary with the method The most common concepts used for domain identification are Local compactness, a domain makes more intra-domain contacts than contacts to the residues in the remainder of the structure It must have at least one hydrophobic core Minimizing the number of chain-breaks needed to separate domains while also measuring the degree of contacts between the separating units Solvent area calculation Secondary structure elements should rarely cross between different domains Chapter 14 Structure classification
5
An Ising model for identifying protein domains
An Ising model consists of nodes, which can be in one of several states Each node has an initial state The states are changed in an iteration, until all nodes belonging to a ”group” are in the same state The changing depends on the state of its neighbour states For domain identifiaction we have The nodes are the residues A group is a domain The states are specified by numerical values The average value of the neighbouring states (in space) are used to decide if changing, and to what Chapter 14 Structure classification
6
An Ising model for identifying protein domains, cont’
We then must decide The initial value Let it be the residue number The neighbourhood Define a radius around the residue How to update (change) the state of residue i Let sit be the state of residue i after t iterations Sit+1 = sit +k, where k is 1 if the neighbourhood has ”greater states” than residue i -1 if the neighbourhood has ”lower states” than residue i 0 otherwise The state of the neighbourhood depends on the states of its residues, and the distances to residue i Must have a method for assuring termination of the iteration Chapter 14 Structure classification
7
Chapter 14 Structure classification
Domain classes The core of a protein is made by packing the SSEs Two types take part in the packing , hence only three types of pairwise connections: alpha with alpha beta with beta alpha with beta All these connections may exist in a domain, but very often one of the connections dominate The domains can therefore be classified after the dominance of a connection into different classes Mainly alpha Mainly beta Alpha-beta, which can be divided into alpha/beta and alpha+beta Chapter 14 Structure classification
8
Chapter 14 Structure classification
Folds A fold is a special arrangement of SSEs An open question is how many (different) folds exist in nature Proteins in the same fold are homologous, or converged to the same fold (Automatic) classification SCOP is completely manually constructed CATH partly automatically constructed FSSP/DaliDD is fully automatically constructed A representative set of nonredundant (unrelated) structures (less than 25% sequence identity) from PDB is constructed Construct a distribution of the scores of all pairwise alignments between the unrelated structures Calculate the middle value m and the standard deviation s Two structures with scoring larger than m+2s are said to have equal folds Chapter 14 Structure classification
9
Comparison of the different classification methods
Chapter 14 Structure classification
10
Classification by CATH
The structures are first divided into domains. Three different methods for domain identification are first used, and if not agreement, manually decision is performed. Then the domains are classified Class assignment: (three classes) assign SSE to each residue (alpha, beta, loop) represent the SSEs as sticks count the number of residues in each SSE-type count the numbers of contacts for alpha-alpha and beta-beta use 2 to 4 to decide the class Architecture assignment Use how the SSEs are organized, independent of topology. Is performed manually Chapter 14 Structure classification
11
Classification by CATH, cont’
Fold assignment: (Topology) Use SSAP for comparison Main rule: SSAP-scoring greater than 70 and 60% of the smallest structure matches the largest, is interpreted as similar fold Homologous superfamily Use SSAP scoring and sequence equality Sequence families Sequence identity greater than 35% (How to measure sequence similarity?) Chapter 14 Structure classification
12
Classification by CATH, the procedure
Chapter 14 Structure classification
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.