Download presentation
Presentation is loading. Please wait.
Published byArleen Flora Daniel Modified over 8 years ago
1
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Sanchayita Sen, Ph.D. PDB Depositions
2
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 22 Structure alignment may be defined as identification of residues occupying “equivalent” geometrical positions Unlike in sequence alignment, residue type is neglected Used for measuring the structural similarity protein classification and functional analysis database searches Structure alignment
3
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 3 Methods Many methods are known: Distance matrix alignment (DALI, Holm & Sander, EBI) Vector alignment (VAST, Bryant et. al. NCBI) Depth-first recursive search on SSEs (DEJAVU, Madsen & Kleywegt, Uppsala) Combinatorial extension (CE, Shindyalov & Bourne, SDSC) Dynamical programming on C (Gerstein & Levitt) Dynamical programming on SSEs (SSA, Singh & Brutlag, Stanford University) many other SSM employs a 2-step procedure: A Initial structure alignment and superposition using SSE graph matching B C - alignment
4
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 4 Three dimensional graph matching Protein secondary structure elements (SSE)– natural and convenient objects for building three dimensional graphs. Secondary structures provide most functionality and is conserved through evolution Details of protein fold –expressed in terms of two SSE – helices and strands.
5
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 5 e L SSE graphs- represented by vectors Each SSE can be used as graph vertices (T i, ρ i ) Any 2 vertices are connected by an edge label L – describes position and orientation of the connected SSEs Each edge labelled with a property vector – α 1/2 angle between edge and vertices, torsion angle between vertices, length of the edge L Graph representation of SSEs ViVi VjVj
6
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 6 Sets of vertices, edges and their labels provides full definition of the graph. Graph matching algorithm is required – set of rules for comparing individual vertices and edges – tolerances chosen empirically Relative and absolute vertex and edge lengths are used for comparison – allows larger absolute differences for longer vertices and edges Torsion angle comparison – distinguish mirror symmetry mates e L
7
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 7 H1H1 S1S1 S2S2 S3S3 S4S4 H2H2 H1H1 H2H2 H3H3 H4H4 S1S1 H5H5 H6H6 S2S2 S3S3 S4S4 S5S5 S6S6 S7S7 H1H1 S1S1 S2S2 H2H2 S3S3 S4S4 S5S5 S6S6 S7S7 H3H3 H4H4 H5H5 H6H6 B H1H1 S1S1 S2S2 S3S3 S4S4 H2H2 A A B Matching the SSE graphs yields a correspondence between secondary structure elements, that is, groups of residues. The correspondence may be used as initial guess for structure superposition and alignment of individual residues. SSE graph matching
8
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 8 What next? We have considered three dimensional arrangement of secondary structure element (SSE) regardless of their ordering in protein chain. Connectivity of SSEs is significant (can be neglected in comparing mutated/engineered proteins) In previous methods connectivity was either preserved or neglected.
9
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 9 PDBefold (SSM) Approach – a more flexible way There are three options – 1) connectivity of SSEs neglected Different connectivity in SSE but SSE graphs are geometrically identical
10
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 10 2) Soft connectivity – general order of SSEs along their protein chains are same in both structures BUT any number of missing/unmatched SSE between matched ones allowed 3)Strict connectivity – matched SSEs follow same order along their protein chains – separated only by equal number of matched/unmatched SSE in both structures To obtain 3D alignment of individual residues – represent them by their C-alpha atoms – use results of graph matching as a starting point
11
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 11 SSE-alignment is used as an initial guess for C -alignment C -alignment is an iterative procedure based on the expansion of shortest contacts at best superposition of structures matched helicesmatched strands chain A chain B C -alignment is a compromise between the alignment length N align and r.m.s.d. Longest contacts are unmapped in order to maximise the Q -score: C - alignment
12
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 12 More than 2 structures are aligned simultaneously Multiple alignment is not equal to the set of all-to-all pairwise alignments Helps to identify common structure motifs for a whole family of structures Multiple structure alignment
13
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 13Macromolecular Structure Database31.10.0713 If you have to ask…. Are there any structures in the PDB that are similar to mine? What SCOP and/or CATH family could my structure belong to ? Can I get some idea about the possible function of my protein based on similarity with others based on structural similarity ? Mutiple alignment of many of my structures ? USE PDBefold Upload your own PDB file for analysis !!
14
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 14 SSM server map http://www.ebi.ac.uk/msd-srv/ssm/
15
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 15 SSM output Table of matched Secondary Structure Elements Table of matched backbone C -atoms with distances between them at best structure superposition Rotation-translation matrix of best structure superposition Visualisation in Jmol and Rasmol r.m.s.d. of C -alignment Length of C -alignment N align Number of gaps in C -alignment Quality score Q Statistical significance scores P(S), Z Sequence identity
16
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 16 Sequence and Structure Alignments Sequence alignment Based on residue identity, sometimes with a modified alphabet --AARNEDDDGKMPSTF-L E-AARNFG-DGK--STFIL Algorithms: Dynamic programming + heuristics Applications: BLAST, FASTA, FLASH and others Used for: evolution studies protein function analysis guessing on structure similarity Structure alignment Based on geometrical equivalence of residue positions, residue type disregarded Used for: protein function analysis some aspects of evolution studies Algorithms: Dynamic programming, graph theory, MC, geometric hashing and others Applications: DALI, VAST, CE, MASS, SSM and others
17
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 17 The PDBefold Search Interface
18
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 18 The Results Page For Pairwise Alignment
19
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 19 Analyzing the result from a particular pairwise alignment
20
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 20 Residue by Residue Structural alignment result
21
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 21 Multiple 3D alignment using MSDfold
22
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 22 Results from multiple 3D alignment
23
PROTEIN DATA BANK EUROPE www.ebi.ac.uk/pdbe 23 it is quite possible that residue identity plays a much less significant role in protein structure than often believed as a consequence, the role of residue identity in protein function may be often overestimated using sequence identity for the assessment of structural or functional features may give more false negatives than expected physical-chemical properties of residues should be given preference over residue identity in structure and function analysis modern methods for structure alignment are efficient; there is little sense to use sequence alignment in structure-related studies Conclusion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.