Structural alignments of Proteins using by TOPOFIT method Vitkup D., Melamud E., Moult J., Sander C. Completeness in structural genomics. Nature Struct.

Slides:



Advertisements
Similar presentations
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Advertisements

11/9/99ICTAI-99, Chicago1 Protein Secondary Structure Prediction Using Data Mining Tool C5 Meiliu Lu †, Du Zhang †, Hongjun Xu †, Ken Tse-yau Lau ‡, and.
Direct-Coupling Analysis (DCA) and Its Applications in Protein Structure and Protein-Protein Interaction Prediction Wang Yang
Profile-profile alignment using hidden Markov models Wing Wong.
Department of Computer Science, University of California, Santa Barbara August 11-14, 2003 CTSS: A Robust and Efficient Method for Protein Structure Alignment.
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
1 1. BLAST (Basic Local Alignment Search Tool) Heuristic Only parts of protein are frequently subject to mutations. For example, active sites (that one.
FLEX* - REVIEW.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
Appendix: Automated Methods for Structure Comparison Basic problem: how are any two given structures to be automatically compared in a meaningful way?
The Protein Data Bank (PDB)
QSD – Quadratic Shape Descriptors Surface Matching and Molecular Docking Using Quadratic Shape Descriptors Goldman BB, Wipke WT. Quadratic Shape Descriptors.
Similar Sequence Similar Function Charles Yan Spring 2006.
A unified statistical framework for sequence comparison and structure comparison Michael Levitt Mark Gerstein.
1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M.
Sequence comparisons June 23, 2009 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Statistical Shape Models Eigenpatches model regions –Assume shape is fixed –What if it isn’t? Faces with expression changes, organs in medical images etc.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Protein Structure Prediction and Analysis
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.
By: Z. S. Rezaei. Structural comparison  Structural alignment  spectrum of structural alignment methods  The properties of output  Types of comparison.
Protein Tertiary Structure Prediction
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
Protein Sequence Alignment and Database Searching.
Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.
1 Randomized Algorithms for Three Dimensional Protein Structures Comparison Yaw-Ling Lin Dept Computer Sci and Info Engineering, Providence University,
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Ozgur Ozturk, Ahmet Sacan, Hakan Ferhatosmanoglu, Yusu Wang The Ohio State University LFM-Pro: a tool for mining family-specific sites in protein structure.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
DALI Method Distance mAtrix aLIgnment
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Pharm 201 Lecture 10, Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne.
November 18, 2000ICTCM 2000 Introductory Biological Sequence Analysis Through Spreadsheets Stephen J. Merrill Sandra E. Merrill Marquette University Milwaukee,
Pair-wise Structural Comparison using DALILite Software of DALI Rajalekshmy Usha.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Examining Protein Folding Process Simulation and.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
An Efficient Index-based Protein Structure Database Searching Method 陳冠宇.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?
We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
Chapter 14 Protein Structure Classification
Bayesian Refinement of Protein Functional Site Matching
Extra Tree Classifier-WS3 Bagging Classifier-WS3
Finding Functionally Significant Structural Motifs in Proteins
Protein Structures.
DALI Method Distance mAtrix aLIgnment
Protein Structure Alignment
Presentation transcript:

Structural alignments of Proteins using by TOPOFIT method Vitkup D., Melamud E., Moult J., Sander C. Completeness in structural genomics. Nature Struct. Biol. 2001;8: Venkatesan Ravichandran,

Contents  Introduction  Structural Alignments  TOPOFIT method  Delaunay tessellation of Protein Structures  The Comparison Algorithm  Structural comparison of active and binding sites  Conclusion

Introduction  Protein Structure  Bio molecular Structure of protein molecule Primary Linear sequence of amino acids Secondary Highly regular local sub-structures Tertiary Structure 3 Dimensional structure Quaternary 3 Dimensional structure of multi sub unit protein

Structural Alignments  CE method  DALI method  TOPOFIT method using Delaunay tessellation (DT)

CE and DALI method  (CE)Combinatorial extension of alignment path defined by AFP’s, are used to identify the final alignment  (DALI)Distance alignment matrix method, which breaks the input structures into fragments and calculates a distance matrix by evaluating the contact patterns between successive fragments

TOPOFIT  The structurally related proteins have a common spatial invariant part, a set of tetrahedrons.  Mathematically : common spatial subgraph volume of the three-dimensional contact graph  Based on the above property, A Superimposition (TOPOFIT) method is introduced to produce structural alignments of proteins.

TOPOFIT method  To compute Similarity  Parameters  RMSD  where δ is the distance between N pairs of equivalent atoms  Ne (Number of equivalent residues)

Topomax point RMSD/Ne curve  The Feature point in RMSD/Ne curve, which reflects the beginning of the topological mismatch  One can see that the number of aligned positions (N e ) is growing along with an increase of the joint distance.  The divergence begins at RMSD 1–1.5 Å  After the small mismatch the topology Starts to deviate drastically (mismatch increases by 50%)  Not used as Threshold I/p in algorithm

Evaluation of the TOPOFIT method to identify structural neighbors  The test set of 2905 proteins was detected among 10,731 structurally nonrelated protein pairs (DALI and CE)  The majority of the alignments are distributed from 1 to 2 Å of RMSD while few alignments are found at RMSD equals 2 Å or more.  In area A, an exact match  In area B, structural subfamily  in area C, structural superfamily

Materials and Methods used in TOPOFIT  Delaunay tessellation (DT) of protein structures  The comparison algorithm  Statistical significance of TOPOFIT alignments

Delaunay tessellation  Mathematical tool for reconstructing a volume-covering and continuous density or intensity field from a discrete point set

Delaunay tessellation - Cont  DTFE consists of three main steps :  Step 1 :  Discrete point distribution  Delaunay triangle

Delaunay tessellation - Cont  Step 2 : (Heart of DFTE)  The size of the triangles is therefore a measure of the local density of the point distribution

Delaunay tessellation - Cont  Step 3 :  These density estimates are interpolated to any other point, by assuming that inside each Delaunay triangle the density field varies linearly

The Comparison Algorithm  Consider two proteins labeled A and B  The first step is a Delaunay tessellation of points representing the proteins  set of tetrahedrons T A and T B  The second step includes initial classification of the tetrahedrons(shape, volume)  For each tetrahedron t a of a particular type from the protein A  the best matched tetrahedron t b of the same type from the protein B is found  The pair of tetrahedrons (t a,t b ) is called a “seed”.

The Comparison Algorithm - cont  The goal of the third step is to iteratively add as a many points to the seed as possible (they can always be superimposed and the lowest RMSD is recalculated)  If there are pairs of points (x a,x b ), x a ∈ t a, x b ∈ t b with superimposition distance d′x a,x b ) > D, then they are excluded from t a and t b.  The algorithm stops when there are no more new points or when the number of mismatches exceeds the threshold (15%)

Structural comparison of active and binding sites  Usually amino acids are tightly packed in protein structure with an approximate distance between two neighboring residues of 4–7 Å.  The active site residues from one protein can be spatially misaligned with the residues surrounding the active site in the other protein  TOPOFIT alignments with RMSD less than 2 Å might provide an insight on the functional comparison

Statistical significance of TOPOFIT alignments  To derive Z- score using Gaussian approximation  The obtained relation between N e and RMSD for a Z-score equal to 3 is shown. Alignments above Z-score 3 are considered to have a statistically significant structural correlation.

Conclusion  Similar proteins have a common volume with invariant three-dimensional tessellation patterns. These patterns can be used as a measure of structural similarity between proteins.  The topology in these common volumes are equivalent between proteins until the topomax point; therefore, the common subgraph volume, defined at the topomax point by the TOPOFIT method, represents a topological invariant.  It is shown that the topological invariants defined by TOPOFIT have a good geometrical correlation with low RMSD <2Å for the resulting structural alignments.

References    Chan, H.S. and Dill, K.A Polymer principles in protein structure and stability. Annu. Rev. Biophys. Biophys. Chem. 20: 447–490.  Chothia, C Structural invariants in protein folding. Nature 254: 304– 308.  Delaunay, B Sur la sphere vide. Bull. Acad. Sci. USSR VII: Class Sci. Mat. Nat. 7: 793–800.  Finney, J.L Volume occupation, environment and accessibility in proteins. The problem of the protein surface. J. Mol. Biol. 96: 721–732.  Finney, J.L The organization and function of water in protein crystals. Philos. Trans. R. Soc. Lond. B Biol. Sci. 278: 3–32.

Thank you