By: Z. S. Rezaei. Structural comparison  Structural alignment  spectrum of structural alignment methods  The properties of output  Types of comparison.

Slides:



Advertisements
Similar presentations
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Advertisements

3D Molecular Structures C371 Fall Morgan Algorithm (Leach & Gillet, p. 8)
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
With thanks to Zhijun Wu An introduction to the algorithmic problems of Distance Geometry.
Protein Tertiary Structure Prediction
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Structure, Databases and Structural Alignment
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
FLEX* - REVIEW.
Mossbauer Spectroscopy in Biological Systems: Proceedings of a meeting held at Allerton House, Monticello, Illinois. Editors: J. T. P. DeBrunner and E.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Appendix: Automated Methods for Structure Comparison Basic problem: how are any two given structures to be automatically compared in a meaningful way?
The Protein Data Bank (PDB)
Structures and Structure Descriptions Chapter 8 Protein Bioinformatics.
Protein structure prediction May 30, 2002 Quiz#4 on June 4 Learning objectives-Understand difference between primary secondary and tertiary structure.
Similar Sequence Similar Function Charles Yan Spring 2006.
Protein Tertiary Structure Prediction Structural Bioinformatics.
BMI 731 Protein Structures and Related Database Searches.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Structure Prediction and Analysis
Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.
IBGP/BMI 705 Lab 4: Protein structure and alignment TA: L. Cooper.
Protein Tertiary Structure Prediction
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Structural alignments of Proteins using by TOPOFIT method Vitkup D., Melamud E., Moult J., Sander C. Completeness in structural genomics. Nature Struct.
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Protein Sequence Alignment and Database Searching.
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
1 Randomized Algorithms for Three Dimensional Protein Structures Comparison Yaw-Ling Lin Dept Computer Sci and Info Engineering, Providence University,
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
1 Motifs & Domains. Protein domains Pairwise sequence comparison of proteins led to strange results A domain is an independent folding unit A domain is.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Protein Structure Comparison. Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein Strucure Comparison Chapter 6,7 Orengo. Helices α-helix4-turn helix, min. 4 residues helix3-turn helix, min. 3 residues π-helix5-turn helix,
DALI Method Distance mAtrix aLIgnment
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Pharm 201 Lecture 10, Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne.
A data-mining approach for multiple structural alignment of proteins WY Siu, N Mamoulis, SM Yiu, HL Chan The University of Hong Kong Sep 9, 2009.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Pair-wise Structural Comparison using DALILite Software of DALI Rajalekshmy Usha.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Examining Protein Folding Process Simulation and.
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
Jürgen Sühnel Supplementary Material: 3D Structures of Biological Macromolecules Exercise 1:
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
An Efficient Index-based Protein Structure Database Searching Method 陳冠宇.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Local Flexibility Aids Protein Multiple Structure Alignment Matt Menke Bonnie Berger Lenore Cowen.
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
Chapter 14 Protein Structure Classification
Protein Structures.
Protein structure prediction.
DALI Method Distance mAtrix aLIgnment
Presentation transcript:

By: Z. S. Rezaei

Structural comparison  Structural alignment  spectrum of structural alignment methods  The properties of output  Types of comparison  Algorithmic complexity  Representation of structures  Distance matrix  Methods  Alignment of large RNA molecules  The classes of scoring

Structural alignment  homology between two or more polymer (2)  a window into the distant past of protein evolution(1)  identification homologous(1)  imply evolutionary relationships between proteins that share very little common sequence(2)  prediction of the functions and the family of the query protein(2)

 rely on information about conformations.( from X-ray crystallography or NMR spectroscopy or structure prediction methods)  for evaluating prediction methods

spectrum of structural alignment methods(1)

The properties of out put  a superposition of the atomic coordinate sets and a minimal RMSD.  existence of multiple protein domains complicates the Structural alignment  a set of superposed three-dimensional coordinates for each input structure(2)

the root mean square (RMS) (3)

A geometrical system Determination uniquely a spatial element

Coordinate system  Spatial coordinate  Planar coordinate

Types of comparisons  Structural superposition used to compare multiple conformations of the same protein uses a simple least-squares fitting algorithm(2)  Alignment Algorithms based on multidimensional rotations and modified quaternions (2)

Definition of quaternion  a number system In mathematics  a quaternion as the quotient of two directed lines in a three-dimensional  represented as the sum of a scalar and a vector (6)

Algorithmic complexity  Optimal solution  Approximate solution(2)

Optimal solution  The optimal "threading" shown to be NP-complete  Strictly speaking, an optimal solution is only known for certain protein structure similarity measures  the algorithm for optimal solution is not practical (2)

Approximate solution Approximate polynomial-time algorithms for structural alignment theoretically classify the approximate protein structure alignment(2)

Representation of structures Protein structurrepresented in some coordinate-independent space by constructing series of matrices (2)

distance matrix a two-dimensional matrix distances between some subset of the atoms (such as the alpha carbons) Reducing the protein to a coarse metric such as secondary structure elements (SSEs)(2)

Methods(2)  DALI  Combinatorial extension(CE)  GANGSTA+  MAMMOTH  ProBiS  RAPIDO  SABERTOOTH  SSAP  Spalign  TOPOFIT  SSM

DALI  distance alignment matrix method  breaks the input structures into hexapeptide fragments and calculates a distance matrix  Distance matrix has two diagonals  conducted via a series of overlapping submatrices of size 6x6  Submatrix matches are reassembled into a final alignment

DALI  The original version used a Monte Carlo simulation  The DALI method has also been used to construct a database known as FSSP (Fold classification based on Structure-Structure alignment of Proteins, or Families of Structurally Similar Proteins)  There is an searchable database based on DALI as well as a downloadable program and web search based on a standalone version known as DaliLite.

Montecarlo method a class of computational algorithms relies on repeated random sampling to compute their results especially useful for simulating systems with many coupled degrees of freedom, such as fluids, disordered materials, strongly coupled solids, and cellular structures (4)

Combinatorial extension(CE) is similar to DALI uses AFPs to define a similarity matrix A number of similarity metrics are possible

Combinatorial extension(CE)  initial AFP pair that nucleates the alignment  proceed with the next AFP  The RCSB PDB has recently released an updated version of CE and FATCAT as part of the RCSB PDB Protein Comparison Tool  provides a new variation of CE that can detect circular permutations in protein structures

Circular permutations  A circular permutation is a relationship between proteins whereby the proteins have a changed order of amino acids in their peptide sequence. The result is a protein structure with different connectivity, but overall similar three- dimensional (3D) shape(7)

GANGSTA+  A combinatorial algorithm for non-sequential structural alignment of proteins  searching for similarity in databases (  evaluates based on contact maps and secondry structure

MAMMOTH  MAtching Molecular Models Obtained from Theory  For comparing models coming from structure prediction  decompose the protein structure into heptapeptides  The similarity score between two heptapeptides is calculated using a unit-vector RMS (URMS) method  These scores are stored in a similarity matrix  Derived from the likelihood of obtaining a given structural alignment by chance

MAMMOTH-mult  extension of the MAMMOTH algorithm  is very fast  produces consistent and high quality structural alignments  produces structurally implied sequence alignments that can be further used for multiple-template homology modeling

ProBiS  Protein Binding Sites. ProBiS  detects structurally similar sites on protein surfaces  compares the query protein to members of a database of protein 3D structures  Using an efficient maximum clique algorithm  Structural similarity scores are calculated for the query protein’s surface residues, and are expressed as different colors  used successfully for the detection of protein– protein, protein–small ligand and protein–DNA binding sites

RAPIDO  Rapid Alignment of Proteins In terms of Domains  a web server for the 3D alignment of crystal  using an approach based on difference distance matrices  The Matching Fragment Pairs (MFPs) are then represented as nodes in a graph  nodes in graph are chained together to form an alignment by means of an algorithm for the identification of the longest path on a DAG (Directed Acyclic Graph).  The final step: improve the quality of the alignment

SABERTOOTH  structural profiles to perform structural alignments  has favourable scaling of computation time with chain length  SABERTOOTH can be used online at

SSAP  Sequential Structure Alignment Program  uses double dynamic programming  constructs its vectors from the beta carbons for all residues except glycine  A series of matrices are constructed  Dynamic programming applied to each resulting matrix  matrices are then summed into a "summary" matrix to  Final dynamic programming is applied again to determine the overall structural alignment

SSAP  originally produced only pairwise alignments  but has since been extended to multiple alignments as well  applied in an all-to-all fashion to produce a hierarchical fold classification scheme known as CATH (Class, Architecture, Topology, Homology)  construct the CATH Protein Structure Classification database

SPalign  Based on a new size-independent score called SPscore for  The source code for SPalign and the server are available at server/SPalign/

TOPOFIT  Based on Delaunay tessellation (DT)  identifies a feature point on the RMSD/Ne curve  topomax point  to detect conformational changes, topological differences in variable parts

SSM  Secondary Structure Matching (SSM), or PDBeFold at the Protein Data Bank in Europe  uses graph matching followed by c- alpha alignment to compute alignments

Recent Developments Tmalign uses a novel method for weighting its distance matrix correct for effects arising from alignment lengths

RNA structural alignment  large RNA molecules also form characteristic tertiary structures  A recent method for pairwise structural alignment of RNA sequences implemented in the program FOLDALIGN  In low sequence identity cases

(1)

References 1. Hitomi Hasegawa and Liisa Holm: Advances and pitfalls of protein structural alignment, Current Opinion in Structural Biology 2009, 19:341– en.wikipedia.org/wiki/structural_alignment software 3. Cartwright, Kenneth V (Fall 2007). "Determining the Effective or RMS Voltage of Various Waveforms without Calculus". Technology Interface 8 (1): 20 pages 4. Anderson, H.L. (1986). "Metropolis, Monte Carlo and the MANIAC". Los Alamos Science 14: 96– Weisstein, Eric W., "Coordinate System" from MathWorld 6. Boris Abramovich Rozenfel ʹ d (1988). The history of non-euclidean geometry: evolution of the concept of a geometric space. Springer. p Cunningham, B. A.; Hemperly, J. J.; Hopp, T. P.; Edelman, G. M. (1979). "Favin versus concanavalin A: Circularly permuted amino acid sequences". Proceedings of the National Academy of Sciences of the United States of America 76 (7): 3218–3222

I am ready to answer your questions