Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Slides:



Advertisements
Similar presentations
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Advertisements

PCA + SVD.
Measuring the degree of similarity: PAM and blosum Matrix
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Structural bioinformatics
x – independent variable (input)
Protein Structure Alignment Human Myoglobin pdb:2mm1 Human Hemoglobin alpha-chain pdb:1jebA Sequence id: 27% Structural id: 90% Another example: G-Proteins:
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
Protein Structure, Databases and Structural Alignment
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
Mossbauer Spectroscopy in Biological Systems: Proceedings of a meeting held at Allerton House, Monticello, Illinois. Editors: J. T. P. DeBrunner and E.
Appendix: Automated Methods for Structure Comparison Basic problem: how are any two given structures to be automatically compared in a meaningful way?
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Similar Sequence Similar Function Charles Yan Spring 2006.
A unified statistical framework for sequence comparison and structure comparison Michael Levitt Mark Gerstein.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Model Database. Scene Recognition Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.
Protein Structures.
Radial Basis Function Networks
October 8, 2013Computer Vision Lecture 11: The Hough Transform 1 Fitting Curve Models to Edges Most contours can be well described by combining several.
IBGP/BMI 705 Lab 4: Protein structure and alignment TA: L. Cooper.
Protein Tertiary Structure Prediction
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2014.
Structural alignments of Proteins using by TOPOFIT method Vitkup D., Melamud E., Moult J., Sander C. Completeness in structural genomics. Nature Struct.
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
COMPARATIVE or HOMOLOGY MODELING
October 14, 2014Computer Vision Lecture 11: Image Segmentation I 1Contours How should we represent contours? A good contour representation should meet.
CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Protein Structure Comparison. Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Conformational Space.  Conformation of a molecule: specification of the relative positions of all atoms in 3D-space,  Typical parameterizations:  List.
Parameter estimation. 2D homography Given a set of (x i,x i ’), compute H (x i ’=Hx i ) 3D to 2D camera projection Given a set of (X i,x i ), compute.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2013.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Pharm 201 Lecture 10, Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Pair-wise Structural Comparison using DALILite Software of DALI Rajalekshmy Usha.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
Sequence Alignment.
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
Jürgen Sühnel Supplementary Material: 3D Structures of Biological Macromolecules Exercise 1:
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
Topics in bioinformatics CS697 Spring 2011 Class 12 – Mar Molecular distance measurements Molecular transformations.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
Lab Meeting 10/08/20041 SuperPose: A Web Server for Automated Protein Structure Superposition Gary Van Domselaar October.
Semantic Alignment Spring 2009 Ben-Gurion University of the Negev.
Local Flexibility Aids Protein Multiple Structure Alignment Matt Menke Bonnie Berger Lenore Cowen.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
CSE 554 Lecture 8: Alignment
Chapter 14 Protein Structure Classification
Protein Structure Comparison
Computational Structure Prediction
Finding Functionally Significant Structural Motifs in Proteins
Protein Structures.
Structural Analysis of Ligand Stimulation of the Histidine Kinase NarX
Protein structure prediction
Presentation transcript:

Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”

Why? A.Study the conformational changes of the same protein with or without ligands -- Same protein sequences B.Study the effect of mutations on protein structure -- Highly similar protein sequences C.Assessment of protein structure prediction. -- How accurate is the predicted models? -- Same protein sequences D.Remote homolog detection. Structures generally are preserved better than sequences over the course of evolution. e.g. myoglobin and  -hemoglobin are homologous and have similar structures, but the sequence identity can be as low as 8.5%! E. Classification of protein folds

Structures may align well even if there sequence similarity is low. For example, an optimal superposition of myoglobin and beta-hemoglobin, which are structural neighbors. However, their sequence identity is only 8.5%! Why? Structure conservation > sequence conservation

Receiver Operating Characteristic Why? Structure conservation > sequence conservation Chothia and Lesk False positive rate (%)

ROC experiment: - For each pair P of proteins in dataset, perform alignment and record score: S(P) - Rank all pairs according to their scores, from highest to lowest. - Scan ranked pairs, and record rate of true positives and true negatives. Receiver Operating Characteristic ASIDE: Making sense of a ROC curve False positive rate (%)

ASIDE: Making sense of a ROC curve 1.00Yes 0.99Yes 0.98Yes 0.97Yes 0.96No 0.95No 0.93Yes 0.91Yes 0.89No 0.87No 0.85No 0.83No 0.83Yes 0.81No 0.77No 0.74No 0.73No 0.70No 0.69No 0.67Yes 0.62No 0.56No 0.54No 0.53No Prediction Benchmark (%)

Alignment vs. Superposition Structural alignment attempts to establish homology between two or more polymer structures based on their shape and 3D structure. Structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Conversely, simple structural superposition uses knowledge of at least some equivalent residues to guide a rigid body superposition. The most basic possible comparison between protein structures makes no attempt to align the input structures. Requires a precalculated alignment as input to determine which of the residues in the sequence are intended to be considered in the RMSD calculation.

Structural superposition of two CheY orthologs In pairwise structure superposition, a correspondence set of residue pairs is established by a pairwise sequence alignment.

Superposition algorithms optimize the orientation and spatial position of the two molecules with respect to each other. Superposition usually starts with a sequence comparison, which establishes the one-to-one relationships between pairs of atoms from which the RMSD is computed. This is typically a good assumption at appreciable pairwise sequence identity, but breaks down in the Twilight Zone. Once atom-to-atom relationships between two structures are established, the task of the algorithm is to achieve an optimal superposition with the smallest possible RMSD. It is usually impossible to achieve perfect overlap of all atoms pairs even for structures with 100% identical sequence. Overlaying one pair of atoms perfectly may push another pair of atoms further apart. Also, as in sequence alignment, there is a friction between global vs. local matching that must be considered. Pairwise structure superposition

Global alignment Images and content from Patrice Koehl at UCDavis Global similarity ≠ local similarity

Local alignment Structural motif Images and content from Patrice Koehl at UCDavis Global similarity ≠ local similarity

Choosing an appropriate description of structure Structure comparisons can be done at several different levels  Individual atoms --disadvantages?  Residue positions, which can be specified by the coordinates of C , C , and the center of mass of the side-chains What are advantages and disadvantages of using different residue representations?  Small fragments  Secondary structure elements (SSE)

Choosing an appropriate description of structure Only when the structures to be aligned are highly similar or even identical is it meaningful to align side-chain atom positions. --In which case the RMSD reflects not only the conformation of the protein backbone but also the rotameric states of the side chains. Other comparison criteria that reduce noise and bolster positive matches include: --Secondary structure assignment --Native contact maps or residue interaction patterns --Measures of side chain packing --Measures of hydrogen bond retention

Contact map

Structure superposition requires minimizing the error within the framework of some object function. Which one? Torsion angle comparison Distance matrices Structure superposition (RMSD, TM-score, etc.)  Most obvious & common Secondary structure superposition (SHEBA) This decision must also be made for structure alignment since superposition is used (many times over) in the harder problem. Choosing an object function to extremize

Torsion angles (  ) are: - local by nature - invariant upon rotation and translation of the molecule - compact (O(n) angles for a protein of n residues) Add 1 degree To all  But… Images and content from Patrice Koehl at UCDavis Torsion angles

Images and content from Patrice Koehl at UCDavis Advantages - invariant with respect to rotation and translation - can be used to compare proteins Disadvantages - the distance matrix is O(n 2 ) for a protein with n residues - comparing distance matrices is a difficult problem - insensitive to chirality Distance matrices

Scoring DM similarity (or in this case, contact map)

Introduce a gap Scoring DM similarity (or in this case, contact map) In superposition, gap location is defined by an alignment! In alignment, different gap positions are tried till the best overlap is identified.

The most common parameter that expresses the difference between two protein structures is RMSD, or root mean squared deviation (distance), in atomic positions between the two structures. RMSD can be calculated as a function of all atoms or as a function of some subset of the atoms, such as the backbone or CA atoms. Using a subset of the protein atoms is common because it is likely that, when two protein structures are compared, they will not be identical to each other in sequence, and therefore the only atoms between which one-to-one comparison in position can be made will be the backbone atoms. Root mean squared deviation (RMSD)

d5d5 d4d4 d3d3 d2d2 d1d1 RMSD calculation The two structures must first be superimposed to calculate a meaningful RMSD value because they are currently in different coordinate systems !!!

d5d5 d4d4 d2d2 d1d1 RMSD calculation (with a gap) Blue1 – 2 – 3 – 4 – 5 Red1 – 2 – x – 4 - 5

Estimating RMSD by averaging distances generally gets better as the correspondence set size increases. However, RMSD must always be greater than. RMSD vs. average D as a function of n

Using RMSD to find the optimal superposition

Superposition is too complicated for manual optimization

Simplified problem (compared to structure alignment): we know the correspondence between set A and set B. We wish to compute the rigid transformation T that best align a 1 with b 1, a 2 with b 2, …, a N with b N The error to minimize is defined above. Old problem, solved in Statistics, Robotics, Medical Image Analysis, etc. Images and content from Patrice Koehl at UCDavis Using RMSD to find the optimal superposition

A rigid-body transformation T is a combination of a translation t and a rotation R, thus: T(x) = Rx + t. The quantity to be minimized is: The algorithm includes a fair amount of linear algebra (and a little bit of calculus) that is outside the scope of this class. Believe it or not, the algorithm is O(n)! Images and content from Patrice Koehl at UCDavisRepresentation of 6 “trivial” DOF Using RMSD to find the optimal superposition

Pseudocode: Superposition algorithm in reality 1.)Define error function (RMSD) 2.)Determine correspondence set (pairwise sequence alignment) 3.)Translation = align centers of mass (COM) 4.)Rotation = use matrix methods to solve for rotation that minimizes the error function (variety of methods available) 5.)Evaluate the resultant superposition 6.)Refine the superposition (b/c COM to COM may not be best translation) 7.)Iterate till convergence Using RMSD to find the optimal superposition

) Generate pairwise alignment ) Find optimal superimposition - Translation Back to our toy model… - Rotation

Sequence identity = 83% RMSD = 1.0 Å Superposition of a pair of CuZnSOD structures

Sequence identity = 83% RMSD = 1.0 Å Superposition of a pair of CuZnSOD structures

= 68%  35% = 1.6 Å  0.6 Å Superposition of several CuZnSOD structures

Ligand free Complexed with trifluoperazine Global vs. local superposition in Calmodulin

Global alignment: RMSD =15 Å (143 residues) Local alignment: RMSD = 0.9 Å (62 residues) Global vs. local superposition in Calmodulin

RMSD = 0.0 Å Aligned = 95 Z-score = 17.3 RMSD = 0.0 Å Aligned = 101 Z-score = 18.4 RMSD = 0.0 Å Aligned = 40 Z-score = 3.7 By itself, RMSD is not a very useful error function For example, consider a series of fragments all generated from the blue structure…

Up-weighting secondary structure, etc. Based on the assumption that that secondary structure elements should match-up better than coil, we can easily modify the RMSD calculation to reflect that. That is, a multiplier is applied (where x 1 > x 2 ) to up-weight the important stuff. For example, assuming the red dots correspond to secondary structures in the figure above, RMSD’ < RMSD, which might be expected to be a more accurate reflection of the similarity between the pair.

Template Modeling Score (TM-score) The TM-score is a measure of similarity between two protein structures with different tertiary structures, which is intended as a more accurate measure of the quality of full- length protein structures than the often used RMSD measures. The TM-score indicates the difference between two structures by a score between (0,1], where 1 indicates a perfect match between two structures. Generally scores below 0.20 corresponds to randomly chosen unrelated proteins whereas structures with a score higher than 0.5 assume roughly the same fold. The TM-score is designed to be independent of protein lengths. d o = Normalization factor d i = Distance between i-th residue pair L xxx = Lengths of target protein and alignment Y. Zhang, J. Skolnick, Scoring function for automated assessment of protein structure template quality, Proteins, :

RMSD vs TM-score RMSD: 12.1Å TM-score:0.81 RMSD:12.5Å TM-score:0.22 Images from Dr. Zhang at KU