Pharm 201 Lecture 10, 20091 Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne.

Slides:



Advertisements
Similar presentations
PhyCMAP: Predicting protein contact map using evolutionary and physical constraints by integer programming Zhiyong Wang and Jinbo Xu Toyota Technological.
Advertisements

PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
A 3-D reference frame can be uniquely defined by the ordered vertices of a non- degenerate triangle p1p1 p2p2 p3p3.
Structural bioinformatics
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity Nicholas M. Luscombe and Janet M. Thornton JMB (2002)
Department of Computer Science, University of California, Santa Barbara August 11-14, 2003 CTSS: A Robust and Efficient Method for Protein Structure Alignment.
Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.
Appendix: Automated Methods for Structure Comparison Basic problem: how are any two given structures to be automatically compared in a meaningful way?
The Protein Data Bank (PDB)
Similar Sequence Similar Function Charles Yan Spring 2006.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Model Database. Scene Recognition Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Or, What is a correspondence set anyway?! Topic 12 Chapter 16, Du and Bourne “Structural Bioinformatics”
Protein Structure Prediction and Analysis
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
IBGP/BMI 705 Lab 4: Protein structure and alignment TA: L. Cooper.
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Protein Tertiary Structure Prediction
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Structural alignments of Proteins using by TOPOFIT method Vitkup D., Melamud E., Moult J., Sander C. Completeness in structural genomics. Nature Struct.
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
Protein Sequence Alignment and Database Searching.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
1 Randomized Algorithms for Three Dimensional Protein Structures Comparison Yaw-Ling Lin Dept Computer Sci and Info Engineering, Providence University,
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Protein Structure Comparison. Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Chapter 3 Computational Molecular Biology Michael Smith
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
A data-mining approach for multiple structural alignment of proteins WY Siu, N Mamoulis, SM Yiu, HL Chan The University of Hong Kong Sep 9, 2009.
PROTEIN STRUCTURE SIMILARITY CALCULATION AND VISUALIZATION CMPS 561-FALL 2014 SUMI SINGH SXS5729.
Pair-wise Structural Comparison using DALILite Software of DALI Rajalekshmy Usha.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
EMBL-EBI MSDfold (SSM) A web service for protein structure comparison and structure searches Eugene Krissinel
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Jürgen Sühnel Supplementary Material: 3D Structures of Biological Macromolecules Exercise 1:
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
An Efficient Index-based Protein Structure Database Searching Method 陳冠宇.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Local Flexibility Aids Protein Multiple Structure Alignment Matt Menke Bonnie Berger Lenore Cowen.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
Chapter 14 Protein Structure Classification
Protein Structures.
Protein structure prediction.
Presentation transcript:

Pharm 201 Lecture 10, Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne Department of Pharmacology, UCSD Reading Chapter 16, Structural Bioinformatics

Pharm 201 Lecture 10, Consider this Course a Workflow Data InUnderstand the scope and complexity of the data Understand the experiment to understand the errors Understand how to best represent (model) the data Understand the methods to physically instantiate the model From initial analysis understand how to control data in Recognize redundancy In the data Classify the dataVisualize the data Analyze the data

Pharm 201 Lecture 10, From Last Time We established the complex relationship between: –Sequence and Structure –Structure and Structure –Structure and Function Today we analyze how the relationships between structure and structure are established

Pharm 201 Lecture 10, Agenda Understand why structure comparison is important Understand why it is not a solved problem Understand the basics of the methods used to address the problem Understand one method (CE) in more detail Review an example where structure comparison has revealed new biological insights

Pharm 201 Lecture 10, Why Structure Comparison is Important Reductionism – needed to classify protein structures Functional assignment and hopefully new biology Alignment of predicted structure against structural templates Establish improved sequence relationships not possible from sequence alone Protein engineering

Pharm 201 Lecture 10, Distinctions - Structure Superposition and Structure Comparison and Alignment are Different Structure superposition assumes you already know which atoms to superimpose – it merely optimizes for the atoms chosen (relatively simple) Structure alignment must first determine what atoms to align (difficult). We are concerned with alignment

Pharm 201 Lecture 10, Distinctions – Pair-wise Alignments are Different from Multiple Structure Alignments Multiple structure alignment algorithms are rare and of questionable quality (see for example Nucleic Acids Research (2004), 32 W100-W103 Multiple structure alignments should not be confused with multiple pair-wise alignments Here we focus on single pair-wise comparison and alignment

Pharm 201 Lecture 10, Why is it Not a Solved Problem?

Pharm 201 Lecture 10, Current State of the Art There are many papers published on this, but relatively few have code to download or Web sites from which to perform comparisons All methods can identify obvious similarities between two structures Remote similarities are detected by a subset of methods – different remote similarities are recognized by different methods Good alignments are much harder to come by Speed is a serious issue with some algorithms

Pharm 201 Lecture 10, Desirables Biologically meaningful alignments not just geometrically meaningful Complete database of all alignments Ability to apply to structures not in the PDB

Pharm 201 Lecture 10, Maintain 9 of 10 interactions RMSD 1.5 Å Maintain 5 of 10 interactions RMSD 0.5 Å Biological vs Geometric Alignments Plastocyanin versus Azurin (from Godzik 1996)

Pharm 201 Lecture 10, Literature Alignments - Flavodoxin vs Che Y Protein From Godzik (1996) Protein Science, 5,

Pharm 201 Lecture 10, Understand the basics of the methods used to address the problem

Pharm 201 Lecture 10, See also

Pharm 201 Lecture 10, How to Compare Structures? Structure 1 Structure 2 Feature extraction Structure description 1Structure description 2 Comparison algorithm Scores Statistical significance Similarity, classification

Pharm 201 Lecture 10, Components of Structure Alignment 1.Structure Description Local geometry Side chain contacts Geometric hashing Distance matrix (Dali, 1993) Properties (secondary structure, hydrophobic clusters (Comparer, 1990) Secondary structure elements (VAST, 1996) Distances of inter & intra aligned fragment pairs (CE, 1998) Contact map (Celera, 2004) Geometry invariants (Jia et al, 2004)

Pharm 201 Lecture 10, Components of Structure Alignment 2. Alignment algorithms –Monte Carlo (Dali, VAST) – Heuristics (CE) – Dynamic Programming (CE) – Probabilistic 3.Statistical significance

Components of Structure Alignment 2. Alignment algorithms  Dynamic programming, Integer programming, Monte Carlo… 3. Statistical significance  Levitt and Gerstein, PNAS, 1998  Random Model and CE scoring function (Jia et al, 2004)  Input & output of alignment algorithm Input: two proteins: and Output: An alignment and scores Constraints: min rmsd: max L min Gaps: 18

Pharm 201 Lecture 10, Understand one method (CE) in more detail I.N. Shindyalov and P.E. Bourne Protein Engineering 1998, 11(9) Protein Structure Alignment by Incremental Combinatorial Extension of the Optimum Path. [PDF File] 793 citations![PDF File]

Pharm 201 Lecture 10, Basic Approach Compare octameric fragments – an aligned fragment pair (AFP) (local alignments) Stitch together AFPs Find the optimal path through the AFPs Optimize the alignment through dynamic programming Measure the statistical significance of the alignment

Pharm 201 Lecture 10, Why This Approach? Alignment Space is Very Large and Must be Constrained Without Loosing Meaningful Alignments Similarity Matrix S where: S=(n A -m).(n B -m) m = Length of AFP n A = Length of protein A This is very large to compute – constraints are needed

Pharm 201 Lecture 10,

Pharm 201 Lecture 10, Definition of the Alignment Path p A i = AFPs starting residue position in protein A at the ith position of the alignment path m = longest continual path – set as 8 One of the conditions (1)-(3) should be satisfied for 2 consecutive AFPs i and i+1 in the path (1)= 2 consecutive AFPs aligned without gaps (2)= Two consecutive AFPs with a gap in protein A (3)= Two consecutive AFPs with a gap in protein B

Pharm 201 Lecture 10, Extension of the Alignment Path Gap sizes are limited to G – heuristically set as 30 residues

Pharm 201 Lecture 10, Distance calculated from independent set of inter-residue distances where each distance is used only once - used for combinations of 2 AFPs 2. Full set of inter-residue distances - used for a single AFP 3. RMSD from least squares superposition - used to select few best fragments Evaluation based upon the following three distance similarity measures

Pharm 201 Lecture 10, Distance calculated from independent set of inter-residue distances where each distance is used only once 2. Full set of inter-residue distances 3. RMSD from least squares superposition Evaluation Based Upon the Following Three Distance Similarity Measures

Pharm 201 Lecture 10, How to Extend the Path? 1. Consider all possible AFPs that extend the path 2. Consider only the best AFP 3. Use some intermediate strategy

Pharm 201 Lecture 10, How to Extend the Path? 1. Consider all possible AFPs that extend the path Computationally expensive 2. Consider only the best AFP Works well with the right heuristics 3. Use some intermediate strategy

Pharm 201 Lecture 10, What Heuristics? Candidate AFPs are based upon (9) D 0 = 3Å The best AFP is based upon (10) D 1 = 4Å The decision to extend or terminate the path is based upon (11)

Pharm 201 Lecture 10, Z-Score Evaluate the probability of finding an alignment path of the same length or smaller gaps and distance from a random set of non-redundant structures

Pharm 201 Lecture 10, The 20 best alignments with a Z score above 3.5 are assessed based on RMSD and the best kept. This produces approx. one error in 1000 structures Each gap in this alignment is assessed for relocation up to m/2 Iterative optimization using dynamic programming is performed using residues for the superimposed structures Optimization of the Final Path

Pharm 201 Lecture 10, Test Case: Phycocyanin versus Colicin A

Pharm 201 Lecture 10, Cyclin-dependent kinases Open (purple) Closed (blue) Pavelitch et al. (1997)

Pharm 201 Lecture 10, Limitations Will not find non- topological alignments (outside the bounds of the dotted lines) What are the correct “units” to be comparing? CE works on chains – as we shall see in future weeks domains are the correct units, but definition of the domains is not straightforward

Pharm 201 Lecture 10, Took 11,748 chain in the PDB (1/98) Computed for 1868 representatives 24,000 Cray T3E processor hours Loaded pairwise alignments into database Computation of All x All

Pharm 201 Lecture 10, Years Ago 40,000 proteins ~ 70,000 chains 70,000 2 /2 * 30 seconds = 2330 yrs Options: –Use a redundant set of chains –Use parallel architectures D. Pekurovsky, I.N. Shindyalov, P.E. Bourne 2004 High Throughput Biological Data Processing on Massively Parallel Computers. A Case Study of Pairwise Structure Comparison and Alignment Using the Combinatorial Extension (CE) Algorithm. Bioinformatics, 20(12) [PDF].[PDF].

Now Using egrid to compute all by all for CE and FatCat Pharm 201 Lecture 10,

Pharm 201 Lecture 10, One Criteria for Redundancy Remove highly homologous chains; The RMSD between two chains is less than 2Å; The length difference between two chains is less than 10%; The number of gap positions in alignment between two chains is less than 20% of aligned residue positions; At least 2/3 of the residue positions in the represented chain are aligned with the representing chain.

Pharm 201 Lecture 10, Review example where structure comparison has revealed new biological insights

Pharm 201 Lecture 10, Example CE revealed putative Ca++ binding domain in acetylcholinesterase Sequence similarity to neuroligins predicts Ca++ binding too – confirmed experimentally Members of the a/b hydrolase family bind Ca++ which may be important for heterologous cell associations Structural similarity between Acetylcholinesterase and Calmodulin found using CE (Tsigelny et al, Prot Sci, 2000, 9:180)

Pharm 201 Lecture 10, The Future (also a general rule) Gold standards are important For structure comparison a human generated alignment standard is important Algorithms are then challenged to meet the standard Eventually those algorithms highlight problems with the standard The cycle continues