Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.

Slides:



Advertisements
Similar presentations
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Advertisements

Multiple Sequence Alignment
Heuristic alignment algorithms and cost matrices
Reduced Support Vector Machine
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
Appendix: Automated Methods for Structure Comparison Basic problem: how are any two given structures to be automatically compared in a meaningful way?
The Protein Data Bank (PDB)
Protein Tertiary Structure Comparison Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia.
Protein threading Structure is better conserved than sequence
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Geometric Crossovers for Supervised Motif Discovery Rolv Seehuus NTNU.
Supplementary material Figure S1. Cumulative histogram of the fitness of the pairwise alignments of random generated ESSs. In order to assess the statistical.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Or, What is a correspondence set anyway?! Topic 12 Chapter 16, Du and Bourne “Structural Bioinformatics”
Protein Structure Prediction and Analysis
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
IBGP/BMI 705 Lab 4: Protein structure and alignment TA: L. Cooper.
Protein Tertiary Structure Prediction
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Structural alignments of Proteins using by TOPOFIT method Vitkup D., Melamud E., Moult J., Sander C. Completeness in structural genomics. Nature Struct.
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
1 Randomized Algorithms for Three Dimensional Protein Structures Comparison Yaw-Ling Lin Dept Computer Sci and Info Engineering, Providence University,
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Protein Structure Comparison. Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem.
Chapter 3 Computational Molecular Biology Michael Smith
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein Strucure Comparison Chapter 6,7 Orengo. Helices α-helix4-turn helix, min. 4 residues helix3-turn helix, min. 3 residues π-helix5-turn helix,
DALI Method Distance mAtrix aLIgnment
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Pharm 201 Lecture 10, Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne.
DDPIn Distance and Density Based Protein Indexing David Hoksza Charles University in Prague Department of Software Engineering Czech Republic.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Pair-wise Structural Comparison using DALILite Software of DALI Rajalekshmy Usha.
Lecture 7 CS5661 Heuristic PSA “Words” to describe dot-matrix analysis Approaches –FASTA –BLAST Searching databases for sequence similarities –PSA –Alternative.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
EMBL-EBI MSDfold (SSM) A web service for protein structure comparison and structure searches Eugene Krissinel
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
Step 3: Tools Database Searching
Computational Biology, Part C Family Pairwise Search and Cobbling Robert F. Murphy Copyright  2000, All rights reserved.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Multiple Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 13, 2004 ChengXiang Zhai Department of Computer Science University.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Lab Meeting 10/08/20041 SuperPose: A Web Server for Automated Protein Structure Superposition Gary Van Domselaar October.
An Efficient Index-based Protein Structure Database Searching Method 陳冠宇.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
Bioinformatics Overview
Chapter 14 Protein Structure Classification
Protein Structure Comparison
Multiple sequence alignment (msa)
Classification: understanding the diversity and principles of
Protein structure prediction.
DALI Method Distance mAtrix aLIgnment
Protein Structural Classification
Presentation transcript:

Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison

Lecture 11 CS5662 Motivation Given structures A and B, are they similar? –Implications: A and B might share the same set of functions Given structure A, is a similar structure already known? –Implications: Each new experimentally solved structure can be placed in context of existing body of structural knowledge

Lecture 11 CS5663 Concepts For n sequences and s corresponding structural equivalence classes, n >> s. Possible reasons: –Structural divergence is slower than sequence divergence in evolution (à la RNA sequence alignment) –Convergent evolution: Some structures are preferred for a functional reason –Coincidence: Only so many structures are possible, for a given threshold of similarity Terms used to describe structure –Architecture, Class, Fold, Super-family, Family……

Lecture 11 CS5664 Superposition versus Alignment Structural superposition versus structural alignment –Superposition Residue correspondence already known, based on a statistically significant sequence alignment Problem is that of finding optimal correspondence between two sets of points, given subset of equivalent points between the two sets Optimality measured by lowest value of Root Mean Square Deviation (RMSD) –Alignment Residue correspondence unknown Need structure-based scoring function and evaluate this for all possible superposition of structures Optimal solution frequently impractical because of high complexity (NP-hard, Why?)

Lecture 11 CS5665 Heuristic Structure Alignment General strategy –Summarized/reduced representation of each structure Consider only subsets of atoms (Just C  or C  ) Use summarized vectors to represent organized sub- structures –Approaches Dynamic programming with empirical scoring functions Distance-matrix correspondence in internal co- ordinate space

Lecture 11 CS5666 Heuristic Structure Alignment Vector based strategies –VAST (Vector Alignment Search Tool): Compare summarized vector representations –SSAP (Secondary Structure Alignment Program): Compare nearest-neighbor vectors by double dynamic programming Distance matrix comparisons (à la Dot-matrix) –DALI (Distance Alignment Tool): Subset of internal coordinate space

Lecture 11 CS5667 VAST alignment (Fig ) Use only subset of atom coordinates Replace atom coordinates with vector coordinates corresponding to secondary structural elements (“structural words”) Compare sets of vectors to assess similarity

Lecture 11 CS5668 Double dynamic programming (SSAP/CATH Fig ) “First level:” –Represent each residue by neighborhood vector for C  –Compare n versus m neighborhood vectors –Generate optimal alignment based on vector differences and dynamic programming “Second Level:” –Add matrix scores if paths cross in a cumulative matrix –Generate optimal alignment based on the cumulative matrix

Lecture 11 CS5669 Distance matrix based alignment (DALI Fig ) Generate dot-matrix of inter- C  distances, using threshold Pick out secondary structure elements based on matrix patterns Compare two matrices to generate structural alignment

Lecture 11 CS56610 Structure Comparison Databases Several databases (CATH, MMDB, FSSP) maintain a hierarchical classification of known structures, based on pair-wise structural alignment scores High complexity of the algorithms requires incremental additions Actual classification is algorithm-dependent, with some consensus, but significant differences exist

Lecture 11 CS56611 Summary Sequence similarity (> 50% identity) implies structural similarity. Converse not necessarily true (evolutionary convergence/information convergence) Structural similarity algorithms are heuristic ways to assess structural similarity – independent of sequence similarity Structural variation is smaller than that suggested by the number of possible sequences