Sequence order independent structural alignment Joe Dundas, Andrew Binkowski, Bhaskar DasGupta, Jie Liang Department of Bioengineering/Bioinformatics,

Slides:



Advertisements
Similar presentations
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Advertisements

Structural bioinformatics
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
The Protein Data Bank (PDB)
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
Thomas Huber Computational Biology and Bioinformatics Environment ComBinE Department of Mathematics The University of Queensland.
Introduction to Bioinformatics Algorithms Sequence Alignment.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
From gene to protein. DNA:nucleotides are the monomers Proteins: amino acids are the monomers DNA:in the nucleus Proteins:synthesized in cytoplasm.
Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.
Pairwise & Multiple sequence alignments
By: Z. S. Rezaei. Structural comparison  Structural alignment  spectrum of structural alignment methods  The properties of output  Types of comparison.
Evolving Models of Biological Sequence Similarity Daniel P. Miranker The University of Texas at Austin [Chenetal98]
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Structural alignments of Proteins using by TOPOFIT method Vitkup D., Melamud E., Moult J., Sander C. Completeness in structural genomics. Nature Struct.
RNA carries DNA’s instructions.
Molecular Genetics DNA Structure  Nucleotides  Consist of a five-carbon sugar, a phosphate group, and a nitrogenous base 12.1 DNA: The Genetic Material.
A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.
Using Motion Planning to Study Protein Folding Pathways Susan Lin, Guang Song and Nancy M. Amato Department of Computer Science Texas A&M University
Chapter 13.2 Ribosomes and Protein Synthesis
Central Dogma of Molecular Biology From Wikipedia Edited by Jungho Kim.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
Order independent structural alignment of circularly permutated proteins T. Andrew Binkowski Bhaskar DasGupta  Jie Liang ‡ Bioengineering Computer Science.
Sequencing a genome and Basic Sequence Alignment
Protein Synthesis 6C transcription & translation.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
3.A.1 DNA and RNA Part IV: Translation DNA, and in some cases RNA, is the primary source of heritable information. DNA, and in some cases RNA, is the primary.
DDPIn Distance and Density Based Protein Indexing David Hoksza Charles University in Prague Department of Software Engineering Czech Republic.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
Pair-wise Structural Comparison using DALILite Software of DALI Rajalekshmy Usha.
Database Similarity Search. 2 Sequences that are similar probably have the same function Why do we care to align sequences?
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
2.1 Notes – Represent Relations and Functions
Protein Synthesis The process of protein synthesis is explained by the central dogma of molecular biology, which states that: DNA  RNA  Proteins How.
Solving and Analyzing Side-Chain Positioning Problems Using Linear and Integer Programming Carleton L. Kingsford, Bernard Chazelle and Mona Singh Bioinformatics.
Transcription and Translation. Central Dogma of Molecular Biology  The flow of information in the cell starts at DNA, which replicates to form more DNA.
OBJECTIVE 11 NOTES. Explain the evolutionary significance of a nearly universal genetic code.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
CSCI2950-C Lecture 12 Networks
Relations and Functions
4-6 Formulizing Relations and Functions
Identifying functions and using function notation
2.1 – Represent Relations and Functions.
Transcription and Translation
Solving Linear Systems by Graphing
Algorithmic Problems Related to Sequences and Phylogenetic Trees
8th Grade Math Presented by Mr. Laws
Genetics The Central Dogma
Classification: understanding the diversity and principles of
Relations and Functions
Warm up: The cell membrane allows certain things to pass through. We call this ________? Which organelle builds proteins? Which organelle packages and.
Transcription and Translation
Basics of Functions and Their Graphs
(Really) Basic Molecular Biology
Protein Synthesis.
Analysis and design of algorithm
It is the presentation about the overview of DOT MATRIX and GAP PENALITY..
Presentation transcript:

Sequence order independent structural alignment Joe Dundas, Andrew Binkowski, Bhaskar DasGupta, Jie Liang Department of Bioengineering/Bioinformatics, University of Illinois at Chicago

Background oExtended Central Dogma of molecular biology DNA  RNA  primary structure  3D structure  function oEvolution conserves the 3D structure more than amino acid sequence. oStructural similarity often reflects a common function or origin of proteins. [1] oIt is useful to classify proteins based on their structures. (SCOP, CATH, FSSP). oMany methods for structure alignment have been reported. (CE, DALI, FAST, Matchprot)

Circular Permutation oLigation of the N and C termini, and subsequent cleavage elsewhere. oIn 1979, first natural circular permutation was observed in favin vs. concanavalin A. [2] oIn 1983, the first engineered circular permutation was performed on bovine pancreatic trypsin inhibitor. [3] oSince, studies have shown that artificially permuted proteins are able to fold into a stable structures that are similar to the native protein. [4] oCircular permutations have been discovered in lectins, β-glucanases, swaposin… [5] Uliel S., Fliess A., Amir A., Unger R. (1999) [6] Uliel S., Fliess A., Unger R. (2001) [7]

Alignment Problem oMost structural alignment methods rely on the structural units of each protein to align sequentially i.e. CE, FAST. oSome newer methods will perform non-sequential alignments i.e. Dali, Matchprot. After explaining our method, will we compare the results against Dali and Matchprot.

Our Method We exhaustively fragment protein A and protein B into lengths ranging from 4 to 7 residues. Notation: fragment λ a = (a 1, a 2 ), where a 1 and a 2 are the beginning and ending positions relative to the N termini of protein A. Π a = {λ a,1, λ a,2,… λ a,n } is the set of all fragments from protein A. L a,i is the length of fragment Π a,I Each fragment from protein A is aligned to all fragments of protein B if L a,I = L b,j, forming a set of Aligned Fragment Pairs ( Λ Π a x Π b ). A similarity function σ maps Λ 

Similarity Function All Λ i with σ(Λ i ) > Threshold are used to create a conflict graph.

Conflict Graph Two fragment pairs Λ i and Λ j are in conflict if any residue in λ i,A is also in λ j,A or any residue in λ i,B is also in λ j,B. Conflicts can be found by a vertex sweep. Query Protein Residues Reference Protein Residues δ3δ3 δ2δ2 δ4δ4 δ1δ1 Simplified Example

LP Formulation No conflicting residues in query or reference protein. Consistency between variables All variables are between 0 and 1 x is a relaxed integer between 0 and 1 0 = don’t use fragment 1 = use fragment Solve using linear programming package Subject to:

Local Conflict Number LP will assign a number between 0 and 1 for each x δ. For each Λ compute a local conflict number Θ Define δ min as the vertex with the smallest local conflict number. Assign a new σ Remove all vertices with σ ≤ 0 from Λ and push them onto a stack Ω in descending order of σ δ4δ4 δ2δ2 δ1δ1 δ3δ3 δ4δ4 δ2δ2 δ1δ1 δ3δ3 σ(Λ 1 ) = 50 x Λ1 =.85 Θ Λ1 = 1.10 σ(Λ 2 ) = 20 x Λ2 =.25 Θ Λ2 = 1.46 σ(Λ 3 ) = 20 x Λ3 = 0.6 Θ Λ3 = 0.85 σ(Λ 4 ) = 15 x Λ4 = 0.01 Θ Λ4 = 0.26 δ min σ(Λ1) = 50 σ(Λ2) = 15 σ(Λ4) = 0 σ(Λ3) = 20

Repeat Repeat LP formulation until all vertices have been pushed onto the stack Ω. Begin with 5 empty alignments. While the stack is not empty, retrieve a aligned pair by popping the stack. Insert it into each non-empty alignment if and only if: 1.No residue conflicts occur. 2.The global RMSD does not change by some threshold. If it can not be inserted into any alignment, insert it into an available empty alignment. Determine which alignment with highest similarity score.

Results – Circular Permutation? 1jqsC 70s ribosome functional complex Fold: Ribosome & Ribosomal fragments 2pii PII (Product of glnB) Fold: Ferredoxin-like RMSD:

Results – Circular Permutation 1iudA Aspartate Racemase Fold: ATC-like 1h0rA Type II 3-dehydrogenate dehydralase Fold: Flavodoxin

Results 1fe0 ATX1 Metallochaperone Fold: ferredoxin-like 1vet Mitogen activated protein kinase

Results 1e50 Core binding factor Fold: Core binding factor beta 1pkv Riboflavin Synthase