Identification of Domains using Structural Data Niranjan Nagarajan Department of Computer Science Cornell University.

Slides:



Advertisements
Similar presentations
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Advertisements

. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Todd J.Taylor, Iosif I.Vaisman Abstract: A method of protein structural domain assignment using an Ising/Potts-like.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 CHAPTER 4 - PART 2 GRAPHS 1.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.
1 “INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN” Lecture 4 Multiple Sequence Alignment Doç. Dr. Nizamettin AYDIN
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
CISC667, F05, Lec21, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction 3-Dimensional Structure.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
Whole Genome Alignment using Multithreaded Parallel Implementation Hyma S Murthy CMSC 838 Presentation.
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
CS262 Lecture 9, Win07, Batzoglou Multiple Sequence Alignments.
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
The 7 steps of Homology modeling. 1: Template recognition and initial alignment.
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
Clustering Color/Intensity
Tertiary protein structure modelling May 31, 2005 Graded papers will handed back Thursday Quiz#4 today Learning objectives- Continue to learn how to manipulate.
The Domain Structure of Proteins: Prediction and Organization. Golan Yona Dept. of Computer Science Cornell University (joint work with Niranjan Nagarajan)
Dali: A Protein Structural Comparison Algorithm Using 2D Distance Matrices.
Improving Code Generation Honors Compilers April 16 th 2002.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Structures.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Developing Pairwise Sequence Alignment Algorithms
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
Distributed Asynchronous Bellman-Ford Algorithm
Structure superposition ≠ Structure alignment Lecture 11 Chapter 16, Du and Bourne “Structural Bioinformatics”
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.
ALIGNMENT OF 3D ARTICULATE SHAPES. Articulated registration Input: Two or more 3d point clouds (possibly with connectivity information) of an articulated.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Order independent structural alignment of circularly permutated proteins T. Andrew Binkowski Bhaskar DasGupta  Jie Liang ‡ Bioengineering Computer Science.
Construction of Substitution Matrices
Using Traveling Salesman Problem Algorithms to Determine Multiple Sequence Alignment Orders Weiwei Zhong.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Basic Algorithms and Software for the Layout Problem
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
Construction of Substitution matrices
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
Identifying Ethnic Origins with A Prototype Classification Method Fu Chang Institute of Information Science Academia Sinica ext. 1819
Multiple Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 13, 2004 ChengXiang Zhai Department of Computer Science University.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
Sequence Alignment. Assignment Read Lesk, Problem: Given two sequences R and S of length n, how many alignments of R and S are possible? If you.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
Chapter 14 Protein Structure Classification
Multiple sequence alignment (msa)
BLAST.
Protein Structures.
3-Dimensional Structure
Nadine Keller, Jiří Mareš, Oliver Zerbe, Markus G. Grütter  Structure 
Presentation transcript:

Identification of Domains using Structural Data Niranjan Nagarajan Department of Computer Science Cornell University

Assorted Definitions of Domains Subsequences that can fold independently into a stable structure. Structurally compact substructures. Functionally well-defined building blocks. Evolutionarily conserved and reused fragments.

Protein Structural Domain Identification William R. Taylor

Basic Algorithm Initial Assignment of Labels –Sequential residue numbering Update of Labels Termination Condition –Mean squared deviation of average between successive cycles (length of protein)/2

Update Formula S i t+1 = S i t + step(t+1)*sign(   j f(S i t, S j t ))  i. sign(x) = 1 if x > 0, -1 if x < 0, 0 if x = 0. f(S i t, S j t ) = –r/d ij if S j t > S i t and d ij < r. –-r/d ij if S j t < S i t and d ij < r. –0 otherwise. Step(x) = –1 if x < N/2. –2(N-x)/N if N/2 <= x < N. –0 otherwise.

Example Full lines indicate protein backbone. Neighboring residues within radius r are connected by dashed lines. Connections between i and i + 2 have been omitted for clarity. Label evolution is done without inverse distance weighting.

Refinements Median based smoothing with a window size of 21 to reclaim short loops of 10 or less residues. Small domains reassigned by using the weighted mean values of its neighbors (weights are given using f.) Domain recalculation repeated for at most five times.

Preserving  -sheets Matrix B of possible  -sheet interactions between residues generated based on distance data and heuristics. Weighted mean heuristic used to generate initial assignment of labels with the averaging being iterated to convergence. Post-processing also done to badly broken  -sheets.

Self-testing with fake homologs Fake homologs generated by smoothing –Replacing central atom of triple by average. –Process repeated five times. Domain assignments compared and similarity evaluated based on overlap score. r optimized for best overlap score.

Extension to Multiple Structures Algorithm is simultaneously run on structures corresponding to a multiple sequence alignment. Labels are synchronized to the average of the labels at a position after each iteration.