Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute.

Slides:



Advertisements
Similar presentations
Gene Prediction: Similarity-Based Approaches
Advertisements

Unravelling the biochemical reaction kinetics from time-series data Santiago Schnell Indiana University School of Informatics and Biocomplexity Institute.
Bioinformatics (4) Sequence Analysis. figure NA1: Common & simple DNA2: the last 5000 generations Sequence Similarity and Homology.
Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
1 Applications of Dynamic Programming zTo sequence analysis Shotgun sequence assembly Multiple alignments Dispersed & tandem repeats Bird song alignments.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
Hidden Markov models for detecting remote protein homologies Kevin Karplus, Christian Barrett, Richard Hughey Georgia Hadjicharalambous.
15-20 september WABI031 A Method to Detect Gene Structure and Alternative Splice Sites by Agreeing ESTs to a Genomic Sequence Paola Bonizzoni Graziano.
Introduction to Bioinformatics
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Gene Prediction: Similarity-Based Approaches (selected from Jones/Pevzner lecture notes)
Structural bioinformatics
Clustered alignments of gene- expression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics.
HMM Sampling and Applications to Gene Finding and Alignment European Conference on Computational Biology 2003 Simon Cawley * and Lior Pachter + and thanks.
Sequence Similarity Searching Class 4 March 2010.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Gene Finding Charles Yan.
Comparative ab initio prediction of gene structures using pair HMMs
Reading Report Ce WANG A segment alignment approach to protein comparison.
Protein Modules An Introduction to Bioinformatics.
1 Convolution and Its Applications to Sequence Analysis Student: Bo-Hung Wu Advisor: Professor Herng-Yow Chen & R. C. T. Lee Department of Computer Science.
Similar Sequence Similar Function Charles Yan Spring 2006.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
“Multiple indexes and multiple alignments” Presenting:Siddharth Jonathan Scribing:Susan Tang DFLW:Neda Nategh Upcoming: 10/24:“Evolution of Multidomain.
Protein Structures.
Multiple sequence alignment
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
BLAST What it does and what it means Steven Slater Adapted from pt.
Structural Bioinformatics R. Sowdhamini National Centre for Biological Sciences Tata Institute of Fundamental Research Bangalore, INDIA.
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
COMPARATIVE or HOMOLOGY MODELING
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Variables: – T(p) - set of candidate transcripts on which pe read p can be mapped within 1 std. dev. – y(t) -1 if a candidate transcript t is selected,
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
1 The Interrupted Gene. Ex Biochem c3-interrupted gene Introduction Figure 3.1.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Sequencing a genome and Basic Sequence Alignment
Bioinformatics: Theory and Practice – Striking a Balance (a plea for teaching, as well as doing, Bioinformatics) Practice (Molecular Biology) Theory: Central.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Applied Bioinformatics Week 12. Bioinformatics & Functional Proteomics How to classify proteins into functional classes? How to compare one proteome with.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
EB3233 Bioinformatics Introduction to Bioinformatics.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Functional and Evolutionary Attributes through Analysis of Metabolism Sophia Tsoka European Bioinformatics Institute Cambridge UK.
Bioinformatics and Computational Biology
Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.
Russell Group, Protein Evolution _________ ____ Rob Russell Cell Networks University of Heidelberg Interactions and Modules: the how and why of molecular.
1 Applications of Hidden Markov Models (Lecture for CS498-CXZ Algorithms in Bioinformatics) Nov. 12, 2005 ChengXiang Zhai Department of Computer Science.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
bacteria and eukaryotes
A Very Basic Gibbs Sampler for Motif Detection
Pfam: multiple sequence alignments and HMM-profiles of protein domains
LSM3241: Bioinformatics and Biocomputing Lecture 4: Sequence analysis methods revisited Prof. Chen Yu Zong Tel:
Identify D. melanogaster ortholog
Protein Structures.
Protein structure prediction.
Connected Word Recognition
Applying principles of computer science in a biological context
MULTIPLE SEQUENCE ALIGNMENT
Sequence Analysis Alan Christoffels
Presentation transcript:

Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute

Outline A new look at the local structure prediction Network matching problem Practical issues Applications

GSDKKGNGVALMTTLFADN EEEEEE HHHHHHHHHHHHHH EEEEEE LLHHHHHHHHLLL LHHHHHLLLL LLLEEEEEEEEE LLLLL Description of local structure one or many answers? GSDKKGNGVALMTTLFADN LLHHHHHHHHLLLEEEEEE A prediction HHHHHHHHLLLLLHHHHHH Real structure

Motivation A natural description of local structures: keep the segment information of local structures Keep uncertainties in local structure predictions: drawbacks of prediction programs and intrinsic uncertainties of local structures in absence of global interactions Incorporating the protein local structure in protein sequence comparison may help to detect the distant homologies and to improve their alignments (for homology modeling)

Proteins are described as a network of PLSSs (predicted local structure segments)

Protein comparison problem is equivalent to a network matching problem Given two networks of PLSSs, find two optimal paths from the source to the sink in each of the networks, whose corresponding PLSSs are most similar to each other. It does not follow the typical position-by-position alignment mode

Solving the network matching problem: dynamic programming V(i,j) i j V(i 1,j 1 ) V(i 1,j 2 ) V(i 3,j 1 ) V(i 3,j 2 ) (i-1) 1 i1i1  (i-1) 3, (i-1) 4 i2i2

Example: (1e68A,1nkl) Each protein is represented as a collection of potentially overlapping and contradictory PLSSs (a network). SEA finds an optimal alignment between these two proteins Simultaneously, SEA identifies the optimal subset of PLSSs (a path in the network) describing each protein. 1e68A: Bacteriocin As-48 1nkl : Nk-lysin

General performance of SEA incorporating different local structure diversities

Keeping local structure diversity helps improve alignment quality alignment between -repressor from E.coli (1lliA) and 434 repressor (1r69)

Stable region Variable region Local structure information is crucial for improving alignments, especially in the more divergent regions 1esfA: straphylococcal enterotoxin 2tssA: toxic shock syndrome toxin-1

Practical issue: local structural prediction Searching I-site database (web-server or standalone program) Our solution: FragLib –using sensitive profile-profile alignment program FFAS to predict local structures

Applications Distant homology detection Local structure prediction Improving alignments for protein modeling

Reference A segment alignment approach to protein comparison (Bioinformatics, April issue) Web server

Related work Spliced sequence alignment –Gelfand et al., 1996, PNAS; Novichkov et al., 2001 –Assembling genes from alternative exons Jumping alignment –Spang R, Rehmsmeier M, Stoye J. JCB, 2002 –Computes a local alignment of a single sequence and a multiple alignment –The sequence is at each position aligned to one sequence of the multiple alignment (reference sequence) instead of a profile Partial order alignment –Lee C, Grasso C, Sharlow MF, Bioinformatics, 2002 –Multiple alignment

Acknowledgements Dariusz Plewczyński Iddo Friedberg Łukasz Jaroszewski Weizhong Li This project is supported by SPAM grant GM63208