Novel computational methods for large scale genome comparison PhD Director: Dr. Xavier Messeguer Departament de Llenguatges i Sistemes Informàtics Universitat.

Slides:



Advertisements
Similar presentations
Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
Advertisements

Parallel BioInformatics Sathish Vadhiyar. Parallel Bioinformatics  Many large scale applications in bioinformatics – sequence search, alignment, construction.
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller August 10, 2004.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Structural bioinformatics
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
Designing Multiple Simultaneous Seeds for DNA Similarity Search Yanni Sun, Jeremy Buhler Washington University in Saint Louis.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Whole Genome Alignment using Multithreaded Parallel Implementation Hyma S Murthy CMSC 838 Presentation.
Bioinformatics and Phylogenetic Analysis
Master Course MSc Bioinformatics for Health Sciences H15: Algorithms on strings and sequences Xavier Messeguer Peypoch (
Comparative ab initio prediction of gene structures using pair HMMs
Comparative Genomics Bio Informatics Scott Gulledge.
Discovery of RNA Structural Elements Using Evolutionary Computation Authors: G. Fogel, V. Porto, D. Weekes, D. Fogel, R. Griffey, J. McNeil, E. Lesnik,
Parallel Computation in Biological Sequence Analysis Xue Wu CMSC 838 Presentation.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
“Multiple indexes and multiple alignments” Presenting:Siddharth Jonathan Scribing:Susan Tang DFLW:Neda Nategh Upcoming: 10/24:“Evolution of Multidomain.
Master Course MSc Bioinformatics for Health Sciences H15: Algorithms on strings and sequences Xavier Messeguer Peypoch (
Sequence comparison: Local alignment
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Multiple Sequence Alignment
Masquerade Detection Mark Stamp 1Masquerade Detection.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
String Matching String matching: definition of the problem (text,pattern) depends on what we have: text or patterns Exact matching: Approximate matching:
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Genome Alignment. Alignment Methods Needleman-Wunsch (global) and Smith- Waterman (local) use dynamic programming Guaranteed to find an optimal alignment.
Amino Acid Scoring Matrices Jason Davis. Overview Protein synthesis/evolution Protein synthesis/evolution Computational sequence alignment Computational.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Interactive tools and programming environments for sequence analysis Bernardo Barbiellini Northeastern University TATACATAAAGACCCAAATGGAACTGTTCTAGA TGATACACTAGCATTAAGAGAAAAATTCGAAGA.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
Discovering the Correlation Between Evolutionary Genomics and Protein-Protein Interaction Rezaul Kabir and Brett Thompson
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Construction of Substitution Matrices
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Efficient multiple genome comparison Mario Huerta
BioPerf: A Benchmark Suite to Evaluate High- Performance Computer Architecture on Bioinformatics Applications David A. Bader, Yue Li Tao Li Vipin Sachdeva.
Chapter 3 Computational Molecular Biology Michael Smith
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Bioinformatics Computing 1 CMP 807 – Day 2 Kevin Galens.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Copyright OpenHelix. No use or reproduction without express written consent1.
1 Repeats!. 2 Introduction  A repeat family is a collection of repeats which appear multiple times in a genome.  Our objective is to identify all families.
Project Management Tools Meenakshi Lakshmikanthan 02/22/2010.
Introduction to Sequence Alignment. Why Align Sequences? Find homology within the same species Find clues to gene function Practical issues in experiments.
Bioinformatic PhD. course Bioinformatics Xavier Messeguer Peypoch ( LSI Dep. de Llenguatges i Sistemes Informàtics BSC Barcelona.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
WABI: Workshop on Algorithms in Bioinformatics
Sequence comparison: Local alignment
PatternHunter: faster and more sensitive homology search
Presentation transcript:

Novel computational methods for large scale genome comparison PhD Director: Dr. Xavier Messeguer Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Novel computational methods for large scale genome comparisons of related species DOCTORAL THESIS, 2008 Todd James Treangen Novel computational methods for large scale genome comparison Todd James Treangen The current wealth of available genomic data embodies a wide variety of species, spanning all domains of life, thus providing an unprecedented opportunity to compare and contrast evolutionary histories of closely and distantly related organisms. The identification of homologous DNA is a fundamental building block of comparative genomic and molecular evolution studies. To date, sequence alignment has proven to be a versatile tool for comparing closely and distantly related organisms, nucleotide at a time. For pairwise comparisons, optimal global and local alignment for a given scoring scheme requires O (m x n) time and space with respect to the lengths m,n of the genomes. This computational bottleneck becomes exponential for multiple genomes, O (n k ) for k genomes, making large scale multiple genome alignment a daunting task using traditional methods. The focus of this dissertation is on developing novel algorithms and software for efficient global and local comparison of multiple genomes and the application of these methods for a biologically relevant case study. The thesis research is organized following the main focus into three successive phases, specifically: (1) multiple genome alignment of closely related species, (2) local multiple alignment of interspersed repeats, and finally, (3) a comparative genomics case study of DNA uptake sequences (DUS) in Nesseria. In Phase 1, we first develop an efficient algorithm and data structure for maximal substring match search in multiple genome sequences. Our approach for searching for maximal unique substrings yields significant improvements, in both time and space, over existing methods when dealing with multiple genome sequences. Specifically, given S 1 … S m genome sequences (where S 1 is the smallest genome), we are able to find the maximal unique substrings among all of the sequence in linear time O(|S 1 |+ … + S m |) and linear space O(|S 1 |). We then implement these contributions into an interactive multiple genome comparison and alignment tool, M-GCAT, which can efficiently construct multiple genome comparison frameworks in closely related species. In Phase 2, we present a novel computational method for local multiple alignment of interspersed repeats. Our method for local alignment of interspersed repeats features a novel method for gapped extensions of chained seed matches, joining global multiple alignment with a homology test based on a hidden Markov model. We implement our method for local alignment of interspersed repeats into the open source procrastAlign software. In Phase 3, using the results from the previous two phases we perform a case study of neisserial genomes by tracking the propagation of repeat sequence elements in attempt to understand why the important pathogens of the neisserial group have sexual exchange of DNA by natural transformation. In conclusion, our global contributions in this dissertation have focused on comparing and contrasting evolutionary histories of related organisms via their genomes. We have proposed novel data structures, algorithms, and software for active areas of research in comparative genomics. We have experimentally demonstrated the efficiency and accuracy of the computational methods presented in this thesis for multiple alignment, and have implemented all data structures and algorithms in freely available software.