A new way of seeing genomes Combining sequence- and signal-based genome analyses Maik Friedel, Thomas Wilhelm, Jürgen Sühnel FLI Introduction: So far,

Slides:



Advertisements
Similar presentations
Markov models and applications
Advertisements

What makes an image memorable?
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
GenomePixelizer - a visualization tool for comparative genomics within and between species. A. Kozik, E. Kochetkova, and R. Michelmore (Department of Vegetable.
Genomic Repeat Visualisation Using Suffix Arrays Nava Whiteford Department of Chemistry University of Southampton
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Molecular Evolution Revised 29/12/06
Finding Eukaryotic Open reading frames.
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Sequence Similarity Searching Class 4 March 2010.
How many transcripts does it take to reconstruct the splice graph? Introduction Alternative splicing is the process by which a single gene may be used.
Finding approximate palindromes in genomic sequences.
Copyright OpenHelix. No use or reproduction without express written consent1.
SNP Genotyping Without Probes by High Resolution Melting of Small Amplicons Robert Pryor 1, Michael Liew 2 Robert Palais 3, and Carl Wittwer 1, 2 1 Dept.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Computational Biology, Part 2 Representing and Finding Sequence Features using Consensus Sequences Robert F. Murphy Copyright  All rights reserved.
Algorithm Animation for Bioinformatics Algorithms.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Computational Biology, Part 4 Protein Coding Regions Robert F. Murphy Copyright  All rights reserved.
Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.
The Human Genome (Harding & Sanger) * *20  globin (chromosome 11) 6*10 4 bp 3*10 9 bp *10 3 Exon 2 Exon 1 Exon 3 5’ flanking 3’ flanking 3*10 3.
NGS Analysis Using Galaxy
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
SUPERVISED NEURAL NETWORKS FOR PROTEIN SEQUENCE ANALYSIS Lecture 11 Dr Lee Nung Kion Faculty of Cognitive Sciences and Human Development UNIMAS,
Locating genes in Plasmodium falciparum You have seen how artemis is used to view, analyse and annotate bacterial genomes, but now we are going to move.
CSE 6406: Bioinformatics Algorithms. Course Outline
Markov Chain Models BMI/CS 576 Fall 2010.
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
SSAHA, or Sequence Search and Alignment by Hashing Algorithm, is used mainly for fast sequence assembly, SNP detection, and the ordering and orientation.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Hash Algorithm and SSAHA Implementations Zemin Ning Production Software Group Informatics.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Sackler Medical School
Maik Friedel, Swetlana Nikolajewa, Thomas Wilhelm Theoretical Systems Biology, FLI-Jena, Germany Codons and the reverse codons.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
MS Sequence Clustering
The UCSC Table Browser & Custom Tracks Advanced searching and discovery using the UCSC Table Browser and Custom Tracks Osvaldo Graña CNIO Bioinformatics.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Copyright OpenHelix. No use or reproduction without express written consent1.
Markov Chain Models BMI/CS 576 Colin Dewey Fall 2015.
Motif Search and RNA Structure Prediction Lesson 9.
Maik Friedel, Thomas Wilhelm, Jürgen Sühnel FLI-Jena, Germany Introduction: During the last 10 years, a large number of complete.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Copyright OpenHelix. No use or reproduction without express written consent1.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Finding genes in the genome
PROTEIN SYNTHESIS DECEMBER 13, 2010 CAPE BIOLOGY UNIT 1 MRS. HAUGHTON.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
1 Repeats!. 2 Introduction  A repeat family is a collection of repeats which appear multiple times in a genome.  Our objective is to identify all families.
Welcome to the combined BLAST and Genome Browser Tutorial.
The Genome Genome Browser Training Materials developed by: Warren C. Lathe, Ph.D. and Mary Mangan, Ph.D. Part 2.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Biocomputational Languages December 1, 2011 Greg Antell & Khoa Nguyen.
DNA Sequences Analysis Hasan Alshahrani CS6800 Statistical Background : HMMs. What is DNA Sequence. How to get DNA Sequence. DNA Sequence formats. Analysis.
Bioinformatics Overview
Disease risk prediction
Regulatory Genomics Lab
Visualization of genomic data
Basic Local Alignment Search Tool (BLAST)
BLAT Blast Like Alignment Tool
Ortholog identification and summaries.
Regulatory Genomics Lab
Regulatory Genomics Lab
Manfred Schmid, Agnieszka Tudek, Torben Heick Jensen  Cell Reports 
Thomas J Cradick, Peng Qiu, Ciaran M Lee, Eli J Fine, Gang Bao 
Manisha Panta, Avdesh Mishra, Md Tamjidul Hoque, Joel Atallah
Presentation transcript:

A new way of seeing genomes Combining sequence- and signal-based genome analyses Maik Friedel, Thomas Wilhelm, Jürgen Sühnel FLI Introduction: So far, genome analysis is almost exclusively done by treating the sequence as a character string. We developed a new approach that may lead to an improved understanding of nucleotide sequences. Our genome browser DiProGB ( encodes. the sequence by geometrical or physicochemical dinucleotide properties. The values of these properties are plotted as a dinucleotide-based sequence graph. This type of visualization allows to recognize sequence patterns that are hidden in the usual character string representation. The graph can be manipulated in real time by zooming in and out, changing the amplitude, and by smoothing the graph adopting a shifting window technique. GenBank annotations such as exons, introns etc. can be visualized by different colors. The browser also allows to search for motifs in general and for repeats in particular, both at the character-based sequence and the signal levels. Finally, it offers a number of options for statistical analysis. In summary, the new genome browser is a powerful new tool for enhanced genome analysis. This leads to deeper insights into organization and function of the genome. For providing a reliable basis of dinucleotide property sets we have collected more than 100 in the dinucleotide property database DiProDB ( Conclusion: The genome browser DiProGB is a powerful new tool for motif discovery in genomes. In addition to the standard sequence representation the DNA is also analyzed considering thermodynamical and geometrical dinucleotide properties. More than 100 property sets are available in the new database – DiProDB. This allows to identify and visualize a broad range of both known and unknown genome patterns. The new way of seeing the genome can lead to a better understanding of its organization and function. 1. Visualization of evolutionary events The DiPro Genome Browser can be used to distinguish between 3 types of rRNA gene clusters in chloroplast genomes. The patterns can be best seen applying the free energy change dataset set for the DNA double strand. 2. Visualization of gene and exon/intron organization With DiProGB it can be shown that genes tend to be purine-rich. In the Figures shown below the sequence (positive strand) is encoded by the purine content. On the left side all genes of the + strand and on the right side all genes of the – strand are shown in red. 3. Repeats which cannot be found by standard repeat search methods We have shown this by hiding DNA sequence repeats in an artificial sequence with only 50% alignment identity. The new sequence contains the same repeats that are only visible in the signal representation. Applications The exon (red) and intron (green) structure of a given gene can be seen adopting a GC content representation. Exons tend to have a higher GC content than introns. 1.) Inverted Repeats (25kB) 79 of 88 genomes 2.) Inverted Repeat Lacking Clade 7 of 88 genomes 3.) 3 Directed Repeats 2 of 88 genomes (subclass: Euglenozoa) 1.) Original sequence repeats 2.) The same repeats hidden in an artificial sequence with only 50% sequence identity The genome browser is a computer program that converts DNA sequences into a signal representation by applying dinucleotide parameters and smoothing the signal using a shifting window technique. Basic features: standalone computer program written in C++ uploads nucleotide sequences of any size and type as GenBank, (multiple) FASTA or text files uploads different types of feature files (.gff, *.ptt, *.bed,...) colors annotated features of a feature or GenBank file manipulates the signal in real time (smoothing, changing amplitude, zooming) Implemented tools: motif and repeat search at the signal and sequence levels statistical tools for average statistics random sequence generator dinucleotide properties editor editor for searching and sorting the list of annotated features editor for adding features and qualifiers to an existing GenBank file export functions for signal information and for the character-based sequence DiProGB Basic features:  includes more than 100 dinucleotide property sets  full references for all sets  all sets are classified according to: - nucleic acid type (DNA, RNA,...) - strand (double, single) - mode of property determination (experimental, calculated) - property type (thermo dynamical, conformational, letter-based)  all information is shown in one table which can be customized  users can submit own datasets Implemented tools:  search and sorting functions  data export as text file or input file for the Genome Browser  Pearson’s correlation and Spearman’s rank correlation 31.1CG 35.8AC 33.4AT 33.4CC 39.3GA 40TA 36.9TG 35.8TT 35.8GT 38.3GC 36.9CA 30.5AG 39.3TC 33.4GG 30.5CT 35.8AA Example: twist (B-DNA) [degree] (Gorin et al. J. Mol. Biol. 247, (1995)). DiProDB Motif finder Repeat finder The main window of the genome browser consists of three panels: (1) Control panel: uploading and manipulating of sequence information and coding parameter (2) Main window: signal curve display (3) Position panel: position information of the actually depicted sequence range ( full genome of Euglena gracilis chloroplast; applied dinucleotide property: stacking energy) ( main table showing a list of twist parameter sets) Genome Browser Data Base Ureaplasma parvum serovar 3 str. Euglena gracilis chloroplast ( ) Euglena gracilis chloroplast Pinus thunbergii chloroplast Saccharum officinarum chloroplast Friedel et al. Bioinformatics 2009; doi: /bioinformatics/btp436 Friedel et al. Nucleic Acids Res Jan;37(Database issue):D37-40.