TimeSearcher: Interactive Querying for Identification of Patterns in Genetic Data Harry Hochheiser Eric Baehrecke Stephen Mount Ben Shneiderman Harry Hochheiser.

Slides:



Advertisements
Similar presentations
Structure and Function of 30nm Fibers Deduced From the Linear Distribution of Genetic Information. Part One: The Logical Framework Christine M. Povinelli,
Advertisements

Gene Control in Development
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Human Genome Project What did they do? Why did they do it? What will it mean for humankind? Animation OverviewAnimation Overview - Click.
Basic Molecular Biology Many slides by Omkar Deshpande.
Time Series visualizations Information Visualization – CPSC 533c Lior Berry March 10 th 2004.
T IME WARPING OF EVOLUTIONARY DISTANT TEMPORAL GENE EXPRESSION DATA BASED ON NOISE SUPPRESSION Yury Goltsev and Dmitri Papatsenko *Department of Molecular.
Interactive Pattern Search in Time Series (Using TimeSearcher 2) Paolo Buono, Aleks Aris, Catherine Plaisant, Amir Khella, and Ben Shneiderman Proceedings,
BIO513: Lecture 1. Central dogma “The central dogma of molecular biology deals with the detailed residue-by-residue transfer of sequential information.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Alternative splicing and evolution Daniel Jeffares.
Dynamic query tools for time series data sets: Timebox widgets for interactive exploration Harry Hochheiser Ben Shneiderman Presented by Justin Domke.
Bioinformatics Alternative splicing Multiple isoforms Exonic Splicing Enhancers (ESE) and Silencers (ESS) SpliceNest Lecture 13.
1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
TimeSearcher: Interactive Querying for Identification of Patterns in Genetic Microarray Time Series Data Harry Hochheiser Ben Shneiderman Eric Baehrecke,
MCB 7200: Molecular Biology
Human Molecular Genetics Section 14–3
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Comparative Genomics of the Eukaryotes
Molecular Biology of the Cell
Chapter 6 Gene Prediction: Finding Genes in the Human Genome.
Elements of Molecular Biology All living things are made of cells All living things are made of cells Prokaryote, Eukaryote Prokaryote, Eukaryote.
Using The Gene Ontology: Gene Product Annotation.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Manifestations of a Code Genes, genomes, bioinformatics and cyberspace – and the promise they hold for biology education.
Analyzing transcription modules in the pathogenic yeast Candida albicans Elik Chapnik Yoav Amiram Supervisor: Dr. Naama Barkai.
Gene Expression and Gene Regulation. The Link between Genes and Proteins At the beginning of the 20 th century, Garrod proposed: – Genetic disorders such.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Genome Sequencing & App. of DNA Technologies Genomics is a branch of science that focuses on the interactions of sets of genes with the environment. –
Genome Organization & Evolution. Chromosomes Genes are always in genomic structures (chromosomes) – never ‘free floating’ Bacterial genomes are circular.
Cyclins Presentation1 Cyclin family of the yeast S. cerevisiae: Biological vs. Bioinformatical Presented by: Tzvika HoltzmanYan Tsitrin.
MCB 720: Molecular Biology Biotechnology terminology Common hosts in biotechnology research Transcription & Translation Prokaryotic gene organization &
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
MCB 720: Molecular Biology Biotechnology terminology Common hosts in biotechnology research Transcription & Translation Prokaryotic gene organization &
Central dogma: the story of life RNA DNA Protein.
MCB 7200: Molecular Biology Biotechnology terminology Common hosts and experimental organisms Transcription and translation Prokaryotic gene organization.
Introduction to Molecular Cell Biology Transcription Regulation Dr. Fridoon Jawad Ahmad HEC Foreign Professor King Edward Medical University Visiting Professor.
Brief Overview of Macromolecules DNA, RNA, and Proteins.
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
RNA Rearrangements in the Spliceosome
David Sadava H. Craig Heller Gordon H. Orians William K. Purves David M. Hillis Biologia.blu B – Le basi molecolari della vita e dell’evoluzione The Eukaryotic.
Bailee Ludwig Quality Management. Before we get started…. ….Let’s see what you know about Genomics.
William S. Klug Michael R. Cummings Charlotte A. Spencer Concepts of Genetics Eighth Edition Chapter 21 Dissection of Gene Function: Mutational Analysis.
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
A Tutorial of the PrePPI Database Presenters: Gabriel Leis and Katrina Sherbina Loyola Marymount University Departments of Biology and Computer Science.
Homework #2 is due 10/17 Bonus #1 is due 10/24 Exam key is online Office hours: M 10/ :30am 2-5pm in Bio 6.
BIOL 433 Plant Genetics Term 2, Instructors: Dr. George Haughn Dr. Ljerka Kunst BioSciences 2239BioSciences Tel
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
Visualizing Biosciences Genomics & Proteomics. “Scientists Complete Rough Draft of Human Genome” - New York Times, June 26, 2000 The problem: –3 billion.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
MCB 7200: Molecular Biology
Chapter 10 BIOTECHNOLOGY
BIOL 433 Plant Genetics Term 2,
PBIO 4500/5500: Biotechnology and Genetic Engineering
EL: To find out what a genome is and how gene expression is regulated
Prediction of Regulatory Elements for Non-Model Organisms Rachita Sharma, Patricia.
more regulating gene expression
Evolution of eukaryote genomes
Biotechnology and Genetic Engineering PBIO 450/550
Department of Chemical Engineering
Yanchang Wang Biomedical Sciences
Modeling cells with protein networks
Working in the Post-Genomic C. elegans World
Presentation transcript:

TimeSearcher: Interactive Querying for Identification of Patterns in Genetic Data Harry Hochheiser Eric Baehrecke Stephen Mount Ben Shneiderman Harry Hochheiser is supported by a fellowship from America Online.

2 Time Series Data Real-Valued function over time Goal: find patterns –“Starts Low, Ends High” –Outliers –Periodic Patterns –Laggards and Leaders Hypothesis generation

3 Microarray Data Chu, et al. The transcriptional program of sporulation in budding yeast, Science 1998 Oct 23; 282(5389):

4 Timeboxes Rectangular query regions Value must be in range for all time points in region Combine multiple timeboxes for conjunctive query Sharp RisePanic Reversal

5 TimeSearcher/Microrarray demo

6 TimeSearcher Interactive exploration of time-series data Dynamic queries (<100ms) Linear display of individual items Create queries on graph area Move, scale timeboxes to modify query Drag-and-Drop for query-by-example

7 Other Applications “ Time”: linear ordered sequence Use TimeSearcher for general sequences –E.g., DNA

8 SF1 Splicing signals are recognized during early steps in the biochemical process of splicing. U2AF65 Exon 1 U1 U2AF35 (Y) n AG Exon 2 Branch Site Application to the case of the Arabidopsis thaliana branch site consensus splicing signal. Steve Mount Cell Biology and Molecular Genetics Harry Hochheiser and Ben Shneiderman Human Computer Interaction Lab Steven Salzberg The Institute for Genomic Research TimeSearcher for analysis of weak signals in nucleotide sequences:

9 Two-step pre-mRNA splicing mechanism with branched intermediate: Diagram courtesy of Dr. Martinez Hewlett Yeast (Saccharomyces cerevisiae) Invariant: TACTAAC Humans (Homo sapiens) Consensus: TNYTRAYY Fruit flies (Drosophila melanogaster) Invariant: WCTAATY Weeds (Arabidopsis thaliana): Invariant: CTRAY Consensus sequences: Here we sought to verify and extend the experimentally determined branch site consensus CTRAY determined by Simpson et al. (2002). Our long-term goal is the characterization of an even weaker signal, the ‘exonic splicing enhancer.’ Y = C or T; W = A or T; R = A or G; N = A, C, G or T

10

11

12

13

14

15

16 ACTAA ACTGA ATAAC ATTGA CTAAA CTAAC CTAAT CTCAT CTGAC TAACG TAACT TCTAA TGACT TGATT TTAAC WYTRAY Branch site Pyrimidines Distance to 3’ splice site Number of over-represented words one sigma two sigma Y = C or T; W = A or T; R = A or G; N = A, C, G or T Conclusions: TimeSearcher can be used to identify weak signals in aligned nucleotide sequences. Analysis of 8,550 exons from Arabidopsis supports the branch site consensus WYTRAY.

17 Future Work: Extensions to query model Leaders and Laggards –Identification of regulatory genes Multiple time-varying values Variable Time timeboxes Collaborations with biologists inform design What sort of queries are of interest?

18 Conclusions TimeSearcher: interactive tool for graphical exploration of time series data Ongoing use for analyzing microarray data and sequence data We’re interested in working with motivated users & real data sets