Presentation is loading. Please wait.

Presentation is loading. Please wait.

TimeSearcher: Interactive Querying for Identification of Patterns in Genetic Data Harry Hochheiser Eric Baehrecke Stephen Mount Ben Shneiderman Harry Hochheiser.

Similar presentations


Presentation on theme: "TimeSearcher: Interactive Querying for Identification of Patterns in Genetic Data Harry Hochheiser Eric Baehrecke Stephen Mount Ben Shneiderman Harry Hochheiser."— Presentation transcript:

1 TimeSearcher: Interactive Querying for Identification of Patterns in Genetic Data Harry Hochheiser Eric Baehrecke Stephen Mount Ben Shneiderman Harry Hochheiser is supported by a fellowship from America Online.

2 2 Time Series Data Real-Valued function over time Goal: find patterns –“Starts Low, Ends High” –Outliers –Periodic Patterns –Laggards and Leaders Hypothesis generation

3 3 Microarray Data Chu, et al. The transcriptional program of sporulation in budding yeast, Science 1998 Oct 23; 282(5389): 699-705.

4 4 Timeboxes Rectangular query regions Value must be in range for all time points in region Combine multiple timeboxes for conjunctive query Sharp RisePanic Reversal

5 5 TimeSearcher/Microrarray demo

6 6 TimeSearcher Interactive exploration of time-series data Dynamic queries (<100ms) Linear display of individual items Create queries on graph area Move, scale timeboxes to modify query Drag-and-Drop for query-by-example

7 7 Other Applications “ Time”: linear ordered sequence Use TimeSearcher for general sequences –E.g., DNA

8 8 SF1 Splicing signals are recognized during early steps in the biochemical process of splicing. U2AF65 Exon 1 U1 U2AF35 (Y) n AG Exon 2 Branch Site Application to the case of the Arabidopsis thaliana branch site consensus splicing signal. Steve Mount Cell Biology and Molecular Genetics Harry Hochheiser and Ben Shneiderman Human Computer Interaction Lab Steven Salzberg The Institute for Genomic Research TimeSearcher for analysis of weak signals in nucleotide sequences:

9 9 Two-step pre-mRNA splicing mechanism with branched intermediate: Diagram courtesy of Dr. Martinez Hewlett Yeast (Saccharomyces cerevisiae) Invariant: TACTAAC Humans (Homo sapiens) Consensus: TNYTRAYY Fruit flies (Drosophila melanogaster) Invariant: WCTAATY Weeds (Arabidopsis thaliana): Invariant: CTRAY Consensus sequences: Here we sought to verify and extend the experimentally determined branch site consensus CTRAY determined by Simpson et al. (2002). Our long-term goal is the characterization of an even weaker signal, the ‘exonic splicing enhancer.’ Y = C or T; W = A or T; R = A or G; N = A, C, G or T

10 10

11 11

12 12

13 13

14 14

15 15

16 16 ACTAA ACTGA ATAAC ATTGA CTAAA CTAAC CTAAT CTCAT CTGAC TAACG TAACT TCTAA TGACT TGATT TTAAC WYTRAY Branch site Pyrimidines Distance to 3’ splice site Number of over-represented words one sigma two sigma Y = C or T; W = A or T; R = A or G; N = A, C, G or T Conclusions: TimeSearcher can be used to identify weak signals in aligned nucleotide sequences. Analysis of 8,550 exons from Arabidopsis supports the branch site consensus WYTRAY.

17 17 Future Work: Extensions to query model Leaders and Laggards –Identification of regulatory genes Multiple time-varying values Variable Time timeboxes Collaborations with biologists inform design What sort of queries are of interest?

18 18 Conclusions TimeSearcher: interactive tool for graphical exploration of time series data Ongoing use for analyzing microarray data and sequence data We’re interested in working with motivated users & real data sets www.cs.umd.edu/hcil/timesearcher


Download ppt "TimeSearcher: Interactive Querying for Identification of Patterns in Genetic Data Harry Hochheiser Eric Baehrecke Stephen Mount Ben Shneiderman Harry Hochheiser."

Similar presentations


Ads by Google