Download presentation
Presentation is loading. Please wait.
Published byElisabeth McDaniel Modified over 5 years ago
1
ClueGene: An Online Search Engine for Querying Gene Regulation
David M. Ng 2008 January 16
2
System Overview Every operation generates a “working set” that can be modified and used as the query in the next search iteration Common structure for all search and test operations with no dead ends
4
New Features Coexpression test Dataset ranking and heat map
Heat map for expression data
5
Coexpression Test Coexpression search performed using half of the working set selected at random AUC computed based on finding the held-out half of the working set Coexpression test score is the average of ten such searches Test score displayed in the context of representative pathways with scores computed the same way as a “thermometer” Precision-recall curves are also displayed
6
Dataset Ranking and Heat Map
Datasets are ranked by their contribution to the scores of the working set genes Display as a heat map Future work: allow user to provide dataset feedback
7
Expression Data Heatmap
Displays the expression data for a dataset For the following genes Result genes Query genes Contrast genes Randomly selected non-query and non-result genes Same number as number of result genes
8
Expression Data Heat Map Script
Generate a heat map as a Web page for specified query, result, and contrast genes for a given dataset. Usage: Invoke as a URL: Specify parameters following a ? Parameters are name-value pairs separated by ampersands
9
Expression Data Heat Map Script Parameters
species=<species code> ds=<dataset name> transactionId=<transaction id> <result gene id>=resultGene <query gene id>=queryGene <contrast gene id>=contrastGene
10
Expression Data Heat Map Example
ds=Segal03&species=sce&transactionId= & YJR123W=resultGene&YLR340W=resultGene&YNL301C=resultGene& YJR123W=queryGene&YLR340W=queryGene&YBL072C=queryGene& YNL232W=contrastGene&YDL175C=contrastGene&YDL104C=contrastGene
11
Invoking ClueGene via URL
ClueGene provides a GET interface
12
Future Work Dataset selection Reimplement Set-based user model
13
Reimplement ClueGene Current ClueGene Hard to maintain
10,000+ lines of Perl in 20 files 800+ lines of HTML and JavaScript Hard to maintain Old CGI technology
14
Set-Based User Model Generalization of Greg’s Gene Sets and Gene Set Families Set members can be atomic or sets Set members have attributes Intrinsic to the element Dependent on the set under consideration Issue: combining duplicate attributes
15
Benefits of Set Model A single, consistent model for all aspects of gene search engines Easier understanding of inputs, operations, and results More straightforward user interface implementation More general manipulation of sets supports saving/loading of sets combining result sets via set operations such as intersection and union
16
ClueGene Sets Gene: atom Cluster: set of genes
Attributes such as unique id, display name, aliases Cluster: set of genes Dataset: set of cluster sets Cluster compendium: set of dataset sets Query set: set of genes Expected set: set of genes
17
ClueGene Query Inputs Output Computing AUC Cluster compendium set
Query set Output Set of all genes in the genome Set-specific attributes for rank and score Computing AUC Additional input: expected set Result AUC: attribute of result set
18
Other Operations Known and Novel Motif Search GO Category Search
Input: Working set Output: Set of {set for each result motif containing the genes with the motif} GO Category Search
19
Clustering Expression data: set of genes Clustering
Set-specific attributes for expression data for each gene Clustering Input expression data: set of genes of expression data Output dataset: set of cluster sets Issue: handling operations that take a really long time
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.