UCB BioText TREC 2003 Participation Participants: Marti Hearst Gaurav Bhalotia, Presley Nakov, Ariel Schwartz Track: Genomics, tasks 1 and 2.

Slides:



Advertisements
Similar presentations
Relevance Feedback Limitations –Must yield result within at most 3-4 iterations –Users will likely terminate the process sooner –User may get irritated.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI and a gift from.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
UCB BioText TREC 2003 Genomics Track Participants: Marti Hearst Gaurav Bhalotia, Preslav Nakov, Ariel Schwartz University of California, Berkeley Genomics:
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Evaluating the Performance of IR Sytems
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
BioText Infrastructure Ariel Schwartz Gaurav Bhalotia 10/07/2002.
Information retrieval: overview. Information Retrieval and Text Processing Huge literature dating back to the 1950’s! SIGIR/TREC - home for much of this.
Mining the Medical Literature Chirag Bhatt October 14 th, 2004.
WMES3103 : INFORMATION RETRIEVAL INDEXING AND SEARCHING.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Classification Technology at LexisNexis SIGIR 2001 Workshop on Operational Text Classification Mark Wasson LexisNexis September.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
1 Bins and Text Categorization Carl Sable (Columbia University) Kenneth W. Church (AT&T)
Outline Quick review of GS Current problems with GS Our solutions Future work Discussion …
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Part 1 – PubMed Interface, Display options, Saving, Printing, and ing results. Instructions This part of the course is a PowerPoint demonstration.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Chapter 6: Information Retrieval and Web Search
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
Dataware’s Document Clustering and Query-By-Example Toolkits John Munson Dataware Technologies 1999 BRS User Group Conference.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Modeling term relevancies in information retrieval using Graph Laplacian Kernels Shuguang Wang Joint work with Saeed Amizadeh and Milos Hauskrecht.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.
Copyright OpenHelix. No use or reproduction without express written consent1.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Tutorial 3 BLAST 1. BLAST tutorial How to use BLAST Score vs. E-value Exercise Cool story of the day: How Alzheimer is studied in yeast 2.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Automatic Assignment of Biomedical Categories: Toward a Generic Approach Patrick Ruch University Hospitals of Geneva, Medical Informatics Service, Geneva.
Labeling protein-protein interactions Barbara Rosario Marti Hearst Project overview The problem Identifying the interactions between proteins. Labeling.
The TREC-9 Adaptive Filtering track (Coordinators: David Hull and Stephen Robertson) Stephen Robertson Microsoft Research Cambridge
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 GAPSCORE: Finding Gene and Protein Names one Word at a Time Jeffery T. Chang 1, Hinrich Schutze 2 & Russ B. Altman 1 1 Department of Genetics, Stanford.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Text Similarity: an Alternative Way to Search MEDLINE James Lewis, Stephan Ossowski, Justin Hicks, Mounir Errami and Harold R. Garner Translational Research.
Vertical Search for Courses of UIUC Homepage Classification The aim of the Course Search project is to construct a database of UIUC courses across all.
Query Type Classification for Web Document Retrieval In-Ho Kang, GilChang Kim KAIST SIGIR 2003.
Best pTree organization? level-1 gives te, tf (term level)
Evaluation.
Chapter 5: Information Retrieval and Web Search
Panagiotis G. Ipeirotis Luis Gravano
Presentation transcript:

UCB BioText TREC 2003 Participation Participants: Marti Hearst Gaurav Bhalotia, Presley Nakov, Ariel Schwartz Track: Genomics, tasks 1 and 2

TREC Task 1: Overview Search 525,938 MedLine records Titles, abstracts, MeSH category terms, citation information Topics: Taken from the GeneRIF portion of the LocusLink database We are supplied with a gene names Definition of a GeneRIF: For gene X, find all MEDLINE references that focus on the basic biology of the gene or its protein products from the designated organism. Basic biology includes isolation, structure, genetics and function of genes/proteins in normal and disease states.

TREC Task 1: Sample Query Homo sapiens OFFICIAL_GENE_NAME ets variant gene 6 (TEL ncogene) Homo sapiens OFFICIAL_SYMBOL ETV Homo sapiens ALIAS_SYMBOL TEL Homo sapiens PREFERRED_PRODUCT ets variant gene Homo sapiens PRODUCT ets variant gene Homo sapiens ALIAS_PROT TEL1 oncogene The first column is the official topic number (1-50). The second column contains the LocusLink ID for the gene. The third column contains the name of organism. The fourth column contains the gene name type. The fifth column contains the gene name.

TREC Task 1: Approach Two main components: Retrieve relevant docs May miss many because of variation in how gene names are expressed Rank order them

TREC Task 1: Approach Retrieval Normalization of query terms Special characters are replaced with spaces in both queries and documents. Term expansion A set of pattern based rules is applied to the original list of query terms, to expand the original set, and increase recall. Some rules with lower confidence get a lower weight in the ranking step. Stop word removal Organism identification Gene names are often shared across different organisms Developed a method to automatically determine which MeSH terms correspond to LocusLink Organism terms  Retrieved Medline docs indicated by LocusLink links corresponding to a given organism  Organism terms were the most frequent MeSH categories among the selected docs  Used these terms to identify the organism term in Medline  An example of playing two databases off each other. Mesh concepts When an exact match is found between one of the query terms and a MeSH term assigned to a document, the document is retrieved.

Gene Name Expansion

Organism Filtering

TREC Task 1: Approach Relevance ranking IBM’s DB2 Net Search Extender was used as the text search engine. Scoring: Each query is a union of 5 different sub-queries -  titles,  abstracts,  titles using low confidence expansion rules,  abstracts using low confidence expansion rules, and  MeSH concepts. Each sub-query returns a set of documents with a relevance score from the text search engine (or a fixed value for MeSH matches) The aggregated score is the weighted SUM of the individual scores with optional weights applied to each sub-query score.  SUM performs better than MAX, since it gives higher confidence to documents found in multiple sub-queries. Scores are normalized to be in the (0,1) range, by dividing the score by the highest aggregated score achieved for the query.

TREC Task 1: Approach GeneRIF classification A Naïve Bayes model is used to assign to each document the probability it is a GeneRIF. MeSH terms are used as features. Combination of text retrieval score and GeneRIF classification score. We tried both an additive and a multiplicative approach. Both behave similarly with a slightly better performance achieved with the additive one.

TREC Task 1: Results Performance is measured using the standard trec_eval program. On training data: Best published result: With GeneRIF classifier: Without GeneRIF classifier: On testing data: (turned in 8/4/03) With GeneRIF classifier – Without GeneRIF classifier –

TREC Task 2 Problem Definition: Given GeneRIFS formatted as: J Biol Chem 2002 Sep 13;277(37): the death effector domain of FADD is involved in interaction with Fas Nucleic Acids Res 2002 Aug 15;30(16): In the case of Fas-mediated apoptosis, when we transiently introduced these hybrid- ribozyme libraries into Fas-expressing HeLa cells, we were able to isolate surviving clones that were resistant to or exhibited a delay in Fas-mediated apoptosis w … reproduce the GeneRIF from the MEDLINE record.

TREC Task 2 What we did TBA