Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.

Slides:



Advertisements
Similar presentations
Calculating Quality Reporting Service
Advertisements

Even More TopX: Relevance Feedback Ralf Schenkel Joint work with Osama Samodi, Martin Theobald.
Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments SIGIR´09, July 2009.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
PRES A Score Metric for Evaluating Recall- Oriented IR Applications Walid Magdy Gareth Jones Dublin City University SIGIR, 22 July 2010.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Developing and Evaluating a Query Recommendation Feature to Assist Users with Online Information Seeking & Retrieval With graduate students: Karl Gyllstrom,
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Information Retrieval Visualization CPSC 533c Class Presentation Qixing Zheng March 22, 2004.
Evaluating Search Engine
Search Engines and Information Retrieval
Information Retrieval in Practice
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.
Evaluating the Performance of IR Sytems
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Web Archive Information Retrieval Miguel Costa, Daniel Gomes (speaker) Portuguese Web Archive.
Overview of Search Engines
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Presented By: Product Activation Group Syndication.
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
Search Engines and Information Retrieval Chapter 1.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
DCU meets MET: Bengali and Hindi Morpheme Extraction Debasis Ganguly, Johannes Leveling, Gareth J.F. Jones CNGL, School of Computing, Dublin City University,
Building a Domain-Specific Document Collection for Evaluating Metadata Effects on Information Retrieval Walid Magdy, Jinming Min, Johannes Leveling, Gareth.
Building a Domain-Specific Document Collection for Evaluating Metadata Effects on Information Retrieval Walid Magdy, Jinming Min, Johannes Leveling, Gareth.
Applying the KISS Principle with Prior-Art Patent Search Walid Magdy Gareth Jones Dublin City University CLEF-IP, 22 Sep 2010.
НИУ ВШЭ – НИЖНИЙ НОВГОРОД EDUARD BABKIN NIKOLAY KARPOV TATIANA BABKINA NATIONAL RESEARCH UNIVERSITY HIGHER SCHOOL OF ECONOMICS A method of ontology-aided.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science University of Waterloo Waterloo, Canada.
IIIT Hyderabad’s CLIR experiments for FIRE-2008 Sethuramalingam S & Vasudeva Varma IIIT Hyderabad, India 1.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
Search Engine Architecture
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
Users and Assessors in the Context of INEX: Are Relevance Dimensions Relevant? Jovan Pehcevski, James A. Thom School of CS and IT, RMIT University, Australia.
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Mtivity Client Support System Quick start guide. Mtivity Client Support System We are very pleased to announce the launch of a new Client Support System.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Information Retrieval
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Crowdsourcing Blog Track Top News Judgments at TREC Richard McCreadie, Craig Macdonald, Iadh Ounis {richardm, craigm, 1.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
1 DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen, Germany.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
IR Homework #2 By J. H. Wang Apr. 13, Programming Exercise #2: Query Processing and Searching Goal: to search for relevant documents Input: a query.
Information Retrieval and Extraction 2009 Term Project – Modern Web Search Advisor: 陳信希 TA: 蔡銘峰&許名宏.
WHIM- Spring ‘10 By:-Enza Desai. What is HCIR? Study of IR techniques that brings human intelligence into search process. Coined by Gary Marchionini.
Evaluation Anisio Lacerda.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Information Retrieval (in Practice)
Walid Magdy Gareth Jones
Evaluation of IR Systems
Search Engine Architecture
Search Engine Architecture
Presentation transcript:

Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City University

Outline TREC-style evaluation Beyond TREC: Collaborative/Personalized IR task objectives Data Information retrieval with logs Baseline experiments Results Conclusions and future work

TREC-style Evaluation Information Retrieval evaluation campaigns TREC, CLEF, NTCIR, INEX, FIRE Evaluation methodology: Organizer(s): create topics (queries) Participating systems: index document collection, process topics, submit results Organiser(s) (+ participants): evaluate submissions

Example TREC topic 401 foreign minorities, Germany What language and cultural differences impede the integration of foreign minorities in Germany? A relevant document will focus on the causes of the lack of integration in a significant way; that is, the mere mention of immigration difficulties is not relevant. Documents that discuss immigration problems unrelated to Germany are also not relevant.

CIR Task Motivation (beyond TREC) Limitations of ad-hoc IR evaluation paradigm: One topic (query) fits all (users) One result set fits all (users) CIR task: Log the topic development to enable research in personalisation Different users have different ways of expressing the same information need. Different users formulate topics for the same broad search categories (e.g. “Bollywood movies”) Users have incomplete knowledge about area of their interest; querying is an iterative process e.g. “education in India” -> “top engineering institutes of India” -> “research in IIT Bombay” etc.

CIR Task Motivation (contd.) Elements of personalization: Different query formulations and relevant documents Same user develops topic and judges relevance Topic development and evaluation on same corpus (→ reproducible results) Elements of collaboration: Users choose a search category as their starting point Two users with same category indicate users with similar information interests Research Question: Can we tune IR systems to address individual user-specific information needs?

Task Based Navigation Select category Indian paintings and painters Form and execute a query Indian painters Read docs and reformulate query 1.Kangra paintings 2.Hussain paintings 3.Hussain controversy 4.Hussain controversial paintings Enter a final test query which will be assigned to you for relevance judgement 23 debforit Indian paintings and painters M.F. Hussain controversy paintings Find a detailed information on M.F. Hussain's controversial paintings Information about the reasons for controversy, Hussain's reactions to the controversies are relevant here. A third party critic's view over the matter is also relevant.

Difference with ad-hoc topics TREC/ad-hoc IR: Q 1 -> D 1, …, D m (single query, single result set) CIR: ( Q 1k -> D 1k, …, D m i k ) x n-1 -> Q n (multiple users, multiple related queries, multiple related result sets)

CIR track activity flow

Data Preparation (1/3) Document collection – English FIRE-2011 ad-hoc collection (articles from Indian and Bangladesh newspapers) Also worked on indexing Wikipedia articles to make the task more attractive Indexed the collection with Lucene Identified 15 broad category news domains Java Servlet based search interface which supports user interactions: registration, category selection, retrieving/viewing/bookmarking documents, submission of summary and results

Data Preparation (2/3) Each line in a CSV formatted log contains: User name (U), category name (C) – to identify the current session. Query string (Q) – to identify the current query. Document ID (D) – to identify the current document on which an action is performed Action – click to view or bookmark Timestamp – to compute relative viewing times

Data Preparation (3/3) Queries in extended TREC format (final test topics) Additional fields: User name of the topic developer Topic category

Information Retrieval using Logs Question: Can we tune IR systems to address individual user-specific information needs? Objective: Investigate benefit of using additional information about a user (topic developer) on IR performance Data: Ad hoc document collection User information (search history+search categories) Final topics Evaluation metrics: (MAP)

Not everything goes according to plan  FIRE 2011: 26 registered participants, but no run submissions ! 25 topics: enough for standard IR evaluation, but not enough for CIR 10 topic developers with different interests Very small overlap between categories and documents FIRE 2012: provided baseline implementation (as source code) 2 expressions of interest/registrations 50 topics 5 other topic developers for in-house experiments -> CIR task cancelled

Updated Last Year’s Conclusions Baseline results show that search logs can be used to improve precision at top ranks If two simple approaches work reasonably well, then more complex methods may work even better. For example Using the view times to predict relevance Using the bookmarks as pseudo-relevant documents Using RS techniques such as popularity of a document across users

Updated Last Year’s Future Work Identify reasons for lack of submissions feedback from registered participants: entry hurdle too high provided baseline implementation Generate more logs from more users performed additional experiments with this system with 5 users users were not paid; so the system must be reasonably usable Make the search task more interesting by using Wikipedia instead of news worked on indexing Wikipedia and on improving the graphical user interface too many related IR evaluation tasks? (e.g. session retrieval)

Thank you for your attention Contact us if you want CIR 2013