1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.

Slides:



Advertisements
Similar presentations
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Search Engines and Information Retrieval
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
Modern Information Retrieval
XML Document Mining Challenge Bridging the gap between Information Retrieval and Machine Learning Ludovic DENOYER – University of Paris 6.
Information Retrieval in Practice
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
1 CS 430: Information Discovery Lecture 20 The User in the Loop.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Evaluation Information retrieval Web. Purposes of Evaluation System Performance Evaluation efficiency of data structures and methods operational profile.
 Official Site: facility.org/research/evaluation/clef-ip-10http:// facility.org/research/evaluation/clef-ip-10.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Search Engines and Information Retrieval Chapter 1.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Building a Domain-Specific Document Collection for Evaluating Metadata Effects on Information Retrieval Walid Magdy, Jinming Min, Johannes Leveling, Gareth.
An Analysis of Assessor Behavior in Crowdsourced Preference Judgments Dongqing Zhu and Ben Carterette University of Delaware.
Building a Domain-Specific Document Collection for Evaluating Metadata Effects on Information Retrieval Walid Magdy, Jinming Min, Johannes Leveling, Gareth.
Jane Reid, AMSc IRIC, QMUL, 16/10/01 1 Evaluation of IR systems Jane Reid
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Information Retrieval Effectiveness of Folksonomies on the World Wide Web P. Jason Morrison.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Techniques for Collaboration in Text Filtering 1 Ian Soboroff Department of Computer Science and Electrical Engineering University of Maryland, Baltimore.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
What Does the User Really Want ? Relevance, Precision and Recall.
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Evaluation of Information Retrieval Systems Xiangming Mu.
Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
1 INFILE - INformation FILtering Evaluation Evaluation of adaptive filtering systems for business intelligence and technology watch Towards real use conditions.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
A Trainable Multi-factored QA System Radu Ion, Dan Ştefănescu, Alexandru Ceauşu, Dan Tufiş, Elena Irimia, Verginica Barbu-Mititelu Research Institute for.
Information Retrieval in Practice
Evaluation Anisio Lacerda.
Walid Magdy Gareth Jones
IR Theory: Evaluation Methods
Presentation transcript:

1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric Besançon, Stéphane Chaudiron, Djamel Mostefa, Ismaïl Timimi, Khalid Choukri

2 13/05/07 2/20 Overview  Goals and features of the INFILE campaign  Test collections: Documents Topics Assessments  Evaluation protocol Evaluation procedure Evaluation metrics  Conclusions LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

3 13/05/07 3/20 Goals and features of the INFILE Campaign  Information Filtering Evaluation filter documents according to long-term information needs (user profiles - topics)‏ Adaptive : use simulated user feedback Following TREC adaptive filtering task  Crosslingual three languages: English, French, Arabic  close to real activity of competitive intelligence professionals  in particular, profiles developed by CI professional (STI)‏  pilot track in CLEF 2008 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

4 13/05/07 4/20 Test Collection  Built from a corpus of news from the AFP (Agence France Presse)‏ almost 1.5 million news in French, English and Arabic  For the information filtering task: documents to filter, in each language  NewsML format standard XML format for news (IPTC)‏ LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

5 13/05/07 5/20 Document example LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit document identifier keywords headline

6 13/05/07 6/20 Document example LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit location IPTC category AFP category content

7 13/05/07 7/20 Profiles  50 interest profiles 20 profiles in the domain of science and technology  developped by CI professionals from INIST, ARIST, Oto Research, Digiport 30 profiles of general interest LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

8 13/05/07 8/20 Profiles  Each profile contains 5 fields: title: a few words description description: a one-sentence description narrative: a longer description of what is considered a relevant document keywords: a set of key words, key phrases or named entities sample: a sample of relevant document (one paragraph)‏  Participants may use any subset of the fields for their filtering LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

9 13/05/07 9/20 Constitution of the corpus  To build the corpus of documents to filter: find relevant documents for the profiles in the original corpus use a pooling technique with results of IR tools  the whole corpus is indexed with 4 IR engines (Lucene, Indri, Zettair and CEA search engine)‏  each search engine is queried independently using the 5 different fields of the profiles + all fields + all fields but the sample  28 runs LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

10 13/05/07 10/20 Constitution of the corpus (2)‏ pooling using a “Mixture of Experts” model  first 10 documents of each run is taken  first pool assessed  a score is computed for each run and each topic according to the assessments of the first pool  create next pool by merging runs using a weighted sum  weights are proportional to the score ongoing assessments  keep all documents assessed documents returned by IR systems by judged not relevant form a set of difficult documents  choose random documents (noise)‏ LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

11 13/05/07 11/20 Evaluation procedure  One pass test  Interactive protocol using a client-server architecture (webservice communication)‏ participant registers retrieves one document filters the document ask for feedback (on kept documents)‏ retrieves new document  limited number of feedbacks (50)‏  new document available only if previous one has been filtered LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

12 13/05/07 12/20 Evaluation metrics  Precision / Recall/F-measure  Utility (from TREC)‏ LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit P=a/a+bR=a/a+c F=2PR/P+R u=w 1 ∗ a-w 2 ∗ b

13 13/05/07 13/20 Evaluation metrics (2)‏  Detection cost (from TDT)‏  uses probability of missed documents and false alarms LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

14 13/05/07 14/20 Evaluation metrics  per profile and averaged on all profiles  adaptivity: score evolution curve (values computed each documents)‏  two experimental measures originality  number of relevant documents a system uniquely retrieves anticipation  inverse rank of first relevant document detected LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

15 13/05/07 15/20 Conclusions  INFILE campaign Information Filtering Evaluation: adaptive, crosslingual, close to real usage  Ongoing pilot track in CLEF 2008 current constitution of the corpus dry run mid-June evaluation campaign in July workshop in September  Work in progress the modelling of the filtering task assumed by the CI practitioners LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit