Download presentation
Presentation is loading. Please wait.
Published byEaster Robyn Copeland Modified over 9 years ago
1
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric Besançon, Stéphane Chaudiron, Djamel Mostefa, Ismaïl Timimi, Khalid Choukri
2
2 13/05/07 2/20 Overview Goals and features of the INFILE campaign Test collections: Documents Topics Assessments Evaluation protocol Evaluation procedure Evaluation metrics Conclusions LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
3
3 13/05/07 3/20 Goals and features of the INFILE Campaign Information Filtering Evaluation filter documents according to long-term information needs (user profiles - topics) Adaptive : use simulated user feedback Following TREC adaptive filtering task Crosslingual three languages: English, French, Arabic close to real activity of competitive intelligence professionals in particular, profiles developed by CI professional (STI) pilot track in CLEF 2008 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
4
4 13/05/07 4/20 Test Collection Built from a corpus of news from the AFP (Agence France Presse) almost 1.5 million news in French, English and Arabic For the information filtering task: 100 000 documents to filter, in each language NewsML format standard XML format for news (IPTC) LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
5
5 13/05/07 5/20 Document example LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit document identifier keywords headline
6
6 13/05/07 6/20 Document example LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit location IPTC category AFP category content
7
7 13/05/07 7/20 Profiles 50 interest profiles 20 profiles in the domain of science and technology developped by CI professionals from INIST, ARIST, Oto Research, Digiport 30 profiles of general interest LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
8
8 13/05/07 8/20 Profiles Each profile contains 5 fields: title: a few words description description: a one-sentence description narrative: a longer description of what is considered a relevant document keywords: a set of key words, key phrases or named entities sample: a sample of relevant document (one paragraph) Participants may use any subset of the fields for their filtering LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
9
9 13/05/07 9/20 Constitution of the corpus To build the corpus of documents to filter: find relevant documents for the profiles in the original corpus use a pooling technique with results of IR tools the whole corpus is indexed with 4 IR engines (Lucene, Indri, Zettair and CEA search engine) each search engine is queried independently using the 5 different fields of the profiles + all fields + all fields but the sample 28 runs LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
10
10 13/05/07 10/20 Constitution of the corpus (2) pooling using a “Mixture of Experts” model first 10 documents of each run is taken first pool assessed a score is computed for each run and each topic according to the assessments of the first pool create next pool by merging runs using a weighted sum weights are proportional to the score ongoing assessments keep all documents assessed documents returned by IR systems by judged not relevant form a set of difficult documents choose random documents (noise) LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
11
11 13/05/07 11/20 Evaluation procedure One pass test Interactive protocol using a client-server architecture (webservice communication) participant registers retrieves one document filters the document ask for feedback (on kept documents) retrieves new document limited number of feedbacks (50) new document available only if previous one has been filtered LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
12
12 13/05/07 12/20 Evaluation metrics Precision / Recall/F-measure Utility (from TREC) LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit P=a/a+bR=a/a+c F=2PR/P+R u=w 1 ∗ a-w 2 ∗ b
13
13 13/05/07 13/20 Evaluation metrics (2) Detection cost (from TDT) uses probability of missed documents and false alarms LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
14
14 13/05/07 14/20 Evaluation metrics per profile and averaged on all profiles adaptivity: score evolution curve (values computed each 10000 documents) two experimental measures originality number of relevant documents a system uniquely retrieves anticipation inverse rank of first relevant document detected LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
15
15 13/05/07 15/20 Conclusions INFILE campaign Information Filtering Evaluation: adaptive, crosslingual, close to real usage Ongoing pilot track in CLEF 2008 current constitution of the corpus dry run mid-June evaluation campaign in July workshop in September Work in progress the modelling of the filtering task assumed by the CI practitioners LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.