1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.

1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric Besançon, Stéphane Chaudiron, Djamel Mostefa, Ismaïl Timimi, Khalid Choukri

2 13/05/07 2/20 Overview  Goals and features of the INFILE campaign  Test collections: Documents Topics Assessments  Evaluation protocol Evaluation procedure Evaluation metrics  Conclusions LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

3 13/05/07 3/20 Goals and features of the INFILE Campaign  Information Filtering Evaluation filter documents according to long-term information needs (user profiles - topics)‏ Adaptive : use simulated user feedback Following TREC adaptive filtering task  Crosslingual three languages: English, French, Arabic  close to real activity of competitive intelligence professionals  in particular, profiles developed by CI professional (STI)‏  pilot track in CLEF 2008 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

4 13/05/07 4/20 Test Collection  Built from a corpus of news from the AFP (Agence France Presse)‏ almost 1.5 million news in French, English and Arabic  For the information filtering task: 100 000 documents to filter, in each language  NewsML format standard XML format for news (IPTC)‏ LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

5 13/05/07 5/20 Document example LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit document identifier keywords headline

6 13/05/07 6/20 Document example LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit location IPTC category AFP category content

7 13/05/07 7/20 Profiles  50 interest profiles 20 profiles in the domain of science and technology  developped by CI professionals from INIST, ARIST, Oto Research, Digiport 30 profiles of general interest LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

8 13/05/07 8/20 Profiles  Each profile contains 5 fields: title: a few words description description: a one-sentence description narrative: a longer description of what is considered a relevant document keywords: a set of key words, key phrases or named entities sample: a sample of relevant document (one paragraph)‏  Participants may use any subset of the fields for their filtering LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

9 13/05/07 9/20 Constitution of the corpus  To build the corpus of documents to filter: find relevant documents for the profiles in the original corpus use a pooling technique with results of IR tools  the whole corpus is indexed with 4 IR engines (Lucene, Indri, Zettair and CEA search engine)‏  each search engine is queried independently using the 5 different fields of the profiles + all fields + all fields but the sample  28 runs LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

10 13/05/07 10/20 Constitution of the corpus (2)‏ pooling using a “Mixture of Experts” model  first 10 documents of each run is taken  first pool assessed  a score is computed for each run and each topic according to the assessments of the first pool  create next pool by merging runs using a weighted sum  weights are proportional to the score ongoing assessments  keep all documents assessed documents returned by IR systems by judged not relevant form a set of difficult documents  choose random documents (noise)‏ LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

11 13/05/07 11/20 Evaluation procedure  One pass test  Interactive protocol using a client-server architecture (webservice communication)‏ participant registers retrieves one document filters the document ask for feedback (on kept documents)‏ retrieves new document  limited number of feedbacks (50)‏  new document available only if previous one has been filtered LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

12 13/05/07 12/20 Evaluation metrics  Precision / Recall/F-measure  Utility (from TREC)‏ LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit P=a/a+bR=a/a+c F=2PR/P+R u=w 1 ∗ a-w 2 ∗ b

13 13/05/07 13/20 Evaluation metrics (2)‏  Detection cost (from TDT)‏  uses probability of missed documents and false alarms LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

14 13/05/07 14/20 Evaluation metrics  per profile and averaged on all profiles  adaptivity: score evolution curve (values computed each 10000 documents)‏  two experimental measures originality  number of relevant documents a system uniquely retrieves anticipation  inverse rank of first relevant document detected LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

15 13/05/07 15/20 Conclusions  INFILE campaign Information Filtering Evaluation: adaptive, crosslingual, close to real usage  Ongoing pilot track in CLEF 2008 current constitution of the corpus dry run mid-June evaluation campaign in July workshop in September  Work in progress the modelling of the filtering task assumed by the CI practitioners LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit

1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.

Similar presentations

Presentation on theme: "1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.

Similar presentations

Presentation on theme: "1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric."— Presentation transcript:

Similar presentations

About project

Feedback