Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.

Similar presentations


Presentation on theme: "1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation."— Presentation transcript:

1 1 01/10/09 CLEF @ 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation Romaric Besançon (1), Djamel Mostefa, Olivier Hamon, Khalid Choukri (2), Stéphane Chaudiron,Ismaïl Timimi (3) (1)(2)(3)

2 2 01/10/09 CLEF @ 2 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Presentation of the INFILE track Information Filtering Evaluation  Filter documents from a document stream according to long-term information needs (user profiles) Second edition of the INFILE track in CLEF  1 participant in 2008  use same data in 2009

3 3 01/10/09 CLEF @ 3 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Presentation of the INFILE track Mutlilingual English, French, Arabic for both documents and topics Two tasks batch filtering  the whole corpus is given to the participants, which must return a list of filtered documents for each topic adaptive filtering  documents are provided to the participants one at a time through an interactive procedure, with possible automated feedback to adapt the filtering system  closer to real usage in a context of competitive intelligence

4 4 01/10/09 CLEF @ 4 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Document Collection Built from a corpus of news from the AFP (Agence France Presse) almost 1.5 million news in French, English and Arabic For the information filtering task: 100 000 documents to filter, in each language NewsML format standard XML format for news (IPTC)

5 5 01/10/09 CLEF @ 5 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Document example document identifier keywords headline

6 6 01/10/09 CLEF @ 6 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Document example IPTC category AFP category content

7 7 01/10/09 CLEF @ 7 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Topics 50 interest profiles 20 profiles in the domain of science and technology  developped by CI professionals from French institutes INIST, ARIS, Oto Research, Digiport 30 profiles of general interest Profiles developed in French/English Translated into Arabic

8 8 01/10/09 CLEF @ 8 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Topics Each profile contains 5 fields: title: a few words description description: a one-sentence description narrative: a longer description of what is considered a relevant document keywords: a set of key words, key phrases or named entities sample: a sample of relevant document (one paragraph)  Participants may use any subset of the fields for their filtering

9 9 01/10/09 CLEF @ 9 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Topic Example

10 10 01/10/09 CLEF @ 10 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Some topic examples in general domain in scientific information domain

11 11 01/10/09 CLEF @ 11 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Constitution of the corpus Same corpus as INFILE@CLEF 2008 With simulated feedback, we need the ground truth before the campaign To build the corpus of documents to filter:  find relevant documents for the profiles in the original corpus  use a pooling technique with results of IR tools  4 IR engines (Lucene, Indri, Zettair and CEA search engine), on several query fields combinations  iterative pooling using Mixture-of-Experts model

12 12 01/10/09 CLEF @ 12 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Constitution of the corpus (2) keep all documents assessed documents returned by IR systems by judged not relevant form a set of difficult documents choose random documents (noise) collection retrieved assessed relevant test collection random

13 13 01/10/09 CLEF @ 13 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Corpus Number of relevant documents for each topic, in each language

14 14 01/10/09 CLEF @ 14 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Tasks Batch filtering (02/04 - 30/05)  documents and topics available to participants  return list of filtered documents per topic (unordered) Adaptive filtering (03/06 - 10/07)  topics available to participants  documents available one at a time (one pass test)  interactive protocol using a client-server architecture (webservice communication)  new document available only if previous one has been filtered  available simulated user feedback  for adapatation  limited number of feedbacks (200)

15 15 01/10/09 CLEF @ 15 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Evaluation metrics Standard precision / recall / F-measure Utility (from TREC filtering tracks) per profile and averaged on all profiles adaptivity: evolution curve (values computed each 10000 documents) two experimental measures originality  number of relevant documents a system uniquely retrieves anticipation  inverse rank of first relevant document detected

16 16 01/10/09 CLEF @ 16 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico INFILE Participants 9 registered 5 submitted runs batch filtering 3 participants, 12 runs interactive filtering 2 participants, 3 runs 27

17 17 01/10/09 CLEF @ 17 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico INFILE results Repartition of runs by task and languages ara fre eng ara fre batch adaptive

18 18 01/10/09 CLEF @ 18 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico INFILE results – monolingual batch filtering

19 19 01/10/09 CLEF @ 19 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico INFILE results – crosslingual / adaptive filtering 57% best mono 90% same team mono crosslingual better than monolingual

20 20 01/10/09 CLEF @ 20 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico INFILE results – anticipation/originality strongly correlated with recall too few pariticipants

21 21 01/10/09 CLEF @ 21 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Approaches Filtering  adapted Information Retrieval tools (Lucene)  SVM classifier with external ressources (GoogleNews)  textual similarity measures with thresholds  reasoning model (human plausible reasoning) Adaptation  adaptation of selection thresholds  user feedback as parameter in reasoning model Crosslingual  bilingual dictionaries  machine translation

22 22 01/10/09 CLEF @ 22 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Conclusion and after… Increasing participation, reasonable result, but not enough… Currently, no INFILE track planned for next year interest in multilingual filtering ?  2/3 runs on monolingual English  not enough participants for crosslingual to have comparative results no funding INFILE evaluation kit will be made available  corpus of documents / topics / relevance assessments  tools for the interactive adaptive filtering procedure  tools for the evaluation distributed by ELDA


Download ppt "1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation."

Similar presentations


Ads by Google