Download presentation
Presentation is loading. Please wait.
Published byCalvin Hill Modified over 9 years ago
1
1 01/10/09 CLEF @ 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation Romaric Besançon (1), Djamel Mostefa, Olivier Hamon, Khalid Choukri (2), Stéphane Chaudiron,Ismaïl Timimi (3) (1)(2)(3)
2
2 01/10/09 CLEF @ 2 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Presentation of the INFILE track Information Filtering Evaluation Filter documents from a document stream according to long-term information needs (user profiles) Second edition of the INFILE track in CLEF 1 participant in 2008 use same data in 2009
3
3 01/10/09 CLEF @ 3 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Presentation of the INFILE track Mutlilingual English, French, Arabic for both documents and topics Two tasks batch filtering the whole corpus is given to the participants, which must return a list of filtered documents for each topic adaptive filtering documents are provided to the participants one at a time through an interactive procedure, with possible automated feedback to adapt the filtering system closer to real usage in a context of competitive intelligence
4
4 01/10/09 CLEF @ 4 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Document Collection Built from a corpus of news from the AFP (Agence France Presse) almost 1.5 million news in French, English and Arabic For the information filtering task: 100 000 documents to filter, in each language NewsML format standard XML format for news (IPTC)
5
5 01/10/09 CLEF @ 5 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Document example document identifier keywords headline
6
6 01/10/09 CLEF @ 6 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Document example IPTC category AFP category content
7
7 01/10/09 CLEF @ 7 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Topics 50 interest profiles 20 profiles in the domain of science and technology developped by CI professionals from French institutes INIST, ARIS, Oto Research, Digiport 30 profiles of general interest Profiles developed in French/English Translated into Arabic
8
8 01/10/09 CLEF @ 8 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Topics Each profile contains 5 fields: title: a few words description description: a one-sentence description narrative: a longer description of what is considered a relevant document keywords: a set of key words, key phrases or named entities sample: a sample of relevant document (one paragraph) Participants may use any subset of the fields for their filtering
9
9 01/10/09 CLEF @ 9 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Topic Example
10
10 01/10/09 CLEF @ 10 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Some topic examples in general domain in scientific information domain
11
11 01/10/09 CLEF @ 11 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Constitution of the corpus Same corpus as INFILE@CLEF 2008 With simulated feedback, we need the ground truth before the campaign To build the corpus of documents to filter: find relevant documents for the profiles in the original corpus use a pooling technique with results of IR tools 4 IR engines (Lucene, Indri, Zettair and CEA search engine), on several query fields combinations iterative pooling using Mixture-of-Experts model
12
12 01/10/09 CLEF @ 12 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Constitution of the corpus (2) keep all documents assessed documents returned by IR systems by judged not relevant form a set of difficult documents choose random documents (noise) collection retrieved assessed relevant test collection random
13
13 01/10/09 CLEF @ 13 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Corpus Number of relevant documents for each topic, in each language
14
14 01/10/09 CLEF @ 14 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Tasks Batch filtering (02/04 - 30/05) documents and topics available to participants return list of filtered documents per topic (unordered) Adaptive filtering (03/06 - 10/07) topics available to participants documents available one at a time (one pass test) interactive protocol using a client-server architecture (webservice communication) new document available only if previous one has been filtered available simulated user feedback for adapatation limited number of feedbacks (200)
15
15 01/10/09 CLEF @ 15 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Evaluation metrics Standard precision / recall / F-measure Utility (from TREC filtering tracks) per profile and averaged on all profiles adaptivity: evolution curve (values computed each 10000 documents) two experimental measures originality number of relevant documents a system uniquely retrieves anticipation inverse rank of first relevant document detected
16
16 01/10/09 CLEF @ 16 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico INFILE Participants 9 registered 5 submitted runs batch filtering 3 participants, 12 runs interactive filtering 2 participants, 3 runs 27
17
17 01/10/09 CLEF @ 17 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico INFILE results Repartition of runs by task and languages ara fre eng ara fre batch adaptive
18
18 01/10/09 CLEF @ 18 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico INFILE results – monolingual batch filtering
19
19 01/10/09 CLEF @ 19 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico INFILE results – crosslingual / adaptive filtering 57% best mono 90% same team mono crosslingual better than monolingual
20
20 01/10/09 CLEF @ 20 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico INFILE results – anticipation/originality strongly correlated with recall too few pariticipants
21
21 01/10/09 CLEF @ 21 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Approaches Filtering adapted Information Retrieval tools (Lucene) SVM classifier with external ressources (GoogleNews) textual similarity measures with thresholds reasoning model (human plausible reasoning) Adaptation adaptation of selection thresholds user feedback as parameter in reasoning model Crosslingual bilingual dictionaries machine translation
22
22 01/10/09 CLEF @ 22 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Conclusion and after… Increasing participation, reasonable result, but not enough… Currently, no INFILE track planned for next year interest in multilingual filtering ? 2/3 runs on monolingual English not enough participants for crosslingual to have comparative results no funding INFILE evaluation kit will be made available corpus of documents / topics / relevance assessments tools for the interactive adaptive filtering procedure tools for the evaluation distributed by ELDA
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.