Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 The Domain-Specific Track at CLEF 2008 Vivien Petras & Stefan Baerisch GESIS Social Science Information Centre, Bonn, Germany Aarhus, Denmark, September.

Similar presentations


Presentation on theme: "1 The Domain-Specific Track at CLEF 2008 Vivien Petras & Stefan Baerisch GESIS Social Science Information Centre, Bonn, Germany Aarhus, Denmark, September."— Presentation transcript:

1 1 The Domain-Specific Track at CLEF 2008 Vivien Petras & Stefan Baerisch GESIS Social Science Information Centre, Bonn, Germany Aarhus, Denmark, September 17, 2008

2 2 Outline The Domain-Specific Task Collections & Controlled Vocabularies Participants, Runs & Relevance Assessments Themes Outlook

3 3 The Domain-Specific Task CLIR on structured scientific document collections: social science domain bibliographic metadata controlled vocabularies for subject description Leverage for: search query expansion translation

4 4 The Domain-Specific Task Tasks: Monolingual: against German, English or Russian Bilingual: against German, English or Russian Multilingual: against combined collection Topics: 25 topics in standard TREC format (title, desc, narr): suggestions from 28 subject specialties in the Social Sciences translated from German  English, Russian

5 5 Collections GermanEnglishRussian NameGIRT-DEGIRT-ENCSA-SAISISS DescriptionGerman social science literature & projects GIRT-DE translated Sociolog. Abstracts Inst. of Scientific Inf. for Soc. Sc. of the Ru. Acad. of Science Coverage1990-2000 1994-1996 Docs151,319 20,000145,802 Abstracts96%17%94%27%

6 6 Controlled Vocabularies GIRTCSA-SAINION Descriptors / doc 106.43.9 Class. codes / doc 21.3n/a 5 different subject-describing terminologies: Thesaurus for the Social Sciences (GIRT-DE, -EN) Thesaurus of Sociological Indexing Terms (CSA-SA) INION Thesaurus (ISISS) Social Sciences Classification (GIRT-DE, -EN) Sociological Abstracts Classification (CSA-SA)

7 7 Controlled Vocabularies – Mapping Tools Translation: GIRT German  GIRT English, GIRT Russian INION Russian  INION English Term mappings: equivalent terms in vocabularies GIRT German / English  CSA-SA English GIRT German  INION Russian counseling for the aged  Counseling + Elderly

8 8 Participants 6 groups Group InstitutionCountry AmsterdamUniversity of AmsterdamThe Netherlands Chemnitz Media Informatics, Chemnitz University of Technology Germany CheshireSchool of Information, UC BerkeleyUSA DarmstadtTechnical University DarmstadtGermany HugUniversity Hospitals GenevaSwitzerland Unine Computer Science Department, University of Neuchatel Switzerland

9 9 Runs TaskRuns 2008 Runs 2007 Runs 2006 Monolingual - against German1013 - against English12158 - against Russian9111 Bilingual - against German12146 - against English9153 - against Russian893 Multilingual992 Total698636

10 10 Relevance Assessments GermanEnglishRussian Pool size147931483513930 Rel. Docs 200815%14%2%* Rel. Docs 200722%25%10%** Rel. Docs 200639%26%n/a * In Russian collection: 1 topic without relevant docs ** 3 topics without relevant docs

11 11 Relevance Assessments – Best MAP TaskBest MAP 2008 Best MAP 2007 Best MAP 2006 Monolingual - against German0.45370.50510.5454 - against English0.38910.35340.4576 - against Russian0.18150.19710.2542 Bilingual - against German0.3702 (82%)0.4568 (90%)0.2448 (45%) - against English0.3385 (87%)0.3341 (95%)0.3301 (72%) - against Russian0.0882 (49%)0.1348 (68%)0.1648 (62%) Multilingual0.2816*0.08840.0753 *German topics; English = 0.2751; Russian = 0.2357

12 12 Themes - Retrieval models Lucene (Xtrieval Chemnitz, Darmstadt) Semantic relatedness: Wikipedia / Wiktionary (Darmstadt) Language Models (Amsterdam) Vector space (EasyIR, Hug) Probabilistic – Logistic Regression (Cheshire) Comparison: Vector Space, LM, Probabilistic, DFR (Unine) Data fusion

13 13 Themes – Query Expansion Blind Feedback (Rocchio) idf-window BF (infrequent terms near search term) Thesaurus Lookup Thesaurus as pivot language: double translation Google (text snippets) Wikipedia (frequent terms from top-ranked articles)

14 14 Themes – Translation Google AJAX language API Commercial Software (Systran, LEC) Bilingual thesaurus look-up ML retrieval  thesaurus look-up Wikipedia (Cross-language links)

15 15 Summary & Outlook Enough interest for 2009? Different corpora Different tasks full topic run (125 topics) result: controlled vocabulary terms (not documents) robust task Full-text retrieval with open access literature

16 16 Domain-Specific Track: http://www.gesis.org/en/research/ information_technology/clef_ds.htm Vocabulary Mappings: http://www.gesis.org/en/research/ information_technology/komohe.htm Email: vivien.petras@gesis.org


Download ppt "1 The Domain-Specific Track at CLEF 2008 Vivien Petras & Stefan Baerisch GESIS Social Science Information Centre, Bonn, Germany Aarhus, Denmark, September."

Similar presentations


Ads by Google