Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,

Similar presentations


Presentation on theme: "1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,"— Presentation transcript:

1 1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest, September 19, 2007

2 2 Outline The Domain-Specific Task Collections & Controlled Vocabularies Topics Participants, Runs & Relevance Assessments Themes Summary & Outlook

3 3 The Domain-Specific Task CLIR on structured scientific document collections: social science domain bibliographic metadata controlled vocabularies for subject description Leverage bibliographic metadata & controlled vocabularies for: search translation

4 4 The Domain-Specific Task Tasks: Monolingual against German, English or Russian Bilingual against German, English or Russian Multilingual against combined collection

5 5 Collections GermanEnglishRussian NameGIRT-DEGIRT-ENCSA-SAISISS DescriptionGerman social science literature & projects GIRT-DE translated Sociolog. Abstracts Inst. of Scientific Inf. for Soc. Sc. of the Ru. Acad. of Science Coverage1990-2000 1994-1996 Docs151,319 20,000145,802 Abstracts96%17%94%27%

6 6 Controlled Vocabularies GIRTCSA-SAISISS Descriptors / doc 106.43.9 Class. codes / doc 21.3n/a 5 different subject-describing terminologies: Thesaurus for the Social Sciences (GIRT-DE, -EN) Thesaurus of Sociological Indexing Terms (CSA-SA) INION Thesaurus (ISISS) Social Sciences Classification (GIRT-DE, -EN) Sociological Abstracts Classification (CSA-SA)

7 7 Controlled Vocabularies – Mapping Tools Translation: GIRT German  GIRT English Intellectual term mappings (cross-walks): equivalent terms in vocabularies GIRT German  CSA-SA English GIRT English  CSA-SA English original-term: agricultural area mapped-term: Rural areas

8 8 Topics 25 topics in standard TREC format (title, desc, narr): 15 volunteers (social scientists) 2-5 suggestions from 28 subject specialties checked for: coverage in collections variance from previous years translated into English, Russian

9 9 Participants 5 groups Group InstitutionCountry Chemnitz Media Informatics Chemnitz University of Technology Germany Cheshire School of Information UC Berkeley USA MoscowMoscow State UniversityRussia Unine Computer Science Department University of Neuchatel Switzerland Xerox Data Mining Group Xerox Research Centre Europe France

10 10 Runs TaskRuns 2007 Runs 2006 Runs 2005 Monolingual - against German13 17 - against English158 - against Russian1118 Bilingual - against German14615 - against English15313 - against Russian935 Multilingual923 Total863676

11 11 Relevance Assessments GermanEnglishRussian Pool size16,28817,86714,473 Rel. Docs 200722%25%10%* Rel. docs 200639%26%n/a Rel. docs 200520%21%9% (RSSC) * In Russian collection: 3 topics without relevant topics All assessments done with Univ. of Padova‘s DIRECT System.

12 12 Relevance Assessments – Best MAP TaskMAP 2007 MAP 2006 MAP 2005 Monolingual - against German0.50510.54540.4936 - against English0.35340.45760.5065 - against Russian0.19710.25420.3038 Bilingual - against German0.4568 (90%)0.2448 (45%)0.4201 (85%) - against English0.3341 (95%)0.3301 (72%)0.4743 (94%) - against Russian0.1348 (68%)0.1648 (62%)0.2331 (77%) Multilingual0.08840.07530.0532

13 13 Themes - Retrieval models Lucene Language Modelling Logistic Regression Comparison: Vector Space, LM, Probabilistic - Okapi, DFR Data fusion Russian word-based vs. N-gram retrieval new light-weight stemmer

14 14 Themes – Query Expansion Entry Vocabulary Modules query terms associated with thesaurus terms from documents Thesaurus Lookup combined thesaurus from all CVs GIRT Thesaurus Index Lexical Entailment find document terms in relation to query terms Blind Feedback

15 15 Themes – Translation Lucene plug-in Babelfish, Google, PROMT, Reverso Bilingual thesaurus mapping Dictionary adaption disambiguate term translation given language context of feedback documents Statistical machine translation MATRAX Commercial Software

16 16 Summary & Outlook Extension of Russian materials Translation table DE-EN-RU for GIRT Thesaurus Translation table RU-EN for INION Thesaurus Mapping between GIRT – INION Thesaurus More tools for Terminology mapping different relationships (0T, SYN, BT, NT, RT) GESIS-IZ project: > 40 mappings 25 controlled vocabularies / 11 disciplines ~ 125,000 terms & phrases ~ 400,000 relations

17 17 Domain-Specific Track: http://www.gesis.org/en/research/ information_technology/clef_ds_2007.htm Vocabulary Mappings: http://www.gesis.org/en/research/ information_technology/komohe.htm Email: vivien.petras@gesis.org


Download ppt "1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,"

Similar presentations


Ads by Google