1 The Domain-Specific Track at CLEF 2008 Vivien Petras & Stefan Baerisch GESIS Social Science Information Centre, Bonn, Germany Aarhus, Denmark, September.

Slides:



Advertisements
Similar presentations
SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer.
Advertisements

IAEA International Atomic Energy Agency International Nuclear Information System (INIS) INIS/ETDE THESAURUS MAINTENANCE & USE OF COMPUTER-ASSISTED INDEXING.
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
Current and Future Research Directions University of Tehran Database Research Group 1 October 2009 Abolfazl AleAhmad, Ehsan Darrudi, Hadi.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Introduction to Library Research Gabriela Scherrer Reference Librarian for English Languages and Literatures, University Library of Bern.
Introduction to Library Research Gabriela Scherrer Reference Librarian for English Languages and Literatures, University Library of Bern.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Advance Information Retrieval Topics Hassan Bashiri.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Cross-Language Retrieval INST 734 Module 11 Doug Oard.
PLUG-INs Information Fujariah Colleges
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
LREC Combining Multiple Models for Speech Information Retrieval Muath Alzghool and Diana Inkpen University of Ottawa Canada.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR.
Introduction to Library Research Gabriela Scherrer Reference Librarian for English Languages and Literatures, University Library of Bern.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
Impressions of 10 years of CLEF Donna Harman Scientist Emeritus National Institute of Standards and Technology.
1 Intra- and interdisciplinary cross- concordances for information retrieval Philipp Mayr GESIS – Leibniz Institute for the Social Sciences, Bonn, Germany.
Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
CLEF 2004 – Interactive Xling Bookmarking, thesaurus, and cooperation in bilingual Q & A Jussi Karlgren – Preben Hansen –
CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists Luo Si & Jamie Callan Language Technology Institute School of Computer.
“ SINAI at CLEF 2005 : The evolution of the CLEF2003 system.” Fernando Martínez-Santiago Miguel Ángel García-Cumbreras University of Jaén.
Multilingual Information Exchange APAN, Bangkok 27 January 2005
Cross-Language Evaluation Forum CLEF Workshop 2004 Carol Peters ISTI-CNR, Pisa, Italy.
ZPID - Member of Leibniz Association1 PSYNDEXplus with TestFinder Zentrum für Psychologische Information und Dokumentation (ZPID) June 8-9, 2004.
Applying the KISS Principle with Prior-Art Patent Search Walid Magdy Gareth Jones Dublin City University CLEF-IP, 22 Sep 2010.
The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley Ray R Larson School of Information.
Ontology-based information retrieval of scientific information Natalia V. Loukachevitch Laboratory of Information Resources Analysis Research Computing.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
IIIT Hyderabad’s CLIR experiments for FIRE-2008 Sethuramalingam S & Vasudeva Varma IIIT Hyderabad, India 1.
Multilingual Retrieval Experiments with MIMOR at the University of Hildesheim René Hackl, Ralph Kölle, Thomas Mandl, Alexandra Ploedt, Jan-Hendrik Scheufen,
A merging strategy proposal: The 2-step retrieval status value method Fernando Mart´inez-Santiago · L. Alfonso Ure ˜na-L´opez · Maite Mart´in-Valdivia.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
How robust is CLIR? Proposal for a new robust task at CLEF Thomas Mandl Information Science Universität Hildesheim 6 th Workshop.
CLEF 2008 Workshop September 17-19, 2008 Aarhus, Denmark.
Arts and Humanities Citation Index and other resources for Modern Languages Nick Hearn Friday, 28 th May 2010.
CLEF 2007 Workshop Budapest, Hungary, 19–21 September 2007 Nicola Ferro Information Management Systems (IMS) Research Group Department of Information Engineering.
Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
GESIS Robert Strötgen Social Science Information Centre, Bonn euroCRIS 2002, 29th August using Meta-Data Extraction and Query Translation Treatment.
Cross-Language Evaluation Forum CLEF 2003 Carol Peters ISTI-CNR, Pisa, Italy Martin Braschler Eurospider Information Technology AG.
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
Reference Collections: Collection Characteristics.
Stiftung Wissenschaft und Politik German Institute for International and Security Affairs CLEF 2005: Domain-Specific Track Overview Michael Kluck SWP,
INFORMATION RETRIEVAL PROJECT Creation of clusters of concepts that represent a domain corpus.
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Multilingual Search Shibamouli Lahiri
1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,
A Collaborative Approach to Developing a Multilingual Forestry Thesaurus A project in development between IUFRO, CABI and FAO –Gillian Petrokofsky, CAB.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
CLEF Workshop ECDL 2003 Trondheim Michael Kluck slide 1 Introduction to the Monolingual and Domain-Specific Tasks of the Cross-language.
Multilingual Search using Query Translation and Collection Selection Jacques Savoy, Pierre-Yves Berger University of Neuchatel, Switzerland
IL Step 3: Using Bibliographic Databases
Retrieval Evaluation - Reference Collections
Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR
Presentation transcript:

1 The Domain-Specific Track at CLEF 2008 Vivien Petras & Stefan Baerisch GESIS Social Science Information Centre, Bonn, Germany Aarhus, Denmark, September 17, 2008

2 Outline The Domain-Specific Task Collections & Controlled Vocabularies Participants, Runs & Relevance Assessments Themes Outlook

3 The Domain-Specific Task CLIR on structured scientific document collections: social science domain bibliographic metadata controlled vocabularies for subject description Leverage for: search query expansion translation

4 The Domain-Specific Task Tasks: Monolingual: against German, English or Russian Bilingual: against German, English or Russian Multilingual: against combined collection Topics: 25 topics in standard TREC format (title, desc, narr): suggestions from 28 subject specialties in the Social Sciences translated from German  English, Russian

5 Collections GermanEnglishRussian NameGIRT-DEGIRT-ENCSA-SAISISS DescriptionGerman social science literature & projects GIRT-DE translated Sociolog. Abstracts Inst. of Scientific Inf. for Soc. Sc. of the Ru. Acad. of Science Coverage Docs151,319 20,000145,802 Abstracts96%17%94%27%

6 Controlled Vocabularies GIRTCSA-SAINION Descriptors / doc Class. codes / doc 21.3n/a 5 different subject-describing terminologies: Thesaurus for the Social Sciences (GIRT-DE, -EN) Thesaurus of Sociological Indexing Terms (CSA-SA) INION Thesaurus (ISISS) Social Sciences Classification (GIRT-DE, -EN) Sociological Abstracts Classification (CSA-SA)

7 Controlled Vocabularies – Mapping Tools Translation: GIRT German  GIRT English, GIRT Russian INION Russian  INION English Term mappings: equivalent terms in vocabularies GIRT German / English  CSA-SA English GIRT German  INION Russian counseling for the aged  Counseling + Elderly

8 Participants 6 groups Group InstitutionCountry AmsterdamUniversity of AmsterdamThe Netherlands Chemnitz Media Informatics, Chemnitz University of Technology Germany CheshireSchool of Information, UC BerkeleyUSA DarmstadtTechnical University DarmstadtGermany HugUniversity Hospitals GenevaSwitzerland Unine Computer Science Department, University of Neuchatel Switzerland

9 Runs TaskRuns 2008 Runs 2007 Runs 2006 Monolingual - against German against English against Russian9111 Bilingual - against German against English against Russian893 Multilingual992 Total698636

10 Relevance Assessments GermanEnglishRussian Pool size Rel. Docs %14%2%* Rel. Docs %25%10%** Rel. Docs %26%n/a * In Russian collection: 1 topic without relevant docs ** 3 topics without relevant docs

11 Relevance Assessments – Best MAP TaskBest MAP 2008 Best MAP 2007 Best MAP 2006 Monolingual - against German against English against Russian Bilingual - against German (82%) (90%) (45%) - against English (87%) (95%) (72%) - against Russian (49%) (68%) (62%) Multilingual0.2816* *German topics; English = ; Russian =

12 Themes - Retrieval models Lucene (Xtrieval Chemnitz, Darmstadt) Semantic relatedness: Wikipedia / Wiktionary (Darmstadt) Language Models (Amsterdam) Vector space (EasyIR, Hug) Probabilistic – Logistic Regression (Cheshire) Comparison: Vector Space, LM, Probabilistic, DFR (Unine) Data fusion

13 Themes – Query Expansion Blind Feedback (Rocchio) idf-window BF (infrequent terms near search term) Thesaurus Lookup Thesaurus as pivot language: double translation Google (text snippets) Wikipedia (frequent terms from top-ranked articles)

14 Themes – Translation Google AJAX language API Commercial Software (Systran, LEC) Bilingual thesaurus look-up ML retrieval  thesaurus look-up Wikipedia (Cross-language links)

15 Summary & Outlook Enough interest for 2009? Different corpora Different tasks full topic run (125 topics) result: controlled vocabulary terms (not documents) robust task Full-text retrieval with open access literature

16 Domain-Specific Track: information_technology/clef_ds.htm Vocabulary Mappings: information_technology/komohe.htm