Thomas Mandl: GeoCLEF Track Overview 2008 1 9 th Workshop of the Cross-Language Evaluation Forum (CLEF) Århus, 18 th Sept. 2008.

Slides:



Advertisements
Similar presentations
Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,
Advertisements

Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, /10 Log-Based Evaluation Resources for Question Answering Thomas Mandl, Julia Maria.
LogCLEF 2009 Log Analysis for Digital Societies (LADS) Thomas Mandl, Maristella Agosti, Giorgio Maria Di Nunzio, Alexander Yeh, Inderjeet Mani, Christine.
SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer.
Literacy Unit Standards AN ALTERNATIVE PATHWAY TO ACHIEVING LEVEL 1 LITERACY.
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
Current issues and trends in bibliographic control Overview of the Division’s interests and activities Bibliography Section Bohdana Stoklasova, Chair Talbott.
Disambiguating Queries for Geographic Information Retrieval Carolyn Hafernik Thesis Proposal May 10, 2006 Computer Science Advisor: Lisa Ballesteros.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg.
The XLDB Group at GeoCLEF 2005 Nuno Cardoso, Bruno Martins, Marcírio Chaves, Leonardo Andrade, Mário J. Silva
Search Engines and Information Retrieval
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)
Cross Language IR Philip Resnik Salim Roukos Workshop on Challenges in Information Retrieval and Language Modeling Amherst, Massachusetts, September 11-12,
Searching Text and Data via Common Geography 1 SEARCHING TEXT AND DATA via COMMON GEOGRAPHY Geographic Information Retrieval: Searching Text and Data via.
Cláudio Baptista, UFCG A Model for Geographic Knowledge Extraction on Web Documents Cláudio E. C. Campelo and Cláudio de Souza.
Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.
New Advanced Higher Subject Implementation Events Physical Education: Advanced Higher Course Assessment.
WISER: Newspapers online : an introduction to the scope and range of recent and current newspapers available on Oxlip, including hints on effective search.
 Ad-hoc - This track tests mono- and cross- language text retrieval. Tasks in 2009 will test both CL and IR aspects.
Overview of Search Engines
 Official Site: facility.org/research/evaluation/clef-ip-10http:// facility.org/research/evaluation/clef-ip-10.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Spanish Question Answering Evaluation Anselmo Peñas, Felisa Verdejo and Jesús Herrera UNED NLP Group Distance Learning University of Spain CICLing 2004,
AWARE PROJECT – AGEING WORKFORCE TOWARDS AN ACTIVE RETIREMENT Alberto Ferreras-Remesal Institute of Biomechanics of Valencia IFA 2012 – Prague – May 31th.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
The Evolution of Shared-Task Evaluation Douglas W. Oard College of Information Studies and UMIACS University of Maryland, College Park, USA December 4,
August 21, 2002Szechenyi National Library Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for.
Search Engines and Information Retrieval Chapter 1.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003.
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
CLEF 2004 – Interactive Xling Bookmarking, thesaurus, and cooperation in bilingual Q & A Jussi Karlgren – Preben Hansen –
Cross-Language Evaluation Forum CLEF Workshop 2004 Carol Peters ISTI-CNR, Pisa, Italy.
MINERVA Survey of Multilingualism Israel Dr. Allison Kupietzky, Coordinator WP 3, Minerva Israel Berlin, April 7 th, 2005.
The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
© Paul Buitelaar – November 2007, Busan, South-Korea Evaluating Ontology Search Towards Benchmarking in Ontology Search Paul Buitelaar, Thomas.
Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Extracting Metadata for Spatially- Aware Information Retrieval on the Internet Clough, Paul University of Sheffield, UK Presented By Mayank Singh.
IIIT Hyderabad’s CLIR experiments for FIRE-2008 Sethuramalingam S & Vasudeva Varma IIIT Hyderabad, India 1.
Multilingual Retrieval Experiments with MIMOR at the University of Hildesheim René Hackl, Ralph Kölle, Thomas Mandl, Alexandra Ploedt, Jan-Hendrik Scheufen,
UNED at iCLEF 2008: Analysis of a large log of multilingual image searches in Flickr Victor Peinado, Javier Artiles, Julio Gonzalo and Fernando López-Ostenero.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
How robust is CLIR? Proposal for a new robust task at CLEF Thomas Mandl Information Science Universität Hildesheim 6 th Workshop.
GeoCLEF Breakout Notes Fred Gey, Ray Larson, Paul Clough.
CLEF 2008 Workshop September 17-19, 2008 Aarhus, Denmark.
CLEF 2007 Workshop Budapest, Hungary, 19–21 September 2007 Nicola Ferro Information Management Systems (IMS) Research Group Department of Information Engineering.
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
Thomas Mandl: GeoCLEF Track Overview Cross-Language Evaluation Forum (CLEF) Thomas Mandl, (U. Hildesheim) 8 th Workshop.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Iterative Translation Disambiguation for Cross-Language.
Information Retrieval
Stiftung Wissenschaft und Politik German Institute for International and Security Affairs CLEF 2005: Domain-Specific Track Overview Michael Kluck SWP,
INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents Esaú Villatoro-Tello Manuel Montes-y-Gómez Luis Villaseñor-Pineda Language Technologies.
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
GeoCLEF topic creation Mark Sanderson. 21/03/2016© The University of Sheffield / Department of Marketing and Communications Topics 25 adhoc topics Developed.
© Mark E. Damon - All Rights Reserved Africa Europe Culture Europe Geography Middle East $100 $200 $300 $400 $500 Final Jeopardy Geography Middle East.
BUDGET 4Distribution of budget per phases of the project; 4Itemization of budget; 4Division of budget per source of contribution; 4Division of budget between:
CLEF Workshop ECDL 2003 Trondheim Michael Kluck slide 1 Introduction to the Monolingual and Domain-Specific Tasks of the Cross-language.
Recycling Bilingue This is a bilingual project between Spanish and Swiss Students obout Recycling in Spain and in Swizerland – what can we learn from each.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
6 ~ GIR.
Updates about Work Track 5 Geographic Names at the Top-Level
Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR
Presentation transcript:

Thomas Mandl: GeoCLEF Track Overview th Workshop of the Cross-Language Evaluation Forum (CLEF) Århus, 18 th Sept. 2008

Thomas Mandl: GeoCLEF Track Overview GeoCLEF Administration Joint effort of –Fredric Gey, Ray Larson (U. California at Berkeley) –Diana Santos (Linguateca, SINTEF ICT, Norway) –Paula Carvalho (Linguateca, U. Lisbon) –Nicola Ferro, Giorgio Di Nunzio (U. Padua) –Christa Womser-Hacker (U. Hildesheim) –many relevance assessors …. –and others

Thomas Mandl: GeoCLEF Track Overview

4ContentContent Introduction Geographic Search Task Topic Development Relevance Assessment Results in a Nutshell Giorgio Di Nunzio: Results and Statistical Analysis Diana Santos: GikiP task

Thomas Mandl: GeoCLEF Track Overview Aim: to evaluate retrieval of multilingual documents with an emphasis on geographic search (GIR) –Example query: –“find me news stories about riots near Dublin” (Fred CLEF Workshop 2005) Initial Aim of GeoCLEF geo partcontent part

Thomas Mandl: GeoCLEF Track Overview Interesting Issues Ambiguity –Santos, Neustadt, Albertville –Galizien, Galicien (Spain, Poland) –Oder (River but also a stop word in German) Different Translations –Peking, Beijing –Deutschland, Allemagne, Germany Name changes –Bombay -> Mumbai –St. Petersburg ->Leningrad -> St. Petersburg multi word groups –Rio Grande do Sul, Newcastle upon Tyne

Thomas Mandl: GeoCLEF Track Overview Search Task How much and which geo knowledge and reasoning is necessary? –spatial reasoning is necessary to solve information needs demonstrations in cities in Northern Germany -> Northern Germany may not appear in documents Often, keyword based systems do well on the task –E.g. Blind relevance feedback may lead to expansion with names of cities

Thomas Mandl: GeoCLEF Track Overview Search Task 2008 Three languages –English, Portuguese, German 600,000 + docs 25 topics –so far 100 in four years –+ 26 geo Topics from prev. CLEF campaigns Test collection is available for future use –Do experiments with the whole set and publish them

Thomas Mandl: GeoCLEF Track Overview Search Task 2008 Monolingual Retrieval –Topic- and document language identical –English, Portuguese, German Bilingual Retrieval –Topic- and document language identical –{English, Portuguese, German} -> {English, Portuguese, German}

Thomas Mandl: GeoCLEF Track Overview Topic Development Topics are meant to express a natural information need which a user of the collection might have Goal: creation of a geographically challenging topic set Geographic knowledge should be necessary to be successful

Thomas Mandl: GeoCLEF Track Overview Topic Development Each group devised a set of candidate topics in their own language, whose appropriateness was checked in the text collection available for that language. The candidate topics were subsequently translated into English and checked for relevant documents in the other collections. Some candidate topics were modified or refined, due to the absence of relevant documents in one of the languages, the complexity of topic interpretation and/or the translation into the other. The final topic set was agreed upon after intensive discussion, and all topics were translated into Portuguese and German Final translation and check (Thanks to Sven Hartrumpf)

Thomas Mandl: GeoCLEF Track Overview Topic Development Topic development is hard for multilingual collections Geo entities below the country level are interesting But these geo entities below the country level may not appear in newspapers in other countries Relevant documents are required in all three languages

Thomas Mandl: GeoCLEF Track Overview Topic Development Several issues were explicitly included: –vague geographic regions (Sub-Saharan Africa, Western Europe ) –geographical relations beyond IN (forest fires on Spanish islands) –granularity below the country level (Industrial or cultural fairs in Lower Saxony) –terms which do not occur in documents (Portuguese communities in other countries, demonstrations in German cities)

Thomas Mandl: GeoCLEF Track Overview Endangered animal species in Iberian Peninsula Agriculture Subject modification: Nobel Prize winners in Physics from Northern European countries Subject extension: Nobel Prize winners Most visited sights in the capital of France Topic refinement: Most visited sights in the capital of France and its vicinity Topic Modifications

Thomas Mandl: GeoCLEF Track Overview Topic Creation (spatial parameters) The majority of the topics specify complex (multiply defined) geographical relations, which may represent: –Inclusion (e.g. Attacks in Japanese subways); –Exclusion (e.g. Portuguese immigrant communities in the world). [the generic geographical term world must be interpreted, in this context, as the entire world excluding Portugal]

Thomas Mandl: GeoCLEF Track Overview ExampleExample /89-GC Trade fairs in Lower Saxony Documents reporting about industrial or cultural fairs in Lower Saxony. Relevant documents should contain information about trade or industrial fairs which take place in the German federal state of Lower Saxony, i.e. name, type and place of the fair. The capital of Lower Saxony is Hanover. Other cities include Braunschweig, Osnabr ü ck, Oldenburg and G ö ttingen.

Thomas Mandl: GeoCLEF Track Overview Reliability?Reliability? 25 topics are sufficient under most circumstances to reliably order systems (Sanderson & Zobel 2005) Analysis of the Results of GeoCLEF 2007 hint that the results are reliable

Thomas Mandl: GeoCLEF Track Overview Participation Main Task CLEF Year Nr. of Participants newcomer Nr. of submitted Experiments

Thomas Mandl: GeoCLEF Track Overview ApproachesApproaches No geographic components –Elaborated weighting (U Berkeley) Specific geographic processing –Geo filter and gazetteer (Imperial College) –GeoWordNet and distance function for geo entities (U Valencia) –Expansion by geo coordinates (U Chengdu & U Pittsburgh) –NER and disambiguation, fusion by Fuzzy Borda (U Jaén & U Valencia) –Ontology based approach (DFKI) –Deep natural language processing (U Hagen)

Thomas Mandl: GeoCLEF Track Overview Relevance Assessment Different range of meanings –Portuguese "monumentos" –English "sights" –German "Sehenswürdigkeiten" Euro Disney might be a sight, but it cannot be considered as a monumento

Thomas Mandl: GeoCLEF Track Overview Relevance Assessment Indirect Information –„foreign aid in Sub-Saharan Africa „ Is a document on the kidnapping of an aid worker relevant? –„natural desasters in the Western USA“ Is a document on the insurance costs caused by a natural desaster relevant?

Thomas Mandl: GeoCLEF Track Overview Relevance Assessment Hints for problems of the systems –German word for fails (Messe) was matched against similar words which have a different meaning angemessen -> appropriate Messer -> knife

Thomas Mandl: GeoCLEF Track Overview Relevant Docs per Topic Language Min Max English German Portuguese 2 158

Thomas Mandl: GeoCLEF Track Overview Results in a Nutshell How much and which geo knowledge and reasoning is necessary? Often, keyword based systems do well on the task Best system in most competitive task (many runs) uses specific geo reasoning –Significant? For most other tasks (esp. cross lingual), the best system uses no specific geo components –Significant?

Thomas Mandl: GeoCLEF Track Overview Parallel Session on Thursday 14:30 – 16:00 Please come to the Breakout Session on Friday and help us to form GeoCLEF 2009 More on GeoCLEF

Thomas Mandl: GeoCLEF Track Overview GeoCLEF 2009 Continuation of GikiP –Search for Wiki-Entires with geographic constraints Query Classification Task –Find geo queries in a search engine log Breakout Session on Friday

Thomas Mandl: GeoCLEF Track Overview OverviewOverview More on the geographic search task … Giorgio Di Nunzio: Results and Statistical Analysis Diana Santos: GikiP Task

Thomas Mandl: GeoCLEF Track Overview AcknowledgmentsAcknowledgments This work was partly done in the scope of the Linguateca, contract nº339/1.3/C/NAC, project jointly funded by the Portuguese Government and the European Union.