Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, 22.05.2010 1/10 Log-Based Evaluation Resources for Question Answering Thomas Mandl, Julia Maria.

Slides:



Advertisements
Similar presentations
1. XP 2 * The Web is a collection of files that reside on computers, called Web servers. * Web servers are connected to each other through the Internet.
Advertisements

1 Alexander Gelbukh Moscow, Russia. 2 Mexico 3 Computing Research Center (CIC), Mexico.
FOR PROFESSIONAL OR ACADEMIC PURPOSES September 2007 L. Codina. UPF Interdisciplinary CSIM Master Online Searching 1.
A Spanish Technology Platform for Sustainable Chemistry SusChem-ES: An example of a National Technology Platform for Sustainable Chemistry. Mª Eugenia.
HOW TO USE … SAMIEEE FOR VOLUNTEER POSITIONS WITH AUTOMATIC ACCESS.
OAF Workshop, May 13-14, 2002, Pisa.CYCLADES IST CYCLADES An Open Collaborative Virtual Archive Environment Umberto Straccia.
SPS Information Management System (SPS IMS). 2 Why SPS IMS? Since 1995 > 10,000 SPS notifications > 2,000 other SPS documents > 300 specific trade concerns.
Ontario Scholars Portal A guide to the basic features of the search interface of Ontario Scholars Portal at the University of Ottawa Prepared by: Ann Romeril.
A Guide to Using Partner Publishers Resources (module 3)
To print your results, click on the printer icon. Choose from the printing options suggested. You can choose to remove items from folder after printing.
Working with the Literature Seminar for 4 th Year Projects students 5 October 2004, 14:00, LT-222 Andrew Long, Paul Soler University of Glasgow.
Metadata workshop, June The Workshop Workshop Timetable introduction to the Go-Geo! project metadata overview Go-Geo! portal hands on session.
Introduction Lesson 1 Microsoft Office 2010 and the Internet
Seher Acer, Başak Çakar, Elif Demirli, Şadiye Kaptanoğlu.
Complex Networks of Mindful Entities – Table of Contents & Papers – Luís Moniz Pereira Universidade Nova de Lisboa.
May 2009 D2L Upgrade to Version 8.4 Desire2Learn Changes in Version 8.4.
Services Course Outlook Live Participant Guide.
CSE594 Fall 2009 Jennifer Wong Oct. 14, 2009
Latest developments in the MYP © International Baccalaureate Organization 2007 Page 2 Background to the presentation This PowerPoint presentation.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
® Microsoft Office 2010 Browser and Basics.
Services Course Windows Live SkyDrive Participant Guide.
Open Access and citation count: a CSIR case study 11th Southern African Online Information Meeting 6-7 June 2012 Dave Ramorulane Information Specialist,
What’s New in WatchGuard Dimension v1.2
12 January 2009SDS batch generation, distribution and web interface 1 ExESS IT tool for SDS batch generation, distribution and web interface ExESS IT tool.
WEB MINING. Why IR ? Research & Fun
LogCLEF 2009 Log Analysis for Digital Societies (LADS) Thomas Mandl, Maristella Agosti, Giorgio Maria Di Nunzio, Alexander Yeh, Inderjeet Mani, Christine.
RefWorks: The Basics October 12, What is RefWorks? A personal bibliographic software manager –Manages citations –Creates bibliogaphies Accessible.
Conceived by: Gina Robinson, MSN Designed by: Terry Hudson, M.A., M.Div. Understanding APA 6 th Edition 8.0 Understanding APA 6 th Edition 8.0.
South Dakota Library Network MetaLib User Interface South Dakota Library Network 1200 University, Unit 9672 Spearfish, SD © South Dakota.
TIDE Presentation Florida Standards Assessments 1 FSA Regional Trainings Updated 02/09/15.
Test Administrator Interface & Student Interface
SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer.
 Andisheh Keikha Ryerson University Ebrahim Bagheri Ryerson University May 7 th
Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004.
Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.
Revision Control Practices in Software Engineering Surekha, Kotiyala Madhuri, Komuravelly Suchitra, Yerramalla.
In Situ Evaluation of Entity Ranking and Opinion Summarization using Kavita Ganesan & ChengXiang Zhai University of Urbana Champaign
Databases & Data Warehouses Chapter 3 Database Processing.
Lecturer: Ghadah Aldehim
Personalization of the Digital Library Experience: Progress and Prospects Nicholas J. Belkin Rutgers University, USA
Search Engines and Information Retrieval Chapter 1.
“Cross-Media and Personalized Learning Applications on top of Digital Libraries” 20 September 2007, Budapest, Hungary M. Agosti 1, T. Coppotelli 1, G.M.
An Introduction to Grants.gov Sponsored Programs Office February 22,
Abstract Question answering is an important task of natural language processing. Unification-based grammars have emerged as formalisms for reasoning about.
Current Trends in Databases - Introduction, part 2 - Bettina Berendt and Marie-Francine Moens 11 February 2009.
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
The Internet 8th Edition Tutorial 4 Searching the Web.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
CLEF 2008 Final Session CLEF 2008 Workshop, Aarhus, Denmark September 2008.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Enhancing Forms with OLE Fields, Hyperlinks, and Subforms – Project 5.
CLEF 2007 Workshop Budapest, Hungary, 19–21 September 2007 Nicola Ferro Information Management Systems (IMS) Research Group Department of Information Engineering.
Which Log for which Information? Gathering Multilinguality Data from Different Log File Types Maria Gäde, Vivien Petras, and Juliane Stiller Humboldt-Universität.
Thomas Mandl: GeoCLEF Track Overview Cross-Language Evaluation Forum (CLEF) Thomas Mandl, (U. Hildesheim) 8 th Workshop.
Information Retrieval
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
G042 - Lecture 09 Commencing Task A Mr C Johnston ICT Teacher
Researching. You will be researching the topic you selected for writing your opinion paper. You will see if what you think is the same that the world.
Multilingual Search Shibamouli Lahiri
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
Jon Juett April 21,  Selected very recent papers  Includes some student level event / conference papers  UM Health Counseling Program  Correctly.
Internet Privacy Define PRIVACY? How important is internet privacy to you? What privacy settings do you utilize for your social media sites?
General Architecture of Retrieval Systems 1Adrienn Skrop.
Shuang Wu REU-DIMACS, 2010 Mentor: James Abello. Project description Our research project Input: time data recorded from the ‘Name That Cluster’ web page.
From CLEF to TrebleCLEF Promoting Technology Transfer
ISI Web of Knowledge Early updates
PubMed Search Options (Basic Course: Module 6)
PubMed Search Options (Basic Course: Module 6)
Lecture 4: File-System Interface
Presentation transcript:

Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, /10 Log-Based Evaluation Resources for Question Answering Thomas Mandl, Julia Maria Schulz

LREC 2010, Web Logs & QA, /10 Information Retrieval Logs and Question Answering  Users are not always aware that such different systems exist  The short query is a preferred way of asking for information, but sometimes also phrases or complete sentences are entered  Demand for query specific treatment (Mandl & Womser-Hacker 2005)

Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, /10 Logfile resources at CLEF

Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, /10 Information Retrieval Evaluation Resources  GeoCLEF 2007:  investigated and provided evaluation resources for geographic information retrieval (Mandl et al. 2008)  The query identification task was based on a query set from MSN, which is no longer distributed by Microsoft  LogCLEF 2009  “action logs” from The European Library portal (TEL), covered period: 1st January 2007 until 30th June 2008  web search engine query log from the Tumba! search engine  LogCLEF 2010  Extended TEL query and action logs  DIPF query logs (raw server log representing three months of activities on the portal is made available. The size of the files is 5 GB.)

Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, /10 TEL  The most significant columns of the table are:  A numeric id, for identifying registered users or “guest” otherwise;  User’s IP address;  An automatically generated alphanumeric, identifying sequential actions of the same user (sessions) ;  Query contents;  Name of the action that a user performed;  The corresponding collection’s alphanumeric id;  Date and time of the action’s occurrence.

Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, /10 Question Style Queries in Query Logs I Examples for queries from the MSN query logfile.

Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, /10 Question Style Queries in Query Logs II  Examples for queries from the TEL logfile.

Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, /10 Stop Words in Query reformulations  over 1/4 of all reformulations in the TEL are additions or deletions of stop words (Ghorab et al. 2009).  Also question words like “where” or “when” are common stop words in information retrieval systems.  Prepositions are typical in the reformulation set, too.  frequent use of prepositions in the Tumba! search engine log.  prepositions belong to the most frequent terms in the MSN log.

Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, /10 Outlook  CLEF has created evaluation resources for logfile analysis which can be used for comparative system evaluation.  The available files do contain queries which could be interesting for question answering systems.  They contain full sentences as questions or phrases which cannot be processed appropriately by the “bag of words” approach.

Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, /10 References Ghorab, M.R.; Leveling, J.; Zhou, D.; Jones, G.; Wade, V.: TCD-DCU at LogCLEF 2009: An Analysis of Queries, Actions, and Interface Languages. In: Peters, C.; Di Nunzio, G.; Kurimo, M.; Mandl, T.; Mostefa, D.; Peñas, A.; Roda, G. (Eds.): Multilingual Information Access Evaluation Vol. I Text Retrieval Experiments: Proceedings 10th Workshop of the Cross$Language Evaluation Forum, CLEF 2009, Corfu, Greece. Revised Selected Papers. Berlin et al.: Springer [Lecture Notes in Computer Science] to appear. Preprint in Working Notes: campaign.org/2009/working_notes/ Li, Z., Wang, C., Xie, X., Ma, W.-Y. (2008). Query Parsing Task for GeoCLEF2007 Report. In: Workingnotes 8th Workshop of the Cross$Language Evaluation Forum, CLEF 2007, Budapest, Hungary, Mandl, T., Gey, F., Di Nunzio, G., Ferro, N., Larson, R., Sanderson, M., Santos, D., Womser-Hacker, C., Xing, X. (2008). GeoCLEF 2007: the CLEF 2007 Cross- Language Geographic Information Retrieval Track Overview. In: Peters, C.; Jijkoun, V.; Mandl, T.; Müller, H.; Oard, D.; Peñas, A.; Petras, V.; Santos, D. (Eds.): Advances in Multilingual and Multimodal Information Retrieval: 8th Workshop of the Cross$Language Evaluation Forum. CLEF 2007, Budapest, Hungary, Revised Selected Papers. Berlin et al.: Springer [Lecture Notes in Computer Science 5152] pp Mandl, T., Womser-Hacker, C. (2005). The Effect of Named Entities on Effectiveness in Cross-Language Information Retrieval Evaluation. In: Proceedings of 2005 ACM SAC Symposium on Applied Computing (SAC). Santa Fe, New Mexico, USA. March pp Mandl, T.; Agosti, M.; Di Nunzio, G.; Yeh, A., Mani, I.; Doran, C.; Schulz, J.M. (2010): LogCLEF 2009: the CLEF 2009 Cross-Language Logfile Analysis Track Overview. In: Peters, C.; Di Nunzio, G.; Kurimo, M.; Mandl, T.; Mostefa, D.; Peñas, A.; Roda, G. (Eds.): Multilingual Information Access Evaluation Vol. I Text Retrieval Experiments: Proceedings 10th Workshop of the Cross$Language Evaluation Forum, CLEF 2009, Corfu, Greece. Revised Selected Papers. Berlin et al.: Springer [Lecture Notes in Computer Science] to appear. Preprint in Working Notes: campaign.org/2009/working_notes/LogCLEF-2009-Overview-Working-Notes pdf