Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, 22.05.2010 1/10 Log-Based Evaluation Resources for Question Answering Thomas Mandl, Julia Maria.

Similar presentations


Presentation on theme: "Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, 22.05.2010 1/10 Log-Based Evaluation Resources for Question Answering Thomas Mandl, Julia Maria."— Presentation transcript:

1 Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, 22.05.2010 1/10 Log-Based Evaluation Resources for Question Answering Thomas Mandl, Julia Maria Schulz

2 LREC 2010, Web Logs & QA, 22.05.2010 2/10 Information Retrieval Logs and Question Answering  Users are not always aware that such different systems exist  The short query is a preferred way of asking for information, but sometimes also phrases or complete sentences are entered  Demand for query specific treatment (Mandl & Womser-Hacker 2005)

3 Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, 22.05.2010 3/10 Logfile resources at CLEF

4 Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, 22.05.2010 4/10 Information Retrieval Evaluation Resources  GeoCLEF 2007:  investigated and provided evaluation resources for geographic information retrieval (Mandl et al. 2008)  The query identification task was based on a query set from MSN, which is no longer distributed by Microsoft  LogCLEF 2009  “action logs” from The European Library portal (TEL), covered period: 1st January 2007 until 30th June 2008  web search engine query log from the Tumba! search engine  LogCLEF 2010  Extended TEL query and action logs  DIPF query logs (raw server log representing three months of activities on the portal is made available. The size of the files is 5 GB.)

5 Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, 22.05.2010 5/10 TEL  The most significant columns of the table are:  A numeric id, for identifying registered users or “guest” otherwise;  User’s IP address;  An automatically generated alphanumeric, identifying sequential actions of the same user (sessions) ;  Query contents;  Name of the action that a user performed;  The corresponding collection’s alphanumeric id;  Date and time of the action’s occurrence.

6 Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, 22.05.2010 6/10 Question Style Queries in Query Logs I Examples for queries from the MSN query logfile.

7 Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, 22.05.2010 7/10 Question Style Queries in Query Logs II  Examples for queries from the TEL logfile.

8 Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, 22.05.2010 8/10 Stop Words in Query reformulations  over 1/4 of all reformulations in the TEL are additions or deletions of stop words (Ghorab et al. 2009).  Also question words like “where” or “when” are common stop words in information retrieval systems.  Prepositions are typical in the reformulation set, too.  frequent use of prepositions in the Tumba! search engine log.  prepositions belong to the most frequent terms in the MSN log.

9 Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, 22.05.2010 9/10 Outlook  CLEF has created evaluation resources for logfile analysis which can be used for comparative system evaluation.  The available files do contain queries which could be interesting for question answering systems.  They contain full sentences as questions or phrases which cannot be processed appropriately by the “bag of words” approach.

10 Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, 22.05.2010 10/10 References Ghorab, M.R.; Leveling, J.; Zhou, D.; Jones, G.; Wade, V.: TCD-DCU at LogCLEF 2009: An Analysis of Queries, Actions, and Interface Languages. In: Peters, C.; Di Nunzio, G.; Kurimo, M.; Mandl, T.; Mostefa, D.; Peñas, A.; Roda, G. (Eds.): Multilingual Information Access Evaluation Vol. I Text Retrieval Experiments: Proceedings 10th Workshop of the Cross$Language Evaluation Forum, CLEF 2009, Corfu, Greece. Revised Selected Papers. Berlin et al.: Springer [Lecture Notes in Computer Science] to appear. Preprint in Working Notes: http://www.clef- campaign.org/2009/working_notes/ Li, Z., Wang, C., Xie, X., Ma, W.-Y. (2008). Query Parsing Task for GeoCLEF2007 Report. In: Workingnotes 8th Workshop of the Cross$Language Evaluation Forum, CLEF 2007, Budapest, Hungary, http://www.clef-campaign.org/2007/working_notes/LI_OverviewCLEF2007.pdf Mandl, T., Gey, F., Di Nunzio, G., Ferro, N., Larson, R., Sanderson, M., Santos, D., Womser-Hacker, C., Xing, X. (2008). GeoCLEF 2007: the CLEF 2007 Cross- Language Geographic Information Retrieval Track Overview. In: Peters, C.; Jijkoun, V.; Mandl, T.; Müller, H.; Oard, D.; Peñas, A.; Petras, V.; Santos, D. (Eds.): Advances in Multilingual and Multimodal Information Retrieval: 8th Workshop of the Cross$Language Evaluation Forum. CLEF 2007, Budapest, Hungary, Revised Selected Papers. Berlin et al.: Springer [Lecture Notes in Computer Science 5152] pp. 745--772. Mandl, T., Womser-Hacker, C. (2005). The Effect of Named Entities on Effectiveness in Cross-Language Information Retrieval Evaluation. In: Proceedings of 2005 ACM SAC Symposium on Applied Computing (SAC). Santa Fe, New Mexico, USA. March 13.-7. pp. 1059--1064. Mandl, T.; Agosti, M.; Di Nunzio, G.; Yeh, A., Mani, I.; Doran, C.; Schulz, J.M. (2010): LogCLEF 2009: the CLEF 2009 Cross-Language Logfile Analysis Track Overview. In: Peters, C.; Di Nunzio, G.; Kurimo, M.; Mandl, T.; Mostefa, D.; Peñas, A.; Roda, G. (Eds.): Multilingual Information Access Evaluation Vol. I Text Retrieval Experiments: Proceedings 10th Workshop of the Cross$Language Evaluation Forum, CLEF 2009, Corfu, Greece. Revised Selected Papers. Berlin et al.: Springer [Lecture Notes in Computer Science] to appear. Preprint in Working Notes: http://www.clef- campaign.org/2009/working_notes/LogCLEF-2009-Overview-Working-Notes-2009-09-14.pdf


Download ppt "Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, 22.05.2010 1/10 Log-Based Evaluation Resources for Question Answering Thomas Mandl, Julia Maria."

Similar presentations


Ads by Google