Nicola Ferro, Allan Hanbury, Jussi Karlgren, Maarten de Rijke, and Giuseppe Santucci CLEF 2010, 20th Sept. 2010, Padova A PROMISE for Experimental Evaluation.

Slides:

Advertisements

Similar presentations

Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.

Advertisements

Usage statistics in context - panel discussion on understanding usage, measuring success Peter Shepherd Project Director COUNTER AAP/PSP 9 February 2005.

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.

Multimedia Database Systems

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Introduction to Research Methodology

Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.

MULTILINGUAL AND MULTIFACETED INTERACTIVE INFORMATION ACCESS (MU) 2 MI 2 A ~ MUMIA COST ACTION IC1002 Michail (Mike) SALAMPASIS Associate Professor Department.

Exploring the Neighborhood with Dora to Expedite Software Maintenance Emily Hill, Lori Pollock, K. Vijay-Shanker University of Delaware.

Evaluating Search Engine

Automatic Classification of Accounting Literature Nineteenth Annual Strategic and Emerging Technologies Workshop Vasundhara Chakraborty, Victoria Chiu,

Search Engines and Information Retrieval

WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.

Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.

Information Retrieval in Practice

WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.

WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.

Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.

Web Archive Information Retrieval Miguel Costa, Daniel Gomes (speaker) Portuguese Web Archive.

Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.

Task analysis 1 © Copyright De Montfort University 1998 All Rights Reserved Task Analysis Preece et al Chapter 7.

Patent CLEF John Tait, Chief Scientific Officer, IRF.

How to write a publishable qualitative article

 Official Site: facility.org/research/evaluation/clef-ip-10http:// facility.org/research/evaluation/clef-ip-10.

Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.

CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

RESEARCH A systematic quest for undiscovered truth A way of thinking

Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.

Search Engines and Information Retrieval Chapter 1.

Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Ethnographic Field Methods and Their Relation to Design by Kim, Antony, Chipo, Tsega.

University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.

©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.

 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.

April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.

Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.

The roots of innovation Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on:

IS2210: Systems Analysis and Systems Design and Change Twitter:

Lucian Voinea Visualizing the Evolution of Code The Visual Code Navigator (VCN) Nunspeet,

1 CS430: Information Discovery Lecture 18 Usability 3.

1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.

WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.

1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.

WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

The Semantic Logger: Supporting Service Building from Personal Context Mischa M Tuffield et al. Intelligence, Agents, Multimedia Group University of Southampton.

1.  Interpretation refers to the task of drawing inferences from the collected facts after an analytical and/or experimental study.  The task of interpretation.

Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.

Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.

Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.

Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq

What’s happening in iCLEF? (the iCLEF Flickr Challenge) Julio Gonzalo (UNED), Paul Clough (U. Sheffield), Jussi Karlgren (SICS), Javier Artiles (UNED),

The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.

How to measure the impact of R&D on SD ? Laurence Esterle, MD, PhD Cermes and Ifris France Cyprus, 16 – 17 October L. ESTERLE Linking science and.

Extracting value from grey literature Processes and technologies for aggregating and analysing the hidden Big Data treasure of the organisations.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.

ENGINEERING What is Engineering? The application of mathematics and scientific principles to better or improve life To equip creative minds with the mathematical.

Information Retrieval in Practice

From CLEF to TrebleCLEF Promoting Technology Transfer

Queensland University of Technology

Evaluation Anisio Lacerda.

Information Day on “Search Engines for Audio-Visual Content”

The Steps into creation of research

Data Mining Chapter 6 Search Engines

Journal of Web Semantics 55 (2019)

Presentation transcript:

Nicola Ferro, Allan Hanbury, Jussi Karlgren, Maarten de Rijke, and Giuseppe Santucci CLEF 2010, 20th Sept. 2010, Padova A PROMISE for Experimental Evaluation

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 2 Multilingual and Multimedia Information Access Systems

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 3 Challenges for Experimental Evalution Heterogeneousness and volume of the data much is done to provide realistic document collections Diversity of users and tasks evalution tasks/tracks are often too “monolithic” Complexity of the systems system are usually dealt with as “black boxes”

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 4 Experimental Evaluation Needs To increase the automation in the evaluation process reduction of the effort necessary for carrying out evaluation increase the number of the experiments conducted in order to deeply analyse evolving user habits and tasks To study systems, component-by-component better understanding of systems’ behaviour also with respect to different tasks To increase the usage of the produced experimental data improving collaboration and user involvment to achieve unforeseen exploitation and enrichment of the experimental data

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 5 PROMISE Approach

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation Evaluation: Labs and Metrics Maarten de Rijke UvA

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 7 Information access changing New breeds of users Performing increasingly broad range of tasks within varying domains Acting within communities to find information for themselves and to share with others Re-orientation of methodology and goals in evaluation of information access systems

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 8 Mapping the evaluation landscape Generating ground truth from log files Generating ground truth from annotations Alternative retrieval scenarios and metrics Living labs Evaluation in the wild Ranking analysis

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 9 Use Cases – a bridge to application Jussi Karlgren SICS

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 10 Two legs of evaluation Benchmarking Validation (well … at least two) Each with separate craft and practice. How can they communicate?

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 11 Use cases – a conduit To communicate starting points of evaluation practice we suggest the formulation of use cases, based on practice in the field. Interviews, think tanks, hypothesis- driven as well as empiry driven. Contact us! Suggest stakeholders!

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 12 IP Search Allan Hanbury IRF

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 13 IR Evaluation Campaigns today … are mostly based on The TREC organisation model, which is based on The Cranfield Paradigm, which was developed for

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 14 You can do a lot with index cards... The Mundaneum Begun in 1919 in Belgium, by April 1934 there were index cards (cross referenced)

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 15 Disadvantages of Evaluation Campaign Approach Fixed timelines and cyclic nature of events Evaluation at system-level only Difficulty in comparing systems and elucidating reasons for their performance Viewing the campaign as a competition Are IR Systems getting better? It is not clear from results in published papers that IR systems have improved over the last decade [Fuhr, this morning; Armstrong et al., CIKM 2009]

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 16 Search for Innovation Patent Search is an interesting problem because: Very high recall required, but precision should not be sacrificed Many types of search done: from narrow to wide Searches also in non-patent literature Classification required Multi-lingual Non-text information is important Different style used in different parts of patents

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 17 Visual Analytics Giuseppe Santucci Sapienza Università di Roma

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 18 Data ! PROMISE has to manage and explore large and/or complex datasets Topics Experiment submissions Creation of pools Relevance assessment Log files Measures Derived data Statistics … And PROMISE foresees a growth of the managed data during the project of about one order of magnitude

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 19 Challenges What are the challenges rising from the management of such datasets? Not the storage (even if it requires an engineered database design) Not the retrieval (if you just need to retrieve a measure) Challenges come from effectively using such immense wealth of data (without being overloaded). It means: understanding it discovering patterns, insights, and trends making decisions sharing and reusing results

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 20 Rescuing information In different situations people need to exploit and to use hidden information resting in unexplored large data sets decision-makers analysts engineers emergency response teams... Several techniques exist devoted to this aim Automatic analysis techniques (e.g., data mining, statistics) Manual analysis techniques (e.g., Information visualization) Large and complex datasets require a joint effort:

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 21 Visual Analytics

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 22 A simple Visual Analytics example How to visually compare Jack London and Mark Twain books? VA steps Split the book in several text block (e.g., pages, paragraph) Measure, for each text block, a relevant feature (e.g., average sentence length, word usage, etc. ) Associate the relevant feature to a visual attribute (e.g., color) Visualize it

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 23 Jack London vs Mark Twain Average sentence length Hapax Legomena (HL) (words appearing only once) Short sentences Long sentences Many HL (rich vocabulary) Few HL

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 24 Visual ! One of the innovative aspects of PROMISE acknowledged by the European Commission, is the idea of providing Visual Analytics techniques for exploring the available datasets Specific algorithms Suitable visualizations Sharing and collaboration mechanisms

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 25 The European Vismaster CA project

CLEF 2010, 20th Sept. 2010, PadovaA PROMISE for Experimental Evaluation 26 Where next? What can PROMISE deliver to future CLEF labs? How will PROMISE contribute to the field as a whole, outside direct CLEF activities? How can PROMISE provide experimental infrastructure for other projects?