Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments SIGIR´09, July 2009.

Slides:



Advertisements
Similar presentations
ESDS Qualidata Libby Bishop, ESDS Qualidata Economic and Social Data Service UK Data Archive ESDS Awareness Day Friday 5 December 2003Royal Statistical.
Advertisements

Research and Innovation Research and Innovation Brussels, 12 November 2013 Types of Action Focus on Innovation Actions and the SME Instrument.
T. Bothos / D. Apostolou Prediction Market prototype V1 Information Management Unit (IMU) Institute of Communication and.
Database Planning, Design, and Administration
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Towards Adaptive Web-Based Learning Systems Katerina Georgouli, MSc, PhD Associate Professor T.E.I. of Athens Dept. of Informatics Tempus.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen, Germany.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Evaluating Search Engine
Information Access I Measurement and Evaluation GSLT, Göteborg, October 2003 Barbara Gawronska, Högskolan i Skövde.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Everyday inclusive Web design: an activity perspective CS575 MADHAVI L NIDAMARTHY.
Overview of Search Engines
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
BA271 Week 4 Lecture Goals for today… Draft Wiki Ideas Making contributions to a public wiki Documenting and selling what you have done in your MyTalk.
Enriching the Ontology for Biomedical Investigations (OBI) to Improve Its Suitability for Web Service Annotations Chaitanya Guttula, Alok Dhamanaskar,
Tyson’s Approach to Organizational Change Management
Internet Based Information Sources on Urbanism - Tutorial - Authors: D. Milovanovic, D. S. Furundzic, yubc.net.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License Creative Commons Attribution-NonCommercial-ShareAlike.
Presentation Path  Introduction to Ved Consultancy and OpenText  Current Challenges  The Valued Customers and Sectors  Our Solutions  Demo. Together,
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
The Information Challenge Exponential growth of resources New researchers with new needs Multiple communication options New expectations and opportunities.
Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science University of Waterloo Waterloo, Canada.
Integrating Technology & Media Into Instruction: The ASSURE Model
Linking Tasks, Data, and Architecture Doug Nebert AR-09-01A May 2010.
Search Engine Architecture
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals Test collections: evaluating sets  Test collections: evaluating rankings Interleaving.
Users and Assessors in the Context of INEX: Are Relevance Dimensions Relevant? Jovan Pehcevski, James A. Thom School of CS and IT, RMIT University, Australia.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Search Tools and Search Engines Searching for Information and common found internet file types.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
FORESTUR How to work… …with this training platform? …with this methodology?
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Greenbush. An informed citizen possesses the knowledge needed to understand contemporary political, economic, and social issues. A thoughtful citizen.
Digital Storytelling Using E- Portfolios Sean Keesler – The Living SchoolBook Project Manager.
How "Next Generation" Are We? A Snapshot of the Current State of OPACs in U.S. and Canadian Academic Libraries Melissa A. Hofmann and Sharon Yang, Moore.
1 DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen, Germany.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
Evaluation Anisio Lacerda.
Everyday inclusive Web design: an activity perspective
Search Engine Architecture
Twelfth Policy Board meeting Lima, Peru 8-9 July 2014
Toshiyuki Shimizu (Kyoto University)
IR Theory: Evaluation Methods
Improved Algorithms for Topic Distillation in a Hyperlinked Environment (ACM SIGIR ‘98) Ruey-Lung, Hsiao Nov 23, 2000.
Measuring Complexity of Web Pages Using Gate
Search Engine Architecture
Retrieval Evaluation - Reference Collections
Retrieval Evaluation - Reference Collections
Retrieval Evaluation - Reference Collections
Preference Based Evaluation Measures for Novelty and Diversity
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments SIGIR´09, July 2009

Summary  Motivation  Overview  Related Work  Methodology  Pilot Study  Analysis and Findings  Conclusions

Motivation  With the advent of the technology more and more interest and use has been given to digital files, like digital books, audio, and video.  These digital files present new challenges in the constructions of test collections, more specifically collecting relevance assessments to tune system performance. This is due to:  The length and cohesion of the digital item  Dispersion of topics within it  Proposal => Develop a method for the collective gathering of relevance assessments using a social game model to instigate participants’ engagement.

Overview Test collections consist of: A corpus of documents A set of search topics And relevance assessments collected from human judges WSJ AT&T Unveils Services to Upgrade Phone Networks Under Global Plan Janet Guyon New York American Telephone & Telegraph Co. Introduced the first of a new.... Number: 168 Topic: Financing AMTRAK Description: A document will address the role of the Federal Government in financing the operation of the National Railroad Transportation Corporation (AMTRAK). Narrative: A relevant document must provide information on the government’s responsability to make AMTRAK an economically viable entity. It could also discuss.. Document (TREC) Topic (TREC)

Overview  Test Collection Construction (in TREC):  A set of documents and a set of topics are given to the TREC participants  Each participant runs the topics against the documents using their retrieval system.  A ranked list of the top k documents per topic are return to TREC.  TREC forms pools (selects top k documents) from the participants’ submission, which are judged by the relevance assessors.  Each submission is then evaluated using the resulting relevance judgment, and the evaluation results are then returned to the participants.

Related work  Gathering relevance judgments:  Single judge – usually the topic author assesses the relevance of documents to the given topic.  Multiple judges – assessments are collected from multiple judges and are typically converted to a single score per document.  In Web search judgments are collect from a representative sample of the user population. Also often user logs are mined for indicators of user satisfaction with the retrieved documents.

Related work  In their approach, they extended the use of multiple assessors per topic by:  Facilitating the review and re-assessment of relevance judgments  Enabling the communication between judges  Providing an enrich collection of relevance labels that incoporate different user profiles and user needs. This also enables the preservation and promotion of diversity of opinions.

Related Work

Methodology  The Collective Relevance Assessment (CRA) method involves three phases:  Preparation of data and setting CRA objectives

Methodology  Design of the game

Methodology  Relevance Assessment System

Pilot Study  Two rounds: First last 2 weeks, the second lasted 4 weekes  Data:  INEX 2008 Track (50,000 digitized books,17 million Scanned pages, 70 topic TREC style)  Participants:  17 Participants  Collected Data  Highlithed document regions  Binary relevance level per page  Notes and comments  Relevance degree assigned to a book

Analysis and Findings  Properties of the methodology:  Feasibility – engagement level comparable to the INEX 2003  Completeness and Exhaustiveness – 17,6% max completeness level.  Semantic Unit and Cohesion – relevance information forms a minor theme of the book. Relevant content is disperse.  Browsing and Relevance Decision – assessors requerie contextual information to make a decision.  Influence of incentive structures  Exloring vs. Reviewing  Assessment Strategies  Quality of the collected Data:  Assessor agreement –the level of agreement is higher comparing with TREC and INEX.  Annotations

Conclusions  The CRA method sucessfully expanded traditional methods and introduced new concepts for gathering relevant assessment.  Encourages personalized and diverse perspectives on the topics.  Promotes the collection of rich contextual data that can assist with interperting relevance assessments and their use for system optimization.