The CLEF 2005 interactive track (iCLEF) Julio Gonzalo 1, Paul Clough 2 and Alessandro Vallin 3 1 1 Departamento de Lenguajes y Sistemas Informáticos, Universidad.

Slides:



Advertisements
Similar presentations
Critical Reading Strategies: Overview of Research Process
Advertisements

Chronic disease self management – a systematic review of proactive telephone applications Carly Muller Dean Schillinger Division of General Internal Medicine.
Sheffield at ImageCLEF 2003 Paul Clough and Mark Sanderson.
Ranked Retrieval INST 734 Module 3 Doug Oard. Agenda  Ranked retrieval Similarity-based ranking Probability-based ranking.
ImageCLEF breakout session Please help us to prepare ImageCLEF2010.
The Challenges of Multilingual Search Paul Clough The Information School University of Sheffield ISKO UK conference 8-9 July 2013.
Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.
With or without users? Julio Gonzalo UNEDhttp://nlp.uned.es.
 Ad-hoc - This track tests mono- and cross- language text retrieval. Tasks in 2009 will test both CL and IR aspects.
Cross-Language Retrieval INST 734 Module 11 Doug Oard.
Community Planning Training 1-1. Community Plan Implementation Training 1- Community Planning Training 1-3.
 Official Site: facility.org/research/evaluation/clef-ip-10http:// facility.org/research/evaluation/clef-ip-10.
Spanish Question Answering Evaluation Anselmo Peñas, Felisa Verdejo and Jesús Herrera UNED NLP Group Distance Learning University of Spain CICLing 2004,
Evaluation and analysis of the application of interactive digital resources in a blended-learning methodology for a computer networks subject F.A. Candelas,
RESEARCH A systematic quest for undiscovered truth A way of thinking
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Developing Business Practice –302LON Introduction to Business and Management Research Unit: 6 Knowledgecast: 2.
August 21, 2002Szechenyi National Library Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
Impressions of 10 years of CLEF Donna Harman Scientist Emeritus National Institute of Standards and Technology.
How to Modify & Accommodate for Autism Dallas Independent School District Special Education Department IDEA Coordination and Specialized Programs.
1 Seminar and Report Writing PhD (Business Administration) Dr Fred Mugambi Mwirigi.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003.
JULIO GONZALO, VÍCTOR PEINADO, PAUL CLOUGH & JUSSI KARLGREN CLEF 2009, CORFU iCLEF 2009 overview tags : image_search, multilinguality, interactivity, log_analysis,
Easy-to-Understand Tables RIT Standards Key Ideas and Details #1 KindergartenGrade 1Grade 2 With prompting and support, ask and answer questions about.
Developing Communicative Dr. Michael Rost Language Teaching.
ELA Common Core Shifts. Shift 1 Balancing Informational & Literary Text.
1 Clustering of search engine results by Google CWI, Amsterdam, The Netherlands Vrije Universiteit.
The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield
Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.
Experimental Research Methods in Language Learning Chapter 1 Introduction and Overview.
EdTPA Teacher Performance Assessment. Planning Task Selecting lesson objectives Planning 3-5 days of instruction (lessons, assessments, materials) Alignment.
Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley Ray R Larson School of Information.
Evaluation INST 734 Module 5 Doug Oard. Agenda  Evaluation fundamentals Test collections: evaluating sets Test collections: evaluating rankings Interleaving.
Comparing syntactic semantic patterns and passages in Interactive Cross Language Information Access (iCLEF at the University of Alicante) Borja Navarro,
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Elaine Ménard & Margaret Smithglass School of Information Studies McGill University [Canada] July 5 th, 2011 Babel revisited: A taxonomy for ordinary images.
D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe Documents and.
UNED at iCLEF 2008: Analysis of a large log of multilingual image searches in Flickr Victor Peinado, Javier Artiles, Julio Gonzalo and Fernando López-Ostenero.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Clarity Cross-Lingual Document Retrieval, Categorisation and Navigation Based on Distributed Services
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
This work is part of the Joint Action on Improving Quality in HIV Prevention (Quality Action), which has received funding from the European Union within.
Seminar and Report Writing
Microsoft Innovative Teacher Awards READ THIS CAREFULLY The following slides provide you with guidelines for the content of your Innovative Teacher Awards.
QA Pilot Task at CLEF 2004 Jesús Herrera Anselmo Peñas Felisa Verdejo UNED NLP Group Cross-Language Evaluation Forum Bath, UK - September 2004.
Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.
What’s happening in iCLEF? (the iCLEF Flickr Challenge) Julio Gonzalo (UNED), Paul Clough (U. Sheffield), Jussi Karlgren (SICS), Javier Artiles (UNED),
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
Acceso a la información mediante exploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED III.
16 September 2004CLEF 2004 iCLEF 2004 at Maryland: Summarization Design for Interactive Cross-Language QA Daqing He, Jianqiang Wang, Jun Luo and Douglas.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Lesson Overview Lesson Overview What Is Science?.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Digital Knowledge Setting May 8th, 2009 Barcelona CLAN – Continuous Learning for Adults with Needs LLP IT-GRUNDTVIG-GMP Grant Agreement.
CLEF Budapest1 Measuring the contribution of Word Sense Disambiguation for QA Proposers: UBC: Agirre, Lopez de Lacalle, Otegi, Rigau, FBK: Magnini.
From CLEF to TrebleCLEF Promoting Technology Transfer
Human Computer Interaction Lecture 21,22 User Support
Supplementary Table 1. PRISMA checklist
Unit 6 Research Project in HSC Unit 6 Research Project in Health and Social Care Aim This unit aims to develop learners’ skills of independent enquiry.
College of Information
Steps of answering a scientific question
Tasks & Grades for MET3.
Week 1 Vocab Definitions
Presentation transcript:

The CLEF 2005 interactive track (iCLEF) Julio Gonzalo 1, Paul Clough 2 and Alessandro Vallin Departamento de Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia 2 Department of Information Studies, University of Sheffield, UK 3 ITC-irst, Trento, Italy

Overview Consolidation of two pilot user studies at iCLEF 2004 Consolidation of two pilot user studies at iCLEF 2004 Interactive question answering task Interactive question answering task Interactive image retrieval task Interactive image retrieval task iCLEF provides resources and experiment design iCLEF provides resources and experiment design Participants select a research question to investigate (by comparing the behaviour and search results of users with a reference and a contrastive system) Participants select a research question to investigate (by comparing the behaviour and search results of users with a reference and a contrastive system) Five research groups submitted results Five research groups submitted results 2 groups for image retrieval 2 groups for image retrieval 3 groups for QA 3 groups for QA

Agenda Image retrieval task Image retrieval task Question Answering task Question Answering task Ideas for 2006: the flickr task Ideas for 2006: the flickr task

Cross-language image retrieval

Overview Limited evaluation with ranked lists Limited evaluation with ranked lists Image retrieval systems highly interactive Image retrieval systems highly interactive Appealing for CLIR research Appealing for CLIR research Language-independent: object to be retrieved is an image Language-independent: object to be retrieved is an image Image Retrieval Image Retrieval Purely visual (QBE) e.g. “find images like this one” Purely visual (QBE) e.g. “find images like this one” Text-based e.g. Web image search Text-based e.g. Web image search Combination Combination

Interactive task Areas of interaction to study include Areas of interaction to study include Query formulation (visual and textual) Query formulation (visual and textual) Query re-formulation (relevance feedback) Query re-formulation (relevance feedback) Browsing/navigating results Browsing/navigating results Identifying/selecting relevant images Identifying/selecting relevant images Based on iCLEF methodology Based on iCLEF methodology Participants require minimum of 8 users Participants require minimum of 8 users Within-subject experimental design Within-subject experimental design 16 search tasks (5 mins per task) 16 search tasks (5 mins per task) Participants select area to investigate Participants select area to investigate

Target search task 海 (sea) Clear goal for the user (easy to describe task) Clear goal for the user (easy to describe task) Can be achieved without knowledge of the collection Can be achieved without knowledge of the collection Clearly defined measures of success Clearly defined measures of success Invokes different searching strategies Invokes different searching strategies

Search tasks

Participants 11 signed up; 2 submitted 11 signed up; 2 submitted University of Sheffield University of Sheffield Compared Italian version of the same system Compared Italian version of the same system Aimed to test whether automatically-generated menus better for presenting results than ranked list Aimed to test whether automatically-generated menus better for presenting results than ranked list Miracle Miracle Compared Spanish versus English query formulation Compared Spanish versus English query formulation Aimed to test whether Boolean AND or OR better Aimed to test whether Boolean AND or OR better

Results Miracle Miracle 69% of images found English; 66% Spanish 69% of images found English; 66% Spanish Domain-specific terminology caused problems for users (and system) Domain-specific terminology caused problems for users (and system) University of Sheffield University of Sheffield 53% images found using list; 47% menus 53% images found using list; 47% menus Users preferred the menus Users preferred the menus Comparison between groups (limited) Comparison between groups (limited) Miracle: 86/128 images found overall Miracle: 86/128 images found overall Sheffield: 82/128 images found overall Sheffield: 82/128 images found overall

Interactive CL Q&A

Q&A task Question (native language) Text collection (foreign language) Answer (native language) Q&A search assistant

Q&A vs. interactive Q&A People know some of the answers People know some of the answers Questions must be carefully selected Questions must be carefully selected People can draw inferences People can draw inferences Answer from multiple documents: considered in 2004, problems with assessment Answer from multiple documents: considered in 2004, problems with assessment Combination of document evidence with user knowledge: avoid definition and other open questions. Combination of document evidence with user knowledge: avoid definition and other open questions. People answer in the question language People answer in the question language Need to provide high-quality manual translations for assessment. Need to provide high-quality manual translations for assessment. People get tired People get tired Exclude nil questions, limit question types Exclude nil questions, limit question types

Experimental Design 8 users (native query language) 8 users (native query language) 16 evaluation questions (+ 4 for training) 16 evaluation questions (+ 4 for training) 5 minutes per search (~3 hours per user) 5 minutes per search (~3 hours per user) Independent variable: CLIR system design (reference/contrastive) Independent variable: CLIR system design (reference/contrastive) Dependent variable: accuracy Dependent variable: accuracy Latin square to block user/question effects Latin square to block user/question effects

Evaluation measures Official score: accuracy (= Q&A track) Official score: accuracy (= Q&A track) Additional quantitative data: searching time, number of interactions, log analysis in general. Additional quantitative data: searching time, number of interactions, log analysis in general. Additional data: questionnaires (initial, 2 post-system, final), observational information. Additional data: questionnaires (initial, 2 post-system, final), observational information.

Experiments Alicante: How much context users need to correctly identify answers? (clauses vs full paragraphs in a QA-based system) Alicante: How much context users need to correctly identify answers? (clauses vs full paragraphs in a QA-based system) Salamanca: How useful is MT for the task? (with/without MT) X (poor/good target language skills) X (EN/FR as target language) Salamanca: How useful is MT for the task? (with/without MT) X (poor/good target language skills) X (EN/FR as target language) UNED: Is it better to search paragraphs than full documents? UNED: Is it better to search paragraphs than full documents?

Official results

Remarkable facts UNED & Alicante: accuracy increases with larger contexts. UNED & Alicante: accuracy increases with larger contexts. Salamanca: MT is not very helpful! Salamanca: MT is not very helpful! Implications for CL-QA systems? Implications for CL-QA systems?

Ideas for 2006

Participation Conclusion: terminate the track!

Failure analysis (1) High cost of entry High cost of entry Long, boring guidelines. Long, boring guidelines. User recruitment, scheduling, training, monitoring. User recruitment, scheduling, training, monitoring. Can’t really do experiment variations. Can’t really do experiment variations. Made a programming mistake? Start recruiting volunteers again. Made a programming mistake? Start recruiting volunteers again.

Failure analysis (2) “users screw everything up” (XXX, IR competition organizer). Recruiting, training, monitoring, sometimes even paying… just to see how users ruin your hypothesis. “users screw everything up” (XXX, IR competition organizer). Recruiting, training, monitoring, sometimes even paying… just to see how users ruin your hypothesis. Is your search assistant at least good for demonstration purposes? No, because Is your search assistant at least good for demonstration purposes? No, because 1) Cross-Language Search  cross-cultural need 2) But cross-cultural need is unfrequent! (Experiment: show your mother)

title

description comments sets Tags (folksonomies) Tags (folksonomies)

Spanish Italian English Japanese

Advantages of flickr Naturally multilingual, new IR challenge (folksonomies) Naturally multilingual, new IR challenge (folksonomies) You can show your mother! (it is cross-language but it is not cross-cultural) You can show your mother! (it is cross-language but it is not cross-cultural) Can avoid recruiting users: study behaviour of real web/flickr users (log analysis) Can avoid recruiting users: study behaviour of real web/flickr users (log analysis) Challenges of web scenarios (social network effects) plus advantages of controlled scenarios (unlike Google or Yahoo image search) Challenges of web scenarios (social network effects) plus advantages of controlled scenarios (unlike Google or Yahoo image search)

Interactive Flickr task (2006) Target language: Portuguese Target language: Portuguese Data: Flickr images (local or via Flickr API) Data: Flickr images (local or via Flickr API) Search task: Search task: Illustrate this text (open) Illustrate this text (open) What’s behind this house? (focused, Q&A type) What’s behind this house? (focused, Q&A type) Sunsets in Mangue Seco (ad-hoc type) Sunsets in Mangue Seco (ad-hoc type) Pictures where the Nike logo appears (ad-hoc, content oriented) Pictures where the Nike logo appears (ad-hoc, content oriented) Track real users w. real information needs (log analysis!) Track real users w. real information needs (log analysis!) Experiment design: open!! (let’s also compare evaluation methodologies!) Experiment design: open!! (let’s also compare evaluation methodologies!)

Plans for 2007 Make the task compulsory for CLEF participants Make the task compulsory for CLEF participants Terminate all other tracks Terminate all other tracks Task coordinators hired by Yahoo! Task coordinators hired by Yahoo!

Acknowledgments People who helped organizing iCLEF 2005: Richard Sutcliffe, Christelle Ayache, Víctor Peinado, Fernando López, Javier Artiles, Jianqiang Wang, Daniela Petrelli People who helped organizing iCLEF 2005: Richard Sutcliffe, Christelle Ayache, Víctor Peinado, Fernando López, Javier Artiles, Jianqiang Wang, Daniela Petrelli People already helping us shape flickr task: Javier Artiles, Peter Anick, Jussi Karlgren, Doug Oard, William Hersh, Donna Harman, Daniela Petrelli, Henning Müller People already helping us shape flickr task: Javier Artiles, Peter Anick, Jussi Karlgren, Doug Oard, William Hersh, Donna Harman, Daniela Petrelli, Henning Müller All participant groups All participant groups