CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 1 Overview of QAST 2009 - Question Answering on Speech Transcriptions - J. Turmo, P. R. Comas,TALP.

Slides:



Advertisements
Similar presentations
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Advertisements

Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
A QUICK OVERVIEW OF LAB REPORT 0 GRADER COMMENTS Physics 119 Lab 0 Rubric Commentary.
ResPubliQA 2010: QA on European Legislation Anselmo Peñas, UNED, Spain Pamela Forner, CELCT, Italy Richard Sutcliffe, U. Limerick, Ireland Alvaro Rodrigo,
TrebleCLEF Evaluation package and language resources CLEF Evaluation Packages Nicolas Moreau / Khalid Choukri - ELDA.
MT Evaluation: Human Measures and Assessment Methods : Machine Translation Alon Lavie February 23, 2011.
1 CLEF 2011, Amsterdam QA4MRE, Question Answering for Machine Reading Evaluation Question Answering Track Overview Main Task Anselmo Peñas Eduard Hovy.
CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner.
Centro per la Ricerca Scientifica e Tecnologica Spoken language technologies: recent advances and future challenges Gianni Lazzari VIENNA July 26.
3rd Answer Validation Exercise ( AVE 2008) QA subtrack at Cross-Language Evaluation Forum 2008 UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks.
CLEF 2007 Multilingual Question Answering Track Danilo Giampiccolo, CELCT Anselmo Peñas, UNED.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Competitive Grant Program: Year 2 Meeting 2. SPECIAL DIABETES PROGRAM FOR INDIANS Competitive Grant Program: Year 2 Meeting 2 Data Quality Assurance Luohua.
Data Quality Case Study Prepared by ORC Macro. 2 Background –Data Correction Tracking system SAS AF query application Guidelines –Profile Analysis SSNs.
Edit and Imputation of the 2011 Abu Dhabi Census Glenn Hui and Hanan AlDarmaki Statistics Centre - Abu Dhabi UNECE CES Work Session on Statistical Data.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Benefits from Formal and Informal Assessments
Spanish Question Answering Evaluation Anselmo Peñas, Felisa Verdejo and Jesús Herrera UNED NLP Group Distance Learning University of Spain CICLing 2004,
Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum 2007 UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.
PEIMS and Accountability. Clear System of Data Quality Documentation (Enrollment, Special Program, etc.) PEIMS Data Entry Pearson Data File Answer Documents.
Impressions of 10 years of CLEF Donna Harman Scientist Emeritus National Institute of Standards and Technology.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003.
The PrestoSpace Project Valentin Tablan. 2 Sheffield NLP Group, January 24 th 2006 Project Mission The 20th Century was the first with an audiovisual.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
SAT Review 1.Which is the equation of a line that passes through the pt (7, - 1) and is to y + 2x = 1. A. y = 2x – 15 B. y = –2x + 13 C. D. 2.Line p is.
Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.
QUALITY OF EVIDENCE FRCC Compliance Workshop September/October 2008.
 ELRA & ELDA TC-STAR General Meeting Lux KC 1 European Language Resources Association (ELRA) HLT Evaluations Khalid CHOUKRI ELRA/ELDA 55 Rue.
Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang.
Quality Control of Language Resources at ELRA Henk van den Heuvel a, Khalid Choukri b, Harald Höge c, Bente Maegaard d, Jan Odijk e, Valerie Mapelli b.
Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley Ray R Larson School of Information.
Evaluating Multilingual Question Answering Systems at CLEF Pamela Forner 1, Danilo Giampiccolo 1, Bernardo Magnini 2, Anselmo Peñas 3, Álvaro Rodrigo 3,
Saskia Sluiter and Erna Gille (CITO, The Netherlands) 3 June 2005 EALTA conference Voss EBAFLS : Building a European Bank of Anchor items for Foreign Language.
Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News (proceedings page 255) Mike Dowman Valentin Tablan Hamish Cunningham.
CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1),
CLEF 2008 Final Session CLEF 2008 Workshop, Aarhus, Denmark September 2008.
LREC 2010 Malta, May 20, 2010  ELDA 1 Evaluation Protocol and Tools for Question-Answering on Speech Transcripts N. Moreau, O. Hamon, D. Mostefa ELDA/ELRA,
California Assessment of Student Performance and Progress (CAASPP) Guidelines for Submitting Appeals for the 2014 California Smarter Balanced Field Test.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.
Assessment Specifications Gronlund, Chapter 4 Gronlund, Chapter 5.
Speech and Music Retrieval INST 734 Doug Oard Module 12.
Text REtrieval Conference (TREC) Implementing a Question-Answering Evaluation for AQUAINT Ellen M. Voorhees Donna Harman.
CLEF 2007 Workshop Budapest, Hungary, 19–21 September 2007 Nicola Ferro Information Management Systems (IMS) Research Group Department of Information Engineering.
Table 5 and Follow-up Collecting and Reporting Follow-up Data Or What’s Behind Table 5? American Institutes for Research February 2005.
Thomas Mandl: GeoCLEF Track Overview Cross-Language Evaluation Forum (CLEF) Thomas Mandl, (U. Hildesheim) 8 th Workshop.
Workshop #1: Introduction to Graduation Project Wednesday September 2 nd at 10 am Capstone Committee Department of Computer Science.
QA Pilot Task at CLEF 2004 Jesús Herrera Anselmo Peñas Felisa Verdejo UNED NLP Group Cross-Language Evaluation Forum Bath, UK - September 2004.
Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.
LREC Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.
CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.
1 Analysis of Exam Scripts Department of Mathematical Sciences Mechanics 1 Level Department of Mechanical Engineering Engineering Science 1 Level.
The FEDEC European Pedagogical Exchanges (EPE) network European Federation of Professional Circus Schools.
Introduction to Developing Learning Objectives Course Introduction Go to Lesson Two Go to Lesson One.
1 INFILE - INformation FILtering Evaluation Evaluation of adaptive filtering systems for business intelligence and technology watch Towards real use conditions.
F. López-Ostenero, V. Peinado, V. Sama & F. Verdejo
FOUNDATION AQA GCSE FRENCH AND SPANISH
Experiments for the CL-SR task at CLEF 2006
The Structure of the Exam
What is the Entrance Exams Task
The Structure of the Exam
UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks to…
1Micheal T. Adenibuyan, 2Oluwatoyin A. Enikuomehin and 2Benjamin S
Test Administrators Last updated: 08/20/09.
CDM Capacity Development Lessons learnt in Ghana, India, Indonesia, South Africa and Tunisia Anja Wucke DNA Forum, Addis Ababa, 6 October 2007.
EFSA’s dedicated support for SMEs
Machine Reading.
CLEF 2008 Multilingual Question Answering Track
Presentation transcript:

CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. R. Comas,TALP Research Centre (UPC), Barcelona, Spain S. Rosset, O. Galibert, LIMSI, Paris, France N. Moreau, D. MostefaELDA/ELRA, Paris, France P. Rosso, D. BuscaldiNLE Lab. - ELiRF Research Group (UPV), Spain QAST Website :

CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 2 Objectives of QAST #3 (after 2007 & 2008)‏ -Development of robust QA for speech transcripts -Measure loss due to ASR inaccuracies Manual Transcriptions / ASR Transcriptions -Measure loss at different ASR word error rates -New in 2009: test with oral questions Written Questions / Spontaneous Oral Questions

CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 3 Evaluation Data & Tasks 1 manual 3 ASR: WER= 11.9%, 23.9%, 35.4% T3(a): Written Questions10h (18 Broadcast News Shows)‏ FrenchESTER T3(b): Oral Questions T2(b): Oral Questions T2(a): Written Questions T1(b): Oral Questions T1(a): Written Questions Tasks 1 manual 3 ASR: WER= 11.5%, 12.7%, 13.7% 3h (6 European Parliament Sessions)‏ SpanishEPPS-ES 1 manual 3 ASR: WER= 10.6%, 14.0%, 24.1% 3h (6 European Parliament Sessions)‏ EnglishEPPS-EN TranscriptionsDescriptionLang.Corpus For each task, 4 different transcriptions: 1 manual transcription 3 ASR transcriptions (  WER)‏

CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 4 Oral Questions Procedure to create spontaneous oral questions: Random selection of passages in the collections Humans read passages and ask a few questions Exact manual transcriptions of spontaneous questions Question filtering (remove invalid ones)‏ Creation of written versions of oral questions Example:ORAL: When did the bombing of Fallujah t() take took place? WRITTEN: When did the bombing of Fallujah take place? #speakers #quest. recorded #valid 7.1T3 (ESTER, French)‏ 7.7T2 (EPPS, Spanish)‏ 9.1T1 (EPPS, English)‏ avg. #wordsTask

CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 5 68% 55% 75% %Fact. 32% 45% 25% %Def. 21% 23% 18% %NIL 10050T3 (ESTER, French)‏ 10050T2 (EPPS, Spanish)‏ 10050T1 (EPPS, English)‏ # test questions # dev questions Task Final Question Sets Final Selection: Factual questions: 5 types: Person, Location, Organization, Measure, Time Definition questions: 3 types of answers: Person, Organization, Other ‘NIL’ questions

CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 6 Participants could submit up to: –2 submissions per task and transcript => max. 48 –Up to 5 ranked answers per question Answers for ‘manual transcriptions’ tasks: Answer_string + Doc_ID Answers for ‘automatic transcriptions’ tasks: Answer_string + Doc_ID + Time_start + Time_end Submissions

CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 7 Four possible judgments (as in Correct / Incorrect / Inexact / Unsupported ‘Manual transcriptions’ tasks: Manual assessment with the QASTLE interface ‘Automatic’ transcriptions tasks Automatic assessment (script) + manual check 2 metrics: –Mean Reciprocal Rank (MRR)‏ measures how well right answers are ranked on average –Accuracy fraction of correct answers ranked in the first position Assessments

CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 8 86 submissions from 4 participants: Participants T3 French T2 Spanish T1 English T3bT3aT2bT2aT1bT1a LIMSI (France)‏ TOTAL: UPC (Spain)‏ TOK (Japan)‏ INAOE (Mexico)‏

CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 9 Best results for ASR transcriptions Manual 10.6% T1 14.0% 24.1% 11.5% T2 12.7% 13.7% 11.9% T3 23.9% 35.4% TaskWER 29.0% % % % % % % % % % % %0.39 Acc(%)‏MRR (a) Written – All 28.0% % % % % % % % % % % %0.37 Acc(%)‏MRR (b) Oral – All

CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 10 4 participants (5 in 2007 and 2008)‏ New methodology for creating “spontaneous” questions Loss in accuracy compared to 2008 Even harder evaluation, but closer to real applications QAST 2010 ??? –Difficult task, but promising –Find more participants –Find new data (manual + ASR transcriptions)… Conclusion