LREC 2010 Malta, May 20, 2010  ELDA 1 Evaluation Protocol and Tools for Question-Answering on Speech Transcripts N. Moreau, O. Hamon, D. Mostefa ELDA/ELRA,

Slides:



Advertisements
Similar presentations
Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
Advertisements

Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji, and.
ResPubliQA 2010: QA on European Legislation Anselmo Peñas, UNED, Spain Pamela Forner, CELCT, Italy Richard Sutcliffe, U. Limerick, Ireland Alvaro Rodrigo,
TrebleCLEF Evaluation package and language resources CLEF Evaluation Packages Nicolas Moreau / Khalid Choukri - ELDA.
CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner.
Languages & The Media, 4 Nov 2004, Berlin 1 Multimodal multilingual information processing for automatic subtitle generation: Resources, Methods and System.
Tools and resources Summary of working group discussion.
The use of a computerized automated feedback system Trevor Barker Dept. Computer Science.
Designing a Multi-Lingual Corpus Collection System Jonathan Law Naresh Trilok Pace University 04/19/2002 Advisors: Dr. Charles Tappert (Pace University)
1 Spontaneous-Speech Dialogue System In Limited Domains ( ) Development of an oral human-machine interface, by way of dialogue, for a semantically.
Quality-Aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim, Aixin Sun, and Roger H. L. Chiang. In Proceedings.
Spanish Question Answering Evaluation Anselmo Peñas, Felisa Verdejo and Jesús Herrera UNED NLP Group Distance Learning University of Spain CICLing 2004,
® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John Sabatini and Lei Chen Educational Testing Service.
Term Paper: Oral Presentation CS4001 Kristin Marsicano.
The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003.
MSE Presentation 3 By Padmaja Havaldar- Graduate Student
CHATS IN THE CLASSROOM: EVALUATIONS FROM THE PERSPECTIVES OF STUDENTS AND TUTORS AT CHEMNITZ UNIVERSITY OF TECHNOLOGY, COMMUNICATION ON TECHNOLOGY AND.
NM – LREC 2008 /1 N. Moreau 1, D. Mostefa 1, R. Stiefelhagen 2, S. Burger 3, K. Choukri 1 1 ELDA, 2 UKA-ISL, 3 CMU s:
Machine Learning for Language Technology Introduction to Weka: Arff format and Preprocessing.
 ELRA & ELDA TC-STAR General Meeting Lux KC 1 European Language Resources Association (ELRA) HLT Evaluations Khalid CHOUKRI ELRA/ELDA 55 Rue.
Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang.
Enhanced Infrastructure for Creation & Collection of Translation Resources Zhiyi Song, Stephanie Strassel (speaker), Gary Krug, Kazuaki Maeda.
Quality Control of Language Resources at ELRA Henk van den Heuvel a, Khalid Choukri b, Harald Höge c, Bente Maegaard d, Jan Odijk e, Valerie Mapelli b.
Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.
Evaluating Multilingual Question Answering Systems at CLEF Pamela Forner 1, Danilo Giampiccolo 1, Bernardo Magnini 2, Anselmo Peñas 3, Álvaro Rodrigo 3,
Learn > Succeed WebEx University. 2 Online Self-Paced Training: Anytime, Anywhere! What is Self-Paced Training? WebEx Interactive Learning features a.
Virtual presentation. Session: e-Learning, Training, Evaluation and Assessment TEACHERS’ CORRECTION ACTIONS OF FREE WORDED EXERCISE SOLUTIONS FOR MODELING.
1 Sentence-extractive automatic speech summarization and evaluation techniques Makoto Hirohata, Yosuke Shinnaka, Koji Iwano, Sadaoki Furui Presented by.
Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.
Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002.
CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1),
CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. R. Comas,TALP.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
OSHA Long Term Care Worker Protection Train the Trainer Program Part 7: Evaluation Tools/Post-Test.
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
The ATLAS Cloud Model Simone Campana. LCG sites and ATLAS sites LCG counts almost 200 sites. –Almost all of them support the ATLAS VO. –The ATLAS production.
LREC Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.
Introduction A field survey of Dutch language resources has been carried out within the framework of a project launched by the Dutch Language Union (Nederlandse.
AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
Test Delivery System (TDS) & Online Reporting System (ORS) for ELPA21 Online Testing Training Webinar Copyright © 2014 American Institutes for Research.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Learning Deep Rhetorical Structure for Extractive Speech Summarization ICASSP2010 Justin Jian Zhang and Pascale Fung HKUST Speaker: Hsiao-Tsung Hung.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
Pruning Analysis for the Position Specific Posterior Lattices for Spoken Document Search Jorge Silva University of Southern California Ciprian Chelba and.
1 INFILE - INformation FILtering Evaluation Evaluation of adaptive filtering systems for business intelligence and technology watch Towards real use conditions.
QA Process within OEM Services Ethan Chang QA Engineer OEM Service, Canonical
Using Speech Recognition to Predict VoIP Quality
ELPA21 Data Entry Interface (DEI) Overview
Student Registration/ Personal Needs Profile
The Scientific Method.
ELPA21 Data Entry Interface (DEI) Overview
Online Testing System Assessment Viewing Application (AVA)
ELPA21 Data Entry Interface (DEI) Overview
Head, IT Systems Section
Data Entry Interface (DEI) Overview
Emergency drill: ECB’s medical scheme and DPIAs
Head, IT Systems Section
9.a Report on IPC-related IT systems IPC Committee of Experts 50
Course: Module: Lesson # & Name Instructional Material 1 of 32 Lesson Delivery Mode: Lesson Duration: Document Name: 1. Professional Diploma in ERP Systems.
Follow-up actions to the June 2017 Standards Working Group meeting
Student Registration/ Personal Needs Profile
1Micheal T. Adenibuyan, 2Oluwatoyin A. Enikuomehin and 2Benjamin S
Student Registration/ Personal Needs Profile
Demonstration #1 Tool Classes: Cost Task
CLEF 2008 Multilingual Question Answering Track
Presentation transcript:

LREC 2010 Malta, May 20, 2010  ELDA 1 Evaluation Protocol and Tools for Question-Answering on Speech Transcripts N. Moreau, O. Hamon, D. Mostefa ELDA/ELRA, Paris, France S. Rosset, O. Galibert, L. LamelLIMSI, Paris, France J. Turmo, P. R. ComasUPC, Barcelona, Spain P. Rosso, D. BuscaldiUPV, Valencia, Spain Contact:

LREC 2010 Malta, May 20, 2010  ELDA Outline -What is QAST? -QAST evaluations -Evaluation data and tasks -QASTLE evaluation interface -Overview of main results -Conclusions and perspectives 2

LREC 2010 Malta, May 20, 2010  ELDA 3 What is QAST? -QAST stands for Question-Answering on Speech Transcripts -4 QAST evaluation campaigns (in 2006, 2007, 2008, 2009) -Organized by UPC, UPV, LIMSI and ELDA. -Goals: -Development of robust QA for speech -Measure loss due to ASR inaccuracies -Measure loss at different ASR word error rates -Measure loss when using oral spontaneous questions (in 2009)

LREC 2010 Malta, May 20, 2010  ELDA QAST evaluations CHIL(EN) Manual transc 1 ASR output CHIL (EN) AMI (EN) Manual transc 1 ASR output +Words graph CHIL(EN) AMI (EN) ESTER (FR) EPPS (EN) EPPS (ES) Manual trans 3 ASR ESTER (FR) EPPS (EN) EPPS (ES) Manual trans 3 ASR outputs Written questionsOral questions 4

LREC 2010 Malta, May 20, 2010  ELDA 5 QAST Data Sets CorpusLanguageDescriptionSpeech TranscriptsWERCampaigns CHILEnglish 25 lectures (~25h) Manual-2006,2007, 2008 ASR20%2006, 2007, 2008 AMIEnglish 168 meetings (~100h) Manual-2007, 2008 ASR38%2007, 2008 ESTERFrench 18 BN shows (~10h) Manual-2008, 2009 ASR11.9%2008, 2009 ASR23.9%2008, 2009 ASR35.4%2008, 2009 EPPSEnglish 6 sessions (~3h) Manual-2008, 2009 ASR10.6%2008, 2009 ASR14.0%2008, 2009 ASR24.1%2008, 2009 EPPSSpanish 6 sessions (~3h) Manual-2008, 2009 ASR11.5%2008, 2009 ASR12.7%2008, 2009 ASR13.7%2008, 2009

LREC 2010 Malta, May 20, 2010  ELDA 6 Questions and Evaluation Tasks Different evaluation tasks: –QA in manual transcriptions –QA in automatic transcriptions (ASR) –QA using written questions –QA using transcription of oral questions Question sets created each year for each dataset: –100 questions for training + 50 questions for tests –Question type: Factual + Definitional –New in 2009: spontaneous oral questions

LREC 2010 Malta, May 20, 2010  ELDA 7 Creation of oral questions People were presented short text excerpts (taken from the corpus) After reading each excerpt they had to ask a few ‘spontaneous’ questions Oral questions were recorded Oral questions were manually transcribed (including speech disfluencies) A canonical “written” version was created for each question Example:Oral:When did the bombing of Fallujah t() take euh took place? Written:When did the bombing of Fallujah take place?

LREC 2010 Malta, May 20, 2010  ELDA 8 Up to 5 ranked answers per question Answers for ‘manual transcriptions’ tasks: Answer_string + Doc_ID Answers for ‘automatic transcriptions’ tasks: Answer_string + Doc_ID + Time_start + Time_end Submissions Time slot of the answer

LREC 2010 Malta, May 20, 2010  ELDA Four possible judgments : Correct / Incorrect / Inexact / Unsupported QA on manual transcriptions: Manual assessment with the QASTLE interface QA on automatic (ASR) transcriptions: Automatic assessment (script) + manual check with QASTLE 2 metrics: –Mean Reciprocal Rank (MRR) measures how well right answers are ranked on average –Accuracy fraction of correct answers ranked in the first position Assessments 9

LREC 2010 Malta, May 20, 2010  ELDA 10 QASTLE interface

LREC 2010 Malta, May 20, 2010  ELDA Automatic script to assess QA on ASR transcriptions The script compares of time slot boundaries of: –Reference time slot (created beforehand) –Hypothesis time slot (submitted answer) The overlap is compared to a predefined threshold: –overlap > threshold => Answer is CORRECT –overlap Answer is INEXACT –no overlap=> Answer is INCORRECT 2nd pass: Manual check with QASTLE Semi-automatic assessments 11

LREC 2010 Malta, May 20, 2010  ELDA 12 Best results (Accuracy %) CorpusTranscr Written Q Oral Q. CHIL Manual ASR (20.0%) AMI Manual ASR (38.0%) ESTER Manual ASR (11.9%) ASR (23.9%) ASR (35.4%) EPPS-EN Manual ASR (10.6%) ASR (14.0%) ASR (24.1%) EPPS-ES Manual ASR (11.5%) ASR (12.7%) ASR (13.7%)

LREC 2010 Malta, May 20, 2010  ELDA 13 We presented evaluation campaigns of QA on speech data Evaluations were done for several languages and on different data (seminars, meetings, BN, parliament speeches) New methodology for semi-automatic evaluation of QA in ASR transcriptions QASTLE interface free for download Conclusion & perspectives (1/2)

LREC 2010 Malta, May 20, 2010  ELDA Future evaluation campaigns: –Multilingual / cross lingual QA –Oral questions with ASR transcription of the questions QAST evaluation package soon available through the ELRA Catalog of language resources Conclusion & perspectives

LREC 2010 Malta, May 20, 2010  ELDA 15 Thank you for your attention... QAST : QASTLE : ELRA Catalogue of Language Resources: