Download presentation
Presentation is loading. Please wait.
Published byMilo Bradley Modified over 8 years ago
1
LREC 2010 Malta, May 20, 2010 ELDA 1 Evaluation Protocol and Tools for Question-Answering on Speech Transcripts N. Moreau, O. Hamon, D. Mostefa ELDA/ELRA, Paris, France S. Rosset, O. Galibert, L. LamelLIMSI, Paris, France J. Turmo, P. R. ComasUPC, Barcelona, Spain P. Rosso, D. BuscaldiUPV, Valencia, Spain Contact: moreau@elda.org
2
LREC 2010 Malta, May 20, 2010 ELDA Outline -What is QAST? -QAST evaluations -Evaluation data and tasks -QASTLE evaluation interface -Overview of main results -Conclusions and perspectives 2
3
LREC 2010 Malta, May 20, 2010 ELDA 3 What is QAST? -QAST stands for Question-Answering on Speech Transcripts -4 QAST evaluation campaigns (in 2006, 2007, 2008, 2009) -Organized by UPC, UPV, LIMSI and ELDA. -Goals: -Development of robust QA for speech -Measure loss due to ASR inaccuracies -Measure loss at different ASR word error rates -Measure loss when using oral spontaneous questions (in 2009)
4
LREC 2010 Malta, May 20, 2010 ELDA QAST evaluations 2006200720082009 CHIL(EN) Manual transc 1 ASR output CHIL (EN) AMI (EN) Manual transc 1 ASR output +Words graph CHIL(EN) AMI (EN) ESTER (FR) EPPS (EN) EPPS (ES) Manual trans 3 ASR outputs @ ESTER (FR) EPPS (EN) EPPS (ES) Manual trans 3 ASR outputs Written questionsOral questions 4
5
LREC 2010 Malta, May 20, 2010 ELDA 5 QAST Data Sets CorpusLanguageDescriptionSpeech TranscriptsWERCampaigns CHILEnglish 25 lectures (~25h) Manual-2006,2007, 2008 ASR20%2006, 2007, 2008 AMIEnglish 168 meetings (~100h) Manual-2007, 2008 ASR38%2007, 2008 ESTERFrench 18 BN shows (~10h) Manual-2008, 2009 ASR11.9%2008, 2009 ASR23.9%2008, 2009 ASR35.4%2008, 2009 EPPSEnglish 6 sessions (~3h) Manual-2008, 2009 ASR10.6%2008, 2009 ASR14.0%2008, 2009 ASR24.1%2008, 2009 EPPSSpanish 6 sessions (~3h) Manual-2008, 2009 ASR11.5%2008, 2009 ASR12.7%2008, 2009 ASR13.7%2008, 2009
6
LREC 2010 Malta, May 20, 2010 ELDA 6 Questions and Evaluation Tasks Different evaluation tasks: –QA in manual transcriptions –QA in automatic transcriptions (ASR) –QA using written questions –QA using transcription of oral questions Question sets created each year for each dataset: –100 questions for training + 50 questions for tests –Question type: Factual + Definitional –New in 2009: spontaneous oral questions
7
LREC 2010 Malta, May 20, 2010 ELDA 7 Creation of oral questions People were presented short text excerpts (taken from the corpus) After reading each excerpt they had to ask a few ‘spontaneous’ questions Oral questions were recorded Oral questions were manually transcribed (including speech disfluencies) A canonical “written” version was created for each question Example:Oral:When did the bombing of Fallujah t() take euh took place? Written:When did the bombing of Fallujah take place?
8
LREC 2010 Malta, May 20, 2010 ELDA 8 Up to 5 ranked answers per question Answers for ‘manual transcriptions’ tasks: Answer_string + Doc_ID Answers for ‘automatic transcriptions’ tasks: Answer_string + Doc_ID + Time_start + Time_end Submissions Time slot of the answer
9
LREC 2010 Malta, May 20, 2010 ELDA Four possible judgments : Correct / Incorrect / Inexact / Unsupported QA on manual transcriptions: Manual assessment with the QASTLE interface QA on automatic (ASR) transcriptions: Automatic assessment (script) + manual check with QASTLE 2 metrics: –Mean Reciprocal Rank (MRR) measures how well right answers are ranked on average –Accuracy fraction of correct answers ranked in the first position Assessments 9
10
LREC 2010 Malta, May 20, 2010 ELDA 10 QASTLE interface
11
LREC 2010 Malta, May 20, 2010 ELDA Automatic script to assess QA on ASR transcriptions The script compares of time slot boundaries of: –Reference time slot (created beforehand) –Hypothesis time slot (submitted answer) The overlap is compared to a predefined threshold: –overlap > threshold => Answer is CORRECT –overlap Answer is INEXACT –no overlap=> Answer is INCORRECT 2nd pass: Manual check with QASTLE Semi-automatic assessments 11
12
LREC 2010 Malta, May 20, 2010 ELDA 12 Best results (Accuracy %) CorpusTranscr.200720082009 Written Q. 2009 Oral Q. CHIL Manual0.510.41-- ASR (20.0%)0.360.31-- AMI Manual0.250.33-- ASR (38.0%)0.210.18-- ESTER Manual-0.450.280.26 ASR (11.9%)-0.410.260.25 ASR (23.9%)-0.250.21 ASR (35.4%)-0.21 0.20 EPPS-EN Manual-0.340.36 ASR (10.6%)-0.300.270.26 ASR (14.0%)-0.200.25 ASR (24.1%)-0.190.230.24 EPPS-ES Manual-0.310.28 ASR (11.5%)-0.240.29 ASR (12.7%)-0.200.270.25 ASR (13.7%)-0.23 0.22 12
13
LREC 2010 Malta, May 20, 2010 ELDA 13 We presented evaluation campaigns of QA on speech data Evaluations were done for several languages and on different data (seminars, meetings, BN, parliament speeches) New methodology for semi-automatic evaluation of QA in ASR transcriptions QASTLE interface free for download Conclusion & perspectives (1/2)
14
LREC 2010 Malta, May 20, 2010 ELDA Future evaluation campaigns: –Multilingual / cross lingual QA –Oral questions with ASR transcription of the questions QAST 2007-2009 evaluation package soon available through the ELRA Catalog of language resources Conclusion & perspectives
15
LREC 2010 Malta, May 20, 2010 ELDA 15 Thank you for your attention... QAST : http://www.lsi.upc.edu/~qast/2009 QASTLE : http://elda.org/qastle/ ELRA Catalogue of Language Resources: http://catalog.elra.info/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.