Download presentation
Presentation is loading. Please wait.
Published byLetitia Anthony Modified over 9 years ago
1
LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa 2, J. Turmo 3 and P. Comas 3 (3) LIMSI-CNRS, France (2) ELDA, France (3) LIMSI-CNRS, France QAST Website : http://www.lsi.upc.edu/~qast/
2
LREC 2008 2 Marrakech, May 29, 2008 Outline 1.Motivations 2.Objectives 3.QAST 2007 1.Tasks 2.Participants 3.Results 4.QAST 2008 5.Conclusion
3
LREC 2008 3 Marrakech, May 29, 2008 QAST Organization Evaluation campaign is jointly organized by : -UPC, Spain (J. Turmo, P. Comas) Coordinator -ELDA, France (N. Moreau, C. Ayache, D. Mostefa) -LIMSI, France (S. Rosset, L. Lamel)
4
LREC 2008 4 Marrakech, May 29, 2008 Motivations Much of human interaction is via spoken language QA research developed techniques for written texts with correct syntactic and semantic structures Spoken data is very different from textual data –Speech phenomena, false starts, speech corrections, truncated words, etc –Grammatical structure of spontenous speech is very particular –No punctuation and no capitalization –For meetings, interaction creates run-on sentences where the distance between the first part and the last one can be very long
5
LREC 2008 5 Marrakech, May 29, 2008 Objectives In general, motivating and driving the design of novel and robust factual QA architectures for automatic speech transcriptions. Comparing the performances systems dealing with both types of transcriptions and both types of questions (fatual and definitional). Measuring the loss of each system due to ASR. Measuring the loss of each system due to the ASR output degradation.
6
LREC 2008 6 Marrakech, May 29, 2008 Corpus: –The CHIL corpus: 25 seminars of 1 hour each Spontenous speech English spoken by non native speakers Domain of lectures: Speech and language processing Manual transcription done by ELDA Automatic transcription provided by LIMSI –The AMI corpus: 168 meetings (100 hours) Spontenous speech English Domain of meetings: Design of television remote control Manual transcription done by AMI Automatic transcription provided by AMI 4 tasks: –T1 : QA in manual transcriptions of lectures –T2 : QA in automatic transcriptions of lectures –T3 : QA in manual transcriptions of meetings –T4 : QA in automatic transcriptions of meetings QAST 2007: Resources and tasks
7
LREC 2008 7 Marrakech, May 29, 2008 For each task, 2 sets of questions were provided: Development set: –Lectures: 10 seminars, 50 questions –Meetings: 50 meetings, 50 questions Evaluation set: –Lectures: 15 seminars, 100 questions –Meetings: 118 meetings, 100 questions Factual questions. No definition questions. Expected answers = named entities. List of NEs: person, location, organization, language, system/method, measure, time, color, shape, material. QAST 2007 : development and evaluation
8
LREC 2008 8 Marrakech, May 29, 2008 Assessors used QASTLE, an evaluation tool developed by ELDA, to evaluate the data. QAST 2007: Human judgment
9
LREC 2008 9 Marrakech, May 29, 2008 Four possible judgments: –Correct –Incorrect –Non-Exact –Unsupported Two metrics were used: –Mean Reciprocal Rank (MRR): measures how well ranked is a right answer. –Accuracy: the fraction of correct answers ranked in the first position in the list of 5 possible answers Participants could submit up to 2 submissions per task and 5 answers per question. Task: Scoring
10
LREC 2008 10 Marrakech, May 29, 2008 Five teams submitted results for one or more QAST tasks: –CLT, Center for Language Technology, Australia ; –DFKI, Germany ; –LIMSI, Laboratoire d’Informatique et de Mécanique des Sciences de l’Ingénieur, France ; –TOKYO, Tokyo Institute of Technology, Japan ; –UPC, Universitat Politècnica de Catalunya, Spain. In total, 28 submission files were evaluated: Participants CHIL CorpusAMI Corpus T1T2T3T4 8 submissions9 submissions5 submissions6 submissions
11
LREC 2008 11 Marrakech, May 29, 2008 Results for CHIL lectures (T1 and T2) System ManualAutomatic MRRAccuracyMRRAccuracy S10.090.06 0.03 S20.090.05 0.02 S30.170.150.09 S40.370.320.230.20 S50.460.390.240.21 S60.190.140.120.08 S70.200.140.120.08 S80.530.510.370.36 S9 0.250.24
12
LREC 2008 12 Marrakech, May 29, 2008 Results for AMI meetings (T3 and T4) System ManualAutomatic MRRAccuracyMRRAccuracy S10.230.160.100.06 S20.250.200.130.08 S30.280.250.190.18 S40.310.250.190.17 S50.260.250.220.21 S6 0.150.13
13
LREC 2008 13 Marrakech, May 29, 2008 QAST 2008 Extension of QAST 2007: –3 languages: French, English, Spanish –4 domains: Broadcast news, Parliament speeches, Lectures, Meetings –Different level of WERs (10%, 20% and 30%) –Factual and Definition questions 5 corpora –CHIL lectures –AMI meetings –TC-STAR05 EPPS English corpus –TC-STAR05 EPPS Spanish corpus –ESTER French broadcast news corpus Evaluation from June 15-June 30
14
LREC 2008 14 Marrakech, May 29, 2008 QAST 2008 tasks T1a: Question Answering in manual transcriptions of lectures (CHIL corpus) T1b: Question Answering in automatic transcriptions of lectures (CHIL corpus) T2a: Question Answering in manual transcriptions of meetings (AMI corpus) T2b: Question Answering in automatic transcriptions of meetings (AMI corpus) T3a: Question Answering in manual transcriptions of broadcast news for French (ESTER corpus) T3b: Question Answering in automatic transcriptions of broadcast news for French (ESTER corpus) T4a: Question Answering in manual transcriptions of European Parliament Plenary sessions in English (EPPS English corpus) T4b: Question Answering in automatic transcriptions of European Parliament Plenary sessions in English (EPPS English corpus) T5a: Question Answering in manual transcriptions of European Parliament Plenary sessions in Spanish (EPPS Spanish corpus) T5b: Question Answering in automatic transcriptions of European Parliament Plenary in Spanish (EPPS Spanish corpus)
15
LREC 2008 15 Marrakech, May 29, 2008 QAST 2008 schedulte 11 March 2008: Development sets released 15 June 2008: Evaluation set released 30 June 2008: Submission deadline 30 July 2008: Release of individual results 15 August 2008: Paper submission deadline 17-19 September 2008: CLEF workshop in Aarhus
16
LREC 2008 16 Marrakech, May 29, 2008 We presented the Question Answering on Speech Transcripts evaluation campaigns framework QAST 2007 –5 participants from 5 different countries (France, Germany, Spain, Australia and Japan) 28 runs –Encouraging results –High loss in accuracy with ASR output Conclusion and future work (1/2)
17
LREC 2008 17 Marrakech, May 29, 2008 QAST 2008 is an extension of QAST 2007 (3 languages, 4 domains, definition and factual questions, multiple ASR outputs with different WERs) It’s still time to join QAST 2008 (participation is free) Future work aims at including: –Cross lingual tasks, –Oral questions, –Other domains. Conclusion and future work (2/2)
18
LREC 2008 18 Marrakech, May 29, 2008 The QAST Website: http://www.lsi.upc.edu/~qast/ For more information
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.