LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

Slides:

Advertisements

Similar presentations

Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.

Advertisements

Seminar on Language Teaching IKG 743 (2) A LECTURE BY Rida Wahyuningrum ENGLISH DEPARTMENT FACULTY OF LANGUAGE AND SCIENCE SURABAYA WIJAYA KUSUMA UNIVERSITY.

Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.

Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:

Evaluation State-of the-art and future actions Bente Maegaard CST, University of Copenhagen

TrebleCLEF Evaluation package and language resources CLEF Evaluation Packages Nicolas Moreau / Khalid Choukri - ELDA.

CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner.

T. O. E. I. C. Bridge Test. Special features :  designed for beginning and lower-intermediate level students whose native language is not English. 

Centro per la Ricerca Scientifica e Tecnologica Spoken language technologies: recent advances and future challenges Gianni Lazzari VIENNA July 26.

J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators.

ASR Evaluation Julia Hirschberg CS Outline Intrinsic Methods –Transcription Accuracy Word Error Rate Automatic methods, toolkits Limitations –Concept.

Third Recognizing Textual Entailment Challenge Potential SNeRG Submission.

DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.

Spanish Question Answering Evaluation Anselmo Peñas, Felisa Verdejo and Jesús Herrera UNED NLP Group Distance Learning University of Spain CICLing 2004,

LREC Combining Multiple Models for Speech Information Retrieval Muath Alzghool and Diana Inkpen University of Ottawa Canada.

Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.

Twenty-First Century Automatic Speech Recognition: Meeting Rooms and Beyond ASR 2000 September 20, 2000 John Garofolo

The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003.

The PrestoSpace Project Valentin Tablan. 2 Sheffield NLP Group, January 24 th 2006 Project Mission The 20th Century was the first with an audiovisual.

The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.

AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.

Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

NM – LREC 2008 /1 N. Moreau 1, D. Mostefa 1, R. Stiefelhagen 2, S. Burger 3, K. Choukri 1 1 ELDA, 2 UKA-ISL, 3 CMU s:

Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.

CHOICES TRANSNACIONAL PROJECT EQUAL PROGRAMME CHOICES TRANSNACIONAL PROJECT EQUAL PROGRAMME Transnational Meeting, Helsinki (Finland) 22th-23th May 2006.

 ELRA & ELDA TC-STAR General Meeting Lux KC 1 European Language Resources Association (ELRA) HLT Evaluations Khalid CHOUKRI ELRA/ELDA 55 Rue.

Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley Ray R Larson School of Information.

Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.

Saskia Sluiter and Erna Gille (CITO, The Netherlands) 3 June 2005 EALTA conference Voss EBAFLS : Building a European Bank of Anchor items for Foreign Language.

Natural Language Processing Course Project: Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University

Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News (proceedings page 255) Mike Dowman Valentin Tablan Hamish Cunningham.

Rundkast at LREC 2008, Marrakech LREC 2008 Ingunn Amdal, Ole Morten Strand, Jørn Almberg, and Torbjørn Svendsen RUNDKAST: An Annotated.

CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1),

LREC 2008 Marrakech1 Clustering Related Terms with Definitions Scott Piao, John McNaught and Sophia Ananiadou

LREC 2010 Malta, May 20, 2010  ELDA 1 Evaluation Protocol and Tools for Question-Answering on Speech Transcripts N. Moreau, O. Hamon, D. Mostefa ELDA/ELRA,

CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. R. Comas,TALP.

1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.

1 DUTIE Speech: Determining Utility Thresholds for Information Extraction from Speech John Makhoul, Rich Schwartz, Alex Baron, Ivan Bulyko, Long Nguyen,

Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington.

LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,

Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.

Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq

公司標誌 Question Answering System Introduction to Q-A System 資訊四 B 張弘霖資訊四 B 王惟正.

Introduction A field survey of Dutch language resources has been carried out within the framework of a project launched by the Dutch Language Union (Nederlandse.

1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.

AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.

K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.

CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,

CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.

Annual Review, Brussels March XX, 2006 SemanticMining No Annual Review NoE No Semantic Interoperability and Data Mining in Biomedicine WP20.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

Correcting Misuse of Verb Forms John Lee, Stephanie Seneff Computer Science and Artiﬁcial Intelligence Laboratory, MIT, Cambridge ACL 2008.

Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,

Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.

EUPATI Training Course Patient Experts in Medicines Research & Development The project is receiving support from the Innovative Medicines Initiative Joint.

1 INFILE - INformation FILtering Evaluation Evaluation of adaptive filtering systems for business intelligence and technology watch Towards real use conditions.

Multimedia Semantic Analysis in the PrestoSpace Project Valentin Tablan, Hamish Cunningham, Cristian Ursu NLP Research Group University of Sheffield Regent.

Prepared by: Maha Ounis Institution: Faculty of Sciences in Gabes

Experiments for the CL-SR task at CLEF 2006

CSE 635 Multimedia Information Retrieval

What is the Entrance Exams Task

UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks to…

University of Illinois System in HOO Text Correction Shared Task

1Micheal T. Adenibuyan, 2Oluwatoyin A. Enikuomehin and 2Benjamin S

EXAMPLES OF E-LEARNING MATERIALS EIT Meeting – Cyprus September 2007

Machine Reading.

CLEF 2008 Multilingual Question Answering Track

Presentation transcript:

LREC Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa 2, J. Turmo 3 and P. Comas 3 (3) LIMSI-CNRS, France (2) ELDA, France (3) LIMSI-CNRS, France QAST Website :

LREC Marrakech, May 29, 2008 Outline 1.Motivations 2.Objectives 3.QAST Tasks 2.Participants 3.Results 4.QAST Conclusion

LREC Marrakech, May 29, 2008 QAST Organization Evaluation campaign is jointly organized by : -UPC, Spain (J. Turmo, P. Comas) Coordinator -ELDA, France (N. Moreau, C. Ayache, D. Mostefa) -LIMSI, France (S. Rosset, L. Lamel)

LREC Marrakech, May 29, 2008 Motivations Much of human interaction is via spoken language QA research developed techniques for written texts with correct syntactic and semantic structures Spoken data is very different from textual data –Speech phenomena, false starts, speech corrections, truncated words, etc –Grammatical structure of spontenous speech is very particular –No punctuation and no capitalization –For meetings, interaction creates run-on sentences where the distance between the first part and the last one can be very long

LREC Marrakech, May 29, 2008 Objectives In general, motivating and driving the design of novel and robust factual QA architectures for automatic speech transcriptions. Comparing the performances systems dealing with both types of transcriptions and both types of questions (fatual and definitional). Measuring the loss of each system due to ASR. Measuring the loss of each system due to the ASR output degradation.

LREC Marrakech, May 29, 2008 Corpus: –The CHIL corpus: 25 seminars of 1 hour each Spontenous speech English spoken by non native speakers Domain of lectures: Speech and language processing Manual transcription done by ELDA Automatic transcription provided by LIMSI –The AMI corpus: 168 meetings (100 hours) Spontenous speech English Domain of meetings: Design of television remote control Manual transcription done by AMI Automatic transcription provided by AMI 4 tasks: –T1 : QA in manual transcriptions of lectures –T2 : QA in automatic transcriptions of lectures –T3 : QA in manual transcriptions of meetings –T4 : QA in automatic transcriptions of meetings QAST 2007: Resources and tasks

LREC Marrakech, May 29, 2008 For each task, 2 sets of questions were provided: Development set: –Lectures: 10 seminars, 50 questions –Meetings: 50 meetings, 50 questions Evaluation set: –Lectures: 15 seminars, 100 questions –Meetings: 118 meetings, 100 questions Factual questions. No definition questions. Expected answers = named entities. List of NEs: person, location, organization, language, system/method, measure, time, color, shape, material. QAST 2007 : development and evaluation

LREC Marrakech, May 29, 2008 Assessors used QASTLE, an evaluation tool developed by ELDA, to evaluate the data. QAST 2007: Human judgment

LREC Marrakech, May 29, 2008 Four possible judgments: –Correct –Incorrect –Non-Exact –Unsupported Two metrics were used: –Mean Reciprocal Rank (MRR): measures how well ranked is a right answer. –Accuracy: the fraction of correct answers ranked in the first position in the list of 5 possible answers Participants could submit up to 2 submissions per task and 5 answers per question. Task: Scoring

LREC Marrakech, May 29, 2008 Five teams submitted results for one or more QAST tasks: –CLT, Center for Language Technology, Australia ; –DFKI, Germany ; –LIMSI, Laboratoire d’Informatique et de Mécanique des Sciences de l’Ingénieur, France ; –TOKYO, Tokyo Institute of Technology, Japan ; –UPC, Universitat Politècnica de Catalunya, Spain. In total, 28 submission files were evaluated: Participants CHIL CorpusAMI Corpus T1T2T3T4 8 submissions9 submissions5 submissions6 submissions

LREC Marrakech, May 29, 2008 Results for CHIL lectures (T1 and T2) System ManualAutomatic MRRAccuracyMRRAccuracy S S S S S S S S S

LREC Marrakech, May 29, 2008 Results for AMI meetings (T3 and T4) System ManualAutomatic MRRAccuracyMRRAccuracy S S S S S S

LREC Marrakech, May 29, 2008 QAST 2008 Extension of QAST 2007: –3 languages: French, English, Spanish –4 domains: Broadcast news, Parliament speeches, Lectures, Meetings –Different level of WERs (10%, 20% and 30%) –Factual and Definition questions 5 corpora –CHIL lectures –AMI meetings –TC-STAR05 EPPS English corpus –TC-STAR05 EPPS Spanish corpus –ESTER French broadcast news corpus Evaluation from June 15-June 30

LREC Marrakech, May 29, 2008 QAST 2008 tasks T1a: Question Answering in manual transcriptions of lectures (CHIL corpus) T1b: Question Answering in automatic transcriptions of lectures (CHIL corpus) T2a: Question Answering in manual transcriptions of meetings (AMI corpus) T2b: Question Answering in automatic transcriptions of meetings (AMI corpus) T3a: Question Answering in manual transcriptions of broadcast news for French (ESTER corpus) T3b: Question Answering in automatic transcriptions of broadcast news for French (ESTER corpus) T4a: Question Answering in manual transcriptions of European Parliament Plenary sessions in English (EPPS English corpus) T4b: Question Answering in automatic transcriptions of European Parliament Plenary sessions in English (EPPS English corpus) T5a: Question Answering in manual transcriptions of European Parliament Plenary sessions in Spanish (EPPS Spanish corpus) T5b: Question Answering in automatic transcriptions of European Parliament Plenary in Spanish (EPPS Spanish corpus)

LREC Marrakech, May 29, 2008 QAST 2008 schedulte 11 March 2008: Development sets released 15 June 2008: Evaluation set released 30 June 2008: Submission deadline 30 July 2008: Release of individual results 15 August 2008: Paper submission deadline September 2008: CLEF workshop in Aarhus

LREC Marrakech, May 29, 2008 We presented the Question Answering on Speech Transcripts evaluation campaigns framework QAST 2007 –5 participants from 5 different countries (France, Germany, Spain, Australia and Japan)  28 runs –Encouraging results –High loss in accuracy with ASR output Conclusion and future work (1/2)

LREC Marrakech, May 29, 2008 QAST 2008 is an extension of QAST 2007 (3 languages, 4 domains, definition and factual questions, multiple ASR outputs with different WERs) It’s still time to join QAST 2008 (participation is free) Future work aims at including: –Cross lingual tasks, –Oral questions, –Other domains. Conclusion and future work (2/2)

LREC Marrakech, May 29, 2008 The QAST Website: For more information