CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 1 Overview of QAST 2007 - Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1),

Slides:

Advertisements

Similar presentations

Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.

Advertisements

ResPubliQA 2010: QA on European Legislation Anselmo Peñas, UNED, Spain Pamela Forner, CELCT, Italy Richard Sutcliffe, U. Limerick, Ireland Alvaro Rodrigo,

TrebleCLEF Evaluation package and language resources CLEF Evaluation Packages Nicolas Moreau / Khalid Choukri - ELDA.

Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.

CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner.

Vikas BhardwajColumbia University NLP for the Web – Spring 2010 Improving QA Accuracy by Question Inversion Prager et al. IBM T.J. Watson Res. Ctr. 02/18/2010.

1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.

Named Entity Recognition in an Intranet Query Log Richard Sutcliffe 1, Kieran White 1, Udo Kruschwitz University of Limerick, Ireland 2 - University.

XML Document Mining Challenge Bridging the gap between Information Retrieval and Machine Learning Ludovic DENOYER – University of Paris 6.

CLEF 2007 Multilingual Question Answering Track Danilo Giampiccolo, CELCT Anselmo Peñas, UNED.

1 Spontaneous-Speech Dialogue System In Limited Domains ( ) Development of an oral human-machine interface, by way of dialogue, for a semantically.

© Johan Bos November 2005 Carol Beer (Little Britain)

National Institute of Standards and Technology Information Technology Laboratory 2000 TREC-9 Spoken Document Retrieval Track

Spanish Question Answering Evaluation Anselmo Peñas, Felisa Verdejo and Jesús Herrera UNED NLP Group Distance Learning University of Spain CICLing 2004,

LREC Combining Multiple Models for Speech Information Retrieval Muath Alzghool and Diana Inkpen University of Ottawa Canada.

Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.

CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.

CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.

NERIL: Named Entity Recognition for Indian FIRE 2013.

CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003.

JCN, Justice Cooperation Network European Treatment and Transition Management of High Risk Offenders Tallin, 13 March 2013 Second Steering Committee Meeting.

The PrestoSpace Project Valentin Tablan. 2 Sheffield NLP Group, January 24 th 2006 Project Mission The 20th Century was the first with an audiovisual.

AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.

1 Named Entity Recognition based on three different machine learning techniques Zornitsa Kozareva JRC Workshop September 27, 2005.

The Browser Evaluation Test A Proposal Pierre Wellner, Mike Flynn IDIAP, September 2003.

Summary  The task of extractive speech summarization is to select a set of salient sentences from an original spoken document and concatenate them to.

Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.

Implementation of Student Mobility Program within the Frame of TEMPUS CD-JEP 16160/2001 Project Ivan Milentijević Faculty of Electronic Engineering University.

L. BastidasB. Nijssen, H. GuptaW. EmmerichE. Small.

Quality Control of Language Resources at ELRA Henk van den Heuvel a, Khalid Choukri b, Harald Höge c, Bente Maegaard d, Jan Odijk e, Valerie Mapelli b.

Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley Ray R Larson School of Information.

Evaluating Multilingual Question Answering Systems at CLEF Pamela Forner 1, Danilo Giampiccolo 1, Bernardo Magnini 2, Anselmo Peñas 3, Álvaro Rodrigo 3,

Saskia Sluiter and Erna Gille (CITO, The Netherlands) 3 June 2005 EALTA conference Voss EBAFLS : Building a European Bank of Anchor items for Foreign Language.

Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News (proceedings page 255) Mike Dowman Valentin Tablan Hamish Cunningham.

Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002.

Inventor Disambiguation Workshop EVALUATION OUTCOMES.

$1,000,000 $500,000 $100,000 $50,000 $10,000 $5000 $1000 $500 $200 $100 Is this your Final Answer? YesNo Question 2? Correct Answer Wrong Answer.

LREC 2010 Malta, May 20, 2010  ELDA 1 Evaluation Protocol and Tools for Question-Answering on Speech Transcripts N. Moreau, O. Hamon, D. Mostefa ELDA/ELRA,

CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. R. Comas,TALP.

1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.

1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.

TRAILER Tagging, Recognition and Acknowledgment of Informal Learning ExpeRience This project has been funded with support from the European CommisionEuropean.

Thomas Mandl: GeoCLEF Track Overview Cross-Language Evaluation Forum (CLEF) Thomas Mandl, (U. Hildesheim) 8 th Workshop.

Workshop #1: Introduction to Graduation Project Wednesday September 2 nd at 10 am Capstone Committee Department of Computer Science.

Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.

QBSH Corpus The QBSH corpus provided by Roger Jang [1] consists of recordings of children’s songs from students taking the course “Audio Signal Processing.

Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq

1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.

LREC Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

Curriculum Project for Information Extraction. Task definitions Task 1: Entity detection and recognition Task 2: Relation detection and recognition Both.

Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.

1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.

AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.

The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.

CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.

Diana Inkpen, University of Ottawa, CLEF 2005 Using various indexing schemes and multiple translations in the CL-SR task at CLEF 2005 Diana Inkpen, Muath.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

Pruning Analysis for the Position Specific Posterior Lattices for Spoken Document Search Jorge Silva University of Southern California Ciprian Chelba and.

Accident Prevention Workgroup: Annual Report 6 th June 2005 Accident Prevention Workgroup Annual Report (period 01 – ) Tamás Potykiewicz Hungarian.

1 INFILE - INformation FILtering Evaluation Evaluation of adaptive filtering systems for business intelligence and technology watch Towards real use conditions.

Work Package 2 „Implementation of the SRA” Call secretariat Annette Angermann & Wenke Apt Rome, 11 June 2015.

CLEF Budapest1 Measuring the contribution of Word Sense Disambiguation for QA Proposers: UBC: Agirre, Lopez de Lacalle, Otegi, Rigau, FBK: Magnini.

F. López-Ostenero, V. Peinado, V. Sama & F. Verdejo

What is the Entrance Exams Task

1Micheal T. Adenibuyan, 2Oluwatoyin A. Enikuomehin and 2Benjamin S

EXAMPLES OF E-LEARNING MATERIALS EIT Meeting – Cyprus September 2007

ETS Working Group: January 2006 Item 10

European Statistical System Network on Culture (ESSnet Culture)

Machine Reading.

CLEF 2008 Multilingual Question Answering Track

Presentation transcript:

CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), C. Ayache, D. Mostefa (2), L. Lamel and S. Rosset (3) (1) UPC, Spain (2) ELDA, France (3) LIMSI, France QAST Website :

CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 2 Outline 1.Task 2.Participants 3.Results 4.Conclusion and future work

CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 3 Task: QAST 2007 Organization Task jointly organized by : -UPC, Spain (J. Turmo, P. Comas) Coordinator -ELDA, France (C. Ayache, D. Mostefa) -LIMSI-CNRS, France (S. Rosset, L. Lamel)

CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 4 4 tasks were proposed: –T1 : QA in manual transcriptions of lectures –T2 : QA in automatic transcriptions of lectures –T3 : QA in manual transcriptions of meetings –T4 : QA in automatic transcriptions of meetings 2 data collections: –The CHIL corpus: around 25 hours (1 hour per lecture) Domain of lectures: Speech and language processing –The AMI corpus: around 100 hours (168 meetings) Domain of meetings: Design of television remote control Task: Evaluation Protocol

CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 5 For each task, 2 sets of questions were provided: Development set (1 February 2007): –Lectures: 10 lectures, 50 questions –Meetings: 50 meetings, 50 questions Evaluation set (18 June 2007): –Lectures: 15 lectures, 100 questions –Meetings: 118 meetings, 100 questions Task: Questions and answer types

CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 6 Factual questions Who is a guru in speech recognition? Expected answers = named entities. List of NEs: person, location, organization, language, system/method, measure, time, color, shape, material. No definition questions. Task: Questions and answer types

CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 7 Assessors used QASTLE, an evaluation tool developed in Perl (by ELDA), to evaluate the data. Four possible judgments: –Correct –Incorrect –Inexact (too short or too long) –Unsupported (correct answers but wrong document) Task: Human judgment

CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 8 Two metrics were used: –Mean Reciprocal Rank (MRR): measures how well ranked is a right answer. –Accuracy: the fraction of correct answers ranked in the first position in the list of 5 possible answers Participants could submit up to 2 submissions per task and 5 answers per question. Task: Scoring

CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 9 Five teams submitted results for one or more QAST tasks: –CLT, Center for Language Technology, Australia ; –DFKI, Germany ; –LIMSI-CNRS, Laboratoire d’Informatique et de Mécanique des Sciences de l’Ingénieur, France ; –Tokyo Institute of Technology, Japan ; –UPC, Universitat Politècnica de Catalunya, Spain. In total, 28 submission files were evaluated: Participants CHIL Corpus (lectures)AMI Corpus (meetings) T1 (manual)T2 (ASR)T3 (manual)T4 (ASR) 8 submissions9 submissions5 submissions6 submissions

CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 10 Due to some problems (typos, answer types and also missing time information at word level for some AMI meetings) some questions have been deleted from test set for scoring. Final counts: –T1 and T2: 98 questions –T3: 96 questions –T4: 93 questions Results

CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 11 QA on CHIL manual transcriptions: Results for T1 System# Questions Returned # Correct Answers MRRAccuracy clt1_t clt2_t dfki1_t limsi1_t limsi2_t tokyo1_t tokyo2_t upc1_t

CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 12 QA on CHIL automatic transcriptions: Results for T2 System# Questions Returned # Correct Answers MRRAccuracy clt1_t clt2_t dfki1_t limsi1_t limsi2_t tokyo1_t tokyo2_t upc1_t upc2_t

CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 13 QA on AMI manual transcriptions: Results for T3 System# Questions Returned # Correct Answers MRRAccuracy clt1_t clt2_t limsi1_t limsi2_t upc1_t

CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 14 QA on AMI automatic transcriptions: Results for T4 System# Questions Returned # Correct Answers MRRAccuracy clt1_t clt2_t limsi1_t limsi2_t upc1_t upc2_t

CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 15 5 participants from 5 different countries (France, Germany, Spain, Australia and Japan) => 28 runs Very encouraging results QA technology can be useful to deal with spontaneous speech transcripts. High loss in accuracy with automatically transcribed speech Conclusion and future work

CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 16 Future work aims at including: Other languages than English Oral questions Other question types: definition, list, etc. Other domains of data collections: European Parliament, broadcast news, etc. Conclusion and future work

CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 17 The QAST Website: For more information