An Evaluation Framework for Natural Language Understanding in Spoken Dialogue Systems Joshua B. Gordon and Rebecca J. Passonneau Columbia University.

Slides:



Advertisements
Similar presentations
Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.
Advertisements

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Spoken Language Understanding in Dialogue Systems Svetlana Stoyanchev 02/02/2015.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
MAI Internship April-May MAI Internship 2002 Slide 2 of 14 What? The AST Project promotes development of speech technology for official languages.
15.0 Utterance Verification and Keyword/Key Phrase Spotting References: 1. “Speech Recognition and Utterance Verification Based on a Generalized Confidence.
© GyrusLogic, Inc AI and Natural Language VUI design Peter Trompetter, VP Global Development GyrusLogic, Inc. Tuesday, August 21 at 1:30 PM - C202.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Spoken Dialogue System Architecture
ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.
Stockholm 6. Feb -04Robust Methods for Automatic Transcription and Alignment of Speech Signals1 Course presentation: Speech Recognition Leif Grönqvist.
CS 4705 Automatic Speech Recognition Opportunity to participate in a new user study for Newsblaster and get $25-$30 for hours of time respectively.
Spoken Dialog System Architecture
Why is ASR Hard? Natural speech is continuous
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.
SDS Architectures Julia Hirschberg COMS 4706 (Thanks to Josh Gordon for slides.) 1.
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
A Phonotactic-Semantic Paradigm for Automatic Spoken Document Classification Bin MA and Haizhou LI Institute for Infocomm Research Singapore.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
Center for Human Computer Communication Department of Computer Science, OG I 1 Designing Robust Multimodal Systems for Diverse Users and Mobile Environments.
Speech and Language Processing
Develop a fast semantic decoder for dialogue systems Capability to parse 10 – 100 ASR hypotheses in real time Robust to speech recognition noise Semantic.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
1 Computational Linguistics Ling 200 Spring 2006.
May 2006CLINT-CS Verbmobil1 CLINT-CS Dialogue II Verbmobil.
Dialogue Managers in two projects: Comic and Amitie Roberta Catizone University of Sheffield.
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
1 BILC SEMINAR 2009 Speech Recognition: Is It for Real? Tony Mirabito Defense Language Institute English Language Center (DLIELC) DLIELC.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Lessons Learned Mokusei: Multilingual Conversational Interfaces Future Plans Explore language-independent approaches to speech understanding and generation.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Develop a fast semantic decoder Robust to speech recognition noise Trainable on different domains: Tourist information (TownInfo) Air travel information.
Rundkast at LREC 2008, Marrakech LREC 2008 Ingunn Amdal, Ole Morten Strand, Jørn Almberg, and Torbjørn Svendsen RUNDKAST: An Annotated.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
A methodology for the creation of a forensic speaker recognition database to handle mismatched conditions Anil Alexander and Andrzej Drygajlo Swiss Federal.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
AQUAINT Herbert Gish and Owen Kimball June 11, 2002 Answer Spotting.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
Supertagging CMSC Natural Language Processing January 31, 2006.
© 2013 by Larson Technical Services
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
Chunk Parsing II Chunking as Tagging. Chunk Parsing “Shallow parsing has become an interesting alternative to full parsing. The main goal of a shallow.
Develop a fast semantic decoder for dialogue systems Capability to parse 10 – 100 ASR hypothesis in real time Robust to speech recognition noise Trainable.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
S1S1 S2S2 S3S3 8 October 2002 DARTS ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,
Speech Recognition Created By : Kanjariya Hardik G.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
ARTIFICIAL NEURAL NETWORKS
Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.
Supervised Time Series Pattern Discovery through Local Importance
Automatic Speech Recognition
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
On the Integration of Speech Recognition into Personal Networks
Automatic Speech Recognition
Automatic Speech Recognition
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
Presentation transcript:

An Evaluation Framework for Natural Language Understanding in Spoken Dialogue Systems Joshua B. Gordon and Rebecca J. Passonneau Columbia University

Outline Motivation: Evaluate NLU during design phase Comparative evaluation of two SDS systems using CMU’s Olympus/RavenClaw framework – Let’s Go Public! and CheckItOut – Differences in language/database characteristics – Varying WER for two domains Two NLU approaches Conclusion May 19-21, 2010LREC, Malta2

Motivation For our SDS CheckItOut, we anticipated high WER – VOIP telephony – Minimal speech engineering WSJ read speech acoustic models Adaptation with ~12 hours of spontaneous speech for certain types of utterances 0.49 WER in recent tests – Related experience Let’s Go Public! had WER of 17% for native speakers in laboratory conditions; 60% in real world conditions May 19-21, 20103LREC, Malta

CheckItOut May 19-21, 20104LREC, Malta Andrew Heiskell Braille & Talking Book Library Branch of New York City Public Library, National Library Service One of first users of Kurzweil Reading Mach. Book transactions by phone Callers order cassettes/braille books/large type books by telephone Orders sent/returned by U.S.P.O. CheckItOut dialog system Based on Loqui Human-Human Corpus 82 recorded patron/librarian calls Transcribed, aligned with the speech signal Replica of Heiskell Library catalogue (N=71,166) Mockup of patron data for 5,028 active patrons

ASR Challenges Speech phenomena - disfluencies, false starts... Intended users comprise a diverse population of accents, ages, native language Large vocabulary Variable telephony: users call from – Land lines – Cell – VOIP Background noise May 19-21, 20105LREC, Malta

The Olympus Architecture May 19-21, 20106LREC, Malta

CheckItOut Callers order books by title, author, or catalog number Size of catalogue: 70,000 Vocabulary – 50K words – Title/author overlap 10% of vocabulary 15% of title words 25% of author words May 19-21, 20107LREC, Malta

Natural Language Understanding Utterance: DO YOU HAVE THE DIARY OF.A. ANY FRANK Dialogue act identification – Book request by title – Book request by author Concept identification – Book-title-name – Author-name Database query: partial match based on phonetic similarity – THE LANGUAGE OF.ISA. COME WARS The Language of Sycamores May 19-21, 20108LREC, Malta

Comparative Evaluation 1.Load or bootstrap a corpus from representative examples with labels for dialogue acts/concepts 2.Generate real ASR (in the case of an audio corpus) OR Simulate ASR at various levels of WER 3.Pipe ASR output through one or more NLU modules 4.Voice search against backend 5.Evaluate using F-measure May 19-21, 20109LREC, Malta

Bootstrapping a Corpus Manually tag a small corpus into – Concept strings, e.g., book titles – Preamble/postamble strings bracketing the concept – Sort preamble/postamble into mutually substitutable sets – Permute: (PREAMBLE) CONCEPT (POSTAMBLE) Sample bootstrapping for book requests by title May 19-21, 2010LREC, Malta10 PreambleTitle String It’s called T1, T2, T3,.... TN I’m wondering if you have T1, T2, T3,.... TN Do you have T1, T2, T3,.... TN

Evaluation Corpora Two corpora – Actual: Lets Go – Bootstrapped: CheckItOut Distinct language characteristics Distinct backend characteristics May 19-21, LREC, Malta Total CorpusMean Utt. LengthVocab. size CheckItOut words6209 Let’s Go words1825 GrammarBackend CheckItOut: Titles4,00070,000 CheckItOut: Authors2,31530,000 LetsGo: Bus Routes70 LetsGo: Place Names1,300

ASR Simulated: NLU performance over varying WER – Simulation procedure adapted from both (Stuttle, 2004) and (Rieser, 2005) – Four levels of WER for bootstrapped CheckItOut – Two levels of WER based on Let’s Go transcriptions Two levels of WER based on Lets Go audio corpus – Piped through PocketSphinx recognizer Lets Go acoustic models and language models – Noise introduced into the language model to increase WER May 19-21, LREC, Malta

Semantic versus Statistical NLU Semantic parsing – Phoenix: a robust parser for noisy input – Helios: a confidence annotator using information from the recognizer, the parser, and the DM Supervised ML – Dialogue Acts: SVM – Concepts: A statistical tagger, YamCha, trained on a sliding five word window of features May 19-21, LREC, Malta

Phoenix A robust semantic parser – Parses a string into a sequence of frames – A frame is a set of slots – Each slot type has its own CFG – Can skip words (noise) between frames or between slots Lets Go grammar: provided by CMU CheckItOut grammar – Manual CFG rules for all but book titles – CFG rules mapped from MICA parses for book titles Example slots, or concepts – [ AreaCode] (Digit Digit Digit) – [Confirm] (yeah) (yes) (sure)... – [TitleName] ([_in_phrase]) – [_in_phrase] ([_in] [_dt] [_nn] )... May 19-21, LREC, Malta

Using MICA Dependency Parses Parsed all book titles using MICA Automatically builds linguistically motivated constraints on constituent structure and word order into Phoenix productions Frame: BookRequest Slot: [Title] [Title] ( [_in_phrase] ) Parse: ( Title [_in] (IN) [_dt] ( THE ) [_nn] ( COMPANY ) [_in] ( OF ) [_nns] ( HEROES ) ) ) ) May 19-21, LREC, Malta

Dialogue Act Classification Robust to noisy input Requires a training corpus which is often unavailable for a new SDS domain: solution -- bootstrap Sample features: – Acoustic confidence – BOW – N-grams – LSA – Length features – POS – TF/IDF May 19-21, LREC, Malta

Concept Recognition Concept identification cast as a named entity recognition problem YamCha a statistical tagger that uses SVM YamCha labels words in an utterance as likely to begin, to fall within, or end the relevant concept May 19-21, LREC, Malta I WOULD LIKE THE DIARY A ANY FRANK ON TAPE N N N BT IT IT IT ET N N

Voice Search A partial matching database query operating on the phonetic level Search terms are scored by Ratcliff / Obershelp similarity =|Matched characters|/|Total characters| where |Matched characters| = recursively find longest common subsequence of 2 or more characters Query “THE DIARY A ANY FRANK” Anne Frank, the Diary of a Young Girl.73 The Secret Diary of Anne Boleyn.67 Anne Frank.58 May 19-21, LREC, Malta

Dialog Act Identification (F-measure) May 19-21, LREC, Malta WER = 0.20WER = 0.40WER = 0.60WER = 0.80 CFGMLCFGMLCFGYMLCFGML Lets Go CheckItOut Difference between semantic grammar and ML Small for Lets Go Large for CheckItOut Difference between Lets Go and CheckItOut CheckItOut gains more from ML

Concept Identification (F-measure) May 19-21, LREC, Malta WER=0.20WER=0.40WER=0.60WER=0.80 CFGYamchaCFGYamchaCFGYamchaCFGYamcha Title Author Place Bus Difference between semantic grammar and learned model Small for Lets Go Large for CheckItOut Larger for Author than Title As WER increases, difference shrinks

Conclusions The small mean utterance length of Let’s Go results in less difference between the NLU approaches The lengthier utterances and larger vocabulary for CheckItOut provide a diverse feature set which potentially enables recovery from higher WER The rapid decline in semantic parsing performance for dialog act identification illustrates the difficulty of writing a robust grammar by hand The title CFG performed well and did not degrade as fast May 19-21, LREC, Malta