Spoken Language Understanding for Conversational Dialog Systems

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Chapter 11 user support. Issues –different types of support at different times –implementation and presentation both important –all need careful design.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Spoken Dialogue Technology Achievements and Challenges
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
CS4705 Natural Language Processing.  Regular Expressions  Finite State Automata ◦ Determinism v. non-determinism ◦ (Weighted) Finite State Transducers.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source.
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Building Knowledge-Driven DSS and Mining Data
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Semantic Parsing for Robot Commands Justin Driemeyer Jeremy Hoffman.
Natural Language Understanding
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
9/8/20151 Natural Language Processing Lecture Notes 1.
1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Develop a fast semantic decoder for dialogue systems Capability to parse 10 – 100 ASR hypotheses in real time Robust to speech recognition noise Semantic.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
1 Computational Linguistics Ling 200 Spring 2006.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.
Introduction to CL & NLP CMSC April 1, 2003.
Develop a fast semantic decoder Robust to speech recognition noise Trainable on different domains: Tourist information (TownInfo) Air travel information.
Learning Automata based Approach to Model Dialogue Strategy in Spoken Dialogue System: A Performance Evaluation G.Kumaravelan Pondicherry University, Karaikal.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
Dept. of Computer Science University of Rochester Rochester, NY By: James F. Allen, Donna K. Byron, Myroslava Dzikovska George Ferguson, Lucian Galescu,
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
16.0 Spoken Dialogues References: , Chapter 17 of Huang 2. “Conversational Interfaces: Advances and Challenges”, Proceedings of the IEEE,
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Supertagging CMSC Natural Language Processing January 31, 2006.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
Develop a fast semantic decoder for dialogue systems Capability to parse 10 – 100 ASR hypothesis in real time Robust to speech recognition noise Trainable.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Non-Monotonic Parsing of Fluent Umm I mean Disfluent Sentences Mohammad Sadegh Rasooli[Columbia University] Joel Tetreault[Yahoo Labs] This work conducted.
Natural Language Processing Information Extraction Jim Martin (slightly modified by Jason Baldridge)
CSC 594 Topics in AI – Natural Language Processing
Formal Language Theory
Machine Learning in Natural Language Processing
CS4705 Natural Language Processing
Chapter 11 user support.
Presentation transcript:

Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology Aruba, December 10-13, 2006

Overview Introductory definitions Task-based and conversational dialog systems Spoken language understanding Issues for spoken language understanding Coverage Robustness Overview of spoken language understanding Hand-crafted approaches Data-driven methods Conclusions

Basic dialog system architecture Speech Recognition Dialogue Manager Back end Language Generation Text to Speech Synthesis Audio Spoken Understanding Words Semantic representation Concepts HMM Acoustic Model N-Gram

Task-based Dialog Systems Mainly interact with databases to get information or support transactions SLU module creates a database query from user’s spoken input by extracting relevant concepts System initiative: constrains user input Keyword / keyphrase extraction User-initiative: less constrained input Call-routing: call classification with named entity extraction Question answering

Conversational Dialog AI (agent-based systems) e.g. TRIPS User can take initiative, e.g. raise new topic, ask for clarification (TRIPS) More complex interactions involving recognition of the user’s intentions, goals, beliefs or plans Deep understanding of the user’s utterance, taking into account contextual information Information State Theory, Planning Theory, User Modelling, Belief Modelling… Simulated conversation e.g. CONVERSE Conversational companions, chatbots, help desk Does not require deep understanding SLU involves identifying system utterance type and determining a suitable response

Defining Spoken Language Understanding extracting the meaning from speech utterances a transduction of the recognition result to an interpretable representation Meaning (in human–computer interactive systems): a representation that can be executed by an interpreter in order to change the state of the system Bangelore et al. 2006

SLU for task based systems a flight from Belfast to Malaga uh I’d like uh um could you uh is there a flight from Bel- uh Belfast to um Gran- I mean Malaga I would like to find a flight from Pittsburgh to Boston on Wednesday and I have to be in Boston by one so I would like a flight out of here no later than 11 a.m. Topic: Flight Origen: BFS Destination: AGP

SLU for advanced conversational systems (TRIPS) Interpretation requires intention recognition can we use a helicopter to get the people from: Abyss (request to modify plan) Barnacle (include sub-goal and suggest solution) Delta (extension of a solution) Six possible interpretations with only change of city name Requires reasoning about task and current context to identify most plausible interpretation Requires more than concept spotting to identify structure and meaning of utterance as basis for reasoning

Role of syntax in deep understanding List all employees of the companies who/which are based in the city centre employees based in city centre companies based in city centre I would like to know where to mail the check. I would like you to mail the check to me I mail check to ?where You mail check to me Stop word removal will transform both strings to “mail check”

SLU for simulated dialog C: Are you still a friend of XXX? H: I am not sure if he has any real friends. He has achieved a lot and has left a lot of people behind. C: You really don’t like the guy - well, at least that’s definite - I thought most people were just indifferent or numb by now. ?

Coverage Possible requirement: “The system should be able to understand everything the user might say” Predicting user input Analysis of corpora and iterative design of hand-crafted grammars Use of carefully designed prompts to constrain user input is constrained Learning grammar from data

Robustness Characteristics of spontaneous spoken language Disfluencies and filled pauses – not just errors, reflect cognitive aspects of speech production and interaction management Output from speech recognition component Words and word boundaries not known with certainty Recognition errors Approaches Use of semantic grammars and robust parsing for concepts spotting Data-driven approaches – learn mappings between input strings and output structures

Developing the SLU component Hand-crafted approaches Grammar development Parsing Data-driven approaches Learning from data Statistical models rather than grammars Efficient decoding

Hand-crafting grammars Parsing Frame Generation Discourse Processing DB Query parse tree semantic frame frame in context SQL query ASR n-best list, word lattice, … Traditional software engineering approach of design and iterative refinement Decisions about type of grammar required Chomsky hierarchy Flat v hierarchical representations Processing issues (parsing) Dealing with ambiguity Efficiency

Semantic Grammar and Robust Parsing: PHOENIX (CMU/CU) The Phoenix parser maps input word strings on to a sequence of semantic frames. named set of slots, where the slots represent related pieces of information. each slot has an associated Context-Free Grammar that specifies word string patterns that match the slot chart parsing with path pruning: e.g. path that accounts for fewer words is pruned ASR Semantic Parser word string meaning representation

Deriving Meaning directly from ASR output: VoiceXML Uses finite state grammars as language models for recognition and semantic tags in the grammars for semantic parsing ASR meaning representation I would like a coca cola and three large pizzas with pepperoni and mushrooms { drink: "coke", pizza: { number: "3", size: "large", topping: [ "pepperoni", "mushrooms" ] }

Deep understanding Requirements for deep understanding advanced grammatical formalisms syntax-semantics issues parsing technologies Example: TRIPS Uses feature-based augmented CFG with agenda-driven best-first chart parser Combined strategy: combining shallow and deep parsing (Swift et al. )

Combined strategies: TINA (MIT) Grammar rules include mix of syntactic and semantic categories Context free grammar using probabilities trained from user utterances to estimate likelihood of a parse Parse tree converted to a semantic frame that encapsulates the meaning Robust parsing strategy Sentences that fail to parse are parsed using fragments that are combined into a full semantic frame When all things fail, word spotting is used

Problems with hand-crafted approaches Hand-crafted grammars are not robust to spoken language input require linguistic and engineering expertise to develop if grammar is to have good coverage and optimised performance time consuming to develop error prone subject to designer bias difficult to maintain

Statistical modelling for SLU SLU as pattern matching problem Given word sequence W, find semantic representation of meaning M that has maximum a posteriori probability P(M|W) P(M): semantic prior model – assigns probability to underlying semantic structure P(W|M): lexicalisation model – assigns probability to word sequence W given the semantic structure

Early Examples CHRONUS (AT&T: Pieraccini et al, 1992; Levin & Pieraccini, 1995) Finite state semantic tagger ‘Flat-concept’ model: simple to train but does not represent hierarchical structure HUM (Hidden Understanding Model) (BBN: Miller et al, 1995) Probabilistic CFG using tree structured meaning representations Grammatical constraints represented in networks rather than rules Ordering of constituents unconstrained - increases robustness Transition probabilities constrain over-generation Requires fully annotated treebank data for training

Using Hidden State Vectors (He & Young) Extends ‘flat-concept’ HMM model Represents hierarchical structure (right-branching) using hidden state vectors Each state expanded to encode stack of a push down automaton Avoids computational tractability issues associated with hierarchical HMMs Can be trained using lightly annotated data Comparison with FST model and with hand-crafted SLU systems using ATIS test sets and reference parse results

Which flights arrive in Burbank from Denver on Saturday? Problem with long-distance dependency between ‘Saturday’ and ‘arrive’ ‘Saturday’ associated with ‘FROMLOC’ Hierarchical model allows ‘Saturday’ to be associated with ‘ARRIVE’ Also: more expressive, allows sharing of sub-structures

SLU Evaluation: Performance Statistical models competitive with approaches based on handcrafted rules Hand-crafted grammars better for full understanding and for users familiar with system’s coverage, statistical model better for shallow and more robust understanding for naïve users Statistical systems more robust to noise and more portable

SLU Evaluation: Software Development “Cost of producing training data should be less than cost of hand-crafting a semantic grammar” (Young, 2002) Issues Availability of training data Maintainability Portability Objective metrics? e.g. time, resources, lines of code, … Subjective issues e.g. designer bias, designer control over system Few concrete results, except … HVS model (He & Young) can be robustly trained from only minimally annotated corpus data Model is robust to noise and portable to other domains

Additional technologies Named entity extraction Rule-based methods: e.g. using grammars in form of regular expressions compiled into finite state acceptors (AT&T SLU system) – higher precision Statistical methods e.g. HMIHY, learn mappings between strings and NEs – higher recall as more robust Call routing Question Answering

Additional Issues 1 ASR/SLU coupling Post-processing results from ASR noisy channel model of ASR errors (Ringger & Allen) Combining shallow and deep parsing major gains in speed, slight gains in accuracy (Swift et al.) Use of context, discourse history, prosodic information re-ordering n-best hypotheses determining dialog act based on combinations of features at various levels: ASR and parse probabilities, semantic and contextual features (Purver et al, Lemon)

Additional Issues 2 Methods for learning from sparse data or without annotation e.g. AT&T system uses ‘active learning’ (Tur et al, 2005) to reduce effort of human data labelling – uses only those data items that improve classifier performance the most Development tools e.g. SGStudio (Wang & Acero) – build semantic grammar with little linguistic knowledge

Additional Issues 3 Some issues addressed in poster session Using SLU for: Dialog act tagging Prosody labelling User satisfaction analysis Topic segmentation and labelling Emotion prediction

Conclusions 1 SLU approach is determined by type of application finite state dialog with single word recognition frame based dialog with topic classification and named entity extraction advanced dialog requiring deep understanding simulated conversation, …

Conclusions 2 SLU approach is determined by type of output required syntactic / semantic parse trees semantic frames speech / dialog acts, … intentions, beliefs, emotions, …

Conclusions 3 SLU approach is determined by Deployment and usability issues applications requiring accurate extraction of information applications involving complex processing of content applications involving shallow processing of content (e.g. conversational companions, interactive games)

Selected References Bangalore, S., Hakkani-Tür, D., Tur, G. (eds), (2006) Special Issue on Spoken Language Understanding in Conversational Systems. Speech Communication 48. Gupta, N., Tur, G., Hakkani-Tür, D., Bangalore, S., Riccardi, G., Gilbert, M. (2006) The AT&T Spoken Language Understanding System. IEEE Transactions on Speech and Audio Processing 14:1, 213-222. Allen, JF, Byron, DK, Dzikovska, O, Ferguson, G, Galescu, L, Stent, A. (2001) Towards conversational human-computer interaction. AI Magazine, 22(4):27–35. Jurafsky, D. & Martin, J. (2000) Speech and Language Processing, Prentice-Hall Huang, X, Acero, A, Hon, H-W. (2001) Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice-Hall