Spoken Dialogue Systems Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.

Slides:



Advertisements
Similar presentations
National Technical University of Athens Department of Electrical and Computer Engineering Image, Video and Multimedia Systems Laboratory
Advertisements

TeleMorph & TeleTuras: Bandwidth determined Mobile MultiModal Presentation Student: Anthony J. Solon Supervisors: Prof. Paul Mc Kevitt Kevin Curran School.
Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.
1 © 2005 CHIL KTH ASIDE 2005, Aalborg, Applications of distributed dialogue systems: The KTH Connector Jens Edlund & Anna Hjalmarsson Applications.
VoiceXML: Application and Session variables, N- best and Multiple Interpretations.
Designing Multimedia with Fuzzy Logic Enrique Diaz de Leon * Rene V. Mayorga ** Paul D. Guild *** * ITESM, Guadalajara Campus, Mexico ** Faculty of Engineering,
Spoken Dialogue Technology Achievements and Challenges
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
User interaction ‘Rules’ of Human-Human Conversation
© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property. Multimodal Interaction in.
U1, Speech in the interface:2. Dialogue Management1 Module u1: Speech in the Interface 2: Dialogue Management Jacques Terken HG room 2:40 tel. (247) 5254.
John Hu Nov. 9, 2004 Multimodal Interfaces Oviatt, S. Multimodal interfaces Mankoff, J., Hudson, S.E., & Abowd, G.D. Interaction techniques for ambiguity.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Speech recognition, understanding and conversational interfaces Alexander Rudnicky School of Computer Science
ASR Evaluation Julia Hirschberg CS Outline Intrinsic Methods –Transcription Accuracy Word Error Rate Automatic methods, toolkits Limitations –Concept.
1 error handling – Higgins / Galatea Dialogs on Dialogs Group July 2005.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
ITCS 6010 Speech Guidelines 1. Errors VUIs are error-prone due to speech recognition. Humans aren’t perfect speech recognizers, therefore, machines aren’t.
6/28/20151 Spoken Dialogue Systems: Human and Machine Julia Hirschberg CS 4706.
MUSCLE Multimodal e-team related activity Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Prof. Alex Potamianos Technical.
1/23 Applications of NLP. 2/23 Applications Text-to-speech, speech-to-text Dialogues sytems / conversation machines NL interfaces to –QA systems –IR systems.
WP1 UGOT demos 2nd year review Saarbrucken Mar 2006.
Speech Graffiti Tutorial FlightLine version Fall 03.
Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.
Speech Guidelines 2 of Errors VUIs are error-prone due to speech recognition. Humans aren’t perfect speech recognizers, therefore, machines aren’t.
Should Intelligent Agents Listen and Speak to Us? James A. Larson Larson Technical Services
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
9/8/20151 Natural Language Processing Lecture Notes 1.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Role-plays for CALL: System Architecture and Resources Sabrina Wilske & Magdalena Wolska Saarland University ICL, Villach, September.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.
Department of Mechanical Engineering, LSUSession VII MATLAB Tutorials Session VIII Graphical User Interface using MATLAB Rajeev Madazhy
Theories of Discourse and Dialogue. Discourse Any set of connected sentences This set of sentences gives context to the discourse Some language phenomena.
11.10 Human Computer Interface www. ICT-Teacher.com.
Chapter 7. BEAT: the Behavior Expression Animation Toolkit
Interaction Design Session 12 LBSC 790 / INFM 718B Building the Human-Computer Interface.
Programming Project (Last updated: August 31 st /2010) Updates: - All details of project given - Deadline: Part I: September 29 TH 2010 (in class) Part.
Multi-Modal Dialogue in Human-Robot Conversation Information States in a Multi- modal Dialogue System for Human-Robot Conversation (Lemon et al. 2001)
Qualitative Data Analysis: An introduction Carol Grbich Chapter 18: Conversation analysis.
Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.
Dept. of Computer Science University of Rochester Rochester, NY By: James F. Allen, Donna K. Byron, Myroslava Dzikovska George Ferguson, Lucian Galescu,
16.0 Spoken Dialogues References: , Chapter 17 of Huang 2. “Conversational Interfaces: Advances and Challenges”, Proceedings of the IEEE,
ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.
1 Natural Language Processing Lecture Notes 14 Chapter 19.
School of something FACULTY OF OTHER Facing Complexity Using AAC in Human User Interface Design Lisa-Dionne Morris School of Mechanical Engineering
Introduction to Dialogue Systems. User Input System Output ?
Introduction to Computational Linguistics
Dialogue systems Volha Petukhova Saarland University 03/07/2015 Einführung in Diskurs and Pragmatik, Sommersemester
Information state and dialogue management in the TRINDI Dialogue Move Engine Toolkit, Larsson and Traum 2000 D&QA Reading Group, Feb 20 th 2007 Genevieve.
Wrapping Up Ling575 Spoken Dialog Systems June 5, 2013.
Requirements Analysis
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
Understanding Naturally Conveyed Explanations of Device Behavior Michael Oltmans and Randall Davis MIT Artificial Intelligence Lab.
NATURAL LANGUAGE PROCESSING
W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.
Software Architecture for Multimodal Interactive Systems : Voice-enabled Graphical Notebook.
1 Unit E-Guidelines (c) elsaddik SEG 3210 User Interface Design & Implementation Prof. Dr.-Ing. Abdulmotaleb.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Issues in Spoken Dialogue Systems
OpenWorld 2018 How to Create Chatbots with OMCe
Natural Language - General
Managing Dialogue Julia Hirschberg CS /28/2018.
Voice Activation for Wealth Management
Professor John Canny Spring 2003
ACM programming contest
Presentation transcript:

Spoken Dialogue Systems Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003

Outline Discourse  Research Issues Spoken Dialogue Systems  Pragmatics (dialogue acts)  Dialogue management Multimodal Systems Examples

Definitions Discourse  Monologue  Dialogue

Discourse: Research Issues Reference resolution, e.g., “That was a lie”  Anaphora, e.g., “John left …. He was bored.”  Co-reference, e.g., “John” and “He” refer to the same entity Text coherence, e.g.,  Coherence: “John left early. He was tired”  Incoherence: “John left early. He likes spinach”

Spoken Dialogue Systems: Concepts Turn-taking  Dialogue Segmentation Grounding  Backchannel, e.g., ‘Mm Hmm’  Acknowledgment  Explicit/implicit confirmation Implicature  “What time are you flying”  “Well, I have a meeting at three” Initiative  “What time are you flying?”  “Don’t feel like booking the flight right now. Lets look at hotels”

Speech, Dialogue and Application Acts Speech Acts (Austin 1962, Searle 1975)  Assertive (conclude), Directive (ask, order), Commissive (promise), Expressive(apologize, thank), Declarations Dialogue Acts  Statement, Info-Request, Wh-Question, Yes-No Question, Opening, Closing, Open-Option, Action-Directive, Offer, Commit, Agree etc. Application Acts  Domain specific but general, e.g., Info-Request into system’s semantic state, Info-Request into database, Info-Request into database results

Dialogue/Application Act Classification Semantic Parsing follows by deterministic rules, e.g., ‘what’, ‘when’, ‘where’, ‘who’ starts a Wh-Question Bayesian Formulation  Given a sentence W the most probable dialogue act A is argmax P(A|W) = argmax P(W|A) P(A)  P(W|A) can be an n-gram model one for each dialogue act  P(A) can also be an n-gram model of dialogue actsn-gram model of dialogue acts

Dialogue Management 1 Frame-based, e.g.,  DeptCity “From what city are you leaving?” GRM_CITY  ArrCity “Where are you flying to” GRM_CITY  DeptTime “What time would you like to fly?” GRM_TIME  DeptDate “When are you flying?” GRM_DATETIME DeptTime Finite state machine dialogue manager Mostly system-initiated dialogue VXML-like dialogue structure (forms and frames)

Dialogue Management 2 Application Independent Flow Chart structure Generic dialogue/application manager (really this is more like a controller)

Dialogue Management 3 Generalized Finite State Machine Dialogue Management Application Dependent but General Dialogue Superstates  Fill: adaptive dialogue module, uses dynamic e-forms to elicit AV pairs from the user; resolves value and tree-position ambiguities  Navigate: presents database results and lets the user select the appropriate ones Fill Verify Create Query Navigate No Yes Is Full Is Correct

Advanced Dialogue Systems Mixed Initiative:  Allow user to say anything (global grammar active at all states), e.g., “What date are you flying” “I am flying next Tuesday in the morning”  Allow user to navigate the systems state machine, e.g., “I would like to look at hotels first”  Open prompts, give user the initiative, e.g., “What next?” Advanced dialogue features  Corrections, e.g., “No not Boston, Atlanta”  Negation, e.g., “Anything but Olympic”  Complex semantic expressions, e.g., “tomorrow evening or Sunday morning”  Ambiguity resolution and representation, e.g., “next Tuesday”  Persistent Semantics, e.g., “Info about his organization”

Ambiguity Representation Position ambiguity Value ambiguity

Error Correction Mechanisms Sys: How can I help you? Usr: I’d like to fly from Austin to Newark on August fifth in the morning Asr: I’d like to fly from Boston to Newark on August fifth in the morning Sys: I've got you leaving Boston on Sunday, August fifth in the morning and arriving in Newark airport. What is your preferred airline?

Usr: Leaving from Austin, Texas Sys: Sorry …, are you leaving from Austin, Texas, or from Boston? Usr 1 : Austin, Texas Sys: Leaving from Austin, Texas. Usr 2 : Change the departure city to Austin, Texas Alternate: use error correction

Spoken Dialogue System Architecture Controller Database Parser TTS Platform ASR Telephony Generation App. Controller DM/Initiative Interpreter/Context Tr. AI …

System Architecture and Portability Ambiguity representation Pragmatic Confidence Scores Application dependent Application independent Dialogue Manager SemanticsPragmaticsGeneration Parser Semantic Interpreter Context Tracker Pragmatic Interpreter Expert Domain Knowledge Initiative Tracking Utterance Planner Surface Realizer Controller

Advantages of application- centric system design:  Increased modularity.  Flexible multi-stage data collection.  Extensible to multi-modal input (universal access).

Multimodal Systems Definition Input Modalities/Output Media Research Issues  User Interface Design  Semantic Module Examples

Input Modalities/Output Media Unimodal:  Speech input/Speech output. Multimodal:  Speech+DTMF input/Speech output.  Speech input/Speech and GUI output.  Speech and pen input/Speech and GUI output. Definitions:  Pen input: buttons, pull-down menus, graffiti, pen gestures.  GUI output: text and graphics SDPS+ D S+ P S G S+G

Issues Semantic/Pragmatic Module:  Merging semantic information from different modalities, e.g., “Draw a line from here to there”  Ambiguity representation and resolution User Interface:  Synergies between input modalities  Turn-taking and appropriate mix of modalities  Maintain interface consistency  Focus/context visualization System issues:  Synchronization and latency

July fifth 7/10 NL ParserGUI Parser Pragmatic Analysis Update Semantic Tree & Pragmatic Scores Context Tracking GUI InterpreterNL InterpreterGUI InterpreterNL Interpreter “fifth” “July” “10” “7” “/” {“date”, “Jul 5, 2002”}{“date”, “Jul 10, 2002”} {“travel.flight.leg1.departure. date”, “Jul 5, 2002”} {“travel.flight.leg1.departure. date”, “Jul 10, 2002”} {“travel.flight.leg1.departure. date”, “Jul 5, 2002”, 0.4} {“travel.flight.leg1.departure. date”, “Jul 10, 2002”, 0.9} Semantic and Pragmatic Module

departure travel flight leg 1 arrival citydate city {“BOS”, 0.5} {“Jul 5, 2002”, 0.4} {“Jul 10, 2002”, 0.9} {“NYC”, 0.5}

Multi-Modal User Interface Emphasis on synergies between modalities:  Value(s) of attributes are displayed graphically  Erroneous values can be easily corrected via the GUI  Focus (aka context) of speech modality is highlighted  Position and value ambiguity are shown (and typically resolved) via the GUI  Voice prompts are significantly shorter and mostly used to emphasize information that is already displayed graphically  GUI takes full advantage of intelligence of voice UI, e.g., ‘round trip’ speech input will ‘gray out’ the third leg button in the GUI  Seamless integration of semantics from the two modalities using modality-specific pragmatic scores

ASR: I want to fly from Boston to New York on September 6 th. new focus field disabled Example 1: Flight First Leg navigation buttons

Example 2: Flight Second Leg ASR: round trip value induction button disabled

ASR: I want a compact car from AVIS GUI: “rental” button pressed Example 3: Car Rental

Example 4: Ambiguity and Errors

Mixing the Modalities: Turn-Taking “Click to talk” vs “Open Mike”  “Click to talk” can be restrictive  “Open mike” can be confusing (falling out of turn)  Both have limitations Often there is a dominant modality based on  Type of input, e.g., “select from menu” vs enter free text  Recent input history  User preferences System automatically selects the dominant modality and the user can click to change it  Dominant modality selection algorithm is adaptive