Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spoken Dialogue Systems Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.

Similar presentations


Presentation on theme: "Spoken Dialogue Systems Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003."— Presentation transcript:

1 Spoken Dialogue Systems Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003

2 Outline Discourse  Research Issues Spoken Dialogue Systems  Pragmatics (dialogue acts)  Dialogue management Multimodal Systems Examples

3 Definitions Discourse  Monologue  Dialogue

4 Discourse: Research Issues Reference resolution, e.g., “That was a lie”  Anaphora, e.g., “John left …. He was bored.”  Co-reference, e.g., “John” and “He” refer to the same entity Text coherence, e.g.,  Coherence: “John left early. He was tired”  Incoherence: “John left early. He likes spinach”

5 Spoken Dialogue Systems: Concepts Turn-taking  Dialogue Segmentation Grounding  Backchannel, e.g., ‘Mm Hmm’  Acknowledgment  Explicit/implicit confirmation Implicature  “What time are you flying”  “Well, I have a meeting at three” Initiative  “What time are you flying?”  “Don’t feel like booking the flight right now. Lets look at hotels”

6 Speech, Dialogue and Application Acts Speech Acts (Austin 1962, Searle 1975)  Assertive (conclude), Directive (ask, order), Commissive (promise), Expressive(apologize, thank), Declarations Dialogue Acts  Statement, Info-Request, Wh-Question, Yes-No Question, Opening, Closing, Open-Option, Action-Directive, Offer, Commit, Agree etc. Application Acts  Domain specific but general, e.g., Info-Request into system’s semantic state, Info-Request into database, Info-Request into database results

7 Dialogue/Application Act Classification Semantic Parsing follows by deterministic rules, e.g., ‘what’, ‘when’, ‘where’, ‘who’ starts a Wh-Question Bayesian Formulation  Given a sentence W the most probable dialogue act A is argmax P(A|W) = argmax P(W|A) P(A)  P(W|A) can be an n-gram model one for each dialogue act  P(A) can also be an n-gram model of dialogue actsn-gram model of dialogue acts

8 Dialogue Management 1 Frame-based, e.g.,  DeptCity “From what city are you leaving?” GRM_CITY  ArrCity “Where are you flying to” GRM_CITY  DeptTime “What time would you like to fly?” GRM_TIME  DeptDate “When are you flying?” GRM_DATETIME DeptTime Finite state machine dialogue manager Mostly system-initiated dialogue VXML-like dialogue structure (forms and frames)

9 Dialogue Management 2 Application Independent Flow Chart structure Generic dialogue/application manager (really this is more like a controller)

10 Dialogue Management 3 Generalized Finite State Machine Dialogue Management Application Dependent but General Dialogue Superstates  Fill: adaptive dialogue module, uses dynamic e-forms to elicit AV pairs from the user; resolves value and tree-position ambiguities  Navigate: presents database results and lets the user select the appropriate ones Fill Verify Create Query Navigate No Yes Is Full Is Correct

11 Advanced Dialogue Systems Mixed Initiative:  Allow user to say anything (global grammar active at all states), e.g., “What date are you flying” “I am flying next Tuesday in the morning”  Allow user to navigate the systems state machine, e.g., “I would like to look at hotels first”  Open prompts, give user the initiative, e.g., “What next?” Advanced dialogue features  Corrections, e.g., “No not Boston, Atlanta”  Negation, e.g., “Anything but Olympic”  Complex semantic expressions, e.g., “tomorrow evening or Sunday morning”  Ambiguity resolution and representation, e.g., “next Tuesday”  Persistent Semantics, e.g., “Info about his organization”

12 Ambiguity Representation Position ambiguity Value ambiguity

13 Error Correction Mechanisms Sys: How can I help you? Usr: I’d like to fly from Austin to Newark on August fifth in the morning Asr: I’d like to fly from Boston to Newark on August fifth in the morning Sys: I've got you leaving Boston on Sunday, August fifth in the morning and arriving in Newark airport. What is your preferred airline?

14 Usr: Leaving from Austin, Texas Sys: Sorry …, are you leaving from Austin, Texas, or from Boston? Usr 1 : Austin, Texas Sys: Leaving from Austin, Texas. Usr 2 : Change the departure city to Austin, Texas Alternate: use error correction

15 Spoken Dialogue System Architecture Controller Database Parser TTS Platform ASR Telephony Generation App. Controller DM/Initiative Interpreter/Context Tr. AI …

16 System Architecture and Portability Ambiguity representation Pragmatic Confidence Scores Application dependent Application independent Dialogue Manager SemanticsPragmaticsGeneration Parser Semantic Interpreter Context Tracker Pragmatic Interpreter Expert Domain Knowledge Initiative Tracking Utterance Planner Surface Realizer Controller

17 Advantages of application- centric system design:  Increased modularity.  Flexible multi-stage data collection.  Extensible to multi-modal input (universal access).

18 Multimodal Systems Definition Input Modalities/Output Media Research Issues  User Interface Design  Semantic Module Examples

19 Input Modalities/Output Media Unimodal:  Speech input/Speech output. Multimodal:  Speech+DTMF input/Speech output.  Speech input/Speech and GUI output.  Speech and pen input/Speech and GUI output. Definitions:  Pen input: buttons, pull-down menus, graffiti, pen gestures.  GUI output: text and graphics SDPS+ D S+ P S G S+G

20 Issues Semantic/Pragmatic Module:  Merging semantic information from different modalities, e.g., “Draw a line from here to there”  Ambiguity representation and resolution User Interface:  Synergies between input modalities  Turn-taking and appropriate mix of modalities  Maintain interface consistency  Focus/context visualization System issues:  Synchronization and latency

21 July fifth 7/10 NL ParserGUI Parser Pragmatic Analysis Update Semantic Tree & Pragmatic Scores Context Tracking GUI InterpreterNL InterpreterGUI InterpreterNL Interpreter “fifth” “July” “10” “7” “/” {“date”, “Jul 5, 2002”}{“date”, “Jul 10, 2002”} {“travel.flight.leg1.departure. date”, “Jul 5, 2002”} {“travel.flight.leg1.departure. date”, “Jul 10, 2002”} {“travel.flight.leg1.departure. date”, “Jul 5, 2002”, 0.4} {“travel.flight.leg1.departure. date”, “Jul 10, 2002”, 0.9} Semantic and Pragmatic Module

22 departure travel flight leg 1 arrival citydate city {“BOS”, 0.5} {“Jul 5, 2002”, 0.4} {“Jul 10, 2002”, 0.9} {“NYC”, 0.5}

23 Multi-Modal User Interface Emphasis on synergies between modalities:  Value(s) of attributes are displayed graphically  Erroneous values can be easily corrected via the GUI  Focus (aka context) of speech modality is highlighted  Position and value ambiguity are shown (and typically resolved) via the GUI  Voice prompts are significantly shorter and mostly used to emphasize information that is already displayed graphically  GUI takes full advantage of intelligence of voice UI, e.g., ‘round trip’ speech input will ‘gray out’ the third leg button in the GUI  Seamless integration of semantics from the two modalities using modality-specific pragmatic scores

24 ASR: I want to fly from Boston to New York on September 6 th. new focus field disabled Example 1: Flight First Leg navigation buttons

25 Example 2: Flight Second Leg ASR: round trip value induction button disabled

26 ASR: I want a compact car from AVIS GUI: “rental” button pressed Example 3: Car Rental

27 Example 4: Ambiguity and Errors

28 Mixing the Modalities: Turn-Taking “Click to talk” vs “Open Mike”  “Click to talk” can be restrictive  “Open mike” can be confusing (falling out of turn)  Both have limitations Often there is a dominant modality based on  Type of input, e.g., “select from menu” vs enter free text  Recent input history  User preferences System automatically selects the dominant modality and the user can click to change it  Dominant modality selection algorithm is adaptive


Download ppt "Spoken Dialogue Systems Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003."

Similar presentations


Ads by Google