Spoken Dialogue Systems

Spoken Dialogue Systems
Julia Hirschberg CS 4706 9/19/2018

Today Basic Conversational Agents Dialogue Manager Design
ASR NLU Generation Dialogue Manager Dialogue Manager Design Finite State Frame-based Initiative: User, System, Mixed Information-State Dialogue-Act Detection Dialogue-Act Generation Evaluation Utility-based conversational agents MDP, POMDP 9/19/2018

Conversational Agents
AKA: Interactive Voice Response Systems Dialogue Systems Spoken Dialogue Systems Applications: Travel arrangements (Amtrak, United airlines) Telephone call routing Tutoring Communicating with robots Anything with limited screen/keyboard 9/19/2018

A travel dialog: Communicator
9/19/2018

Call routing: ATT HMIHY
9/19/2018

A tutorial dialogue: ITSPOKE
9/19/2018

Conversational Structure
Telephone conversations Stage 1: Enter a conversation Stage 2: Identification Stage 3: Establish joint willingness to converse Stage 4: First topic is raised, usually by caller 9/19/2018

Why is this customer confused?
Customer: (rings) Operator: Directory Enquiries, for which town please? Customer: Could you give me the phone number of um: Mrs. um: Smithson? Operator: Yes, which town is this at please? Customer: Huddleston. Operator: Yes. And the name again? Customer: Mrs. Smithson 9/19/2018

Why is this customer confused?
A: And, what day in May did you want to travel? C: OK, uh, I need to be there for a meeting that’s from the 12th to the 15th. Note that client did not answer question. Meaning of client’s sentence: Meeting Start-of-meeting: 12th End-of-meeting: 15th Doesn’t say anything about flying!!!!! How does agent infer client is informing him/her of travel dates? 9/19/2018

Will this client be confused?
A: … there’s 3 non-stops today. True if in fact 7 non-stops today. But agent means: 3 and only 3. How can client infer that agent means: only 3 9/19/2018

Grice: conversational implicature
Implicature means a particular class of licensed inferences. Grice (1975) proposed that what enables hearers to draw correct inferences is: Cooperative Principle This is a tacit agreement by speakers and listeners to cooperate in communication 9/19/2018

4 Gricean Maxims Relevance: Be relevant
Quantity: Do not make your contribution more or less informative than required Quality: try to make your contribution one that is true (don’t say things that are false or for which you lack adequate evidence) Manner: Avoid ambiguity and obscurity; be brief and orderly 9/19/2018

Relevance A: Is Regina here? B: Her car is outside. Implication: yes
Hearer thinks: why would he mention the car? It must be relevant. How could it be relevant? It could since if her car is here she is probably here. Client: I need to be there for a meeting that’s from the 12th to the 15th Hearer thinks: Speaker is following maxims, would only have mentioned meeting if it was relevant. How could meeting be relevant? If client meant me to understand that he had to depart in time for the mtg. 9/19/2018

Quantity A:How much money do you have on you? B: I have 5 dollars
Implication: not 6 dollars Similarly, 3 non stops can’t mean 7 non-stops (hearer thinks: if speaker meant 7 non-stops she would have said 7 non-stops A: Did you do the reading for today’s class? B: I intended to Implication: No B’s answer would be true if B intended to do the reading AND did the reading, but would then violate maxim 9/19/2018

Dialogue System Architecture
9/19/2018

Speech recognition Input: acoustic waveform Output: string of words
Basic components: a recognizer for phones, small sound units like [k] or [ae]. a pronunciation dictionary like cat = [k ae t] a grammar telling us what words are likely to follow what words A search algorithm to find the best string of words 9/19/2018

Natural Language Understanding
Or “NLU” Or “Computational semantics” There are many ways to represent the meaning of sentences For speech dialogue systems, most common is “Frame and slot semantics”. 9/19/2018

An example of a frame Show me morning flights from Boston to SF on Tuesday. SHOW: FLIGHTS: ORIGIN: CITY: Boston DATE: Tuesday TIME: morning DEST: CITY: San Francisco 9/19/2018

Semantics for a sentence
LIST FLIGHTS ORIGIN Show me flights from Boston DESTINATION DEPARTDATE to San Francisco on Tuesday DEPARTTIME morning 9/19/2018

Generation and TTS Generation component TTS component
Chooses concepts to express to user Plans out how to express these concepts in words Assigns any necessary prosody to the words TTS component Takes words and prosodic annotations Synthesizes a waveform 9/19/2018

Generation Component Content Planner Language Generation
Decides what content to express to user (ask a question, present an answer, etc) Often merged with dialogue manager Language Generation Chooses syntactic structures and words to express meaning. Simplest method All words in sentence are prespecified! “Template-based generation” Can have variables: What time do you want to leave CITY-ORIG? Will you return to CITY-ORIG from CITY-DEST? 9/19/2018

More sophisticated language generation component
Natural Language Generation Approach: Dialogue manager builds representation of meaning of utterance to be expressed Passes this to a “generator” Generators have three components Sentence planner Surface realizer Prosody assigner 9/19/2018

Architecture of a generator for a dialogue system (after Walker and Rambow 2002)
9/19/2018

HCI constraints on generation for dialogue: “Coherence”
Discourse markers and pronouns (“Coherence”): (1) Please say the date. … Please say the start time. Please say the duration… Please say the subject… (2) First, tell me the date. Next, I’ll need the time it starts. Thanks. <pause> Now, how long is it supposed to last? Last of all, I just need a brief description Bad! Good! 9/19/2018

HCI constraints on generation for dialogue: coherence (II): tapered prompts
Prompts which get incrementally shorter: System: Now, what’s the first company to add to your watch list? Caller: Cisco System: What’s the next company name? (Or, you can say, “Finished”) Caller: IBM System: Tell me the next company name, or say, “Finished.” Caller: Intel System: Next one? Caller: America Online. System: Next? Caller: … 9/19/2018

Dialogue Manager Controls the architecture and structure of dialogue
Takes input from ASR/NLU components Maintains some sort of state Interfaces with Task Manager Passes output to NLG/TTS modules 9/19/2018

Four architectures for dialogue management
Finite State Frame-based Information State Markov Decision Processes AI Planning 9/19/2018

Finite-State Dialogue Management
Consider a trivial airline travel system Ask the user for a departure city For a destination city For a time Whether the trip is round-trip or not 9/19/2018

Finite State Dialogue Manager
9/19/2018

Finite-state Dialogue Managers
System completely controls the conversation with the user Asks the user a series of questions Ignores (or misinterprets) anything the user says that is not a direct answer to the system’s questions 9/19/2018

Dialogue Initiative Systems that control conversation like this are system initiative or single initiative. “Initiative”: who has control of conversation In normal human-human dialogue, initiative shifts back and forth between participants. 9/19/2018

System Initiative Systems which completely control the conversation at all times are called system initiative. Advantages: Simple to build User always knows what they can say next System always knows what user can say next Known words: Better performance from ASR Known topic: Better performance from NLU Ok for VERY simple tasks (entering a credit card, or login name and password) Disadvantage: Too limited 9/19/2018

User Initiative User directs the system
Generally, user asks a single question, system answers System can’t ask questions back, engage in clarification dialogue, confirmation dialogue Used for simple database queries User asks question, system gives answer Web search is user initiative dialogue. 9/19/2018

Problems with System Initiative
Real dialogue involves give and take! In travel planning, users might want to say something that is not the direct answer to the question. For example answering more than one question in a sentence: Hi, I’d like to fly from Seattle Tuesday morning I want a flight from Milwaukee to Orlando one way leaving after 5 p.m. on Wednesday. 9/19/2018

Single initiative + universals
We can give users a little more flexibility by adding universal commands Universals: commands you can say anywhere As if we augmented every state of FSA with these Help Start over Correct This describes many implemented systems But still doesn’t allow user to say what the want to say 9/19/2018

Mixed Initiative Slot Question
Conversational initiative can shift between system and user Simplest kind of mixed initiative: use the structure of the frame itself to guide dialogue Slot Question ORIGIN What city are you leaving from? DEST Where are you going? DEPT DATE What day would you like to leave? DEPT TIME What time would you like to leave? AIRLINE What is your preferred airline? 9/19/2018

Frames are mixed-initiative
User can answer multiple questions at once. System asks questions of user, filling any slots that user specifies When frame is filled, do database query If user answers 3 questions at once, system has to fill slots and not ask these questions again! Anyhow, we avoid the strict constraints on order of the finite-state architecture. 9/19/2018

Multiple frames flights, hotels, rental cars
Flight legs: Each flight can have multiple legs, which might need to be discussed separately Presenting the flights (If there are multiple flights meeting users constraints) It has slots like 1ST_FLIGHT or 2ND_FLIGHT so user can ask “how much is the second one” General route information: Which airlines fly from Boston to San Francisco Airfare practices: Do I have to stay over Saturday to get a decent airfare? 9/19/2018

Multiple Frames Need to be able to switch from frame to frame
Based on what user says. Disambiguate which slot of which frame an input is supposed to fill, then switch dialogue control to that frame. Main implementation: production rules Different types of inputs cause different productions to fire Each of which can flexibly fill in different frames Can also switch control to different frame 9/19/2018

Defining Mixed Initiative
Mixed Initiative could mean User can arbitrarily take or give up initiative in various ways This is really only possible in very complex plan-based dialogue systems No commercial implementations Important research area Something simpler and quite specific which we will define in the next few slides 9/19/2018

True Mixed Initiative 9/19/2018

How mixed initiative is usually defined
First we need to define two other factors Open prompts vs. directive prompts Restrictive versus non-restrictive grammar 9/19/2018

Open vs. Directive Prompts
Open prompt System gives user very few constraints User can respond how they please: “How may I help you?” “How may I direct your call?” Directive prompt Explicit instructs user how to respond “Say yes if you accept the call; otherwise, say no” 9/19/2018

Restrictive vs. Non-restrictive grammars
Language model which strongly constrains the ASR system, based on dialogue state Non-restrictive grammar Open language model which is not restricted to a particular dialogue state 9/19/2018

Definition of Mixed Initiative
Grammar Open Prompt Directive Prompt Restrictive Doesn’t make sense System Initiative Non-restrictive User Initiative Mixed Initiative 9/19/2018

VoiceXML Voice eXtensible Markup Language
An XML-based dialogue design language Makes use of ASR and TTS Deals well with simple, frame-based mixed initiative dialogue. Most common in commercial world (too limited for research systems) But useful to get a handle on the concepts. 9/19/2018

Voice XML Each dialogue is a <form>. (Form is the VoiceXML word for frame) Each <form> generally consists of a sequence of <field>s, with other commands 9/19/2018

Sample vxml doc <form> <field name="transporttype">
<prompt> Please choose airline, hotel, or rental car. </prompt> <grammar type="application/x=nuance-gsl"> [airline hotel "rental car"] </grammar> </field> <block> You have chosen <value expr="transporttype">. </prompt> </block> </form> 9/19/2018

VoiceXML interpreter Walks through a VXML form in document order
Iteratively selecting each item If multiple fields, visit each one in order. Special commands for events 9/19/2018

Another vxml doc (1) <noinput>
I'm sorry, I didn't hear you. <reprompt/> </noinput> - “noinput” means silence exceeds a timeout threshold <nomatch> I'm sorry, I didn't understand that. <reprompt/> </nomatch> - “nomatch” means confidence value for utterance is too low - notice “reprompt” command 9/19/2018

Another vxml doc (2) <form>
<block> Welcome to the air travel consultant. </block> <field name="origin"> <prompt> Which city do you want to leave from? </prompt> <grammar type="application/x=nuance-gsl"> [(san francisco) denver (new york) barcelona] </grammar> <filled> <prompt> OK, from <value expr="origin"> </prompt> </filled> </field> - “filled” tag is executed by interpreter as soon as field filled by user 9/19/2018

Another vxml doc (3) <field name="destination">
<prompt> And which city do you want to go to? </prompt> <grammar type="application/x=nuance-gsl"> [(san francisco) denver (new york) barcelona] </grammar> <filled> <prompt> OK, to <value expr="destination"> </prompt> </filled> </field> <field name="departdate" type="date"> <prompt> And what date do you want to leave? </prompt> <prompt> OK, on <value expr="departdate"> </prompt> 9/19/2018

Another vxml doc (4) <block>
<prompt> OK, I have you are departing from <value expr="origin”> to <value expr="destination”> on <value expr="departdate"> </prompt> send the info to book a flight... </block> </form> 9/19/2018

Summary: VoiceXML Voice eXtensible Markup Language
An XML-based dialogue design language Makes use of ASR and TTS Deals well with simple, frame-based mixed initiative dialogue. Most common in commercial world (too limited for research systems) But useful to get a handle on the concepts. 9/19/2018

Information-State and Dialogue Acts
If we want a dialogue system to be more than just form-filling Needs to: Decide when the user has asked a question, made a proposal, rejected a suggestion Ground a user’s utterance, ask clarification questions, suggestion plans Suggests: Conversational agent needs sophisticated models of interpretation and generation In terms of speech acts and grounding Needs more sophisticated representation of dialogue context than just a list of slots 9/19/2018

Information-state architecture
Dialogue act interpreter Dialogue act generator Set of update rules Update dialogue state as acts are interpreted Generate dialogue acts Control structure to select which update rules to apply 9/19/2018

Information-state 9/19/2018

Dialogue acts Also called “conversational moves”
An act with (internal) structure related specifically to its dialogue function Incorporates ideas of grounding Incorporates other dialogue and conversational functions that Austin and Searle didn’t seem interested in 9/19/2018

Verbmobil task Two-party scheduling dialogues
Speakers were asked to plan a meeting at some future date Data used to design conversational agents which would help with this task (cross-language, translating, scheduling assistant) 9/19/2018

Verbmobil Dialogue Acts
THANK thanks GREET Hello Dan INTRODUCE It’s me again BYE Allright, bye REQUEST-COMMENT How does that look? SUGGEST June 13th through 17th REJECT No, Friday I’m booked all day ACCEPT Saturday sounds fine REQUEST-SUGGEST What is a good day of the week for you? INIT I wanted to make an appointment with you GIVE_REASON Because I have meetings all afternoon FEEDBACK Okay DELIBERATE Let me check my calendar here CONFIRM Okay, that would be wonderful CLARIFY Okay, do you mean Tuesday the 23rd? 9/19/2018

Automatic Interpretation of Dialogue Acts
How do we automatically identify dialogue acts? Given an utterance: Decide whether it is a QUESTION, STATEMENT, SUGGEST, or ACK Recognizing illocutionary force will be crucial to building a dialogue agent Perhaps we can just look at the form of the utterance to decide? 9/19/2018

Can we just use the surface syntactic form?
YES-NO-Q’s have auxiliary-before-subject syntax: Will breakfast be served on USAir 1557? STATEMENTs have declarative syntax: I don’t care about lunch COMMAND’s have imperative syntax: Show me flights from Milwaukee to Orlando on Thursday night 9/19/2018

Surface form != speech act type
Locutionary Force Illocutionary Can I have the rest of your sandwich? Question Request I want the rest of your sandwich Declarative Give me your sandwich! Imperative 9/19/2018

Dialogue act disambiguation is hard! Who’s on First?
Abbott: Well, Costello, I'm going to New York with you. Bucky Harris the Yankee's manager gave me a job as coach for as long as you're on the team. Costello: Look Abbott, if you're the coach, you must know all the players. Abbott: I certainly do. Costello: Well you know I've never met the guys. So you'll have to tell me their names, and then I'll know who's playing on the team. Abbott: Oh, I'll tell you their names, but you know it seems to me they give these ball players now-a-days very peculiar names. Costello: You mean funny names? Abbott: Strange names, pet names...like Dizzy Dean... Costello: His brother Daffy Abbott: Daffy Dean... Costello: And their French cousin. Abbott: French? Costello: Goofe' Abbott: Goofe' Dean. Well, let's see, we have on the bags, Who's on first, What's on second, I Don't Know is on third... Costello: That's what I want to find out. Abbott: I say Who's on first, What's on second, I Don't Know's on third. 9/19/2018

Dialogue act ambiguity
Who’s on first? INFO-REQUEST or STATEMENT 9/19/2018

Dialogue Act ambiguity
Can you give me a list of the flights from Atlanta to Boston? This looks like an INFO-REQUEST. If so, the answer is: YES. But really it’s a DIRECTIVE or REQUEST, a polite form of: Please give me a list of the flights… What looks like a QUESTION can be a REQUEST 9/19/2018

Dialogue Act ambiguity
Similarly, what looks like a STATEMENT can be a QUESTION: Us OPEN-OPTION I was wanting to make some arrangements for a trip that I’m going to be taking uh to LA uh beginnning of the week after next Ag HOLD OK uh let me pull up your profile and I’ll be right with you here. [pause] CHECK And you said you wanted to travel next week? ACCEPT Uh yes. 9/19/2018

Indirect speech acts Utterances which use a surface statement to ask a question Utterances which use a surface question to issue a request 9/19/2018

DA interpretation as statistical classification
Lots of clues in each sentence that can tell us which DA it is: Words and Collocations: Please or would you: good cue for REQUEST Are you: good cue for INFO-REQUEST Prosody: Rising pitch is a good cue for INFO-REQUEST Loudness/stress can help distinguish yeah/AGREEMENT from yeah/BACKCHANNEL Conversational Structure Yeah following a proposal is probably AGREEMENT; yeah following an INFORM probably a BACKCHANNEL 9/19/2018

Statistical classifier model of dialogue act interpretation
Our goal is to decide for each sentence what dialogue act it is This is a classification task (we are making a 1-of-N classification decision for each sentence) With N classes (= number of dialog acts). Three probabilistic models corresponding to the 3 kinds of cues from the input sentence. Conversational Structure: Probability of one dialogue act following another P(Answer|Question) Words and Syntax: Probability of a sequence of words given a dialogue act: P(“do you” | Question) Prosody: probability of prosodic features given a dialogue act : P(“rise at end of sentence” | Question) 9/19/2018

An example of dialogue act detection: Correction Detection
Despite all these clever confirmation/rejection strategies, dialogue systems still make mistakes (Surprise!) If system misrecognizes an utterance, and either Rejects Via confirmation, displays its misunderstanding Then user has a chance to make a correction Repeat themselves Rephrasing Saying “no” to the confirmation question. 9/19/2018

Corrections Unfortunately, corrections are harder to recognize than normal sentences! Swerts et al (2000): corrections misrecognized twice as often (in terms of WER) as non-corrections!!! Why? Prosody seems to be largest factor: hyperarticulation English Example from Liz Shriberg “NO, I am DE-PAR-TING from Jacksonville) A German example from Bettina Braun from a talking elevator 9/19/2018

A Labeled dialogue (Swerts et al)
9/19/2018

Machine Learning and Classifiers
Given a labeled training set We can build a classifier to label observations into classes Decision Tree Regression SVM I won’t introduce the algorithms here. But these are at the core of NLP/computational linguistics/Speech/Dialogue You can learn them in: AI - CS 121/221 Machine Learning CS 229 9/19/2018

Machine learning to detect user corrections
Build classifiers using features like Lexical information (words “no”, “correction”, “I don’t”, swear words) Prosodic features (various increases in F0 range, pause duration, and word duration that correlation with hyperarticulation) Length ASR confidence LM probability Various dialogue features (repetition) 9/19/2018

Generating Dialogue Acts
Confirmation Rejection 9/19/2018

Confirmation Another reason for grounding
Errors: Speech is a pretty errorful channel Even for humans; so they use grounding to confirm that they heard correctly ASR is way worse than humans! So dialogue systems need to do even more grounding and confirmation than humans 9/19/2018

Explicit confirmation
S: Which city do you want to leave from? U: Baltimore S: Do you want to leave from Baltimore? U: Yes 9/19/2018

Explicit confirmation
U: I’d like to fly from Denver Colorado to New York City on September 21st in the morning on United Airlines S: Let’s see then. I have you going from Denver Colorado to New York on September 21st. Is that correct? U: Yes 9/19/2018

Implicit confirmation: display
U: I’d like to travel to Berlin S: When do you want to travel to Berlin? U: Hi I’d like to fly to Seattle Tuesday morning S: Traveling to Seattle on Tuesday, August eleventh in the morning. Your name? 9/19/2018

Implicit vs. Explicit Complementary strengths
Explicit: easier for users to correct systems’s mistakes (can just say “no”) But explicit is cumbersome and long Implicit: much more natural, quicker, simpler (if system guesses right). 9/19/2018

Implicit and Explicit Early systems: all-implicit or all-explicit
Modern systems: adaptive How to decide? ASR system can give confidence metric. This expresses how convinced system is of its transcription of the speech If high confidence, use implicit confirmation If low confidence, use explicit confirmation 9/19/2018

Computing confidence Simplest: use acoustic log-likelihood of user’s utterance More features Prosodic: utterances with longer pauses, F0 excursions, longer durations Backoff: did we have to backoff in the LM? Cost of an error: Explicit confirmation before moving money or booking flights 9/19/2018

Rejection e.g., VoiceXML “nomatch”
“I’m sorry, I didn’t understand that.” Reject when: ASR confidence is low Best interpretation is semantically ill-formed Might have four-tiered level of confidence: Below confidence threshhold, reject Above threshold, explicit confirmation If even higher, implicit confirmation Even higher, no confirmation 9/19/2018

Dialogue System Evaluation
Key point about SLP. Whenever we design a new algorithm or build a new application, need to evaluate it Two kinds of evaluation Extrinsic: embedded in some external task Intrinsic: some sort of more local evaluation. How to evaluate a dialogue system? What constitutes success or failure for a dialogue system? 9/19/2018

Dialogue System Evaluation
It turns out we’ll need an evaluation metric for two reasons 1) the normal reason: we need a metric to help us compare different implementations can’t improve it if we don’t know where it fails Can’t decide between two algorithms without a goodness metric 2) a new reason: we will need a metric for “how good a dialogue went” as an input to reinforcement learning: automatically improve our conversational agent performance via learning 9/19/2018

Evaluating Dialogue Systems
PARADISE framework (Walker et al ’00) “Performance” of a dialogue system is affected both by what gets accomplished by the user and the dialogue agent and how it gets accomplished Maximize Task Success Minimize Costs Efficiency Measures Qualitative Measures 9/19/2018 Slide from Julia Hirschberg

PARADISE evaluation again:
Maximize Task Success Minimize Costs Efficiency Measures Quality Measures PARADISE (PARAdigm for Dialogue System Evaluation) 9/19/2018

Task Success % of subtasks completed
Correctness of each questions/answer/error msg Correctness of total solution Attribute-Value matrix (AVM) Kappa coefficient Users’ perception of whether task was completed 9/19/2018

Task Success Task goals seen as Attribute-Value Matrix
ELVIS retrieval task (Walker et al ‘97) “Find the time and place of your meeting with Kim.” Attribute Value Selection Criterion Kim or Meeting Time 10:30 a.m. Place 2D516 Task success can be defined by match between AVM values at end of task with “true” values for AVM 9/19/2018 Slide from Julia Hirschberg

Efficiency Cost Total elapsed time in seconds or turns
Polifroni et al. (1992), Danieli and Gerbino (1995) Hirschman and Pao (1993) Total elapsed time in seconds or turns Number of queries Turn correction ration: number of system or user turns used solely to correct errors, divided by total number of turns 9/19/2018

Quality Cost # of times ASR system failed to return any sentence
# of ASR rejection prompts # of times user had to barge-in # of time-out prompts Inappropriateness (verbose, ambiguous) of system’s questions, answers, error messages 9/19/2018

Another key quality cost
“Concept accuracy” or “Concept error rate” % of semantic concepts that the NLU component returns correctly I want to arrive in Austin at 5:00 DESTCITY: Boston Time: 5:00 Concept accuracy = 50% Average this across entire dialogue “How many of the sentences did the system understand correctly” 9/19/2018

PARADISE: Regress against user satisfaction
9/19/2018

Regressing against user satisfaction
Questionnaire to assign each dialogue a “user satisfaction rating”: this is dependent measure Set of cost and success factors are independent measures Use regression to train weights for each factor 9/19/2018

Experimental Procedures
Subjects given specified tasks Spoken dialogues recorded Cost factors, states, dialog acts automatically logged; ASR accuracy,barge-in hand-labeled Users specify task solution via web page Users complete User Satisfaction surveys Use multiple linear regression to model User Satisfaction as a function of Task Success and Costs; test for significant predictive factors 9/19/2018 Slide from Julia Hirschberg

User Satisfaction: Sum of Many Measures
Was the system easy to understand? (TTS Performance) Did the system understand what you said? (ASR Performance) Was it easy to find the message/plane/train you wanted? (Task Ease) Was the pace of interaction with the system appropriate? (Interaction Pace) Did you know what you could say at each point of the dialog? (User Expertise) How often was the system sluggish and slow to reply to you? (System Response) Did the system work the way you expected it to in this conversation? (Expected Behavior) Do you think you'd use the system regularly in the future? (Future Use) 9/19/2018 Adapted from Julia Hirschberg

Performance Functions from Three Systems
ELVIS User Sat.= .21* COMP * MRS * ET TOOT User Sat.= .35* COMP + .45* MRS - .14*ET ANNIE User Sat.= .33*COMP + .25* MRS +.33* Help COMP: User perception of task completion (task success) MRS: Mean (concept) recognition accuracy (cost) ET: Elapsed time (cost) Help: Help requests (cost) 9/19/2018 Slide from Julia Hirschberg

Performance Model Perceived task completion and mean recognition score (concept accuracy) are consistently significant predictors of User Satisfaction Performance model useful for system development Making predictions about system modifications Distinguishing ‘good’ dialogues from ‘bad’ dialogues As part of a learning model 9/19/2018

Now that we have a success metric
Could we use it to help drive learning? In recent work we use this metric to help us learn an optimal policy or strategy for how the conversational agent should behave 9/19/2018

New Idea: Modeling a dialogue system as a probabilistic agent
A conversational agent can be characterized by: The current knowledge of the system A set of states S the agent can be in a set of actions A the agent can take A goal G, which implies A success metric that tells us how well the agent achieved its goal A way of using this metric to create a strategy or policy  for what action to take in any particular state. 9/19/2018

What do we mean by actions A and policies ?
Kinds of decisions a conversational agent needs to make: When should I ground/confirm/reject/ask for clarification on what the user just said? When should I ask a directive prompt, when an open prompt? When should I use user, system, or mixed initiative? 9/19/2018

A threshold is a human-designed policy!
Could we learn what the right action is Rejection Explicit confirmation Implicit confirmation No confirmation By learning a policy which, given various information about the current state, dynamically chooses the action which maximizes dialogue success 9/19/2018

Another strategy decision
Open versus directive prompts When to do mixed initiative 9/19/2018

Outline The Linguistics of Conversation Basic Conversational Agents
ASR NLU Generation Dialogue Manager Dialogue Manager Design Finite State Frame-based Initiative: User, System, Mixed VoiceXML Information-State Dialogue-Act Detection Dialogue-Act Generation Evaluation Utility-based conversational agents MDP, POMDP 9/19/2018

END of TODAY’S LECTURE THE FOLLOWING SLIDES ARE AN OPTIONAL ADVANCED DISCUSSION OF MARKOV-DECISION-PROCESS DIALOGUE SYSTEMS. 9/19/2018

Review: Open vs. Directive Prompts
Open prompt System gives user very few constraints User can respond how they please: “How may I help you?” “How may I direct your call?” Directive prompt Explicit instructs user how to respond “Say yes if you accept the call; otherwise, say no” 9/19/2018

Review: Restrictive vs. Non-restrictive gramamrs
Restrictive grammar Language model which strongly constrains the ASR system, based on dialogue state Non-restrictive grammar Open language model which is not restricted to a particular dialogue state 9/19/2018

Kinds of Initiative How do I decide which of these initiatives to use at each point in the dialogue? Grammar Open Prompt Directive Prompt Restrictive Doesn’t make sense System Initiative Non-restrictive User Initiative Mixed Initiative 9/19/2018

Modeling a dialogue system as a probabilistic agent
A conversational agent can be characterized by: The current knowledge of the system A set of states S the agent can be in a set of actions A the agent can take A goal G, which implies A success metric that tells us how well the agent achieved its goal A way of using this metric to create a strategy or policy  for what action to take in any particular state. 9/19/2018

Goals are not enough Goal: user satisfaction
OK, that’s all very well, but Many things influence user satisfaction We don’t know user satisfaction til after the dialogue is done How do we know, state by state and action by action, what the agent should do? We need a more helpful metric that can apply to each state 9/19/2018

Utility A utility function Principle of Maximum Expected Utility:
maps a state or state sequence onto a real number describing the goodness of that state I.e. the resulting “happiness” of the agent Principle of Maximum Expected Utility: A rational agent should choose an action that maximizes the agent’s expected utility 9/19/2018

Maximum Expected Utility
Principle of Maximum Expected Utility: A rational agent should choose an action that maximizes the agent’s expected utility Action A has possible outcome states Resulti(A) E: agent’s evidence about current state of world Before doing A, agent estimates prob of each outcome P(Resulti(A)|Do(A),E) Thus can compute expected utility: 9/19/2018

Utility (Russell and Norvig)
9/19/2018

Markov Decision Processes
Or MDP Characterized by: a set of states S an agent can be in a set of actions A the agent can take A reward r(a,s) that the agent receives for taking an action in a state (+ Some other things I’ll come back to (gamma, state transition probabilities)) 9/19/2018

A brief tutorial example
Levin et al (2000) A Day-and-Month dialogue system Goal: fill in a two-slot frame: Month: November Day: 12th Via the shortest possible interaction with user 9/19/2018

What is a state? In principle, MDP state could include any possible information about dialogue Complete dialogue history so far Usually use a much more limited set Values of slots in current frame Most recent question asked to user Users most recent answer ASR confidence etc 9/19/2018

State in the Day-and-Month example
Values of the two slots day and month. Total: 2 special initial state si and sf. 365 states with a day and month 1 state for leap year 12 states with a month but no day 31 states with a day but no month 411 total states 9/19/2018

Actions in MDP models of dialogue
Speech acts! Ask a question Explicit confirmation Rejection Give the user some database information Tell the user their choices Do a database query 9/19/2018

Actions in the Day-and-Month example
ad: a question asking for the day am: a question asking for the month adm: a question asking for the day+month af: a final action submitting the form and terminating the dialogue 9/19/2018

A simple reward function
For this example, let’s use a cost function A cost function for entire dialogue Let Ni=number of interactions (duration of dialogue) Ne=number of errors in the obtained values (0-2) Nf=expected distance from goal (0 for complete date, 1 if either data or month are missing, 2 if both missing) Then (weighted) cost is: C = wiNi + weNe + wfNf 9/19/2018

3 possible policies Dumb Open prompt Directive prompt
P1=probability of error in open prompt Open prompt Directive prompt P2=probability of error in directive prompt 9/19/2018

3 possible policies Strategy 3 is better than strategy 2 when
improved error rate justifies longer interaction: P1=probability of error in open prompt open P2=probability of error in directive prompt directive 9/19/2018

That was an easy optimization
Only two actions, only tiny # of policies In general, number of actions, states, policies is quite large So finding optimal policy * is harder We need reinforcement leraning Back to MDPs: 9/19/2018

MDP We can think of a dialogue as a trajectory in state space
The best policy * is the one with the greatest expected reward over all trajectories How to compute a reward for a state sequence? 9/19/2018

Reward for a state sequence
One common approach: discounted rewards Cumulative reward Q of a sequence is discounted sum of utilities of individual states Discount factor  between 0 and 1 Makes agent care more about current than future rewards; the more future a reward, the more discounted its value 9/19/2018

The Markov assumption MDP assumes that state transitions are Markovian
9/19/2018

Expected reward for an action
Expected cumulative reward Q(s,a) for taking a particular action from a particular state can be computed by Bellman equation: Expected cumulative reward for a given state/action pair is: immediate reward for current state + expected discounted utility of all possible next states s’ Weighted by probability of moving to that state s’ And assuming once there we take optimal action a’ 9/19/2018

What we need for Bellman equation
A model of p(s’|s,a) Estimate of R(s,a) How to get these? If we had labeled training data P(s’|s,a) = C(s,s’,a)/C(s,a) If we knew the final reward for whole dialogue R(s1,a1,s2,a2,…,sn) Given these parameters, can use value iteration algorithm to learn Q values (pushing back reward values over state sequences) and hence best policy 9/19/2018

Final reward What is the final reward for whole dialogue R(s1,a1,s2,a2,…,sn)? This is what our automatic evaluation metric PARADISE computes! The general goodness of a whole dialogue!!!!! 9/19/2018

How to estimate p(s’|s,a) without labeled data
Have random conversations with real people Carefully hand-tune small number of states and policies Then can build a dialogue system which explores state space by generating a few hundred random conversations with real humans Set probabilities from this corpus Have random conversations with simulated people Now you can have millions of conversations with simulated people So you can have a slightly larger state space 9/19/2018

An example Singh, S., D. Litman, M. Kearns, and M. Walker Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System. Journal of AI Research. NJFun system, people asked questions about recreational activities in New Jersey Idea of paper: use reinforcement learning to make a small set of optimal policy decisions 9/19/2018

Very small # of states and acts
States: specified by values of 8 features Which slot in frame is being worked on (1-4) ASR confidence value (0-5) How many times a current slot question had been asked Restrictive vs. non-restrictive grammar Result: 62 states Actions: each state only 2 possible actions Asking questions: System versus user initiative Receiving answers: explicit versus no confirmation. 9/19/2018

Ran system with real users
311 conversations Simple binary reward function 1 if competed task (finding museums, theater, winetasting in NJ area) 0 if not System learned good dialogue strategy: Roughly Start with user initiative Backoff to mixed or system initiative when re-asking for an attribute Confirm only a lower confidence values 9/19/2018

State of the art Only a few such systems Hot topics:
From (former) ATT Laboratories researchers, now dispersed And Cambridge UK lab Hot topics: Partially observable MDPs (POMDPs) We don’t REALLY know the user’s state (we only know what we THOUGHT the user said) So need to take actions based on our BELIEF , I.e. a probability distribution over states rather than the “true state” 9/19/2018

Spoken Dialogue Systems

Similar presentations

Presentation on theme: "Spoken Dialogue Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Spoken Dialogue Systems

Similar presentations

Presentation on theme: "Spoken Dialogue Systems"— Presentation transcript:

Similar presentations

About project

Feedback