Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dialogue Systems Julia Hirschberg CS 4705 11/14/2018.

Similar presentations


Presentation on theme: "Dialogue Systems Julia Hirschberg CS 4705 11/14/2018."— Presentation transcript:

1 Dialogue Systems Julia Hirschberg CS 4705 11/14/2018

2 Today Dialogue Systems and Human Conversation Turns and Turn-taking
Speech Acts and Dialogue Acts Grounding and Intentional Structure Pragmatics Presupposition Conventional Implicature Conversational Implicature 11/14/2018

3 Dialogue System Applications
Information providing 800-BING-411, Google Mobile App, Amtrak’s Julie, Customer Care T-Mobile’s Call Center, AT&T Call Routing Training Language tutoring: e.g. Carnegie Speech, KTH Ville Other research platforms: e.g. ItSpoke at UPitt Fun and games…. Goal: Emulate Human-Human Behavior? 11/14/2018

4 Today Dialogue Systems and Human Conversation Turns and Turn-taking
Speech Acts and Dialogue Acts Grounding and Intentional Structure Pragmatics Presupposition Conventional Implicature Conversational Implicature 11/14/2018

5 Turn-taking Behavior Dialogue characterized by turn-taking
How do speakers know what to say and when to say it? Conversational partners expect certain patterns of behavior in normal conversation Pat: You got an A? That’s great! Chris: Yeah, I’m really smart you know. Chris: Well, I was just lucky I happened to read the chapter on dialogue systems right before the test. Otherwise I never would have squeaked through. Deviation is significant: dispreferred utterances 11/14/2018

6 Children learn turn taking within first 2 years (Stern ’74)
General individual differences Shy people pause longer and speak less and less often (Pilkonis ’77) Schizophrenics, neurotics, depressed people less skilled in turn-taking 11/14/2018

7 Cultural Differences in Turn-Taking
Chinese telephone conversations Openings (Zhu ’04) Mandarin vs. British Identification differences British self-report Chinese callees ask the caller Closings (Sun ’05) 39 female-female Mandarin telephone conversations Closings initiated through matter-of-fact statement of intention to end conversation Verbalized thanking occurs except in mother/daughter closings – not the standard English model Finnish business calls (Halmari ’93) vs. American Americans get right to the point Finns chat 11/14/2018

8 Conversational Analysis (Sacks et al ’74)
Can we characterize expectations of ‘what to say’ more generally? ‘Rules’ of turn-taking If, during this turn the current speaker has selected A as the next speaker, then A must speak next If the current speaker does not select the next speaker, any other speaker may take the next turn If no one else takes the next turn, the current speaker may take the next turn Rules Apply at Transition Relevance Places (TRPs) where something allows speaker changes to occur 11/14/2018

9 Where Can Speaker Shifts Occur
Adjacency pairs Question/answer Greeting/greeting Compliment/downplayer Dispreferred responses Silence ‘No’ to a simple request without explanation Changing the topic abruptly without transition Important for Spoken Dialogue Systems 11/14/2018

10 Diarization: Automatic Speaker Identification/Segmentation
Segment audio corpora (Broadcast News, meetings, telephone conversations) into speaker segments Speaker segmentation Speaker identification Speech and music Speaker segmentation (Diarization) Initial segmentation Segment clustering based on acoustic features State-of-the-art: 8.47% error 11/14/2018

11 Speaker identification
Linguistic information to identify speaker types and speaker names (LIMSI ’04) Templates (“<name> has this report from <location>”) Results: 10.9% error on test set But only 10% of segments contain relevant patterns Estimate 25% error on broadcast news if segmentation and clustering is done to id all of each speaker’s segments 11/14/2018

12 Turn-taking Behaviors Important for SDS
System understanding: Is the user backchanneling or is she taking the turn (does ‘ok’ mean ‘I agree’ or ‘I’m listening’)? Is this a good place for a system backchannel? System generation: How to signal to the user that the system system’s turn is over? How to signal to the user that a backchannel might be appropriate? 11/14/2018

13 Types of Behavior Smooth Switch: S1 is speaking and S2 speaks and takes and holds the floor Hold: S1 is speaking, pauses, and continues to speak Backchannel: S1 is speaking and S2 speaks -- to indicate continued attention -- not to take the floor (e.g. mhmm, ok, yeah) How do people coordinate these behaviors with their interlocutor? Acoustic-prosodic and lexical cues…. 11/14/2018

14 Smooth Switch, Backchannel, and Hold Differences
11/14/2018

15 Today Dialogue Systems and Human Conversation Turns and Turn-taking
Speech Acts and Dialogue Acts Grounding and Intentional Structure Pragmatics Presupposition Conventional Implicature Conversational Implicature 11/14/2018

16 Speech Act Theory (Austin, Searle)
Locutionary acts: the act of uttering (semantic meaning) Illocutionary acts: the act S intends to convey by the utterance (e.g. request, promise, statement) Perlocutionary acts: the rhetorical act S intends the utterance to produce on H (e.g. regret, fear, hope) Indirect Speech Acts (a type of illocutionary act): It’s cold in here. Can you tell me the time. 11/14/2018

17 NLP Speech Acts Often identified with illocutionary force
Can be indicated by performative verbs E.g. promise, order, ask, beseech, deny, apologize, curse NB: Perlocutionary force cannot (I convince you to vote for me for president) Searle’s ’75 taxonomy (assertives, directives, commissives, expressives, declarations) now vastly expanded assertive = speech acts that commit a speaker to the truth of the expressed proposition, e.g. reciting a creed directives = speech acts that are to cause the hearer to take a particular action, e.g. requests, commands and advice commissives = speech acts that commit a speaker to some future action, e.g. promises and oaths expressives = speech acts that express the speaker's attitudes and emotions towards the proposition, e.g. congratulations, excuses and thanks declarations = speech acts that change the reality in accord with the proposition of the declaration, e.g. baptisms, pronouncing someone guilty or pronouncing someone husband and wife 11/14/2018

18 Dialogue Acts in SDS Roughly correspond to Illocutionary acts
Motivation: Improving Spoken Dialogue Systems Many coding schemes (e.g. DAMSL) Many-to-many mapping between DAs and words Agreement DA can realized by Okay, Um, Right, Yeah, … But each of these can express multiple DAs, e.g. S: You should take the 10pm flight. U: Okay …that sounds perfect. …but I’d prefer an earlier flight. …(I’m listening) 11/14/2018

19 DA recognition important for
Turn recognition (which grammar to use when) Turn disambiguation, e.g. S: What city do you want to go to? U1: Boston. (reply) U2: Boston? (request for information) S: Do you want to go to Boston? U1: Boston. (confirmation) U2: Boston? (question) 11/14/2018

20 Automatic DA Detection
Rosset & Lamel ’04: Can we detect DAs automatically w/ minimal reliance on lexical content? Lexicons are domain-dependent ASR output is errorful Corpora (3912 utts total) Agent/client dialogues in a French bank call center, in a French web-based stock exchange customer service center, in an English bank call center 11/14/2018

21 DA tags (44) Conventional (openings, closings) Information level (items related to the semantic content of the task) Forward Looking Function: statement (e.g. assert, commit, explanation) infl on Hearer (e.g. confirmation, offer, request) Backward Looking Function: Agreement (e.g. accept, reject) Understanding (e.g. backchannel, correction) Communicative Status (e.g. self-talk, change-mind) NB: Each utt could receive a tag for each class, so utts represented as vectors But…only 197 combinations observed 11/14/2018

22 Method: Memory-based learning (TIMBL)
Uses all examples for classification Useful for sparse data Features Speaker identity First 2 words of each turn # utts in turn Previously proposed DA tags for utts in turn Results With true utt boundaries: ~83% accuracy on test data from same domain ~75% accuracy on test data from different domain 11/14/2018

23 Which DAs are easiest/hardest to detect? DA GE.fr CAP.fr GE.eng
On automatically identified utt units: 3.3% ins, 6.6% del, 13.5% sub Which DAs are easiest/hardest to detect? DA GE.fr CAP.fr GE.eng Resp-to 52.0% 33.0% 55.7% Backch 75.0% 72.0% 89.2% Accept 41.7% 26.0% 30.3% Assert 66.0% 56.3% 50.5% Expression 89.0% 69.3% 56.2% Comm-mgt 86.8% 70.7% 59.2% Task 85.4% 81.4% 78.8% 11/14/2018

24 Strong ‘grammar’ of DAs in Spoken Dialogue systems
Conclusions Strong ‘grammar’ of DAs in Spoken Dialogue systems A few initial words perform as well as more 11/14/2018

25 Today Dialogue Systems and Human Conversation Turns and Turn-taking
Speech Acts and Dialogue Acts Grounding and Intentional Structure Pragmatics Presupposition Conventional Implicature Conversational Implicature 11/14/2018

26 Grounding (Stalnaker ’78, Clark & Schaefer ’89)
Common Ground: the set of propositions mutually believed by S and H Principle of Closure: agents performing an action require evidence that they have succeeded – and S needs to know when s/he has succeeded in communicating Presentation of utterance by S Acceptance of utterance by H How does grounding take place in conversation? 11/14/2018

27 Grounding Strategies from Weak to Strong
I need to get your homework by Monday. Continued attention Next contribution I should be finished Sunday night. Acknowledgment Mhmm… Demonstration You need this soon. Display You need to get my homework Monday. 11/14/2018

28 Discourse Structure and Intention
Welcome to word processing. That’s using a computer to type letters and reports. Make a typo? No problem. Just back up, type over the mistake, and it’s gone. And, it eliminates retyping. And, it eliminates retyping. 11/14/2018

29 Structures of Discourse Structure (Grosz & Sidner ‘86)
Leading alternative to Rhetorical Structure Theory Provides for multiple levels of analysis: S’s purpose as well as content of utterances and S and H’s attentional state Identifies only a few, general relations that hold among intentions Three components: Linguistic structure Intentional structure Attentional structure 11/14/2018

30 Linguistic Structure What is actually said/written
How is this represented? Assume discourse is segmented into Discourse Segments (DS) -- how? what is basic unit of analysis? segmentation agreement automatic segmentation Embedding relations: topic structure Cue phrases Stopped here 11/14/2018

31 Intentional Structure
Discourse purpose (DP): basic purpose of the discourse Discourse segment purposes (DSPs): how this segment contributes to the overall DP Segment relations: Satisfaction-precedence: DSP1 must be satisfied before DSP2 Dominance: DSP1 dominates DSP2 if fulfilling DSP2 constitutes part of fulfilling DSP1 11/14/2018

32 Attentional State Focus stack:
Stack of focus spaces, each containing objects, properties and relations salient during each DS, plus the DSP (content plus purpose) State changes modeled by transition rules controlling the addition/deletion of focus spaces Information at lower levels may or may not be available at higher levels Focus spaces are pushed onto the stack when new DS or embedded DS (e.g. DS that are dominated by other DS) are begun popped when they are completed 11/14/2018

33 Limits of G&S ‘86 Assumes that discourses are task-oriented
Assumes there is a single, hierarchical structure shared by S and H How do we identify entities that are salient (on the focus stack)? Grammatical function? Do people really build such structures when they converse? Use them in interpreting what others say? 11/14/2018

34 How are these structures recognized from a discourse?
Linguistic markers: tense and aspect cue phrases intonational variation Inference of S intentions Inference from task structure Intonational Information 11/14/2018

35 Today Dialogue Systems and Human Conversation Turns and Turn-taking
Speech Acts and Dialogue Acts Grounding and Intentional Structure Pragmatics Presupposition Conventional Implicature Conversational Implicature 11/14/2018

36 Implicit Information Question interpretation in SDS
S: Are you traveling to La Guardia? U: I’m going to New York. U: When does the 5 o’clock train leave from Newark? S : <U believes there is a 5 o’clock train from Newark.> S: I heard you say New York City? U: New York City? 11/14/2018

37 Cooperative responses in SDS Correcting misconceptions
U: When does the 5 o’clock train leave from Newark? S (thinks): <U believes there is a 5 o’clock train from Newark> S: There is no 5 o’clock train from Newark; there is a 5:20 tho. Providing more information than is asked for U: Do I have the $500 minimum in that account? S1: Yes. S2: You have $739. 11/14/2018

38 Discourse Pragmatics Context-dependent meaning, invited inference, intended meaning – vs. “propositional content” Indirect Speech Acts Presupposition Implicature Conversational Conventional 11/14/2018

39 Presupposition What is `taken for granted’, given some linguistic expression X The King of France is bald. (Is there a King of France? All of Herman’s children are bright. (Does Herman have children?) Linguistic Test: Negative, interrogative, and embedded X preserve the same assumption The King of France is not bald. Is the King of France bald? I thought that the King of France was bald. 11/14/2018

40 All of Herman’s children are bright, if he indeed has children.
Presuppositions can be suspended but they cannot be felicitously denied All of Herman’s children are bright, if he indeed has children. *All of Herman’s children are bright, though he has no children. 11/14/2018

41 Presupposition and SDS
Presuppositional information adds facts/beliefs to the dialogue history Information to store and check for accuracy My wife will also be a driver (S has a spouse) My number is (S has a telephone account) I’ll take the red-eye (S believes there is a red-eye) I’m upset about being charged for a call to Ethiopia (S was charged for a call to Ethiopia) I’m a bachelor. (S is an unmarried male person) 11/14/2018

42 Conversational Implicature
H. Paul Grice: Conversation is not formal logic and is not ‘^’, or is not ‘v’, some is not George got married and had a baby. Was it a boy or a girl? Some people sent baby gifts. Principles of Cooperative Conversation: Make your conversational contribution such as is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged 11/14/2018

43 Maxims of Cooperative Conversation
Maxim of Quantity: 1. Make your contribution as informative as is required (for the current purposes of the exchange) 2. Do not make your contribution more than is required. Maxim of Quality: Try to make your contribution one that is true. 1. Do not say what you believe to be false. 2. Do not say that for which you lack adequate evidence. Maxim of Relation: Be relevant 11/14/2018

44 Maxim of Manner: Be perspicuous 1. Avoid obscurity of expression.
2. Avoid ambiguity. 3. Be brief (avoid unnecessary prolixity). 4. Be orderly. Maxims may be Observed John got into Columbia and won a scholarship. Violated quietly I never said that. Flouted He has excellent handwriting…. 11/14/2018

45 Speakers may not be able to observe all maxims simultaneously
Implicature interpretation requires both S and H to understand the CP and Maxims That which S licenses and H infers via the CP and the Maxims A. I got an A on that exam. B. And I’m Queen Marie of Rumania. A. Where did you go? B. Out. 11/14/2018

46 A: Where does Arnold live? B: Somewhere in southern California.
11/14/2018

47 Other Implicatures Generalized Conversational, e.g. indefinites
A car ran over John’s foot. (not John’s car) John broke a foot yesterday. (John’s foot) John broke a nose yesterday. (not his own) Conventional George is short but brave. George is short; therefore he is brave. 11/14/2018

48 Summary Dialogue Systems and Human Conversation Turns and Turn-taking
Speech Acts and Dialogue Acts Grounding and Intentional Structure Pragmatics Presupposition Conventional Implicature Conversational Implicature 11/14/2018

49 Spoken Language Processing
These are only a few of the challenges of Spoken Language Processing (CS 4706) How does it go beyond CS 4705? Speech analysis tools and techniques Deception, charisma, emotional speech, medical states Speech technologies Text-to-Speech Automatic Speech Recognition Speaker ID Language and dialect ID 11/14/2018

50 Project Build a Spoken Dialogue System of your own
Choose the domain and task Build a speech recognizer, a text-to-speech synthesis system, and a dialogue manager (from libraries) Demo your system and maybe win a prize 11/14/2018

51 Next Class Review for the Final Exam 11/14/2018


Download ppt "Dialogue Systems Julia Hirschberg CS 4705 11/14/2018."

Similar presentations


Ads by Google