Dialogue Acts Julia Hirschberg LSA07 353 11/29/2018.

Slides:



Advertisements
Similar presentations
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Advertisements

Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
High Frequency Word Entrainment in Spoken Dialogue ACL, June Columbus, OH Department of Computer and Information Science University of Pennsylvania.
Context and Prosody in the Interpretation of Cue Phrases in Dialogue Julia Hirschberg Columbia University and KTH 11/22/07 Spoken Dialog with Humans and.
6/25/20151 Dialogue Acts and Information State Julia Hirschberg CS 4706.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
Direct and indirect speech acts
Language Assessment 4 Listening Comprehension Testing Language Assessment Lecture 4 Listening Comprehension Testing Instructor Tung-hsien He, Ph.D. 何東憲老師.
PS429 Social and Public Communication PS429 Social and Public Communication Week 4 (25/10/2005) Reading group discussion.
Semantics 3rd class Chapter 5.
Theories of Discourse and Dialogue. Discourse Any set of connected sentences This set of sentences gives context to the discourse Some language phenomena.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
Background: Speakers use prosody to distinguish between the meanings of ambiguous syntactic structures (Snedeker & Trueswell, 2004). Discourse also has.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
Dialogue Act Tagging Discourse and Dialogue CMSC November 4, 2004.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006.
SPEECH ACTS Saying as Doing See R. Nofsinger, Everyday Conversation, Sage, 1991.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
Speech Acts Actions performed via utterances e.g. You are fired
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
Speech Acts: What is a Speech Act?
Chapter 8 Spoken Discourse. Linguistic Competence communicative competence: the knowledge we bring to using language as a communicative tool in conversation.
A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,
Investigating Pitch Accent Recognition in Non-native Speech
SPEECH ACT THEORY: Felicity Conditions.
conversation takes place in real time, is spontaneous and unplanned
Grounding by nodding GESPIN 2009, Poznan, Poland
Towards Emotion Prediction in Spoken Tutoring Dialogues
Conditional Random Fields for ASR
Speech Acts.
Dialogue-Learning Correlations in Spoken Dialogue Tutoring
Welcome back!.
‘The most natural way to communicate is simply to speak
Recognizing Structure: Dialogue Acts and Segmentation
Studying Intonation Julia Hirschberg CS /21/2018.
Meanings of Intonational Contours
Studying Intonation Julia Hirschberg CS /21/2018.
Issues in Spoken Dialogue Systems
Spoken Dialogue Systems
Intonational and Its Meanings
Intonational and Its Meanings
The American School and ToBI
Dialogue Systems Julia Hirschberg CS /14/2018.
Dialogue Acts Julia Hirschberg CS /18/2018.
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Meanings of Intonational Contours
Turn-taking and Disfluencies
Representing Intonational Variation
Advanced NLP: Speech Research and Technologies
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
High Frequency Word Entrainment in Spoken Dialogue
Agustín Gravano & Julia Hirschberg {agus,
Advanced NLP: Speech Research and Technologies
Spoken Dialogue Systems
Intonational and Its Meanings
Agustín Gravano1 · Stefan Benus2 · Julia Hirschberg1
SPEECH ACTS Saying as Doing
Recognizing Structure: Dialogue Acts and Segmentation
SPEECH ACTS Saying as Doing Professor Lenny Shedletsky
Direct and indirect speech acts
Low Level Cues to Emotion
SPEECH ACT THEORY: Felicity Conditions.
Presentation transcript:

Dialogue Acts Julia Hirschberg LSA07 353 11/29/2018

Today Recognizing structural information: Dialogue Acts vs. Discourse Structure Speech Acts  Dialogue Acts Coding schemes (DAMSL) Practical goals Identifying DAs Direct and indirect DAs: experimental results Corpus studies of DA disambiguation Automatic DA identification More corpus studies 11/29/2018

Speech Acts Wittgenstein ’53, Austin ’62 and Searle ’75 Contributions to dialogue are actions performed by speakers: I promise to make you very very sorry for that. Performative verbs Locutionary act: the act of conveying the ‘meaning’ of the sentence uttered (e.g. committing the Speaker to making the hearer sorry) Ilocutionary act: the act associated with the verb uttered (e.g. promising) Perlocutionary act: the act of producing an effect on the Hearer (e.g. threatening) 11/29/2018

Searle’s Classification Scheme Assertives: commit S to the truth of X (e.g. The world is flat) Directives: attempt by S to get H to do X (e.g. Open the window please) Commissives: commit S to do X (e.g. I’ll do it tomorrow) Expressives: S’s description of his/her own feelings about X (e.g. I’m sorry I screamed) Declarations: S brings about a change in the world by virtue of uttering X (e.g. I divorce you, I resign) 11/29/2018

Dialogue Acts Roughly correspond to Illocutionary acts Motivation: Modeling Spoken Dialogue Many coding schemes (e.g. DAMSL) Many-to-many mapping between DAs and words Agreement DA can realized by Okay, Um, Right, Yeah, … But each of these can express multiple DAs, e.g. S: You should take the 10pm flight. U: Okay …that sounds perfect. …but I’d prefer an earlier flight. …(I’m listening) 11/29/2018

A Possible Coding Scheme for ‘ok’ Ritualistic? Closing You're welcome Other No 3rd-Turn-Receipt? Yes If Ritualistic==No, code all of these as well: Task Management: I'm done I'm not done yet None 11/29/2018

Pivot: finishing and starting Turn Management: Topic Management: Starting new topic Finished old topic Pivot: finishing and starting Turn Management: Still your turn (=traditional backchannel) Still my turn (=stalling for time) I'm done, it is now your turn None Belief Management: I accept your proposition I entertain your proposition I reject your proposition Do you accept my proposition? (=ynq) 11/29/2018

Practical Goals In Spoken Dialogue Systems Disambiguate current DA Represent user input correctly Respond appropriately Predict next DA Switch Language Models for ASR Switch states in semantic processing Produce DA for next system turn appropriately 11/29/2018

Disambiguating Ambiguous DAs Intonationally Modal (Can/would/would..willing) questions Can you move the piano? Would you move the piano? Would you be willing to move the piano? Nickerson & Chu-Carroll ’99: Can info-requests be disambiguated reliably from action-requests? By prosodic information? Role of politeness 11/29/2018

Production Studies Design Subjects read ambiguous questions in disambiguating contexts Control for given/new and contrastiveness Polite/neutral/impolite readings ToBI-style labeling Problems: Cells imbalanced; little data No pretesting No distractors Same speaker reads both contexts No perception checks 11/29/2018

Results Indirect requests (e.g. for action) If L%, more likely (73%) to be indirect If H%,46% were indirect: differences in height of boundary tone? Politeness: can differs in impolite (higher rise) vs. neutral cases Speaker variability Some production differences Limited utility in production of indirect DAs Beware too steep a rise 11/29/2018

Corpus Studies: Jurafsky et al ‘98 Can we distinguish different DA functions for affirmative words Lexical, acoustic/prosodic/syntactic differentiators for yeah, ok, uhuh, mhmm, um… Functional categories to distinguish Continuers: Mhmm (not taking floor) Assessments: Mhmm (tasty) Agreements: Mhmm (I agree) Yes answers: Mhmm (That’s right) Incipient speakership: Mhmm (taking floor) 11/29/2018

Questions Are these terms important cues to dialogue structure? Does prosodic variation help to disambiguate them? Is there any difference in syntactic realization of certain DAs, compared to others? 11/29/2018

SwitchBoard telephone conversation corpus Hand segmented and labeled with DA information (initially from text) using the SWBD-DAMSL dialogue tagset ~60 labels that could be combined in different dimensions 84% inter-labeler agreement on tags Tagset reduced to 42 7 CU-Boulder linguistics grad students labeling switchboard conversations of human-to-human interaction 11/29/2018

Relabeling from speech  only 2% changed labels (114/5757) 43/987 continuers --> agreements Why? Shorter duration, lower F0, lower energy, longer preceding pause DAs analyzed for Lexical realization F0 and intensity features Syntactic patterns 11/29/2018

Results: Lexical Differences Agreements yeah (36%), right (11%),... Continuer uhuh (45%), yeah (27%),… Incipient speaker yeah (59%), uhuh (17%), right (7%),… Yes-answer yeah (56%), yes (17%), uhuh (14%),... 11/29/2018

Prosodic and Lexico/Syntactic Cues Over all DA’s, duration best differentiator Highly correlated with DA length in words Assessments: Pro Term + Copula + (Intensifier) + Assessment Adjective That’s X (good, great, fine,…) 11/29/2018

Observations Yeah (and variations) ambiguous agreement at 36% incipient speaker at 59% Yes-answer at 86% Uh-huh (with its variations): a continuer at 45% (vs. yeah at 27%) Continuers (compared to agreements) are: shorter in duration less intonationally `marked’ Preceded by longer pauses 11/29/2018

Hypothesis Prosodic information may be particularly helpful in distinguishing DAs with less lexical content 11/29/2018

Automatic DA Detection Rosset & Lamel ’04: Can we detect DAs automatically w/ minimal reliance on lexical content? Lexicons are domain-dependent ASR output is errorful Corpora (3912 utts total) Agent/client dialogues in a French bank call center, in a French web-based stock exchange customer service center, in an English bank call center 11/29/2018

DA tags (44) similar to DAMSL Conventional (openings, closings) Information level (items related to the semantic content of the task) Forward Looking Function: statement (e.g. assert, commit, explanation) infl on Hearer (e.g. confirmation, offer, request) Backward Looking Function: Agreement (e.g. accept, reject) Understanding (e.g. backchannel, correction) Communicative Status (e.g. self-talk, change-mind) NB: each utt could receive a tag for each class, so utts represented as vectors But…only 197 combinations observed 11/29/2018

Method: Memory-based learning (TIMBL) Uses all examples for classification Useful for sparse data Features Speaker identity First 2 words of each turn # utts in turn Previously proposed DA tags for utts in turn Results With true utt boundaries: ~83% accuracy on test data from same domain ~75% accuracy on test data from different domain 11/29/2018

Which DAs are easiest/hardest to detect? DA GE.fr CAP.fr GE.eng On automatically identified utt units: 3.3% ins, 6.6% del, 13.5% sub Which DAs are easiest/hardest to detect? DA GE.fr CAP.fr GE.eng Resp-to 52.0% 33.0% 55.7% Backch 75.0% 72.0% 89.2% Accept 41.7% 26.0% 30.3% Assert 66.0% 56.3% 50.5% Expression 89.0% 69.3% 56.2% Comm-mgt 86.8% 70.7% 59.2% Task 85.4% 81.4% 78.8% 11/29/2018

Strong ‘grammar’ of DAs in Spoken Dialogue systems Conclusions Strong ‘grammar’ of DAs in Spoken Dialogue systems A few initial words perform as well as more 11/29/2018

Phonetic, Prosodic, and Lexical Context Cues to DA Disambiguation Hypothesis: Prosodic information may be important for disambiguating shorter DAs Observation: ASR errors suggest it would be useful to limit the role of lexical content in DA disambiguation as much as possible …and that this is feasible Experiment: Can people distinguish one (short) DA from another purely from phonetic/acoustic/prosodic cues? Are they better with lexical context? 11/29/2018

The Columbia Games Corpus Collection 12 spontaneous task-oriented dyadic conversations in Standard American English. 2 subjects playing a computer game, no eye contact. Describer: Follower: 11/29/2018

The Columbia Games Corpus Affirmative Cue Words alright gotcha huh mm-hm okay right uh-huh yeah yep yes yup Functions Acknowledgment / Agreement Backchannel Cue beginning discourse segment Cue ending discourse segment Check with the interlocutor Stall / Filler Back from a task Literal modifier Pivot beginning Pivot ending count the 4565 of 1534 okay 1151 and 886 like 753 … 11/29/2018

Perception Study Selection of Materials Cue beginning discourse segment Acknowledgment / Agreement Backchannel Speaker 1: yeah um there's like there's some space there's Speaker 2: okay I think I got it okay Speaker 1: but it's gonna be below the onion Speaker 2: okay Speaker 1: okay alright I'll try it okay Speaker 2: okay the owl is blinking 11/29/2018

Perception Study Experiment Design 54 instances of ‘okay’ (18 for each function). 2 tokens for each ‘okay’: Isolated condition: Only the word ‘okay’. Contextualized condition: 2 full speaker turns: The turn containing the target ‘okay’; and The previous turn by the other speaker. contextualized ‘okay’ speakers okay 11/29/2018

Perception Study Experiment Design Two conditions: Part 1: 54 isolated tokens Part 2: 54 contextualized tokens Subjects asked to classify each token of ‘okay’ as: Acknowledgment / Agreement, or Backchannel, or Cue beginning discourse segment. 11/29/2018

Perception Study Definitions Given to the Subjects Acknowledge/Agreement: The function of okay that indicates “I believe what you said” and/or “I agree with what you say”. Backchannel: The function of okay in response to another speaker's utterance that indicates only “I’m still here” or “I hear you and please continue”. Cue beginning discourse segment The function of okay that marks a new segment of a discourse or a new topic. This use of okay could be replaced by now. 11/29/2018

Perception Study Subjects and Procedure 20 paid subjects (10 female, 10 male). Ages between 20 and 60. Native speakers of English. No hearing problems. GUI on a laboratory workstation with headphones. 11/29/2018

Results Inter-Subject Agreement Kappa measure of agreement with respect to chance (Fleiss ’71) Isolated Condition Contextualized Condition Overall .120 .294 Ack / Agree vs. Other .089 .227 Backchannel vs. Other .118 .164 Cue beginning vs. Other .157 .497 11/29/2018

Results Cues to Interpretation Phonetic transcription of okay: Isolated Condition Strong correlation for realization of initial vowel  Backchannel  Ack/Agree, Cue Beginning Contextualized Condition No strong correlations found for phonetic variants. 11/29/2018

Results Cues to Interpretation Isolated Condition Contextualized Condition Ack / Agree Shorter /k/ Shorter latency between turns Shorter pause before okay Backchannel Higher final pitch slope Longer 2nd syllable Lower intensity More words by S2 before okay Fewer words by S1 after okay Cue beginning Lower final pitch slope Lower overall pitch slope Longer latency between turns More words by S1 after okay Pearson’s r for % of subjects choosing this interpretation w/feature (ttests to determine signif) S1 = Utterer of the target ‘okay’. S2 = The other speaker. 11/29/2018

Conclusions Agreement: Availability of context improves inter-subject agreement. Cue beginnings easier to disambiguate than the other two functions. Cues to interpretation: Contextual features override word features Exception: Final pitch slope of okay in both conditions. Guide to generation… 11/29/2018

Summary: Dialogue Act Modeling for SDS DA identification Looks potentially feasible, even when transcription is errorful Prosodic and lexical cues useful DA generation Descriptive results may be more useful for generation than for recognition, ironically Choice of DA realization, lexical and prosodic 11/29/2018

Next Class J&M 22.5 Hirschberg et al ’04 Goldberg et al ’03 Krahmer et al ‘01 11/29/2018