Recognizing Structure: Dialogue Acts and Segmentation

Slides:



Advertisements
Similar presentations
Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Advertisements

5/5/20151 Recognizing Metadata: Segmentation and Disfluencies Julia Hirschberg CS 4706.
Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
1 Spoken Dialogue Systems Dialogue and Conversational Agents (Part IV) Chapter 19: Draft of May 18, 2005 Speech and Language Processing: An Introduction.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Understanding Spoken Corrections in Human-Computer Dialogue Gina-Anne Levow University of Chicago MAICS April 1, 2006.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
CS 4705 Lecture 22 Intonation and Discourse What does prosody convey? In general, information about: –What the speaker is trying to convey Is this a.
Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Modeling Other Speaker State COMS 4995/6998 Julia Hirschberg Thanks to William Wang.
Dianne Bradley & Eva Fern á ndez Graduate Center & Queens College CUNY Eliciting and Documenting Default Prosody ABRALIN23-FEB-05.
6/25/20151 Dialogue Acts and Information State Julia Hirschberg CS 4706.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
1 Back Channel Communication Antoine Raux Dialogs on Dialogs 02/25/2005.
Funded by NIH grant RO1 HD-4152 to J. Arnold NSF BCS and NSF BCS to Z. Griffin Why do speakers modulate acoustic prominence? Listener-oriented.
Discourse Markers Discourse & Dialogue CS November 25, 2006.
AUTOMATIC DETECTION OF REGISTER CHANGES FOR THE ANALYSIS OF DISCOURSE STRUCTURE Laboratoire Parole et Langage, CNRS et Université de Provence Aix-en-Provence,
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
Background: Speakers use prosody to distinguish between the meanings of ambiguous syntactic structures (Snedeker & Trueswell, 2004). Discourse also has.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
1 Prosody-Based Automatic Segmentation of Speech into Sentences and Topics Elizabeth Shriberg Andreas Stolcke Speech Technology and Research Laboratory.
Discourse & Dialogue CS 359 November 13, 2001
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
Challenges in Dialogue
Investigating Pitch Accent Recognition in Non-native Speech
Towards Emotion Prediction in Spoken Tutoring Dialogues
Recognizing Disfluencies
Why Study Spoken Language?
Recognizing Structure: Sentence and Topic Segmentation
Studying Intonation Julia Hirschberg CS /21/2018.
Meanings of Intonational Contours
Studying Intonation Julia Hirschberg CS /21/2018.
Intonational and Its Meanings
Intonational and Its Meanings
Prosody in Recognition/Understanding
The American School and ToBI
Meaningful Intonational Variation
Dialogue Systems Julia Hirschberg CS /14/2018.
Dialogue Acts Julia Hirschberg CS /18/2018.
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Information Structure and Prosody
Why Study Spoken Language?
Meanings of Intonational Contours
Turn-taking and Disfluencies
Representing Intonational Variation
Advanced NLP: Speech Research and Technologies
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Searching and Summarizing Speech
Dialogue Acts Julia Hirschberg LSA /29/2018.
High Frequency Word Entrainment in Spoken Dialogue
Searching and Summarizing Speech
Agustín Gravano & Julia Hirschberg {agus,
Recognizing Disfluencies
Advanced NLP: Speech Research and Technologies
Discourse Structure in Generation
Comparative Studies Avesani et al 1995; Hirschberg&Avesani 1997
Intonational and Its Meanings
Emotional Speech Julia Hirschberg CS /16/2019.
Recognizing Structure: Dialogue Acts and Segmentation
Low Level Cues to Emotion
Automatic Prosodic Event Detection
Presentation transcript:

Recognizing Structure: Dialogue Acts and Segmentation Julia Hirschberg CS 6998 9/19/2018

Today Recognizing structural information from speech Topic structure Speech/dialogue acts Applications Speech browsing and search of large corpora Broadcast News (NIST TREC SDR track) Topic Detection and Tracking (NIST/DARPA TDT) Customer care, focus groups, voicemail Spoken Dialogue Systems 9/19/2018

SCAN 9/19/2018

SCANMail Demo: Basic Layout

SCANMail Demo: Number Extraction

Discourse Structure and Topic Structure Intention-based accounts Grosz & Sidner ‘86 Conversational moves (games) Edinburgh map task dialogues Adjacency pairs Schegloff, Sacks, Jefferson 9/19/2018

Indicators of Topic Structure Cue phrases: now, well, first Pronominal reference Orthography and formatting -- in text Lexical information (Hearst ‘94, Reynar ’98, Beeferman et al ‘99) In speech? 9/19/2018

Prosodic Correlates of Discourse/Topic Structure Pitch range Lehiste ’75, Brown et al ’83, Silverman ’86, Avesani & Vayra ’88, Ayers ’92, Swerts et al ’92, Grosz & Hirschberg’92, Swerts & Ostendorf ’95, Hirschberg & Nakatani ‘96 Preceding pause Lehiste ’79, Chafe ’80, Brown et al ’83, Silverman ’86, Woodbury ’87, Avesani & Vayra ’88, Grosz & Hirschberg’92, Passoneau & Litman ’93, Hirschberg & Nakatani ‘96 9/19/2018

Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 Rate Butterworth ’75, Lehiste ’80, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 Amplitude Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 Contour Brown et al ’83, Woodbury ’87, Swerts et al ‘92 Add Audix tree?? 9/19/2018

Prosodic Cues to Sentence and Topic Boundaries: Shriberg et al ’00 Prosody cues perform as well or better than text-based cues at topic segmentation -- and generalize better? Goal: identify sentence and topic boundaries at ASR-defined word boundaries CART decision trees provided boundary predictions HMM combined these with lexical boundary predictions 9/19/2018

Features For each potential boundary location: Pause at boundary (raw and normalized by speaker) Pause at word before boundary (is this a new ‘turn’ or part of continuous speech segment?) Phone and rhyme duration (normalized by inherent duration) (phrase-final lengthening?) F0 (smoothed and stylized): reset, range (topline, baseline), slope and continuity Raw pause worked better than normalized Speaker id/change hand marked apparently F0 reset measured before and after potential boundary, range is defined over preceding word and compared to speaker-specific parameters 9/19/2018

Trained/tested on Switchboard and Broadcast News Voice quality (halving/doubling estimates as correlates of creak or glottalization) Speaker change, time from start of turn, # turns in conversation and gender Trained/tested on Switchboard and Broadcast News 9/19/2018

Sentence segmentation results Prosodic features Better than LM for BN Worse (on transcription) and same for ASR transcript on SB All better than chance Useful features for BN Pause at boundary ,turn/no turn, f0 diff across boundary, rhyme duration BN sentence boundaries vs SB ASR Trans Model 13.3 6.2 Chance(nonb) 11.7 3.3 Comb HMM 11.8 4.1 LM only 10.9 3.6 Pros only ASR Trans Model 25.8 11.0 Chance(nonb) 22.5 4.0 Comb HMM 22.8 4.3 LM only 22.9 6.7 Pros only 9/19/2018 BN topic vs SB ASR Trans Model 13.3 6.2 Chance(nonb) 11.7 3.3 Comb HMM 11.8 4.1 LM only 10.9 3.6 Pros only ASR Trans Model .3 Chance(nonb) .1438 .1377 Comb HMM .1897 .1895 LM only .1731 .1657 Pros only

Useful features for SB Phone/rhyme duration before boundary, pause at boundary, turn/no turn, pause at preceding word boundary, time in turn 9/19/2018

Topic segmentation results (BN only): Useful features Pause at boundary, f0 range, turn/no turn, gender, time in turn Prosody alone better than LM Combined model improves significantly 9/19/2018

Speech Act Theory John Searle Locutionary acts: semantic meaning Illocutionary acts: ask, promise, answer, threat Perlocutionary acts: Effect intended to be produced on speaker: regret, fear Dialogue acts Many tagging schemes (e.g. DAMSL) 9/19/2018

Practical Motivations: Spoken Dialogue Systems Add more information about speaker intentions Disambiguate ambiguous utterances Okay Um Right 9/19/2018

Experimental Evidence: Nickerson & Chu-Carroll ‘99 Can/would/would..willing questions Can you move the piano? Would you move the piano? Would you be willing to move the piano? A la Sag & Liberman ‘75: can intonation disambiguate? 9/19/2018

Experiments Production studies: Subjects read ambiguous questions in disambiguating contexts Control for given/new and contrastiveness Polite/neutral/impolite Problems: Cells imbalanced No pretesting 9/19/2018

Same speaker reads both contexts No distractors Same speaker reads both contexts 9/19/2018

Results Indirect requests If L%, more likely (73%) to be indirect 46% H%: differences in height of boundary tone? Politeness: can differs in impolite (higher rise) vs. neutral Variation in speaker strategy 9/19/2018

Corpus Studies: Jurafsky et al ‘98 Lexical, acoustic/prosodic/syntactic differentiators for yeah, ok, uhuh, mhmm, um… Continuers: Mhmm (not taking floor) Assessments: Mhmm (tasty) Agreements: Mhmm (I agree) Yes answers: Mhmm (That’s right) Incipient speakership: Mhmm (taking floor) 9/19/2018

Corpus Study Switchboard telephone conversation corpus Hand segmented and labeled with DA information (initially from text) Relabeled for this study Analyzed for Lexical realization F0 and rms features Syntactic patterns 9/19/2018

Results: Lexical Differences Agreements yeah (36%), right (11%),... Continuer uhuh (45%), yeah (27%),… Incipient speaker yeah (59%), uhuh (17%), right (7%),… Yes-answer yeah (56%), yes (17%), uhuh (14%),... 9/19/2018

Results: Prosodic and Syntactic Cues Relabeling from speech produces only 2% changed labels over all (114/5757) 43/987 continuers --> agreements Why? Shorter duration, lower F0, lower energy, longer preceding pause Over all DA’s, duration best differentiator but… Highly correlated with length in words Assessments: That’s X (good, great, fine,…) 9/19/2018

Future Work Speaker differences? Higher level prosodic differences among ambiguous word DA’s? 9/19/2018

A Coding Scheme for ‘ok’ Ritualistic? Closing You're Welcome Other No 3rd-Turn-Receipt? Yes If Ritualistic==No, code all of these as well: 9/19/2018

Task Management: Topic Management: Turn Management: Belief Management: I'm done I'm not done yet None Topic Management: Starting new topic Finished old topic Pivot: finishing and starting Turn Management: Still your turn (=traditional backchannel) Still my turn (=stalling for time) I'm done, it is now your turn Belief Management: I accept your proposition I entertain your proposition I reject your proposition Do you accept my proposition? (=y/n question) 9/19/2018

Next Week Turn-taking and disfluencies 9/19/2018