Recognizing Structure: Dialogue Acts and Segmentation

Slides:



Advertisements
Similar presentations
Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Advertisements

5/5/20151 Recognizing Metadata: Segmentation and Disfluencies Julia Hirschberg CS 4706.
Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
1 Spoken Dialogue Systems Dialogue and Conversational Agents (Part IV) Chapter 19: Draft of May 18, 2005 Speech and Language Processing: An Introduction.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Presented by Ravi Kiran. Julia Hirschberg Stefan Benus Jason M. Brenier Frank Enos Sarah Friedman Sarah Gilman Cynthia Girand Martin Graciarena Andreas.
Understanding Spoken Corrections in Human-Computer Dialogue Gina-Anne Levow University of Chicago MAICS April 1, 2006.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
CS 4705 Lecture 22 Intonation and Discourse What does prosody convey? In general, information about: –What the speaker is trying to convey Is this a.
Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Modeling Other Speaker State COMS 4995/6998 Julia Hirschberg Thanks to William Wang.
Dianne Bradley & Eva Fern á ndez Graduate Center & Queens College CUNY Eliciting and Documenting Default Prosody ABRALIN23-FEB-05.
6/25/20151 Dialogue Acts and Information State Julia Hirschberg CS 4706.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,
Funded by NIH grant RO1 HD-4152 to J. Arnold NSF BCS and NSF BCS to Z. Griffin Why do speakers modulate acoustic prominence? Listener-oriented.
Discourse Markers Discourse & Dialogue CS November 25, 2006.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
Background: Speakers use prosody to distinguish between the meanings of ambiguous syntactic structures (Snedeker & Trueswell, 2004). Discourse also has.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
1 Prosody-Based Automatic Segmentation of Speech into Sentences and Topics Elizabeth Shriberg Andreas Stolcke Speech Technology and Research Laboratory.
Discourse & Dialogue CS 359 November 13, 2001
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
Challenges in Dialogue
Investigating Pitch Accent Recognition in Non-native Speech
Towards Emotion Prediction in Spoken Tutoring Dialogues
Recognizing Disfluencies
Why Study Spoken Language?
Recognizing Structure: Dialogue Acts and Segmentation
Recognizing Structure: Sentence and Topic Segmentation
Studying Intonation Julia Hirschberg CS /21/2018.
Meanings of Intonational Contours
Studying Intonation Julia Hirschberg CS /21/2018.
Intonational and Its Meanings
Intonational and Its Meanings
Prosody in Recognition/Understanding
The American School and ToBI
Meaningful Intonational Variation
Dialogue Acts Julia Hirschberg CS /18/2018.
Information Structure and Prosody
Why Study Spoken Language?
Meanings of Intonational Contours
Turn-taking and Disfluencies
Representing Intonational Variation
Advanced NLP: Speech Research and Technologies
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Searching and Summarizing Speech
Turn-taking and Disfluencies
Dialogue Acts Julia Hirschberg LSA /29/2018.
High Frequency Word Entrainment in Spoken Dialogue
Searching and Summarizing Speech
Agustín Gravano & Julia Hirschberg {agus,
Recognizing Disfluencies
Advanced NLP: Speech Research and Technologies
Discourse Structure in Generation
Comparative Studies Avesani et al 1995; Hirschberg&Avesani 1997
Intonational and Its Meanings
Emotional Speech Julia Hirschberg CS /16/2019.
Low Level Cues to Emotion
Automatic Prosodic Event Detection
Presentation transcript:

Recognizing Structure: Dialogue Acts and Segmentation Julia Hirschberg CS 6998 2/25/2019

Today Recognizing structural information from speech Topic structure Speech/dialogue acts Applications Speech browsing and search of large corpora Broadcast News (NIST TREC SDR track) Topic Detection and Tracking (NIST/DARPA TDT) Customer care call recordings, focus groups, voicemail 2/25/2019

SCAN 2/25/2019

SCAN demo 2/25/2019

Discourse Structure and Topic Structure Intention-based accounts Grosz & Sidner ‘86 Conversational moves (games) Edinburgh map task dialogues Adjacency pairs Schegloff, Sacks, Jefferson 2/25/2019

Indicators of Topic Structure Cue phrases: now, well, first Pronominal reference Orthography and formatting -- in text Lexical information (Hearst ‘94, Reynar ’98, Beeferman et al ‘99) In speech? 2/25/2019

Prosodic Correlates of Discourse/Topic Structure Pitch range Lehiste ’75, Brown et al ’83, Silverman ’86, Avesani & Vayra ’88, Ayers ’92, Swerts et al ’92, Grosz & Hirschberg’92, Swerts & Ostendorf ’95, Hirschberg & Nakatani ‘96 Preceding pause Lehiste ’79, Chafe ’80, Brown et al ’83, Silverman ’86, Woodbury ’87, Avesani & Vayra ’88, Grosz & Hirschberg’92, Passoneau & Litman ’93, Hirschberg & Nakatani ‘96 2/25/2019

Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 Rate Butterworth ’75, Lehiste ’80, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 Amplitude Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 Contour Brown et al ’83, Woodbury ’87, Swerts et al ‘92 Add Audix tree?? 2/25/2019

Prosodic Cues to Sentence and Topic Boundaries: Shriberg et al ’00 Prosody cues perform as well or better than text-based cues at topic segmentation -- and generalize better? Goal: identify sentence and topic boundaries at ASR-defined word boundaries CART decision trees provided boundary predictions HMM combined these with lexical boundary predictions 2/25/2019

Features For each potential boundary location: Pause at boundary (raw and normalized by speaker) Pause at word before boundary (is this a new ‘turn’ or part of continuous speech segment?) Phone and rhyme duration (normalized by inherent duration) (phrase-final lengthening?) F0 (smoothed and stylized): reset, range (topline, baseline), slope and continuity Raw pause worked better than normalized Speaker id/change hand marked apparently F0 reset measured before and after potential boundary, range is defined over preceding word and compared to speaker-specific parameters 2/25/2019

Trained/tested on Switchboard and Broadcast News Voice quality (halving/doubling estimates as correlates of creak or glottalization) Speaker change, time from start of turn, # turns in conversation and gender Trained/tested on Switchboard and Broadcast News 2/25/2019

Sentence segmentation results Prosodic features Better than LM for BN Worse (on transcription) and same for ASR transcript on SB All better than chance Useful features for BN Pause at boundary ,turn/no turn, f0 diff across boundary, rhyme duration BN sentence boundaries vs SB ASR Trans Model 13.3 6.2 Chance(nonb) 11.7 3.3 Comb HMM 11.8 4.1 LM only 10.9 3.6 Pros only ASR Trans Model 25.8 11.0 Chance(nonb) 22.5 4.0 Comb HMM 22.8 4.3 LM only 22.9 6.7 Pros only 2/25/2019 BN topic vs SB ASR Trans Model 13.3 6.2 Chance(nonb) 11.7 3.3 Comb HMM 11.8 4.1 LM only 10.9 3.6 Pros only ASR Trans Model .3 Chance(nonb) .1438 .1377 Comb HMM .1897 .1895 LM only .1731 .1657 Pros only

Useful features for SB Phone/rhyme duration before boundary, pause at boundary, turn/no turn, pause at preceding word boundary, time in turn 2/25/2019

Topic segmentation results (BN only): Useful features Pause at boundary, f0 range, turn/no turn, gender, time in turn Prosody alone better than LM Combined model improves significantly 2/25/2019

Speech Act Theory John Searle Locutionary acts: semantic meaning Illocutionary acts: ask, promise, answer, threat Perlocutionary acts: Effect intended to be produced on speaker: regret, fear Dialogue acts Many tagging schemes (e.g. DAMSL) 2/25/2019

Practical Motivations: Spoken Dialogue Systems Add more information about speaker intentions Disambiguate ambiguous utterances Okay Um Right 2/25/2019

Experimental Evidence: Nickerson & Chu-Carroll ‘99 Can/would/would..willing questions Can you move the piano? Would you move the piano? Would you be willing to move the piano? A la Sag & Liberman ‘75: can intonation disambiguate? 2/25/2019

Experiments Production studies: Subjects read ambiguous questions in disambiguating contexts Control for given/new and contrastiveness Polite/neutral/impolite Problems: Cells imbalanced No pretesting 2/25/2019

Same speaker reads both contexts No distractors Same speaker reads both contexts 2/25/2019

Results Indirect requests If L%, more likely (73%) to be indirect 46% H%: differences in height of boundary tone? Politeness: can differs in impolite (higher rise) vs. neutral Variation in speaker strategy 2/25/2019

Corpus Studies: Jurafsky et al ‘98 Lexical, acoustic/prosodic/syntactic differentiators for yeah, ok, uhuh, mhmm, um… Continuers: Mhmm (not taking floor) Assessments: Mhmm (tasty) Agreements: Mhmm (I agree) Yes answers: Mhmm (That’s right) Incipient speakership: Mhmm (taking floor) 2/25/2019

Corpus Study Switchboard telephone conversation corpus Hand segmented and labeled with DA information (initially from text) Relabeled for this study Analyzed for Lexical realization F0 and rms features Syntactic patterns 2/25/2019

Results: Lexical Differences Agreements yeah (36%), right (11%),... Continuer uhuh (45%), yeah (27%),… Incipient speaker yeah (59%), uhuh (17%), right (7%),… Yes-answer yeah (56%), yes (17%), uhuh (14%),... 2/25/2019

Results: Prosodic and Syntactic Cues Relabeling from speech produces only 2% changed labels over all (114/5757) 43/987 continuers --> agreements Why? Shorter duration, lower F0, lower energy, longer preceding pause Over all DA’s, duration best differentiator but… Highly correlated with length in words Assessments: That’s X (good, great, fine,…) 2/25/2019

Future Work Speaker differences? Higher level prosodic differences among ambiguous word DA’s? 2/25/2019

Next Week Turn-taking and disfluencies 2/25/2019