Recognizing Structure: Dialogue Acts and Segmentation

Recognizing Structure: Dialogue Acts and Segmentation
Julia Hirschberg CS 6998 9/19/2018

Today Recognizing structural information from speech Topic structure
Speech/dialogue acts Applications Speech browsing and search of large corpora Broadcast News (NIST TREC SDR track) Topic Detection and Tracking (NIST/DARPA TDT) Customer care, focus groups, voic Spoken Dialogue Systems 9/19/2018

SCAN 9/19/2018

SCANMail Demo: Basic Layout

SCANMail Demo: Number Extraction

Discourse Structure and Topic Structure
Intention-based accounts Grosz & Sidner ‘86 Conversational moves (games) Edinburgh map task dialogues Adjacency pairs Schegloff, Sacks, Jefferson 9/19/2018

Indicators of Topic Structure
Cue phrases: now, well, first Pronominal reference Orthography and formatting -- in text Lexical information (Hearst ‘94, Reynar ’98, Beeferman et al ‘99) In speech? 9/19/2018

Prosodic Correlates of Discourse/Topic Structure
Pitch range Lehiste ’75, Brown et al ’83, Silverman ’86, Avesani & Vayra ’88, Ayers ’92, Swerts et al ’92, Grosz & Hirschberg’92, Swerts & Ostendorf ’95, Hirschberg & Nakatani ‘96 Preceding pause Lehiste ’79, Chafe ’80, Brown et al ’83, Silverman ’86, Woodbury ’87, Avesani & Vayra ’88, Grosz & Hirschberg’92, Passoneau & Litman ’93, Hirschberg & Nakatani ‘96 9/19/2018

Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96
Rate Butterworth ’75, Lehiste ’80, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 Amplitude Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 Contour Brown et al ’83, Woodbury ’87, Swerts et al ‘92 Add Audix tree?? 9/19/2018

Prosodic Cues to Sentence and Topic Boundaries: Shriberg et al ’00
Prosody cues perform as well or better than text-based cues at topic segmentation -- and generalize better? Goal: identify sentence and topic boundaries at ASR-defined word boundaries CART decision trees provided boundary predictions HMM combined these with lexical boundary predictions 9/19/2018

Features For each potential boundary location:
Pause at boundary (raw and normalized by speaker) Pause at word before boundary (is this a new ‘turn’ or part of continuous speech segment?) Phone and rhyme duration (normalized by inherent duration) (phrase-final lengthening?) F0 (smoothed and stylized): reset, range (topline, baseline), slope and continuity Raw pause worked better than normalized Speaker id/change hand marked apparently F0 reset measured before and after potential boundary, range is defined over preceding word and compared to speaker-specific parameters 9/19/2018

Trained/tested on Switchboard and Broadcast News
Voice quality (halving/doubling estimates as correlates of creak or glottalization) Speaker change, time from start of turn, # turns in conversation and gender Trained/tested on Switchboard and Broadcast News 9/19/2018

Sentence segmentation results
Prosodic features Better than LM for BN Worse (on transcription) and same for ASR transcript on SB All better than chance Useful features for BN Pause at boundary ,turn/no turn, f0 diff across boundary, rhyme duration BN sentence boundaries vs SB ASR Trans Model 13.3 6.2 Chance(nonb) 11.7 3.3 Comb HMM 11.8 4.1 LM only 10.9 3.6 Pros only ASR Trans Model 25.8 11.0 Chance(nonb) 22.5 4.0 Comb HMM 22.8 4.3 LM only 22.9 6.7 Pros only 9/19/2018 BN topic vs SB ASR Trans Model 13.3 6.2 Chance(nonb) 11.7 3.3 Comb HMM 11.8 4.1 LM only 10.9 3.6 Pros only ASR Trans Model .3 Chance(nonb) .1438 .1377 Comb HMM .1897 .1895 LM only .1731 .1657 Pros only

Useful features for SB Phone/rhyme duration before boundary, pause at boundary, turn/no turn, pause at preceding word boundary, time in turn 9/19/2018

Topic segmentation results (BN only):
Useful features Pause at boundary, f0 range, turn/no turn, gender, time in turn Prosody alone better than LM Combined model improves significantly 9/19/2018

Speech Act Theory John Searle Locutionary acts: semantic meaning
Illocutionary acts: ask, promise, answer, threat Perlocutionary acts: Effect intended to be produced on speaker: regret, fear Dialogue acts Many tagging schemes (e.g. DAMSL) 9/19/2018

Practical Motivations: Spoken Dialogue Systems
Add more information about speaker intentions Disambiguate ambiguous utterances Okay Um Right 9/19/2018

Experimental Evidence: Nickerson & Chu-Carroll ‘99
Can/would/would..willing questions Can you move the piano? Would you move the piano? Would you be willing to move the piano? A la Sag & Liberman ‘75: can intonation disambiguate? 9/19/2018

Experiments Production studies:
Subjects read ambiguous questions in disambiguating contexts Control for given/new and contrastiveness Polite/neutral/impolite Problems: Cells imbalanced No pretesting 9/19/2018

Same speaker reads both contexts
No distractors Same speaker reads both contexts 9/19/2018

Results Indirect requests If L%, more likely (73%) to be indirect
46% H%: differences in height of boundary tone? Politeness: can differs in impolite (higher rise) vs. neutral Variation in speaker strategy 9/19/2018

Corpus Studies: Jurafsky et al ‘98
Lexical, acoustic/prosodic/syntactic differentiators for yeah, ok, uhuh, mhmm, um… Continuers: Mhmm (not taking floor) Assessments: Mhmm (tasty) Agreements: Mhmm (I agree) Yes answers: Mhmm (That’s right) Incipient speakership: Mhmm (taking floor) 9/19/2018

Corpus Study Switchboard telephone conversation corpus
Hand segmented and labeled with DA information (initially from text) Relabeled for this study Analyzed for Lexical realization F0 and rms features Syntactic patterns 9/19/2018

Results: Lexical Differences
Agreements yeah (36%), right (11%),... Continuer uhuh (45%), yeah (27%),… Incipient speaker yeah (59%), uhuh (17%), right (7%),… Yes-answer yeah (56%), yes (17%), uhuh (14%),... 9/19/2018

Results: Prosodic and Syntactic Cues
Relabeling from speech produces only 2% changed labels over all (114/5757) 43/987 continuers --> agreements Why? Shorter duration, lower F0, lower energy, longer preceding pause Over all DA’s, duration best differentiator but… Highly correlated with length in words Assessments: That’s X (good, great, fine,…) 9/19/2018

Future Work Speaker differences?
Higher level prosodic differences among ambiguous word DA’s? 9/19/2018

A Coding Scheme for ‘ok’
Ritualistic? Closing You're Welcome Other No 3rd-Turn-Receipt? Yes If Ritualistic==No, code all of these as well: 9/19/2018

Task Management: Topic Management: Turn Management: Belief Management:
I'm done I'm not done yet None Topic Management: Starting new topic Finished old topic Pivot: finishing and starting Turn Management: Still your turn (=traditional backchannel) Still my turn (=stalling for time) I'm done, it is now your turn Belief Management: I accept your proposition I entertain your proposition I reject your proposition Do you accept my proposition? (=y/n question) 9/19/2018

Next Week Turn-taking and disfluencies 9/19/2018

Recognizing Structure: Dialogue Acts and Segmentation

Similar presentations

Presentation on theme: "Recognizing Structure: Dialogue Acts and Segmentation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Recognizing Structure: Dialogue Acts and Segmentation

Similar presentations

Presentation on theme: "Recognizing Structure: Dialogue Acts and Segmentation"— Presentation transcript:

Similar presentations

About project

Feedback