Download presentation
Presentation is loading. Please wait.
Published byMaria Teresa Massari Modified over 6 years ago
1
Recognizing Structure: Dialogue Acts and Segmentation
Julia Hirschberg CS 6998 9/19/2018
2
Today Recognizing structural information from speech Topic structure
Speech/dialogue acts Applications Speech browsing and search of large corpora Broadcast News (NIST TREC SDR track) Topic Detection and Tracking (NIST/DARPA TDT) Customer care, focus groups, voic Spoken Dialogue Systems 9/19/2018
3
SCAN 9/19/2018
4
SCANMail Demo: Basic Layout
5
SCANMail Demo: Number Extraction
6
Discourse Structure and Topic Structure
Intention-based accounts Grosz & Sidner ‘86 Conversational moves (games) Edinburgh map task dialogues Adjacency pairs Schegloff, Sacks, Jefferson 9/19/2018
7
Indicators of Topic Structure
Cue phrases: now, well, first Pronominal reference Orthography and formatting -- in text Lexical information (Hearst ‘94, Reynar ’98, Beeferman et al ‘99) In speech? 9/19/2018
8
Prosodic Correlates of Discourse/Topic Structure
Pitch range Lehiste ’75, Brown et al ’83, Silverman ’86, Avesani & Vayra ’88, Ayers ’92, Swerts et al ’92, Grosz & Hirschberg’92, Swerts & Ostendorf ’95, Hirschberg & Nakatani ‘96 Preceding pause Lehiste ’79, Chafe ’80, Brown et al ’83, Silverman ’86, Woodbury ’87, Avesani & Vayra ’88, Grosz & Hirschberg’92, Passoneau & Litman ’93, Hirschberg & Nakatani ‘96 9/19/2018
9
Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96
Rate Butterworth ’75, Lehiste ’80, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 Amplitude Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 Contour Brown et al ’83, Woodbury ’87, Swerts et al ‘92 Add Audix tree?? 9/19/2018
10
Prosodic Cues to Sentence and Topic Boundaries: Shriberg et al ’00
Prosody cues perform as well or better than text-based cues at topic segmentation -- and generalize better? Goal: identify sentence and topic boundaries at ASR-defined word boundaries CART decision trees provided boundary predictions HMM combined these with lexical boundary predictions 9/19/2018
11
Features For each potential boundary location:
Pause at boundary (raw and normalized by speaker) Pause at word before boundary (is this a new ‘turn’ or part of continuous speech segment?) Phone and rhyme duration (normalized by inherent duration) (phrase-final lengthening?) F0 (smoothed and stylized): reset, range (topline, baseline), slope and continuity Raw pause worked better than normalized Speaker id/change hand marked apparently F0 reset measured before and after potential boundary, range is defined over preceding word and compared to speaker-specific parameters 9/19/2018
12
Trained/tested on Switchboard and Broadcast News
Voice quality (halving/doubling estimates as correlates of creak or glottalization) Speaker change, time from start of turn, # turns in conversation and gender Trained/tested on Switchboard and Broadcast News 9/19/2018
13
Sentence segmentation results
Prosodic features Better than LM for BN Worse (on transcription) and same for ASR transcript on SB All better than chance Useful features for BN Pause at boundary ,turn/no turn, f0 diff across boundary, rhyme duration BN sentence boundaries vs SB ASR Trans Model 13.3 6.2 Chance(nonb) 11.7 3.3 Comb HMM 11.8 4.1 LM only 10.9 3.6 Pros only ASR Trans Model 25.8 11.0 Chance(nonb) 22.5 4.0 Comb HMM 22.8 4.3 LM only 22.9 6.7 Pros only 9/19/2018 BN topic vs SB ASR Trans Model 13.3 6.2 Chance(nonb) 11.7 3.3 Comb HMM 11.8 4.1 LM only 10.9 3.6 Pros only ASR Trans Model .3 Chance(nonb) .1438 .1377 Comb HMM .1897 .1895 LM only .1731 .1657 Pros only
14
Useful features for SB Phone/rhyme duration before boundary, pause at boundary, turn/no turn, pause at preceding word boundary, time in turn 9/19/2018
15
Topic segmentation results (BN only):
Useful features Pause at boundary, f0 range, turn/no turn, gender, time in turn Prosody alone better than LM Combined model improves significantly 9/19/2018
16
Speech Act Theory John Searle Locutionary acts: semantic meaning
Illocutionary acts: ask, promise, answer, threat Perlocutionary acts: Effect intended to be produced on speaker: regret, fear Dialogue acts Many tagging schemes (e.g. DAMSL) 9/19/2018
17
Practical Motivations: Spoken Dialogue Systems
Add more information about speaker intentions Disambiguate ambiguous utterances Okay Um Right 9/19/2018
18
Experimental Evidence: Nickerson & Chu-Carroll ‘99
Can/would/would..willing questions Can you move the piano? Would you move the piano? Would you be willing to move the piano? A la Sag & Liberman ‘75: can intonation disambiguate? 9/19/2018
19
Experiments Production studies:
Subjects read ambiguous questions in disambiguating contexts Control for given/new and contrastiveness Polite/neutral/impolite Problems: Cells imbalanced No pretesting 9/19/2018
20
Same speaker reads both contexts
No distractors Same speaker reads both contexts 9/19/2018
21
Results Indirect requests If L%, more likely (73%) to be indirect
46% H%: differences in height of boundary tone? Politeness: can differs in impolite (higher rise) vs. neutral Variation in speaker strategy 9/19/2018
22
Corpus Studies: Jurafsky et al ‘98
Lexical, acoustic/prosodic/syntactic differentiators for yeah, ok, uhuh, mhmm, um… Continuers: Mhmm (not taking floor) Assessments: Mhmm (tasty) Agreements: Mhmm (I agree) Yes answers: Mhmm (That’s right) Incipient speakership: Mhmm (taking floor) 9/19/2018
23
Corpus Study Switchboard telephone conversation corpus
Hand segmented and labeled with DA information (initially from text) Relabeled for this study Analyzed for Lexical realization F0 and rms features Syntactic patterns 9/19/2018
24
Results: Lexical Differences
Agreements yeah (36%), right (11%),... Continuer uhuh (45%), yeah (27%),… Incipient speaker yeah (59%), uhuh (17%), right (7%),… Yes-answer yeah (56%), yes (17%), uhuh (14%),... 9/19/2018
25
Results: Prosodic and Syntactic Cues
Relabeling from speech produces only 2% changed labels over all (114/5757) 43/987 continuers --> agreements Why? Shorter duration, lower F0, lower energy, longer preceding pause Over all DA’s, duration best differentiator but… Highly correlated with length in words Assessments: That’s X (good, great, fine,…) 9/19/2018
26
Future Work Speaker differences?
Higher level prosodic differences among ambiguous word DA’s? 9/19/2018
27
A Coding Scheme for ‘ok’
Ritualistic? Closing You're Welcome Other No 3rd-Turn-Receipt? Yes If Ritualistic==No, code all of these as well: 9/19/2018
28
Task Management: Topic Management: Turn Management: Belief Management:
I'm done I'm not done yet None Topic Management: Starting new topic Finished old topic Pivot: finishing and starting Turn Management: Still your turn (=traditional backchannel) Still my turn (=stalling for time) I'm done, it is now your turn Belief Management: I accept your proposition I entertain your proposition I reject your proposition Do you accept my proposition? (=y/n question) 9/19/2018
29
Next Week Turn-taking and disfluencies 9/19/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.