Presentation is loading. Please wait.

Presentation is loading. Please wait.

Recognizing Structure: Dialogue Acts and Segmentation

Similar presentations


Presentation on theme: "Recognizing Structure: Dialogue Acts and Segmentation"— Presentation transcript:

1 Recognizing Structure: Dialogue Acts and Segmentation
Julia Hirschberg CS 6998 9/19/2018

2 Today Recognizing structural information from speech Topic structure
Speech/dialogue acts Applications Speech browsing and search of large corpora Broadcast News (NIST TREC SDR track) Topic Detection and Tracking (NIST/DARPA TDT) Customer care, focus groups, voic Spoken Dialogue Systems 9/19/2018

3 SCAN 9/19/2018

4 SCANMail Demo: Basic Layout

5 SCANMail Demo: Number Extraction

6 Discourse Structure and Topic Structure
Intention-based accounts Grosz & Sidner ‘86 Conversational moves (games) Edinburgh map task dialogues Adjacency pairs Schegloff, Sacks, Jefferson 9/19/2018

7 Indicators of Topic Structure
Cue phrases: now, well, first Pronominal reference Orthography and formatting -- in text Lexical information (Hearst ‘94, Reynar ’98, Beeferman et al ‘99) In speech? 9/19/2018

8 Prosodic Correlates of Discourse/Topic Structure
Pitch range Lehiste ’75, Brown et al ’83, Silverman ’86, Avesani & Vayra ’88, Ayers ’92, Swerts et al ’92, Grosz & Hirschberg’92, Swerts & Ostendorf ’95, Hirschberg & Nakatani ‘96 Preceding pause Lehiste ’79, Chafe ’80, Brown et al ’83, Silverman ’86, Woodbury ’87, Avesani & Vayra ’88, Grosz & Hirschberg’92, Passoneau & Litman ’93, Hirschberg & Nakatani ‘96 9/19/2018

9 Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96
Rate Butterworth ’75, Lehiste ’80, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 Amplitude Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 Contour Brown et al ’83, Woodbury ’87, Swerts et al ‘92 Add Audix tree?? 9/19/2018

10 Prosodic Cues to Sentence and Topic Boundaries: Shriberg et al ’00
Prosody cues perform as well or better than text-based cues at topic segmentation -- and generalize better? Goal: identify sentence and topic boundaries at ASR-defined word boundaries CART decision trees provided boundary predictions HMM combined these with lexical boundary predictions 9/19/2018

11 Features For each potential boundary location:
Pause at boundary (raw and normalized by speaker) Pause at word before boundary (is this a new ‘turn’ or part of continuous speech segment?) Phone and rhyme duration (normalized by inherent duration) (phrase-final lengthening?) F0 (smoothed and stylized): reset, range (topline, baseline), slope and continuity Raw pause worked better than normalized Speaker id/change hand marked apparently F0 reset measured before and after potential boundary, range is defined over preceding word and compared to speaker-specific parameters 9/19/2018

12 Trained/tested on Switchboard and Broadcast News
Voice quality (halving/doubling estimates as correlates of creak or glottalization) Speaker change, time from start of turn, # turns in conversation and gender Trained/tested on Switchboard and Broadcast News 9/19/2018

13 Sentence segmentation results
Prosodic features Better than LM for BN Worse (on transcription) and same for ASR transcript on SB All better than chance Useful features for BN Pause at boundary ,turn/no turn, f0 diff across boundary, rhyme duration BN sentence boundaries vs SB ASR Trans Model 13.3 6.2 Chance(nonb) 11.7 3.3 Comb HMM 11.8 4.1 LM only 10.9 3.6 Pros only ASR Trans Model 25.8 11.0 Chance(nonb) 22.5 4.0 Comb HMM 22.8 4.3 LM only 22.9 6.7 Pros only 9/19/2018 BN topic vs SB ASR Trans Model 13.3 6.2 Chance(nonb) 11.7 3.3 Comb HMM 11.8 4.1 LM only 10.9 3.6 Pros only ASR Trans Model .3 Chance(nonb) .1438 .1377 Comb HMM .1897 .1895 LM only .1731 .1657 Pros only

14 Useful features for SB Phone/rhyme duration before boundary, pause at boundary, turn/no turn, pause at preceding word boundary, time in turn 9/19/2018

15 Topic segmentation results (BN only):
Useful features Pause at boundary, f0 range, turn/no turn, gender, time in turn Prosody alone better than LM Combined model improves significantly 9/19/2018

16 Speech Act Theory John Searle Locutionary acts: semantic meaning
Illocutionary acts: ask, promise, answer, threat Perlocutionary acts: Effect intended to be produced on speaker: regret, fear Dialogue acts Many tagging schemes (e.g. DAMSL) 9/19/2018

17 Practical Motivations: Spoken Dialogue Systems
Add more information about speaker intentions Disambiguate ambiguous utterances Okay Um Right 9/19/2018

18 Experimental Evidence: Nickerson & Chu-Carroll ‘99
Can/would/would..willing questions Can you move the piano? Would you move the piano? Would you be willing to move the piano? A la Sag & Liberman ‘75: can intonation disambiguate? 9/19/2018

19 Experiments Production studies:
Subjects read ambiguous questions in disambiguating contexts Control for given/new and contrastiveness Polite/neutral/impolite Problems: Cells imbalanced No pretesting 9/19/2018

20 Same speaker reads both contexts
No distractors Same speaker reads both contexts 9/19/2018

21 Results Indirect requests If L%, more likely (73%) to be indirect
46% H%: differences in height of boundary tone? Politeness: can differs in impolite (higher rise) vs. neutral Variation in speaker strategy 9/19/2018

22 Corpus Studies: Jurafsky et al ‘98
Lexical, acoustic/prosodic/syntactic differentiators for yeah, ok, uhuh, mhmm, um… Continuers: Mhmm (not taking floor) Assessments: Mhmm (tasty) Agreements: Mhmm (I agree) Yes answers: Mhmm (That’s right) Incipient speakership: Mhmm (taking floor) 9/19/2018

23 Corpus Study Switchboard telephone conversation corpus
Hand segmented and labeled with DA information (initially from text) Relabeled for this study Analyzed for Lexical realization F0 and rms features Syntactic patterns 9/19/2018

24 Results: Lexical Differences
Agreements yeah (36%), right (11%),... Continuer uhuh (45%), yeah (27%),… Incipient speaker yeah (59%), uhuh (17%), right (7%),… Yes-answer yeah (56%), yes (17%), uhuh (14%),... 9/19/2018

25 Results: Prosodic and Syntactic Cues
Relabeling from speech produces only 2% changed labels over all (114/5757) 43/987 continuers --> agreements Why? Shorter duration, lower F0, lower energy, longer preceding pause Over all DA’s, duration best differentiator but… Highly correlated with length in words Assessments: That’s X (good, great, fine,…) 9/19/2018

26 Future Work Speaker differences?
Higher level prosodic differences among ambiguous word DA’s? 9/19/2018

27 A Coding Scheme for ‘ok’
Ritualistic? Closing You're Welcome Other No 3rd-Turn-Receipt? Yes If Ritualistic==No, code all of these as well: 9/19/2018

28 Task Management: Topic Management: Turn Management: Belief Management:
I'm done I'm not done yet None Topic Management: Starting new topic Finished old topic Pivot: finishing and starting Turn Management: Still your turn (=traditional backchannel) Still my turn (=stalling for time) I'm done, it is now your turn Belief Management: I accept your proposition I entertain your proposition I reject your proposition Do you accept my proposition? (=y/n question) 9/19/2018

29 Next Week Turn-taking and disfluencies 9/19/2018


Download ppt "Recognizing Structure: Dialogue Acts and Segmentation"

Similar presentations


Ads by Google