Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adapting Dialogue Models Discourse & Dialogue CMSC 35900-1 November 19, 2006.

Similar presentations


Presentation on theme: "Adapting Dialogue Models Discourse & Dialogue CMSC 35900-1 November 19, 2006."— Presentation transcript:

1 Adapting Dialogue Models Discourse & Dialogue CMSC 35900-1 November 19, 2006

2 Roadmap The Problem: Portability Task domain: Call-routing Porting: –Speech recognition –Call-routing –Dialogue management Conclusions

3 SLS Portability Spoken language system design –Record or simulate user interactions –Collect vocabulary, sentence style, sequence Transcribe/label –Expert creates vocabulary, language model, dialogue model Problem: Costly, time-consuming, expert

4 Call-routing Goal: Given an utterance, identify type –Dispatch to right operator Classification task: –Manual rules or data-driven methods Feature-based classification (Boosting) –Pre-defined types, e.g.: Hello? -> Hello; I have a question -> request(info) I would like to know my balance. > request(balance)

5 Dialogue Management Flow Controller –Pluggable dialogue strategy modules ATN: call-flow, easy to augment, manage context –Inputs: context, semantic rep. of utterance ASR –Language models Trigrams, in probabilistic framework

6 Adaptation: ASR ASR: Language models –Usually trained from in-domain transcriptions Here: out-of-domain transcriptions –Switchboard, spoken dialog (telecomm, insur) –In-domain web pages New domain: pharmaceuticals Style differences: SLS:pronouns; OOV: med best Best accuracy: spoken dialogue+web –SWBD too big/slow

7 Adaptation: Call-routing Manual tagging: Slow, expensive Here: Existing out-of-domain labeled data –Meta call-types: Library Generic: all apps Re-usable: in-domain, but already exist Specific: only this app –Grouping done by experts Bootstrap: Start with generic, reusable

8 Call-type Classification Boostexter: word n-gram features; 1,100 iter –ASR output basis Telecomm based call-type library Two classifications: reject-yn; classification –In-domain: true: 78%; ASR: 62% –Generic: test on generic: 95%; 91% –Bootstrap: generic+reuse+rules: 79%, 68%

9 Dialogue Model Build dialogue strategy templates –Based on call-type classification Generic: –E.g.. Yes, no, hello, repeat, help Cause generic context dependent reply Tag as vague/concrete: –Vague: “I have a question” -> clarification –Concrete:clear routing, attributes – sub-dialogs

10 Dialogue Model Porting Evaluation: –Compare to original transcribed dialogue Task 1: DM category: 32 clusters of calls –Bootstrap 16 categories – 70% of instances Using call-type classifiers: get class, conf, concrete? If confident/concrete/correct -> correct; –If incorrect, error Also classify vague/generic 67-70% accuracy for DM, routing task

11 Conclusions Portability: –Bootstrapping of ASR, Call-type, DM –Generally effective Call-type success high Others: potential

12 Turn-taking Discourse and Dialogue CS 35900-1 November 16, 2004

13 Agenda Motivation –Silence in Human-Computer Dialogue Turn-taking in human-human dialogue –Turn-change signals –Back-channel acknowledgments –Maintaining contact Exploiting to improve HCC –Automatic identification of disfluencies, jump-in points, and jump-ins

14 Turn-taking in HCI Human turn end: – Detected by 250ms silence System turn end: –Signaled by end of speech –Indicated by any human sound Barge-in Continued attention: –No signal

15 Missed turn example

16 Yielding & Taking the Floor Turn change signal –Offer floor to auditor/hearer –Cues: pitch fall, lengthening, “but uh”, end gesture, amplitude drop+’uh’, end clause Likelihood of change increases with more cues Negated by any gesticulation Speaker-state signal: Shift in head direction AND/OR Start of gesture

17 Retaining the Floor Within-turn signal –Still speaker: Look at hearer as end clause Continuation signal –Still speaker: Look away after within-turn/back Back-channel: –‘mmhm’/okay/etc; nods, sentence completion. Clarification request; restate –NOT a turn: signal attention, agreement, confusion

18 Segmenting Turns Speaker alone: –Within-turn signal->end of one unit; –Continuation signal -. Beginning of next unit Joint signal: –Speaker turn signal (end); auditor ->speaker; speaker- >auditor –Within-turn + back-channel + continuation Back-channels signal understanding –Early back-channel + continuation

19 Regaining Attention Gaze & Disfluency –Disfluency: “perturbation” in speech Silent pause, filled pause, restart –Gaze: Conversants don’t stare at each other constantly However, speaker expects to meet hearer’s gaze –Confirm hearer’s attention Disfluency occurs when realize hearer NOT attending –Pause until begin gazing, or to request attention

20 Improving Human-Computer Turn-taking Identifying cues to turn change and turn start Meeting conversations: –Recorded, natural research meetings –Multi-party –Overlapping speech –Units = “Spurts” between 500ms silence

21 Tasks Sentence/disfluency/non-boundary ID –End of sentence, break off, continue Jump-in points –Times when others “jump in” Jump-in words –Interruption vs start from silence Off- and on- line Language model and/or prosodic cues

22 Text + Prosody Text sequence: –Modeled as n-gram language model Hidden event prediction – e.g. boundary as hidden state –Implement as HMM Prosody: –Duration, Pitch, Pause, Energy –Decision trees: classify + probability Integrate LM + DT

23 Interpreting Breaks For each inter-word position: –Is it a disfluency, sentence end, or continuation? Key features: –Pause duration, vowel duration 62% accuracy wrt 50% chance baseline –~90% overall Best combines LM & DT

24 Jump-in Points (Used) Possible turn changes –Points WITHIN spurt where new speaker starts Key features: –Pause duration, low energy, pitch fall –No lexical/punctuation features used –Forward features useless Look like SB but aren’t Accuracy: 65% wrt 50% baseline Performance depends only on preceding prosodic features

25 Jump-in Features Do people speak differently when jump-in? –Differ from regular turn starts? Examine only first words of turns –No LM Key features: –Raised pitch, raised amplitude Accuracy: 77% wrt 50% baseline –Prosody only

26 Summary Prosodic features signal conversational moves –Pause and vowel duration distinguish sentence end, disfluency, or fluent continuation –Jump-ins occur at locations that sound like sent. ends –Raise voice when jump in


Download ppt "Adapting Dialogue Models Discourse & Dialogue CMSC 35900-1 November 19, 2006."

Similar presentations


Ads by Google