Adapting Dialogue Models Discourse & Dialogue CMSC 35900-1 November 19, 2006.

Adapting Dialogue Models Discourse & Dialogue CMSC 35900-1 November 19, 2006

Roadmap The Problem: Portability Task domain: Call-routing Porting: –Speech recognition –Call-routing –Dialogue management Conclusions

SLS Portability Spoken language system design –Record or simulate user interactions –Collect vocabulary, sentence style, sequence Transcribe/label –Expert creates vocabulary, language model, dialogue model Problem: Costly, time-consuming, expert

Call-routing Goal: Given an utterance, identify type –Dispatch to right operator Classification task: –Manual rules or data-driven methods Feature-based classification (Boosting) –Pre-defined types, e.g.: Hello? -> Hello; I have a question -> request(info) I would like to know my balance. > request(balance)

Dialogue Management Flow Controller –Pluggable dialogue strategy modules ATN: call-flow, easy to augment, manage context –Inputs: context, semantic rep. of utterance ASR –Language models Trigrams, in probabilistic framework

Adaptation: ASR ASR: Language models –Usually trained from in-domain transcriptions Here: out-of-domain transcriptions –Switchboard, spoken dialog (telecomm, insur) –In-domain web pages New domain: pharmaceuticals Style differences: SLS:pronouns; OOV: med best Best accuracy: spoken dialogue+web –SWBD too big/slow

Adaptation: Call-routing Manual tagging: Slow, expensive Here: Existing out-of-domain labeled data –Meta call-types: Library Generic: all apps Re-usable: in-domain, but already exist Specific: only this app –Grouping done by experts Bootstrap: Start with generic, reusable

Call-type Classification Boostexter: word n-gram features; 1,100 iter –ASR output basis Telecomm based call-type library Two classifications: reject-yn; classification –In-domain: true: 78%; ASR: 62% –Generic: test on generic: 95%; 91% –Bootstrap: generic+reuse+rules: 79%, 68%

Dialogue Model Build dialogue strategy templates –Based on call-type classification Generic: –E.g.. Yes, no, hello, repeat, help Cause generic context dependent reply Tag as vague/concrete: –Vague: “I have a question” -> clarification –Concrete:clear routing, attributes – sub-dialogs

Dialogue Model Porting Evaluation: –Compare to original transcribed dialogue Task 1: DM category: 32 clusters of calls –Bootstrap 16 categories – 70% of instances Using call-type classifiers: get class, conf, concrete? If confident/concrete/correct -> correct; –If incorrect, error Also classify vague/generic 67-70% accuracy for DM, routing task

Conclusions Portability: –Bootstrapping of ASR, Call-type, DM –Generally effective Call-type success high Others: potential

Turn-taking Discourse and Dialogue CS 35900-1 November 16, 2004

Agenda Motivation –Silence in Human-Computer Dialogue Turn-taking in human-human dialogue –Turn-change signals –Back-channel acknowledgments –Maintaining contact Exploiting to improve HCC –Automatic identification of disfluencies, jump-in points, and jump-ins

Turn-taking in HCI Human turn end: – Detected by 250ms silence System turn end: –Signaled by end of speech –Indicated by any human sound Barge-in Continued attention: –No signal

Missed turn example

Yielding & Taking the Floor Turn change signal –Offer floor to auditor/hearer –Cues: pitch fall, lengthening, “but uh”, end gesture, amplitude drop+’uh’, end clause Likelihood of change increases with more cues Negated by any gesticulation Speaker-state signal: Shift in head direction AND/OR Start of gesture

Retaining the Floor Within-turn signal –Still speaker: Look at hearer as end clause Continuation signal –Still speaker: Look away after within-turn/back Back-channel: –‘mmhm’/okay/etc; nods, sentence completion. Clarification request; restate –NOT a turn: signal attention, agreement, confusion

Segmenting Turns Speaker alone: –Within-turn signal->end of one unit; –Continuation signal -. Beginning of next unit Joint signal: –Speaker turn signal (end); auditor ->speaker; speaker- >auditor –Within-turn + back-channel + continuation Back-channels signal understanding –Early back-channel + continuation

Regaining Attention Gaze & Disfluency –Disfluency: “perturbation” in speech Silent pause, filled pause, restart –Gaze: Conversants don’t stare at each other constantly However, speaker expects to meet hearer’s gaze –Confirm hearer’s attention Disfluency occurs when realize hearer NOT attending –Pause until begin gazing, or to request attention

Improving Human-Computer Turn-taking Identifying cues to turn change and turn start Meeting conversations: –Recorded, natural research meetings –Multi-party –Overlapping speech –Units = “Spurts” between 500ms silence

Tasks Sentence/disfluency/non-boundary ID –End of sentence, break off, continue Jump-in points –Times when others “jump in” Jump-in words –Interruption vs start from silence Off- and on- line Language model and/or prosodic cues

Text + Prosody Text sequence: –Modeled as n-gram language model Hidden event prediction – e.g. boundary as hidden state –Implement as HMM Prosody: –Duration, Pitch, Pause, Energy –Decision trees: classify + probability Integrate LM + DT

Interpreting Breaks For each inter-word position: –Is it a disfluency, sentence end, or continuation? Key features: –Pause duration, vowel duration 62% accuracy wrt 50% chance baseline –~90% overall Best combines LM & DT

Jump-in Points (Used) Possible turn changes –Points WITHIN spurt where new speaker starts Key features: –Pause duration, low energy, pitch fall –No lexical/punctuation features used –Forward features useless Look like SB but aren’t Accuracy: 65% wrt 50% baseline Performance depends only on preceding prosodic features

Jump-in Features Do people speak differently when jump-in? –Differ from regular turn starts? Examine only first words of turns –No LM Key features: –Raised pitch, raised amplitude Accuracy: 77% wrt 50% baseline –Prosody only

Summary Prosodic features signal conversational moves –Pause and vowel duration distinguish sentence end, disfluency, or fluent continuation –Jump-ins occur at locations that sound like sent. ends –Raise voice when jump in

Adapting Dialogue Models Discourse & Dialogue CMSC 35900-1 November 19, 2006.

Similar presentations

Presentation on theme: "Adapting Dialogue Models Discourse & Dialogue CMSC 35900-1 November 19, 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Adapting Dialogue Models Discourse & Dialogue CMSC 35900-1 November 19, 2006.

Similar presentations

Presentation on theme: "Adapting Dialogue Models Discourse & Dialogue CMSC 35900-1 November 19, 2006."— Presentation transcript:

Similar presentations

About project

Feedback