Download presentation
Presentation is loading. Please wait.
Published byBridget Grant Modified over 9 years ago
1
Adapting Dialogue Models Discourse & Dialogue CMSC 35900-1 November 19, 2006
2
Roadmap The Problem: Portability Task domain: Call-routing Porting: –Speech recognition –Call-routing –Dialogue management Conclusions
3
SLS Portability Spoken language system design –Record or simulate user interactions –Collect vocabulary, sentence style, sequence Transcribe/label –Expert creates vocabulary, language model, dialogue model Problem: Costly, time-consuming, expert
4
Call-routing Goal: Given an utterance, identify type –Dispatch to right operator Classification task: –Manual rules or data-driven methods Feature-based classification (Boosting) –Pre-defined types, e.g.: Hello? -> Hello; I have a question -> request(info) I would like to know my balance. > request(balance)
5
Dialogue Management Flow Controller –Pluggable dialogue strategy modules ATN: call-flow, easy to augment, manage context –Inputs: context, semantic rep. of utterance ASR –Language models Trigrams, in probabilistic framework
6
Adaptation: ASR ASR: Language models –Usually trained from in-domain transcriptions Here: out-of-domain transcriptions –Switchboard, spoken dialog (telecomm, insur) –In-domain web pages New domain: pharmaceuticals Style differences: SLS:pronouns; OOV: med best Best accuracy: spoken dialogue+web –SWBD too big/slow
7
Adaptation: Call-routing Manual tagging: Slow, expensive Here: Existing out-of-domain labeled data –Meta call-types: Library Generic: all apps Re-usable: in-domain, but already exist Specific: only this app –Grouping done by experts Bootstrap: Start with generic, reusable
8
Call-type Classification Boostexter: word n-gram features; 1,100 iter –ASR output basis Telecomm based call-type library Two classifications: reject-yn; classification –In-domain: true: 78%; ASR: 62% –Generic: test on generic: 95%; 91% –Bootstrap: generic+reuse+rules: 79%, 68%
9
Dialogue Model Build dialogue strategy templates –Based on call-type classification Generic: –E.g.. Yes, no, hello, repeat, help Cause generic context dependent reply Tag as vague/concrete: –Vague: “I have a question” -> clarification –Concrete:clear routing, attributes – sub-dialogs
10
Dialogue Model Porting Evaluation: –Compare to original transcribed dialogue Task 1: DM category: 32 clusters of calls –Bootstrap 16 categories – 70% of instances Using call-type classifiers: get class, conf, concrete? If confident/concrete/correct -> correct; –If incorrect, error Also classify vague/generic 67-70% accuracy for DM, routing task
11
Conclusions Portability: –Bootstrapping of ASR, Call-type, DM –Generally effective Call-type success high Others: potential
12
Turn-taking Discourse and Dialogue CS 35900-1 November 16, 2004
13
Agenda Motivation –Silence in Human-Computer Dialogue Turn-taking in human-human dialogue –Turn-change signals –Back-channel acknowledgments –Maintaining contact Exploiting to improve HCC –Automatic identification of disfluencies, jump-in points, and jump-ins
14
Turn-taking in HCI Human turn end: – Detected by 250ms silence System turn end: –Signaled by end of speech –Indicated by any human sound Barge-in Continued attention: –No signal
15
Missed turn example
16
Yielding & Taking the Floor Turn change signal –Offer floor to auditor/hearer –Cues: pitch fall, lengthening, “but uh”, end gesture, amplitude drop+’uh’, end clause Likelihood of change increases with more cues Negated by any gesticulation Speaker-state signal: Shift in head direction AND/OR Start of gesture
17
Retaining the Floor Within-turn signal –Still speaker: Look at hearer as end clause Continuation signal –Still speaker: Look away after within-turn/back Back-channel: –‘mmhm’/okay/etc; nods, sentence completion. Clarification request; restate –NOT a turn: signal attention, agreement, confusion
18
Segmenting Turns Speaker alone: –Within-turn signal->end of one unit; –Continuation signal -. Beginning of next unit Joint signal: –Speaker turn signal (end); auditor ->speaker; speaker- >auditor –Within-turn + back-channel + continuation Back-channels signal understanding –Early back-channel + continuation
19
Regaining Attention Gaze & Disfluency –Disfluency: “perturbation” in speech Silent pause, filled pause, restart –Gaze: Conversants don’t stare at each other constantly However, speaker expects to meet hearer’s gaze –Confirm hearer’s attention Disfluency occurs when realize hearer NOT attending –Pause until begin gazing, or to request attention
20
Improving Human-Computer Turn-taking Identifying cues to turn change and turn start Meeting conversations: –Recorded, natural research meetings –Multi-party –Overlapping speech –Units = “Spurts” between 500ms silence
21
Tasks Sentence/disfluency/non-boundary ID –End of sentence, break off, continue Jump-in points –Times when others “jump in” Jump-in words –Interruption vs start from silence Off- and on- line Language model and/or prosodic cues
22
Text + Prosody Text sequence: –Modeled as n-gram language model Hidden event prediction – e.g. boundary as hidden state –Implement as HMM Prosody: –Duration, Pitch, Pause, Energy –Decision trees: classify + probability Integrate LM + DT
23
Interpreting Breaks For each inter-word position: –Is it a disfluency, sentence end, or continuation? Key features: –Pause duration, vowel duration 62% accuracy wrt 50% chance baseline –~90% overall Best combines LM & DT
24
Jump-in Points (Used) Possible turn changes –Points WITHIN spurt where new speaker starts Key features: –Pause duration, low energy, pitch fall –No lexical/punctuation features used –Forward features useless Look like SB but aren’t Accuracy: 65% wrt 50% baseline Performance depends only on preceding prosodic features
25
Jump-in Features Do people speak differently when jump-in? –Differ from regular turn starts? Examine only first words of turns –No LM Key features: –Raised pitch, raised amplitude Accuracy: 77% wrt 50% baseline –Prosody only
26
Summary Prosodic features signal conversational moves –Pause and vowel duration distinguish sentence end, disfluency, or fluent continuation –Jump-ins occur at locations that sound like sent. ends –Raise voice when jump in
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.