Download presentation
Presentation is loading. Please wait.
2
Identifying Local Corrections in Human-Computer Dialogue Gina-Anne Levow University of Chicago October 5, 2004
3
Roadmap Problem Data collection & analysis Identifying local corrections Conclusions & future work
4
The Problem U: October eleventh S: Okay, leaving October fifth… U: October eleventh Goal: Pinpoint WHAT is being corrected –Builds on recognition of corrections (Kirchoff, 2001; SHL, 2000; Levow 1998)
5
Why Identify Local Corrections? Miscommunication is inevitable –SER still high for conversational speech Error resolution is crucial –Easy recovery more important than WER (Walker et al, 2001; Shriberg et al, 1992) –Facilitates recovery Adaptive dialogue strategy
6
Challenge & Response Few lexical/syntactic cues –Cue phrases rare, e.g. “No I meant…” –May be identical to legal original input Near repetitions common –E.g. departure and return dates Approach: Exploit prosodic cues –Wizard-of-Oz study found significant contrasts Increases in duration, pitch, intensity (Oviatt et al 1998)
7
Data Collection Corpus: 2000, 2001 Communicator Eval’n –Telephone-only interface to travel information Air, hotel, car –>160 hours of interactions,~43K utts Local corrections –Single focus of correction –Error identifiable from system response
8
Local Correction Set Lexically matched –U: October eleventh –S: Okay, leaving October fifth… –U: October eleventh Lexically unmatched –U: October eleventh –S: Okay, leaving October fifth –U: The eleventh of October 57 utterances: 200 total words, 57 corrective –Automatically identified from logs, manually checked
9
Prosodic Features & Analysis Pitch, Intensity –Maximum, Minimum, Mean, Range –From Praat (Boersma 2001), smoothed Utterance normalized, per-word Duration –Normalized ( ATIS-based phoneme durs, Chung & Seneff 1997 ) Significant increases in duration –Local correction words ONLY No other measures reach significance (cf. Oviatt)
10
Local Correction
11
Local Correction II
12
Local Correction Classification Classifier: Boostexter (Schapire & Singer, 2000) Feature selection, avoid overfitting –5-way cross-validation Report average over runs Features: –Duration –Pitch, Intensity (Max, Min, Mean, Range) Normalized values Within utterance ranks
13
Localizing Corrections Baseline: Most common class: 71.5% Overall: 85.5% Lexically matched: 81.25% (Baseline: 59%) Unmatched: 87% (Baseline:80%) Rank-based features crucial –Using normalized values degrades performance Key features: –Pitch range: Approaches best –Maximum pitch, Maximum intensity Duration less useful
14
Conclusion & Future Work Prosodic cues identify focus of correction –Pitch range; Pitch, Intensity Maximum –Rank-based features key –Correspond to utterance level prominence Increased pitch max, range, intensity,duration Extend beyond single correction point –Phrasal units, Multi-point Integrate recognition with dialogue management
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.