Download presentation
Presentation is loading. Please wait.
1
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004
2
Roadmap Motivation Data Collection –Segment Boundary Selection Feature Extraction & Analysis Cues to Segment Boundaries Preliminary Classification Study Conclusion
3
Why Segment? Enables language understanding tasks –Reference resolution Anaphors typically refer to entities in current segment –Summarization Identify and represent range of topics –Conversational understanding Constrain recognition Different interpretations in different contexts
4
Approaches to Segmentation Monologue –Text similarity: Vector space, language model, cue phrases –(Hearst 1994, Beeferman et al 1999, Marcu 2000) –Prosodic cues: (with text) Pitch, amplitude, duration, pause –(Nakatani et al 1995, Swerts 1997, Tur et al 2001) Human dialogue –Dialogue act classification (Shriberg et al, 1998; Taylor et al, 1998) Text: language models; Prosody: contour, accent type Multi-party segmentation –Text + silence (Galley et al, 2003)
5
Prosody in Human-Computer Dialogue Errors in speech recognition –Prosody provides additional source of evidence Topic change can be expensive Possible contrasts to human-human dialogue –More stilted speaking style –Slow conversation
6
Data Collection System: –SpeechActs (Sun Microsystems, 1993-1996) Voice-only interface to desktop applications –Email, calendar, weather, stock quotes, time, currency –Data: 60 hours, collected during field trial 19 subjects: 4 expert, 14 novice, guest Recorded : 8KHz, 8-bit ulaw, Logged Manually transcribed > 7500 user utterances
7
Discourse Segment Boundary Data Focus: –High-level discourse segment boundaries Not fine-grained subtopic analysis (future work) –More reliably coded and extracted »(Swerts, 1997; Nakatani et al 1995) –Task-based correspondence Align with changes from application to application –Reliably extractable from current data set
8
Data Set Paired data set: –Discourse segment-final and segment-initial pairs User utterances Last command in current application, and application change command –U: What’s the price for Sun? Segment-final –S: … –U: Switch to mail.Segment-initial –473 pairs Extracted automatically Alignment, content verified manually
9
Acoustic Analysis Features: –Pitch and intensity Extracted automatically (Praat) –5-point median smoothed –Normalized per-speaker/call Scalar measures: –Maximum, minimum, mean Full utterance
10
Acoustic Contrasts Pitch: –Segment initial vs segment-final: Maximum, minimum, and mean significantly higher –Lower final fall in segment-final Intensity: –Segment-initial vs segment-final Mean intensity significantly higher –No other measures significant
11
Acoustic Contrasts
12
Discussion Segment initial utterances –Significantly higher in pitch and intensity Largest contrast –Dramatically lower pitch in segment final –Low pitch as cue to topic finality Robust cues to discourse segment boundaries
13
Classification: Preliminary Experiments Automatic prosody-based identification of segment boundaries –Question: Does a pair of utterances span a segment boundary? Data: –Ordered utterance pairs: Half segment-final + segment-initial Half non-boundary
14
Classifier and Features Decision tree classifier (c4.5) Features: –Pitch and intensity Maximum, minimum, mean –Values for each utterance –Differences across pair Preliminary classification results: –70-80% accuracy –Key features: Minimum pitch, average intensity
15
Classifier Tree Min
16
Conclusions & Future Work Discourse segment boundaries in HCI –Segment-initial utterances Significant increases in pitch and intensity –Relative to segment final –Robust contrastive use of pitch and intensity –Preliminary classification efforts: 70-80% Difference in pitch minimum, intensity Extend to subdialogue structure –Richer feature set, data set
17
Conclusions & Future Work Discourse segment boundaries in HCI –Segment-initial utterances Significant increases in pitch and intensity –Relative to segment final –Robust contrastive use of pitch and intensity Extend to subdialogue structure –Richer feature set, data set
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.