Low Level Cues to Emotion Julia Hirschberg CS 4995/6998 8/6/2019
Liscombe et al ’05a Domain: Phone account information, How May I Help You? system Motivation Improve customer satisfaction Emotions examined: Negative, non-negative (collapsed from 7 classes) Corpus: 5690 dialogs, 20,013 user turns Training-test split: 75% - 25% ML method: BoostTexter, combines multiple weak classifiers
~80 Features Lexical: bag of words from transcripts 1,2,3grams Prosodic: Energy min, max, median, s.d., F0 min, max, median, s.d., mean slope Ratio of voiced frames to total (rate) Slope after final vowel (turn-final pitch contour) Mean F0 and energy over longest normalized vowel (accent) Syllables per second (rate) mean vowel length percent internal silence (hesitation) Local jitter over longest normalized vowel (VQ) Last 7 are semi-automatically extracted
Results Baseline 73.1% (majority) Lexical + prosodic features 76.1% Lexical + prosodic + dialog act features 77.0% Lexical + prosodic + dialog act + context 79.0%
Dialogue Act (DA) of current turn Context: Change in value of prosodic features from n-1 to n and n to n+1 Bag of words from two previous turns Edit difference between n-1 and n, n and n-2 DAs of n-1 and n-2 DAs of system prompts eliciting n and n-1 Hand-labeled emotion of n-1 and n-2