Presentation is loading. Please wait.

Presentation is loading. Please wait.

Emotional Speech Julia Hirschberg CS 6998 2/16/2019.

Similar presentations


Presentation on theme: "Emotional Speech Julia Hirschberg CS 6998 2/16/2019."— Presentation transcript:

1 Emotional Speech Julia Hirschberg CS 6998 2/16/2019

2 Today Defining emotional speech Emotional categories
Eliciting judgments Producing emotional speech Detecting emotional speech A Subclass: Deceptive speech 2/16/2019

3 Cowie ‘00 Is there a good theoretical or practical definition of emotional speech? “Full-blown” emotion vs. emotional state Cause and effect descriptions Primary and secondary (second order) Everyday descriptions Representations Biological 2/16/2019

4 Dimensions in continuous space, e.g.
Valence: positive or negative Activation level: how disposed to take action Structural models: different ways of appraising situation that evokes emotion e.g. positive or negative? Does situation help agent to achieve his/her goals? Timing as a key variable sadness vs. grief vs. depression vs. gloominess 2/16/2019

5 How are emotions expressed? Display rules? In speech? Mixing
Simulation 2/16/2019

6 Schroeder ‘01: Emotion in Synthesis
How is a given emotion expressed in speech? What are the properties of the emotion to be expressed? How are they related to those of other emotions? What kind of synthesizer works best? Formant Diphone Unit selection 2/16/2019

7 Prosody rules: what to modify? How do we evaluate the results?
Forced choice Free response Recognition rate Perceived naturalness 2/16/2019

8 Ten Bosch ‘00: Emotion Recognition
How hard is the problem? Is ‘standard’ ASR technology well-suited to it? Acoustic and language models target short local events Feature extraction normlizes/excludes e.g. pitch, rate, amplitude -- why? Interaction: emotional speech and ASR performance Synthesis needs one good example but... 2/16/2019

9 Ang et al Challenges: Use output from ASR system
Use automatic prosodic features Find good speaker normalization Combine with lexical features Pioneered approach of “direct modeling” – no use of intermediate phonological units Applications: detecting frustration, disappointment/tiredness, amusement/surprise Results: prediction comparable to human accuracy 70-75% 2/16/2019

10 Method: Prosodic Models
Extract pitch from signal Speech recognizer outputs word and phone alignments (duration features) Utterance-level features extracted (e.g., max speaker normalized pitch in the longest phone-normalized vowel, etc) Decision trees created to provide posterior probabilities of emotion classes given features Feature selection from development test set Separate test set used for evaluation 2/16/2019

11 Prosodic Features Duration features Phone / Vowel / Syllable Durations
Normalized by Phone/Vowel Means, Speaker Speaking rate features (vowels/time) Pause features Speech to pause ratio, number of long pauses Maximum pause length Energy features (RMS energy) Pitch features Used pitch stylization algorithm (Sonmez et al.) LTM model of F0 to estimate speaker range Pitch ranges, slopes, locations of interest Spectral tilt features Other (non-prosodic) features Position of utterance in dialog Repeat or correction 2/16/2019

12 Emotion in Deception Motivation: why might such cues exist?
Deception evokes emotion in deceivers (e.g. Ekman ‘85-92) Fear of discovery: higher pitch, faster, louder, pauses disfluencies, indirect speech Elation at successful deceiving: higher pitch, faster, louder, greater elaboration 2/16/2019

13 Acoustic/Prosodic/Lexical Cues
Are deceivers less forthcoming? Shorter speech with fewer details Are lies less compelling than truths? Less plausible, logical, more discrepancies Less verbal and vocal ‘involvement’ Less verbal ‘immediacy’: more passives, negations, indirect speech More uncertainty (subjective) More repetitions Are liars less positive, pleasant? 2/16/2019

14 More negative statements, complaints Are liars more tense?
Nervous overall Vocal tension High pitch Do lies contain fewer ‘imperfections’? Fewer self-repairs Fewer admissions of forgetfulness Fewer scene descriptions, details More mention of peripheral events or relationships 2/16/2019

15 Current State-of-the-Art
No single cue to deceptive speech: most studied are visual Other acoustic/prosodic features proposed, but evidence mixed so far Loudness/intensity Speaking rate Response latency Disfluencies No attested method to detect deception automatically using acoustic/prosodic/lexical cues All current findings are descriptive, suggestive All proposed methods require human intervention 2/16/2019

16 Our Approach Elicit deceptive and non-deceptive corpus
Motivation: Identity-relevant (self-image) and instrumental (monetary) incentives “Real” deception vs. acted Good recording conditions Tasks/interview paradigm Transcription/annotation Acoustic/prosodic/lexical analysis to identify features of interest, test validity of paradigm Automatic feature extraction and analysis to train models of deceptive and non-deceptive speech 2/16/2019

17 Corpus Collection Subjects asked to perform tasks for comparison with target profile of 25 top entrepreneurs Performance manipulated to produce performance same as/differing from target Monetary incentive to convince an interviewer they matched target Recorded interview/interrogation Biographical information (t/f) “Big lie” on task performance “Local lie”: Pedal indicators of t/f for each answer 2/16/2019

18 To date: 15 subjects, totaling ~3h of subject speech
Collection To date: 15 subjects, totaling ~3h of subject speech Planned: 7-8h hours of subject speech 2/16/2019

19 Results of Prosodic/Acoustic Analysis
On Arizona Mock Theft data subset: 32 interviews/72m, required segmentation, recording issues (50/160m more being segmented) Significant pitch feature differences between deceptive and non-deceptive speech, but... Highly motivated speakers lower pitch when lying Low motivation speakers raise pitch when lying Males lower pitch when lying Females raise pitch when lying 2/16/2019

20 Preliminary analyses of 8 speakers for ‘local’ t/f
On Columbia corpus: Preliminary analyses of 8 speakers for ‘local’ t/f Significant differences in pitch range for six subjects, but differ from Mock Theft wrt gender Lexical findings: Preliminary analyses on Columbia data using LIWC show negative words more prevalent in deceptive speech 2/16/2019


Download ppt "Emotional Speech Julia Hirschberg CS 6998 2/16/2019."

Similar presentations


Ads by Google