Inducing and Detecting Emotion in Voice Aaron S. Master Peter X. Deng Kristin L. Richards Advisor: Clifford Nass
Overview Experiment Have subject establish neutral baselines by reading the weather. Induce a positive aroused emotion in subjects by showing them a “happy” video and having them do a happy reading and speech. Repeat for a “sad” negative unaroused video and reading and speech after a distracter test.. Analysis Compare subjects to each other in general. Compare subjects to their individual baselines. Detect emotions using individual or group baselines.
Key Questions Answered Can video and readings cause changes in self- reported emotion? Yes, due to PANAS data. Do detected features of speech correlate with stimuli? Yes, between and within subjects. Do detected features of speech correlate with self- reported emotion? Yes What are the distinguishing characteristics of a positive aroused voice versus a negative calm voice? More words per minute, intensity (variations), pitch variation, voiced frames Does having an individual emotion baseline (a computer “trained” to each subject) help? Yes, greatly.
Detailed Procedure Participant comes into the lab. Participant reads out loud a neutral article, recorded for data. Participant silently reads a primer article. Participant watches an excerpt of “Miracle” Participant reads an article about the 1980 Olympics, recorded for data Participant tells the story in own words, recorded for data. Participant takes PANAS test Participant takes a distracter test Participant silently reads a primer article. Participant watches an excerpt about Ugandan children Participant reads an article about a sick girl, recorded for data Participant tells the story in own words, recorded for data. Participant takes PANAS Participant gets reward.
Stimuli Video clips: “Miracle” excerpt – Disney film about U.S. hockey victory “Invisible Children” excerpt – Independent film about child Soldiers Readings: “Mike Eruzione Stakes the U.S. to a Lead and a Miracle on Ice” “After Battle with Fatal Disease, Little Girl Dies”
Significant differences were found in some features of the voice recordings: Average loudness Words per minute speaking rate Pitch fluctuation Relative proportion of voiced speech frames Can see the effect of individual baselines in DET curves for above features (lower left curves “better”). Using these features, a neural net classifier estimated emotion with 67% accuracy with general baselines, and 89% for individual baselines. Significant Results for Each Condition
Example: Mean Relative Loudness Data (dB) – Individual Baseline
DET Curves
Psychological Implications Manipulations affect speech data – when the subjects are not “acting.” Under investigation: speech data may reflect responses to stimuli that even self reporting does not. It seems possible to change people’s emotions in a short period of time (less than 10 minutes)
Open Questions and Current Directions Do everyday events have the same effect on speech as manipulative videos and readings? Which stimulus had a greater affect on the emotional condition? What about positive unaroused (“pleased”) and negative aroused (“angry”) conditions? Do non-U.S. citizens respond the same way? Nissan discovered similar manipulation effects on Japanese subjects, but with different significant features.