Vocalic Markers of Deception and Cognitive Dissonance for Automated Emotion Detection Systems Dr. Aaron C. Elkins The University of Arizona.

Slides:



Advertisements
Similar presentations
Acoustic/Prosodic Features
Advertisements

Tom Lentz (slides Ivana Brasileiro)
Phonetics as a scientific study of speech
Digital Signal Processing
Vowel Formants in a Spectogram Nural Akbayir, Kim Brodziak, Sabuha Erdogan.
From Resonance to Vowels March 8, 2013 Friday Frivolity Some project reports to hand back… Mystery spectrogram reading exercise: solved! We need to plan.
Acoustic Characteristics of Vowels
The nature of sound Types of losses Possible causes of hearing loss Educational implications Preparing students for hearing assessment.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Anatomy of the vocal mechanism
The Human Voice. I. Speech production 1. The vocal organs
ACOUSTICAL THEORY OF SPEECH PRODUCTION
Speech Perception Overview of Questions Can computers perceive speech as well as humans? Does each word that we hear have a unique pattern associated.
The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.
Introduction to Acoustics Words contain sequences of sounds Each sound (phone) is produced by sending signals from the brain to the vocal articulators.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Eva Björkner Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing HUT, Helsinki, Finland KTH – Royal Institute of Technology.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Vowel Acoustics, part 2 November 14, 2012 The Master Plan Acoustics Homeworks are due! Today: Source/Filter Theory On Friday: Transcription of Quantity/More.
Audiovisual Emotional Speech of Game Playing Children: Effects of Age and Culture By Shahid, Krahmer, & Swerts Presented by Alex Park
Voice source characterisation Gerrit Bloothooft UiL-OTS Utrecht University.
Presented By: Karan Parikh Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent.
SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?
Vowels Vowels: Articulatory Description (Ferrand, 2001) Tongue Position.
Anatomic Aspects Larynx: Sytem of muscles, cartileges and ligaments.
1 Lab Preparation Initial focus on Speaker Verification –Tools –Expertise –Good example “Biometric technologies are automated methods of verifying or recognising.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Sunee Holland University of South Australia School of Computer and Information Science Supervisor: Dr G Stewart Von Itzstein.
Digital audio and computer music COS 116, Spring 2012 Guest lecture: Rebecca Fiebrink.
LIAR BEHAVIOR: VERBAL AND NONVERBAL PERSPECTIVES.
Harmonics, Timbre & The Frequency Domain
Phonetics HSSP Week 5.
Linguistic Credibility Assessment. Emma – general comments on language Matt – tools for linguistic analysis Mary – case study.
Source/Filter Theory and Vowels February 4, 2010.
Speech Production1 Articulation and Resonance Vocal tract as resonating body and sound source. Acoustic theory of vowel production.
Resonance, Revisited March 4, 2013 Leading Off… Project report #3 is due! Course Project #4 guidelines to hand out. Today: Resonance Before we get into.
Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.
1. 2 Abstract - Two experimental paradigms : - EEG-based system that is able to detect high mental workload in drivers operating under real traffic condition.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Vowel Acoustics November 2, 2012 Some Announcements Mid-terms will be back on Monday… Today: more resonance + the acoustics of vowels Also on Monday:
Speech Acoustics1 Clinical Application of Frequency and Intensity Variables Frequency Variables Amplitude and Intensity Variables Voice Disorders Neurological.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
Male Cheerleaders and their Voices. Background Information: What Vocal Folds Look Like.
Speech analysis with Praat Paul Trilsbeek DoBeS training course June 2007.
SH 565- Instrumentation in Communicative Disorders Spring ‘02.
Structure of Spoken Language
Speech Science VI Resonances WS Resonances Reading: Borden, Harris & Raphael, p Kentp Pompino-Marschallp Reetzp
Stops Stops include / p, b, t, d, k, g/ (and glottal stop)
Vowel Acoustics March 10, 2014 Some Announcements Today and Wednesday: more resonance + the acoustics of vowels On Friday: identifying vowels from spectrograms.
The Effects of Text and Robotic Agents on Deception Detection Wesley Miller and Michael Seaholm – Department of Computer Sciences University of Wisconsin.
CSD 2230 INTRODUCTION TO HUMAN COMMUNICATION DISORDERS Normal Sound Perception, Speech Perception, and Auditory Characteristics at the Boundaries of the.
Introduction to Digital Speech Processing Presented by Dr. Allam Mousa 1 An Najah National University SP_1_intro.
Tone, Accent and Quantity October 19, 2015 Thanks to Chilin Shih for making some of these lecture materials available.
Performance Comparison of Speaker and Emotion Recognition
Predicting Voice Elicited Emotions
Voicing + Basic Acoustics October 14, 2015 Agenda Production Exercise #2 is due on Friday! No transcription exercise this Friday! Today, we’ll begin.
The Speech Chain (Denes & Pinson, 1993)
P105 Lecture #27 visuals 20 March 2013.
RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.
HOW WE TRANSMIT SOUNDS? Media and communication 김경은 김다솜 고우.
The problem 1.1 Background –What is a voice for the brain? –Source/filter theory of voice production: two independent components: larynx (f0) / vocal.
The Role of Pitch and Age in Perceptions of Speaker Confidence
The Human Voice. 1. The vocal organs
Investigating Multiple Roles of Vocal Pitch in Attitude Change
Investigating Multiple Roles of Vocal Pitch in Attitude Change
The Human Voice. 1. The vocal organs
Problem Statement/Objectives
Speech Perception CS4706.
Voice source characterisation
Presentation transcript:

Vocalic Markers of Deception and Cognitive Dissonance for Automated Emotion Detection Systems Dr. Aaron C. Elkins The University of Arizona

Emotional Voice 2

Can computers perceive vocal emotion? Yes…. but, The science of the emotional voice is young Communication is complex and dynamic Moods and emotions contextually switch Emotion is computationally ill-defined Measuring emotion may inform theory 3

Emotional Dimensions 4 DISGUST?

Four Components of Speech Voiced vs. Unvoiced sounds [v] vs. [f] Airstream through mouth or nose [m] vs. [o] 5

Speech Sounds (1) pitch, (2) loudness, and (3) quality Sound is small variations in air pressure that occur rapidly in succession Vocal folds superimpose outgoing air of voiced sounds The vocal folds vibrate to create a periodic vibration (100 – 250 Hz) We measure these features digitally 6

Recording Father – Digital Audio 7 Waveform measures pulses of vocal folds Based on air pressure disturbance (dB) Voiced vs. Unvoiced (low pressure) Each peak occurs every 100 th of a second (100 Hz)

Vowel Articulation 8 Source-Filter Theory (Müller, 1848) Vocal Folds vibrate at same speed (pitch) Resonance changes in vocal tract to filter frequencies (formants)

Vocalics Vocalic Analysis Examines how it was said Amplitude Pitch (frequency) Response latency Tempo Linguistics Examines what was said 9

Sound Production is Complex When we tense our muscles, such during stress, our larynx tenses Higher Pitch The process is complex Emotions affect the normal operation Deception takes away cognitive resources away and is stressful More mistakes, lower quality, increased average and variation in pitch Sympathetic Nervous system response Increased auditory acuity Heightened arousal 10

Standard Vocal Measures Calculated with Praat and Custom Signal Processing Software 11

Nemesysco LVA 6.50 Commercial Vocalic Software Evaluated 12

Five Vocalic Studies Summarized Study One (Deception Experiment) Study Two (Cognitive Dissonance) Study Three (Embodied Conversational Agent and Trust) Study Four (Embodied Conversational Agent Security Screening - Bomber) Study Five (Embodied Conversational Agent Security Screening - Imposter)

Vocal Deception (Study 1) – Experimental Design N = 96 $10 reward for appearing credible to professional interviewer Two Sequences: First Sequence: DT DDTT TD TTDD T Second Sequence: DT TTDD TD DDTT T 13 Short-Answer Questions Only 8 had variation both within and between subjects Two types of questions: Charged and Neutral 14

Results Built-in classification performed at chance level Vocal measures independent of system discriminated deception: FMain, AVJ, and SOS Possible Latent Variables measuring Conflicting Thoughts, Cognitive Effort, and Emotional Fear Logistic regression performed best on charged questions Higher pitch, cognitive effort, and hesitations are predictive of deception in more stressful interactions The claim that the vocal analysis software measures stress, cognitive effort, or emotion cannot be completely dismissed Deception and Stress can be predicted by Acoustic measures of Voice Quality and Pitch when controlling for speaker characteristics 15

Vocal Dissonance (Study 2) – Experimental Design Modified Induced-Compliance Paradigm Participants (N=52) made two vocal counter-attitudinal arguments for cutting funding for service for the disabled Choice is manipulated High vs. Low (IV) High N = 24, Low N = 28 Participants report attitude towards argument issue (DV)

Arousal (Vocal Pitch) 17 High choice had a 10Hz higher pitch F(1,50) = 4.43, p =.04 All participants reduced their pitch over time F(1,50) = 4.90, p =.03

Cognitive Difficulty High Choice had nearly 2x the response latency on argument two F(1,50) = 4.53, p =.04 Arousal moderation 18

Cognitive Difficulty Participants spoke with 33% more nonfluencies on the second argument F(1,50) = 4.03, p =.05 19

The Importance of Language (Imagery as Abstract Language) 20

Vocal Dissonance Model χ²(1, N = 51), p =.49 SRMR =.02 R² Attitude Change =.17, Imagery =.11 21

From the lab to the AVATAR 22

First Kiosk 23

Kiosk from Last Year 24

Third-Generation Kiosk 25

Gender and Demeanor 26

Vocal Trust (Study 3) – Experimental Design Participants completed pre- survey Packed bag before ECA screening interviewing Completed security screening All responses to ECA recorded for vocal analysis

ECA Demeanor and Gender 28 Question Block 1 Question Block 2 Question Block 3 Question Block4 Repeated Measures Latin Square Design All participants interacted with all demeanor and gender ECA combinations 4 Questions Per block, 16 Total Questions N = 88 Participants (53 Males, 35 Females)

Trust and Time Main effects Initial Trust = 4.09 Trust Rate of Change.04 per second increase p <.01 Duration.05 decrease in trust for every second spent answering the ECA over the 7.6 second average p < Multilevel Growth Model Specified with Trust as the DV (N = 218) with Subject as random effect (N=60)

Vocal Pitch, Time, and Trust Main Effect of Pitch For every 1Hz increase in pitch over 156Hz trust drops by.01 p =.03 Interaction Pitch and Time Pitch x Time b = 9.3e- 05, p =.03 Over time pitch predicts trust less and less 30

Results Human perceptions of trust transfer to ECA Time plays in important role in the interaction All participants trusted the ECA more over time, particularly when it smiled 48 increase in trust when ECA smiles Vocal measures of pitch predicted trust, but only early on For every 1Hz increase in pitch over 156Hz trust drops by.01 Over time pitch predicts trust less and less 31

Vocalics of a Bomber (Study 4) Experimental Design 29 EU border guards were randomly assigned to build a bomb (N = 16) or Control (N = 13) then pack a bag Identical to Study 3, but no breaks in the interview Only male neutral demeanor ECA interviewed participants Bomb Makers were instructed to successfully smuggle the bomb past the ECA

Vocal Analysis Recorded responses to question: “Has anyone given you a prohibited substance to transport through this checkpoint?” Average Response 2.68 sec (SD = 1.66) Responses such as “No” or “of course not” Vocal measures of Pitch and Pitch Variation 33

Results of Vocal Pitch Voice Quality, Gender, and Intensity included as covariates No difference in mean vocal pitch F(1,22)=0.38, p =.54 Main Effect of pitch variation Bomb Makers had 25.34% more variation F(1,22)=4.79, p=.04 34

Pitch Contours 35

Eye Gaze: Guilty 36

Eye Gaze: Innocent 37

Vocalics of an Imposter (Study 5) – Experimental Design 38 EU Border Guards All required to present visa and passport through multiphase screening E-gate Manual Processing AVATAR Screening Interview Four randomly assigned imposters carrying false documents with hostile intentions through screening

AVATAR Interaction Example

iPad Output for Screener 40

Voice Quality Change from Baseline Question (What is your full name?) 41

Vocalic Classification Model 42

Vocalic Resulting Classification 7 innocents falsely classified as terrorists 27 correctly classified as innocent All “guilty” referred to secondary Overall accuracy = 81% TPR = 100% TNR = 79% FPR = 20% FNR = 0% 43

Eye Fixations on Visa 44

Date of Birth Results – Correct? 45

Final Decision Model 46

Vocalic Resulting Classification 3 innocents falsely classified as terrorists One of these three was actually lying Actually a True Positive 31 correctly classified as innocent All “guilty” referred to secondary Overall accuracy = 94.47% TPR = 100% TNR = 88.24% FPR = 5.8%  Reduced by 3/4 FNR = 0% 47

Questions? Isn’t the voice amazing?