A Generative Audio-Visual Prosodic Model for Virtual

Slides:



Advertisements
Similar presentations
A project to develop software to assist people with autism to recognise, understand and express emotions through facial expressions,
Advertisements

APPROACHES TO T&L Language
IB Oral Presentation Presentation dates: January-February (tentative)
Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.
AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of.
Double level analysis of the Multimodal Expressions of Emotions in Human-Machine Interaction.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Sunee Holland University of South Australia School of Computer and Information Science Supervisor: Dr G Stewart Von Itzstein.
Chapter 3 Needs Assessment
C. Dente, Intersemiotic complexity: the word of drama.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
6 Presentation Skills Research Methods – Bazara Barry.
Chapter 7. BEAT: the Behavior Expression Animation Toolkit
How To Give A Scientific Seminar Michelle Chow Ocean Discovery! Sebastopol, CA.
Graphite 2004 Statistical Synthesis of Facial Expressions for the Portrayal of Emotion Lisa Gralewski Bristol University United Kingdom
A project to develop software to assist people with autism to recognise, understand and express emotions through facial expressions,
One way to inspire or inform others is with a multimedia presentation, which combines sounds, visuals, and text.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
Sign Language – 5 Parameters Sign Language has an internal structure. They can be broken down into smaller parts. These parts are called the PARAMETERS.
THE MEANING OF CULTURE 2-1. FOCUS QUESTION HOW DO YOU THINK SOCIETY AND CULTURE DIFFER?
Part I - Understanding Communication By Ms Kamlesh Sadarangani Head – Marketing & PRO.
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
Can a blind person guess the state of mind of someone they are talking with without seeing them? SAK-WERNICKA, JOLANTA. "EXPLORING THEORY OF MIND USE IN.
MALE AND FEMALE NONVERBAL BEHAVIOR SPECIFYING WHAT IS “MALE” AND WHAT IS “FEMALE”
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.
“Articulatory Talking Head” Showcase Project, INRIA, KTH. Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team KTH, Centre.
Presented By Meet Shah. Goal  Automatically predicting the respondent’s reactions (accept or reject) to offers during face to face negotiation by analyzing.
Contents and Format of APA Papers. Who is your audience? Your audience is a group of colleagues. Write your paper so that it could be understood by students.
A Plane-Based Approach to Mondrian Stereo Matching
Chapter 3 Intercultural Communication Competence
DEPARTMENT OF HUMAN AND SOCIAL CIENCES APPLIED LINGUISTICS IN ENGLISH CAREER    “THE INFLUENCE OF TEACHER’S ATTITUDES AND BELIEFS INTO TECHNOLOGY-RELATED.
Influence, Persuade & Sell
Linguistic knowledge for Speech recognition
Investigating Pitch Accent Recognition in Non-native Speech
Mr. Darko Pekar, Speech Morphing Inc.
Chapter 3 Choosing Information & Communications Technologies that Fit the Research Design Janet Salmons, PhD.
SURFBRD Michael Margel Dec CSC 2524.
Date: __________________ Course Name:__________________ Logo 1 2 VOICE ALLOCATION Character Description   3 MODULE 1 - PRESENTATION Section 1 - Welcome.
IB Assessments CRITERION!!!.
Towards Emotion Prediction in Spoken Tutoring Dialogues
Chapter 16 The Persuasive Speech
Ever wanted to Anna Hathaway leave an appealing good morning alarm ?
Communication Skills COMM 101 Lecture#2
Introduction & Overview
Warm Up Get out your notes/flash cards and study your vocabulary words (act, cast, character, comedy, complication, drama, improvisation, lines, monologue,
Listening Speaking Reading Class Preparation Class Preparation Class Preparation Class Preparation Online Tools Online Tools Online Tools Online Tools.
Studying Intonation Julia Hirschberg CS /21/2018.
Before we get started You will need the following apps downloaded to your mobile device: Microsoft Translator Office Lens  This matches with Engage section.
Overview What is Multimedia? Characteristics of multimedia
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Studying Spoken Language Text 17, 18 and 19
How to read a scientific paper
Elements of Voice.
[Type The Title Of The Persuasive Speech Here]
Patricia Keating, Marco Baroni, Sven Mattys, Rebecca Scarborough,
WELCOME.
Communication.
Chapter 8 Communicative competence
Perception: Visual, Vocal & Verbal Insight’s Engagement Styles™
Chapter 7 Communication.
Level 2 Certificate in Fitness Instructing: Gym
Chapter 4 Summary.
Low Level Cues to Emotion
FCE IES Parque de Lisboa.
Paper presentation by: Dan Andrei Ganea and Anca Negulescu
End-to-End Speech-Driven Facial Animation with Temporal GANs
Presented by: Iwan Boksebeld and Marijn Suijten
Data-Driven Approach to Synthesizing Facial Animation Using Motion Capture Ioannis Fermanis Liu Zhaopeng
Chapter 9: Communicating Effectively
Presentation transcript:

A Generative Audio-Visual Prosodic Model for Virtual Actors Modelling virtual humans Cas Laugs (4140613) Max Meijers (6493769)

Authors Adela Barbulescu and Rémi Ronfard From Inria, French research institute for digital sciences Gérard Bailly From GIPSA-lab, University of Grenoble-Alpes

Overview and Motivation Published in IEEE Computer Graphics and Applications December 2017 Only a single citation

Overview and Motivation Expressing complex mental states during conversation in animation is hard Prosody of voice (Rhythm, tone) Facial animation Gaze NOT speech-to-motion!

Problem statement Create a method for generating natural speech and facial animation Expressing various attitudes Based on neutral input Speech Animation Make distinction between attitude and emotion. Voluntary-involuntary

Previous work Three types of expressive facial animation Text-driven Speech-driven Expressive SFC Model

Previous work - Types Text-driven Use Text-To-Speech to obtain Speech (audio) Phoneme duration Joint driven Rule-based approach for attitude Speech-driven Generate face motion Also TTS and rule-based Semantics

Previous work - Types Expressive conversion Mapping function to emotional space (speech) Statistical mapping of motion capture data (animation)

Previous work - SFC Superposition of functional contours (SFC) By Gérard Bailly, 2 citations Generates prosody of utterance Based on (non-)linguistic information SFC genereert intonatie, toon, ritme

Approach Train model on various dramatic performances Input neutral sentence performance (video + audio) and attitude label Generate animated 3D character Speech and animation To match attitude label Three main steps

A Visual Example In this video, the attitude labels are shown above the characters

Method Model 10 attitudes From Simon Baron-Cohen’s Mind Reading Project Captured by ‘semi-professional’ actors More into detail on their method 412 attitudes in whole corpus

Method Extend existing SFC model with speech and facial animation Neural networks using prosody features Acoustic Visual Virtual syllables (supportive)

Method - Acoustic Acoustic features Voice pitch contours Rhythm Energy value Acoustic features from Praat

Method - Visual ...and visual features Verbal motion Non-verbal motion Combine linearly by adding to ‘neutral’ motion Split upper and lower face 19 and 29 blendshapes Verbal motion: shapes made by mouth to articulate words Non-verbal motion: gestures, facial expressions, etc. https://aeanimation.files.wordpress.com/2017/04/79e36420a59429cfd4fad9036f356c7d.jpg?w=676 https://www.telenews.pk/wp-content/uploads/2017/06/mp900422226.jpg

Method - Virtual syllables Supporting Virtual syllables Before and after each utterance Indicates ‘talking turns’ using nonverbal gestures 250ms

Method Sampled and reconstructed at 20, 50, and 80% of each syllable Pitch contour Motion Rhythm Energy These features are...

Experiment Perceptual test Short dialogue video Only one actor shown Participants asked to identify attitude To assess perceived expressiveness of results.

Experiment Participants were shown 3 classes of video: a) Original video b) Motion-captured animation c) Animation generated by method

Experiment 26 different short exchanges Choose 1 of 6 attitudes per exchange Some attitudes actor-exclusive 36 evaluations per participant 51 (French) participants 36 short dialogue exchanges, total 20 minutes

Results Compare attitude evaluations between video types per participant Consistency = good Hypothesis: No significant differences in results between video classes Hypothesis: no significant differences between classes

Results Hypothesis: no significant differences between classes

Results Male p = 0.43 Fail to reject null hypothesis No stat. difference between video types Female p < 1.e-3 → reject null hypothesis Remove tender and fascinated p = 0.32 → fail to reject Hypothesis: no significant differences between classes

Critical review

Positives

Results look promising Participants recognise most attitudes correctly Animation and speech feel ‘expressive’ But a bit ‘stiff’

Large expansion Huge jump from SFC to fully animated Multiple innovations Blink Gaze Facial animation Rhythm The authors could have been content with just furthering research in one area, but they did work on many fronts at once

Negatives

Mediocre structuring Topic structure and naming sometimes confusing Focus changes within topics Without proper writing indicating change Makes it confusing and hard to combine everything together on first read

‘High-level’ explanation lacking Paper lacks solid explanation ‘how everything comes together’ Encapsulated in a single image Referenced only once This is a good figure, but the text does not follow the structure of the figure

Lacking self reflection No analysis of the quality of animation/speech How does the NN differ from the GT? Perfect? Flaws? Unexpected behavior? Ze brengen het allemaal te rooskleurig

Lacking integrity Is ‘trial and error’ a valid way to structure a Neural Network? Reproducibility? Authors disregard certain attitudes to obtain significant results ‘Cause’ could easily be corrected Ze brengen het allemaal te rooskleurig

Future work Innovative project Huge jump from SFC to fully animated Too ambitious? Training set quite small Time constraints? Focus on improving/studying subsystems Blinking, gaze etc. Invest in professional actors Also for analysis Ask about ANALYSIS

Questions/ Discussion

Discussion #1 Is the reason for excluding the badly performing female attitude valid? If not, what should they have done?

Discussion #2 Could making certain attitudes actor-exclusive influence the results?