Building & Applying Emotion Recognition

Slides:



Advertisements
Similar presentations
The Extended Cohn-Kanade Dataset(CK+):A complete dataset for action unit and emotion-specified expression Author:Patrick Lucey, Jeffrey F. Cohn, Takeo.
Advertisements

Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.
Chapter 1: Introduction to Pattern Recognition
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Feature Detection and Emotion Recognition Chris Matthews Advisor: Prof. Cotter.
1 A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions Zhihong Zeng, Maja Pantic, Glenn I. Roisman, Thomas S. Huang Reported.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Recognizing Emotions in Facial Expressions
Compiled By: Raj G Tiwari.  A pattern is an object, process or event that can be given a name.  A pattern class (or category) is a set of patterns sharing.
Multimodal Information Analysis for Emotion Recognition
Machine Learning Extract from various presentations: University of Nebraska, Scott, Freund, Domingo, Hong,
Performance Comparison of Speaker and Emotion Recognition
Interpreting Ambiguous Emotional Expressions Speech Analysis and Interpretation Laboratory ACII 2009.
9/24/2017 7:27 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Microsoft Ignite /4/2018 1:44 PM BRK3105
2/13/2018 4:38 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
2/21/ :54 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
4/19/ :02 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
4/18/2018 3:49 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Unsupervised Learning of Video Representations using LSTMs
4/23/2018 7:04 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
5/14/2018 7:32 PM BRK3299 Microsoft Cognitive Services: Infusing language and speech capabilities into your apps Giampaolo Battaglia Luis Cabrera Sr.
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
Building image classification using the Microsoft AI platform
Predicting Azure Consumption using Ensemble Learning
Histograms CSE 6363 – Machine Learning Vassilis Athitsos
Deep Learning Amin Sobhani.
Globalizing apps and UX with Microsoft Translator text and speech translation API Kelly Altom Program Manager
Data Mining, Neural Network and Genetic Programming
Napredno prepoznavanje ljudi koristeći Microsoft Azure Cognitive Services SLAVEN MIŠAK, Span d.o.o. IVAN MARKOVIĆ, Span d.o.o.
Grey Sentiment Analysis
AI development using Data Science Virtual Machines (DSVM) in Azure
Machine Learning and Office 365 Collaboration
Understanding Neural Networks using .NET
7/6/2018 1:42 PM BRK2391 Making Microsoft AI work for your business with Bing Custom Search and Bing Search API v7 Brian King Group Program Manager Bing.
GESTURE RECOGNITION TECHNOLOGY
Introduction to Azure Bot Framework
Summary Presented by : Aishwarya Deep Shukla
Changing how people interact with computers
Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas
9/14/ :46 AM BRK3293 How the Portland Trail Blazers Use Personalization and Acxiom Data to Target Customers Chris Hoder Program Manager, AI + Research.
Voluntary (Motor Cortex)
Leverage the Intelligent Cloud
INITIAL GOAL: Detecting personality based on interaction with Alexa
Deceptive News Prediction Clickbait Score Inference
Microsoft SharePoint Server 2013
The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression By: Patrick Lucey, Jeffrey F. Cohn, Takeo.
Proportion of Original Tweets
Quanzeng You, Jiebo Luo, Hailin Jin and Jianchao Yang
Personalizing conversational agent based e-learning applications
Microsoft Ignite NZ October 2016 SKYCITY, Auckland.
Vessel Extraction in X-Ray Angiograms Using Deep Learning
App Discovery in the Windows Store
SAS Deep Learning: From Toolkit to Fast Model Prototyping
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
AHED Automatic Human Emotion Detection
Technical Capabilities
Papers 15/08.
C++ Productivity Improvements
AHED Automatic Human Emotion Detection
Word embeddings (continued)
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Artificial intelligence for everyone
Bots, so you don't have to be always available to help your customers
Building Data-Driven Applications Using "Quadrant" and "M"
Exploring Cognitive Services
A touch of AI with Cognitive Services
COGNITIVE SERVICES MACHINE LEARNING FOR DEVELOPERS
Getting Started with Microsoft Azure Machine Learning
Presentation transcript:

Building & Applying Emotion Recognition Cristian Canton Microsoft @CristianCanton Anna S. Roth Microsoft @AnnaSRoth

Microsoft Cognitive Services Vision Computer Vision | Emotion | Face | Video Microsoft Cognitive Services We’re hiring! Speech Custom Recognition | Speech Language Bing Spell Check | Language Understanding | Linguistic Analysis | Text Analytics | Web Language Model Knowledge Academic Knowledge | Entity Linking | Knowledge Exploration | Recommendations Search Bing Autosuggest | Bing Image Search | Bing News Search | Bing Video Search | Bing Web Search

Goals Emotion as a subjective problem Building an image classifier end-to-end

The Recipe  Data collection  Tagging  Aggregation  Data preprocessing  Architecture selection  Cost function  Training

“Emotion”

Microsoft Confidential - Internal Only 1) Sample 2) Why 3) Switch to demo Microsoft Confidential - Internal Only

Machine Learning, Analytics, & Data Science Conference 8/7/2018 2:13 AM Basic Emotions Neutral Happiness Surprise Sadness Angry Contempt Disgust Fear © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

FACS Emotion Action Units Happiness 6+12 Sadness 1+4+15 Surprise 1+2+5B+26 Fear 1+2+4+5+7+20+26 Anger 4+5+7+23 Disgust 9+15+16 Contempt R12A+R14A

CIRCUMPLEX High Arousal Valence Negative Positive Low

Lots of models of emotion 𝑃 𝐴 𝐷 Lovheim Cube Image is CC-BY-SA-4.0 from Wikimedia user “Fred The Oyster” - https://en.wikipedia.org/wiki/File:L%C3%B6vheim_cube_of_emotion.svg Plutchik wheel image public domain from: https://en.wikipedia.org/wiki/File:Plutchik-wheel.svg

Machine Learning, Analytics, & Data Science Conference 8/7/2018 2:13 AM Basic Emotions Neutral Happiness Surprise Sadness Angry Contempt Disgust Fear © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Dog image is CC-BY-SA-4.0 from Wikimedia user “Edmontcz” Subjective Dog image is CC-BY-SA-4.0 from Wikimedia user “Edmontcz”

Subjective Your cat may be dead or alive, but it’s still a cat

Very Subjective Your cat may be dead or alive, but it’s still a cat

Other Subjective Problems Attractiveness Personality traits Style

The Recipe  Data collection  Tagging  Data preprocessing  Architecture selection Aggregation Cost function  Training

FER Data – used for early academic work 28k training, 7k val+test 71.73% with aug http://www-etud.iro.umontreal.ca/~goodfeli/fer2013.html

In-house Data Collection Machine Learning, Analytics, & Data Science Conference 8/7/2018 2:13 AM In-house Data Collection 4.5 million webcrawled images Emotional keywords Names Preprocessed images before tagging- face detector © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

The Recipe  Data collection  Tagging  Data preprocessing Architecture selection Aggregation  Cost function  Training

Machine Learning, Analytics, & Data Science Conference 8/7/2018 2:13 AM Tagging FACS Appearance based More accurate and less subjective. Easy expand to more emotions. Con: Expensive and require a certified tagger. Cheap and doesn’t require a certified tagger. Con: Crowdsourcing is very noisy. © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Machine Learning, Analytics, & Data Science Conference 8/7/2018 2:13 AM Crowd Sourced Tagging Each tagger can choose between 1 of the 8 emotions or unknown or not a face. We started with at least 2 taggers agree and up to 5 taggers. Quality was very bad specially with subtle emotions. We retagged all our data with 10 taggers. Quality improved drastically (detailed next). Even after using gold standard, amount of time taken by each taggers…etc. © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

How many taggers to we need? 3 45.4 4 59 5 66.4 6 74.2 7 80.6 8 85.8 9 92.6

https://github.com/Microsoft/FERPlus FER++ https://github.com/Microsoft/FERPlus

Unreliable?

The Recipe  Data collection  Tagging  Data preprocessing Architecture selection Aggregation  Cost function  Training

Input data Face detection

Input data

The Recipe  Data collection  Tagging  Data preprocessing Input Data Data Preprocessing DNN Architecture Cost Function Training  Data collection  Tagging  Data preprocessing Architecture selection Aggregation  Cost function  Training

Data pre-processing Present the data in a more or less homogeneous way to the system

Data pre-processing Present the data in a more or less homogeneous way to the system Reduce variability of the input data exploiting any known characteristics

Data pre-processing Present the data in a more or less homogeneous way to the system Reduce variability of the input data exploiting any known characteristics In our case: Grayscale conversion Image cropping and scaling to the input size No frontalization DeepFace: Ranzato et al Taigman et al.,2014

Data pre-processing: Augmentation Rotation

Data pre-processing: Augmentation Translation

Data pre-processing: Augmentation Scaling

Data pre-processing: Augmentation Flip Other augmentations: - Affine, projective transformations, lens distortion - Noise - Be creative

The Recipe  Data collection  Tagging  Data preprocessing  Architecture selection  Aggregation  Cost function  Training

DNN Architecture It is very difficult to predict the performance of a given DNN architecture for a particular problem Explored several deep architectures: VGG16, VGG19, Resnet-50, Resnet-101 Commodity architectures

The Recipe  Data collection  Tagging  Data preprocessing  Architecture selection Aggregation Cost function  Training

Cost Function Link between distilled info from tags into cost function. Soft max and entropy

Emotion Probability Distribution Machine Learning, Analytics, & Data Science Conference 8/7/2018 2:13 AM Emotion Probability Distribution Happiness Surprise Fear 5 4 1 Majority Voting (MV) Each face is associated with one emotion, the one that has the majority vote. Multi-Label Learning (ML) All emotions above certain threshold are treated as valid emotion. Probabilistic Drawing (PLD) During training draw the target emotion according to its probability. Cross-entropy loss (CEL) Learn the actual probability distribution. © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Emotion Probability Distribution Training result (on FER+) Machine Learning, Analytics, & Data Science Conference 8/7/2018 2:13 AM Emotion Probability Distribution Training result (on FER+) Schemes Accuracy MV 83.85±0.63% ML 83.97±0.36% PLD 84.99±0.37% CEL 84.72±0.24% © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

The Recipe  Data collection  Tagging  Data preprocessing Architecture selection Aggregation  Cost function  Training

The Recipe  Data collection  Tagging  Aggregation  Data preprocessing  Architecture selection  Cost function Training ... Future work??

Video

Emotion in Video Difficulties: Potential approaches: Temporal component of expressions Necessity to track the face along time Data tagging Potential approaches: Fame-by-frame analysis + temporal aggregation Fully train a RNN or LSTM (data hungry!)

Multimodal Future

Multimodal Emotion Combine audio+video in sequences to improve the recognition ratio of emotions Combine audio+text to improve the recognition ratio

Microsoft Cognitive Services Vision Computer Vision | Emotion | Face | Video Microsoft Cognitive Services We’re hiring! Speech Custom Recognition | Speech Language Bing Spell Check | Language Understanding | Linguistic Analysis | Text Analytics | Web Language Model Knowledge Academic Knowledge | Entity Linking | Knowledge Exploration | Recommendations Search Bing Autosuggest | Bing Image Search | Bing News Search | Bing Video Search | Bing Web Search