Building & Applying Emotion Recognition Cristian Canton Microsoft @CristianCanton Anna S. Roth Microsoft @AnnaSRoth
Microsoft Cognitive Services Vision Computer Vision | Emotion | Face | Video Microsoft Cognitive Services We’re hiring! Speech Custom Recognition | Speech Language Bing Spell Check | Language Understanding | Linguistic Analysis | Text Analytics | Web Language Model Knowledge Academic Knowledge | Entity Linking | Knowledge Exploration | Recommendations Search Bing Autosuggest | Bing Image Search | Bing News Search | Bing Video Search | Bing Web Search
Goals Emotion as a subjective problem Building an image classifier end-to-end
The Recipe Data collection Tagging Aggregation Data preprocessing Architecture selection Cost function Training
“Emotion”
Microsoft Confidential - Internal Only 1) Sample 2) Why 3) Switch to demo Microsoft Confidential - Internal Only
Machine Learning, Analytics, & Data Science Conference 8/7/2018 2:13 AM Basic Emotions Neutral Happiness Surprise Sadness Angry Contempt Disgust Fear © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
FACS Emotion Action Units Happiness 6+12 Sadness 1+4+15 Surprise 1+2+5B+26 Fear 1+2+4+5+7+20+26 Anger 4+5+7+23 Disgust 9+15+16 Contempt R12A+R14A
CIRCUMPLEX High Arousal Valence Negative Positive Low
Lots of models of emotion 𝑃 𝐴 𝐷 Lovheim Cube Image is CC-BY-SA-4.0 from Wikimedia user “Fred The Oyster” - https://en.wikipedia.org/wiki/File:L%C3%B6vheim_cube_of_emotion.svg Plutchik wheel image public domain from: https://en.wikipedia.org/wiki/File:Plutchik-wheel.svg
Machine Learning, Analytics, & Data Science Conference 8/7/2018 2:13 AM Basic Emotions Neutral Happiness Surprise Sadness Angry Contempt Disgust Fear © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Dog image is CC-BY-SA-4.0 from Wikimedia user “Edmontcz” Subjective Dog image is CC-BY-SA-4.0 from Wikimedia user “Edmontcz”
Subjective Your cat may be dead or alive, but it’s still a cat
Very Subjective Your cat may be dead or alive, but it’s still a cat
Other Subjective Problems Attractiveness Personality traits Style
The Recipe Data collection Tagging Data preprocessing Architecture selection Aggregation Cost function Training
FER Data – used for early academic work 28k training, 7k val+test 71.73% with aug http://www-etud.iro.umontreal.ca/~goodfeli/fer2013.html
In-house Data Collection Machine Learning, Analytics, & Data Science Conference 8/7/2018 2:13 AM In-house Data Collection 4.5 million webcrawled images Emotional keywords Names Preprocessed images before tagging- face detector © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
The Recipe Data collection Tagging Data preprocessing Architecture selection Aggregation Cost function Training
Machine Learning, Analytics, & Data Science Conference 8/7/2018 2:13 AM Tagging FACS Appearance based More accurate and less subjective. Easy expand to more emotions. Con: Expensive and require a certified tagger. Cheap and doesn’t require a certified tagger. Con: Crowdsourcing is very noisy. © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Machine Learning, Analytics, & Data Science Conference 8/7/2018 2:13 AM Crowd Sourced Tagging Each tagger can choose between 1 of the 8 emotions or unknown or not a face. We started with at least 2 taggers agree and up to 5 taggers. Quality was very bad specially with subtle emotions. We retagged all our data with 10 taggers. Quality improved drastically (detailed next). Even after using gold standard, amount of time taken by each taggers…etc. © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
How many taggers to we need? 3 45.4 4 59 5 66.4 6 74.2 7 80.6 8 85.8 9 92.6
https://github.com/Microsoft/FERPlus FER++ https://github.com/Microsoft/FERPlus
Unreliable?
The Recipe Data collection Tagging Data preprocessing Architecture selection Aggregation Cost function Training
Input data Face detection
Input data
The Recipe Data collection Tagging Data preprocessing Input Data Data Preprocessing DNN Architecture Cost Function Training Data collection Tagging Data preprocessing Architecture selection Aggregation Cost function Training
Data pre-processing Present the data in a more or less homogeneous way to the system
Data pre-processing Present the data in a more or less homogeneous way to the system Reduce variability of the input data exploiting any known characteristics
Data pre-processing Present the data in a more or less homogeneous way to the system Reduce variability of the input data exploiting any known characteristics In our case: Grayscale conversion Image cropping and scaling to the input size No frontalization DeepFace: Ranzato et al Taigman et al.,2014
Data pre-processing: Augmentation Rotation
Data pre-processing: Augmentation Translation
Data pre-processing: Augmentation Scaling
Data pre-processing: Augmentation Flip Other augmentations: - Affine, projective transformations, lens distortion - Noise - Be creative
The Recipe Data collection Tagging Data preprocessing Architecture selection Aggregation Cost function Training
DNN Architecture It is very difficult to predict the performance of a given DNN architecture for a particular problem Explored several deep architectures: VGG16, VGG19, Resnet-50, Resnet-101 Commodity architectures
The Recipe Data collection Tagging Data preprocessing Architecture selection Aggregation Cost function Training
Cost Function Link between distilled info from tags into cost function. Soft max and entropy
Emotion Probability Distribution Machine Learning, Analytics, & Data Science Conference 8/7/2018 2:13 AM Emotion Probability Distribution Happiness Surprise Fear 5 4 1 Majority Voting (MV) Each face is associated with one emotion, the one that has the majority vote. Multi-Label Learning (ML) All emotions above certain threshold are treated as valid emotion. Probabilistic Drawing (PLD) During training draw the target emotion according to its probability. Cross-entropy loss (CEL) Learn the actual probability distribution. © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Emotion Probability Distribution Training result (on FER+) Machine Learning, Analytics, & Data Science Conference 8/7/2018 2:13 AM Emotion Probability Distribution Training result (on FER+) Schemes Accuracy MV 83.85±0.63% ML 83.97±0.36% PLD 84.99±0.37% CEL 84.72±0.24% © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
The Recipe Data collection Tagging Data preprocessing Architecture selection Aggregation Cost function Training
The Recipe Data collection Tagging Aggregation Data preprocessing Architecture selection Cost function Training ... Future work??
Video
Emotion in Video Difficulties: Potential approaches: Temporal component of expressions Necessity to track the face along time Data tagging Potential approaches: Fame-by-frame analysis + temporal aggregation Fully train a RNN or LSTM (data hungry!)
Multimodal Future
Multimodal Emotion Combine audio+video in sequences to improve the recognition ratio of emotions Combine audio+text to improve the recognition ratio
Microsoft Cognitive Services Vision Computer Vision | Emotion | Face | Video Microsoft Cognitive Services We’re hiring! Speech Custom Recognition | Speech Language Bing Spell Check | Language Understanding | Linguistic Analysis | Text Analytics | Web Language Model Knowledge Academic Knowledge | Entity Linking | Knowledge Exploration | Recommendations Search Bing Autosuggest | Bing Image Search | Bing News Search | Bing Video Search | Bing Web Search