Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building & Applying Emotion Recognition

Similar presentations


Presentation on theme: "Building & Applying Emotion Recognition"— Presentation transcript:

1 Building & Applying Emotion Recognition
Cristian Canton Microsoft @CristianCanton Anna S. Roth Microsoft @AnnaSRoth

2 Microsoft Cognitive Services
Vision Computer Vision | Emotion | Face | Video Microsoft Cognitive Services We’re hiring! Speech Custom Recognition | Speech Language Bing Spell Check | Language Understanding | Linguistic Analysis | Text Analytics | Web Language Model Knowledge Academic Knowledge | Entity Linking | Knowledge Exploration | Recommendations Search Bing Autosuggest | Bing Image Search | Bing News Search | Bing Video Search | Bing Web Search

3 Goals Emotion as a subjective problem
Building an image classifier end-to-end

4 The Recipe  Data collection  Tagging  Aggregation
 Data preprocessing  Architecture selection  Cost function  Training

5 “Emotion”

6 Microsoft Confidential - Internal Only
1) Sample 2) Why 3) Switch to demo Microsoft Confidential - Internal Only

7 Machine Learning, Analytics, & Data Science Conference
8/7/2018 2:13 AM Basic Emotions Neutral Happiness Surprise Sadness Angry Contempt Disgust Fear © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

8

9 FACS Emotion Action Units Happiness 6+12 Sadness 1+4+15 Surprise
1+2+5B+26 Fear Anger Disgust Contempt R12A+R14A

10 CIRCUMPLEX High Arousal Valence Negative Positive Low

11 Lots of models of emotion
𝑃 𝐴 𝐷 Lovheim Cube Image is CC-BY-SA-4.0 from Wikimedia user “Fred The Oyster” - Plutchik wheel image public domain from:

12 Machine Learning, Analytics, & Data Science Conference
8/7/2018 2:13 AM Basic Emotions Neutral Happiness Surprise Sadness Angry Contempt Disgust Fear © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

13 Dog image is CC-BY-SA-4.0 from Wikimedia user “Edmontcz”
Subjective Dog image is CC-BY-SA-4.0 from Wikimedia user “Edmontcz”

14 Subjective Your cat may be dead or alive, but it’s still a cat

15 Very Subjective Your cat may be dead or alive, but it’s still a cat

16 Other Subjective Problems
Attractiveness Personality traits Style

17 The Recipe  Data collection  Tagging  Data preprocessing
 Architecture selection Aggregation Cost function  Training

18 FER Data – used for early academic work
28k training, 7k val+test 71.73% with aug

19 In-house Data Collection
Machine Learning, Analytics, & Data Science Conference 8/7/2018 2:13 AM In-house Data Collection 4.5 million webcrawled images Emotional keywords Names Preprocessed images before tagging- face detector © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

20 The Recipe  Data collection  Tagging  Data preprocessing
Architecture selection Aggregation  Cost function  Training

21 Machine Learning, Analytics, & Data Science Conference
8/7/2018 2:13 AM Tagging FACS Appearance based More accurate and less subjective. Easy expand to more emotions. Con: Expensive and require a certified tagger. Cheap and doesn’t require a certified tagger. Con: Crowdsourcing is very noisy. © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

22 Machine Learning, Analytics, & Data Science Conference
8/7/2018 2:13 AM Crowd Sourced Tagging Each tagger can choose between 1 of the 8 emotions or unknown or not a face. We started with at least 2 taggers agree and up to 5 taggers. Quality was very bad specially with subtle emotions. We retagged all our data with 10 taggers. Quality improved drastically (detailed next). Even after using gold standard, amount of time taken by each taggers…etc. © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

23 How many taggers to we need?
3 45.4 4 59 5 66.4 6 74.2 7 80.6 8 85.8 9 92.6

24 https://github.com/Microsoft/FERPlus
FER++

25 Unreliable?

26 The Recipe  Data collection  Tagging  Data preprocessing
Architecture selection Aggregation  Cost function  Training

27 Input data Face detection

28 Input data

29 The Recipe  Data collection  Tagging  Data preprocessing
Input Data Data Preprocessing DNN Architecture Cost Function Training  Data collection  Tagging  Data preprocessing Architecture selection Aggregation  Cost function  Training

30 Data pre-processing Present the data in a more or less homogeneous way to the system

31 Data pre-processing Present the data in a more or less homogeneous way to the system Reduce variability of the input data exploiting any known characteristics

32 Data pre-processing Present the data in a more or less homogeneous way to the system Reduce variability of the input data exploiting any known characteristics In our case: Grayscale conversion Image cropping and scaling to the input size No frontalization DeepFace: Ranzato et al Taigman et al.,2014

33 Data pre-processing: Augmentation
Rotation

34 Data pre-processing: Augmentation
Translation

35 Data pre-processing: Augmentation
Scaling

36 Data pre-processing: Augmentation
Flip Other augmentations: - Affine, projective transformations, lens distortion - Noise - Be creative

37 The Recipe  Data collection  Tagging  Data preprocessing
 Architecture selection  Aggregation  Cost function  Training

38 DNN Architecture It is very difficult to predict the performance of a given DNN architecture for a particular problem Explored several deep architectures: VGG16, VGG19, Resnet-50, Resnet-101 Commodity architectures

39 The Recipe  Data collection  Tagging  Data preprocessing
 Architecture selection Aggregation Cost function  Training

40 Cost Function Link between distilled info from tags into cost function. Soft max and entropy

41 Emotion Probability Distribution
Machine Learning, Analytics, & Data Science Conference 8/7/2018 2:13 AM Emotion Probability Distribution Happiness Surprise Fear 5 4 1 Majority Voting (MV) Each face is associated with one emotion, the one that has the majority vote. Multi-Label Learning (ML) All emotions above certain threshold are treated as valid emotion. Probabilistic Drawing (PLD) During training draw the target emotion according to its probability. Cross-entropy loss (CEL) Learn the actual probability distribution. © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

42 Emotion Probability Distribution Training result (on FER+)
Machine Learning, Analytics, & Data Science Conference 8/7/2018 2:13 AM Emotion Probability Distribution Training result (on FER+) Schemes Accuracy MV 83.85±0.63% ML 83.97±0.36% PLD 84.99±0.37% CEL 84.72±0.24% © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

43 The Recipe  Data collection  Tagging  Data preprocessing
Architecture selection Aggregation  Cost function  Training

44 The Recipe  Data collection  Tagging  Aggregation
 Data preprocessing  Architecture selection  Cost function Training ... Future work??

45 Video

46 Emotion in Video Difficulties: Potential approaches:
Temporal component of expressions Necessity to track the face along time Data tagging Potential approaches: Fame-by-frame analysis + temporal aggregation Fully train a RNN or LSTM (data hungry!)

47 Multimodal Future

48 Multimodal Emotion Combine audio+video in sequences to improve the recognition ratio of emotions Combine audio+text to improve the recognition ratio

49 Microsoft Cognitive Services
Vision Computer Vision | Emotion | Face | Video Microsoft Cognitive Services We’re hiring! Speech Custom Recognition | Speech Language Bing Spell Check | Language Understanding | Linguistic Analysis | Text Analytics | Web Language Model Knowledge Academic Knowledge | Entity Linking | Knowledge Exploration | Recommendations Search Bing Autosuggest | Bing Image Search | Bing News Search | Bing Video Search | Bing Web Search


Download ppt "Building & Applying Emotion Recognition"

Similar presentations


Ads by Google