Presentation is loading. Please wait.

Presentation is loading. Please wait.

Diversity meets Deep Networks: Inference, Ensembles, and Applications

Similar presentations


Presentation on theme: "Diversity meets Deep Networks: Inference, Ensembles, and Applications"— Presentation transcript:

1 Diversity meets Deep Networks: Inference, Ensembles, and Applications
Alexander Kirillov Bogdan Savchynskyy Carsten Rother Stefan Lee Indiana University  Virginia Tech Dhruv Batra Virginia Tech Technische Universität Dresden

2 Schedule Time Topic Presenter 2:15 – 3:00
Opening Remarks + Need for Multiple Diverse Solutions Dhruv 3:00 – 3:15 Coffee Break 3:15 – 4:45 Generating Diverse Solutions from a Single Model Alex & Bogdan 4:45 – 5:00 5:00 – 5:45 Training Diverse Deep Ensembles Stefan (C) Dhruv Batra

3 Schedule 1. Please interrupt & ask questions!
Time Topic Presenter 2:15 – 3:00 Opening Remarks + Need for Multiple Diverse Solutions Dhruv 3:00 – 3:15 Coffee Break 3:15 – 4:45 Generating Diverse Solutions from a Single Model Alex & Bogdan 4:45 – 5:00 5:00 – 5:45 Training Diverse Deep Ensembles Stefan 1. Please interrupt & ask questions! 2. All slides available online. (C) Dhruv Batra

4 Image Classification 1000 object classes 1.4M/50k/100k images
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 1000 object classes M/50k/100k images Person Dalmatian (C) Dhruv Batra

5 Image Credit: [He et al. CVPR16]
Image Classification (C) Dhruv Batra Image Credit: [He et al. CVPR16]

6 Image Credit: [He et al. CVPR16]
Revolution of Depth a (C) Dhruv Batra Image Credit: [He et al. CVPR16]

7 Image Credit: [He et al. CVPR16]
Revolution of Depth a (C) Dhruv Batra Image Credit: [He et al. CVPR16]

8 Image Credit: [Vinyals et al. CVPR15]
Image Captioning (C) Dhruv Batra Image Credit: [Vinyals et al. CVPR15]

9 Visual Question Answering (VQA)
(C) Dhruv Batra

10 Visual Question Answering (VQA)
Slide Credit: Stan Antol

11 AI far from perfect (C) Dhruv Batra

12 AI far from perfect (C) Dhruv Batra

13 A Brief History of AI (C) Dhruv Batra
Image Credit: Joseph Mehling;

14 A Brief History of AI “We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire.” … [Our] conjecture is that every aspect of learning or intelligence can be so precisely described that a machine can be made to simulate it. An attempt will be made … to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.” (C) Dhruv Batra

15 Why is AI hard? “The three biggest challenges to a computer being able to build knowledge from data are Ambiguity, Ambiguity, Ambiguity.” -- Ray Mooney, CS, UT-Austin, (quote in reference to IBM Watson). (C) Dhruv Batra

16 Image Credit: Liang Huang
Linguistic Ambiguity “I saw her duck” (C) Dhruv Batra Image Credit: Liang Huang

17 Image Credit: Liang Huang
Linguistic Ambiguity “I saw her duck” (C) Dhruv Batra Image Credit: Liang Huang

18 Image Credit: Liang Huang
Linguistic Ambiguity “I saw her duck” (C) Dhruv Batra Image Credit: Liang Huang

19 What is this tutorial about?
Multiple Diverse Predictions in ML and AI (C) Dhruv Batra

20 Classical Machine Learning
Image Classification “Person” (C) Dhruv Batra

21 Classical Machine Learning
Semantic Segmentation (C) Dhruv Batra

22 Classical Machine Learning
Pose Estimation (C) Dhruv Batra

23 Classical Machine Learning
Image Captioning “Two people are petting horses.” (C) Dhruv Batra

24 Classical Machine Learning
VQA “2” How many people are there? (C) Dhruv Batra

25 Classical Machine Learning
Dialogue System “Count us in!” (C) Dhruv Batra Image Credit: Google Research Blog

26 Classical Machine Learning
Input Output (C) Dhruv Batra

27 Classical Machine Learning
Input Output (C) Dhruv Batra

28 Classical Machine Learning
Input Output This Tutorial Machine Learning Multiple Outputs Input (C) Dhruv Batra

29 Example: Segmentation
, , , [Batra et al. ECCV12], [Guzman-Rivera et al. NIPS12], [Yadollahpour et al. CVPR13], [Gimpel et al. EMNLP13], [Guzman-Rivera AISTATS13], [Premachandran et al. CVPR14], [Prasad et a. NIPS14], [Guzman-Rivera et al. AISTATS14], [Sun et al. CVPR15], [Ahmed et al. ICCV15], [Sun et al. NIPS15] (C) Dhruv Batra

30 Exponentially-Large Item Set
Semantic Segmentation (C) Dhruv Batra

31 Exponentially-Large Item Set
Pose Estimation (C) Dhruv Batra

32 Exponentially-Large Item Set
Image Captioning Two people are standing next to two horses. A man pets a horse while a woman looks on. There is a man sitting on a horse. Two people and two horses standing in a field. (C) Dhruv Batra

33 Neural Image Captioning
Image Embedding (VGGNet) 4096-dim Convolution Layer + Non-Linearity Pooling Layer Fully-Connected MLP (C) Dhruv Batra

34 Neural Image Captioning
Convolution Layer + Non-Linearity Pooling Layer Fully-Connected MLP 4096-dim Image Embedding (VGGNet) (C) Dhruv Batra

35 Neural Image Captioning
P(next) P(next) P(next) P(next) P(next) RNN RNN RNN RNN RNN RNN RNN <start> Two people and two horses. (C) Dhruv Batra

36 Beam Search Demo Classical Beam Search Diverse Beam Search
(C) Dhruv Batra

37 What is this tutorial about?
[What?]: Multiple Diverse Predictions in ML and AI [Why?]: Need for Diversity Overcoming poor models Hedging against ambiguity Don’t be boring [How?]: Techniques Diverse Solutions from a Single (Deep) Model Part 1: Alex/Bogdan Training Diverse Deep Ensembles Part 2: Stefan [Now what?]: What do I do multiple predictions? [What?]: Multiple Diverse Predictions in ML and AI (C) Dhruv Batra

38 What is this tutorial about?
[What?]: Multiple Diverse Predictions in ML and AI [Why?]: Need for Diversity Overcoming poor models Hedging against ambiguity Don’t be boring [How?]: Techniques Diverse Solutions from a Single (Deep) Model Part 1: Alex/Bogdan Training Diverse Deep Ensembles Part 2: Stefan [Now what?]: What do I do multiple predictions? (C) Dhruv Batra

39 Need#1: Poor Models Approximation Error Human Body ≠ Tree
-- Model-Class is Wrong! Human Body ≠ Tree Figure Courtesy: [Yang & Ramanan ICCV ‘11] Unfortunately, we often run into a number of problems with MAP. Most often, our model is simply wrong. So even if we predict the most probable state from our model, it could be very far from ground-truth. For example, a tree model assumes that we walk around like this, with our limbs always un-occluded. (C) Dhruv Batra

40 Need#1: Poor Models Approximation Error Embedding (VGGNet) Image
Neural Network Softmax over top K answers Embedding (VGGNet) Image Approximation Error -- Model-Class is Wrong! Convolution Layer + Non-Linearity Pooling Layer Fully-Connected MLP 4096-dim Embedding (LSTM) Question “How many horses are in this image?” (C) Dhruv Batra

41 Need#1: Poor Models (C) Dhruv Batra
So how can we make multiple predictions. Well, it’s a probabilistic model. We could sample from the distribution. Unfortunately, sampling is rather wasteful since we observe the same modes of the distribution over and over again. And if there is a low-probability mode, we will have to wait a long time to observe a sample from it. (C) Dhruv Batra

42 What is this tutorial about?
[What?]: Multiple Diverse Predictions in ML and AI [Why?]: Need for Diversity Overcoming poor models Hedging against ambiguity Don’t be boring [How?]: Techniques Diverse Solutions from a Single (Deep) Model Part 1: Alex/Bogdan Training Diverse Deep Ensembles Part 2: Stefan [Now what?]: What do I do multiple predictions? (C) Dhruv Batra

43 Need#2: Ambiguity Bayes Error “I saw her duck”
-- Not enough information ? “I saw her duck” Even if you can compute MAP, there may simply be multiple acceptable answers. For example, this woman could be rotating left or rotating right. This could be a young woman looking away or an old lady looking left. When we have a user-in-the-loop, different users may expect different outputs from the same input. One instance / Two instances? (C) Dhruv Batra

44 Need#2: Ambiguity Dialogue System “We’ll be there!”
“Sorry, we won’t be able to make it” “Count us in!” “Thanks so much, but we’re out of town.” “Can I bring my dog?” (C) Dhruv Batra Image Credit: Google Research Blog

45 Need#2: Ambiguity Image Captioning
“Single engine train rolling down the tracks” “A steam locomotive is blowing steam” Image Captioning “A locomotive drives along the tracks among trees and bushes” “An engine is coming down the tracks” “An old fashioned train with steam coming out of its pipe” (C) Dhruv Batra

46 Need#2: Ambiguity (C) Dhruv Batra
“An old fashioned train with steam coming out of its pipe” “A steam locomotive is blowing steam” “Single engine train rolling down the tracks” “An engine is coming down the tracks” (C) Dhruv Batra

47 What is this tutorial about?
[What?]: Multiple Diverse Predictions in ML and AI [Why?]: Need for Diversity Overcoming poor models Hedging against ambiguity Don’t be boring [How?]: Techniques Diverse Solutions from a Single (Deep) Model Part 1: Alex/Bogdan Training Diverse Deep Ensembles Part 2: Stefan [Now what?]: What do I do multiple predictions? (C) Dhruv Batra

48 Need#3: Don’t be boring (C) Dhruv Batra
Even if you can compute MAP, there may simply be multiple acceptable answers. For example, this woman could be rotating left or rotating right. This could be a young woman looking away or an old lady looking left. When we have a user-in-the-loop, different users may expect different outputs from the same input. “An old fashioned train with steam coming out of its pipe” “A steam locomotive is blowing steam” “Single engine train rolling down the tracks” “An engine is coming down the tracks” (C) Dhruv Batra

49 Need#3: Don’t be boring [One] bizarre feature of our early prototype was its propensity to respond with “I love you” to seemingly anything. As adorable as this sounds, it wasn’t really what we were hoping for. [It] turns out that responses like “Thanks", "Sounds good", and “I love you” are super common -- so the system would lean on them as a safe bet if it was unsure. (C) Dhruv Batra

50 Need#3: Don’t be boring (C) Dhruv Batra

51 Need#3: Don’t be boring [It] just said everything was awesome all the time — 'all the people had a great time; everybody had an awesome time; it was a great day. Meg Mitchell (C) Dhruv Batra

52 Need#3: Don’t be boring “I don’t know” “no” “I love you” “yes”
Even if you can compute MAP, there may simply be multiple acceptable answers. For example, this woman could be rotating left or rotating right. This could be a young woman looking away or an old lady looking left. When we have a user-in-the-loop, different users may expect different outputs from the same input. “I don’t know” “no” “I love you” “yes” (C) Dhruv Batra

53 Classical Machine Learning
“I love you” Input Boring Output Machine Learning Input Multiple Outputs This Tutorial (C) Dhruv Batra

54 What is this tutorial about?
[What?]: Multiple Diverse Predictions in ML and AI [Why?]: Need for Diversity Overcoming poor models Hedging against ambiguity Don’t be boring [How?]: Techniques Diverse Solutions from a Single (Deep) Model Part 1: Alex/Bogdan Training Diverse Deep Ensembles Part 2: Stefan [Now what?]: What do I do multiple predictions? (C) Dhruv Batra

55 Diverse Predictions Now what? (C) Dhruv Batra
[Batra et al. ECCV12], [Guzman-Rivera et al. NIPS12], [Yadollahpour et al. CVPR13], [Gimpel et al. EMNLP13], [Guzman-Rivera AISTATS13], [Premachandran et al. CVPR14], [Prasad et a. NIPS14], [Guzman-Rivera et al. AISTATS14], [Sun et al. CVPR15], [Ahmed et al. ICCV15], [Sun et al. NIPS15] (C) Dhruv Batra

56 Increasing Side Information
Your Options Nothing: User-in-the-loop [ECCV12] Additional Information: None Tracking [ECCV12] Additional Information: Time (Approximate) Min Bayes Risk [CVPR14] Additional Information: Loss function Re-ranking [CVPR13] Additional Information: higher-order constraints Holistic Scene Understanding Increasing Side Information (C) Dhruv Batra

57 User-in-the-loop (C) Dhruv Batra

58 Increasing Side Information
Your Options Nothing: User-in-the-loop [ECCV12] Additional Information: None Tracking [ECCV12] Additional Information: Time (Approximate) Min Bayes Risk [CVPR14] Additional Information: Loss function Re-ranking [CVPR13] Additional Information: higher-order constraints Holistic Scene Understanding Increasing Side Information (C) Dhruv Batra

59 Image Credit: [Yang & Ramanan, ICCV ‘11]
Pose Estimation Setup Model: Mixture of Parts Tree [Park & Ramanan, ICCV ‘11] Inference: Dynamic Programming Dataset: PARSE Next, we applied our approach to pose-tracking in videos. We replicated the setup of Park & Ramanan who use a mixture of parts tree model. Exact inference can be performed by dynamic programming. (C) Dhruv Batra Image Credit: [Yang & Ramanan, ICCV ‘11]

60 Pose Estimation: 10 guesses/frame
(C) Dhruv Batra [Premachandran, Tarlow, Batra, CVPR14]

61 Image Credit: [Yang & Ramanan, ICCV ‘11]
Pose Tracking Chain CRF with M states at each frame We compute M solutions in each frame of the video, and then choose a smooth trajectory using the Viterbi algorithm. DivMBest Solutions (C) Dhruv Batra Image Credit: [Yang & Ramanan, ICCV ‘11]

62 [Batra, Yadollahpour, Guzman-Rivera, Shakhnarovich, ECCV12]
Pose Tracking Here, on the left, I am showing you the MAP pose on each frame. We can see that is quite noisy and jumps around, while the DivMBest solution is smooth. MAP / 1-Best DivMBest + Viterbi (C) Dhruv Batra [Batra, Yadollahpour, Guzman-Rivera, Shakhnarovich, ECCV12]

63 Your Options Nothing: User-in-the-loop [ECCV12] Tracking [ECCV12]
Additional Information: None Tracking [ECCV12] Additional Information: Time (Approximate) Min Bayes Risk [CVPR14] Additional Information: Loss function Re-ranking [CVPR13] Additional Information: higher-order constraints Holistic Scene Understanding (C) Dhruv Batra

64 Pose Estimation: 10 guesses/frame
(C) Dhruv Batra [Premachandran, Tarlow, Batra, CVPR14]

65 Fit model (CRF, etc) to mimic
Statistics 101 Loss Hamming, Jaccard Index, … “True” Distribution Expected Loss: Min Bayes Risk Fit model (CRF, etc) to mimic (C) Dhruv Batra

66 Pose Estimation #Solutions / Frame DivMBest (Oracle) 22%-gain possible
Better DivMBest (Oracle) 22%-gain possible Same Features Same Model ~7% Gain Same Model No new information! MBR [PTB, CVPR14] State of art 2012 [Yang & Ramanan PAMI12] #Solutions / Frame (C) Dhruv Batra [Premachandran, Tarlow, Batra, CVPR14]

67 Your Options Nothing: User-in-the-loop [ECCV12] Tracking [ECCV12]
Additional Information: None Tracking [ECCV12] Additional Information: Time (Approximate) Min Bayes Risk [CVPR14] Additional Information: Loss function Re-ranking [CVPR13] Additional Information: higher-order constraints Holistic Scene Understanding (C) Dhruv Batra

68 Re-ranking Diverse Segmentations
(C) Dhruv Batra [Yadollahpour, Batra, Shakhnarovich, CVPR13]

69 Your Options Nothing: User-in-the-loop [ECCV12] Tracking [ECCV12]
Additional Information: None Tracking [ECCV12] Additional Information: Time (Approximate) Min Bayes Risk [CVPR14] Additional Information: Loss function Re-ranking [CVPR13] Additional Information: higher-order constraints Holistic Scene Understanding (C) Dhruv Batra

70 PASCAL Sentence Dataset
“A dog is standing next to a woman on a couch” Ambiguity: (dog next to woman) on couch vs dog next to (woman on couch) Vision: Semantic Segmentation NLP: Sentence Parsing Labels: Chairs, desks, etc Output: Parse Tree Couch Person Dog Hypothesis #1 Hypothesis #M Consistent Person Couch PASCAL Sentence Dataset (C) Dhruv Batra

71 Schedule Time Topic Presenter 2:15 – 3:00
Opening Remarks + Need for Multiple Diverse Solutions Dhruv 3:00 – 3:15 Coffee Break 3:15 – 4:45 Generating Diverse Solutions from a Single Model Alex & Bogdan 4:45 – 5:00 5:00 – 5:45 Training Diverse Deep Ensembles Stefan (C) Dhruv Batra


Download ppt "Diversity meets Deep Networks: Inference, Ensembles, and Applications"

Similar presentations


Ads by Google