Download presentation
Presentation is loading. Please wait.
Published byМома Недић Modified over 6 years ago
1
Diversity meets Deep Networks: Inference, Ensembles, and Applications
Alexander Kirillov Bogdan Savchynskyy Carsten Rother Stefan Lee Indiana University Virginia Tech Dhruv Batra Virginia Tech Technische Universität Dresden
2
Schedule Time Topic Presenter 2:15 – 3:00
Opening Remarks + Need for Multiple Diverse Solutions Dhruv 3:00 – 3:15 Coffee Break 3:15 – 4:45 Generating Diverse Solutions from a Single Model Alex & Bogdan 4:45 – 5:00 5:00 – 5:45 Training Diverse Deep Ensembles Stefan (C) Dhruv Batra
3
Schedule 1. Please interrupt & ask questions!
Time Topic Presenter 2:15 – 3:00 Opening Remarks + Need for Multiple Diverse Solutions Dhruv 3:00 – 3:15 Coffee Break 3:15 – 4:45 Generating Diverse Solutions from a Single Model Alex & Bogdan 4:45 – 5:00 5:00 – 5:45 Training Diverse Deep Ensembles Stefan 1. Please interrupt & ask questions! 2. All slides available online. (C) Dhruv Batra
4
Image Classification 1000 object classes 1.4M/50k/100k images
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 1000 object classes M/50k/100k images Person Dalmatian (C) Dhruv Batra
5
Image Credit: [He et al. CVPR16]
Image Classification (C) Dhruv Batra Image Credit: [He et al. CVPR16]
6
Image Credit: [He et al. CVPR16]
Revolution of Depth a (C) Dhruv Batra Image Credit: [He et al. CVPR16]
7
Image Credit: [He et al. CVPR16]
Revolution of Depth a (C) Dhruv Batra Image Credit: [He et al. CVPR16]
8
Image Credit: [Vinyals et al. CVPR15]
Image Captioning (C) Dhruv Batra Image Credit: [Vinyals et al. CVPR15]
9
Visual Question Answering (VQA)
(C) Dhruv Batra
10
Visual Question Answering (VQA)
Slide Credit: Stan Antol
11
AI far from perfect (C) Dhruv Batra
12
AI far from perfect (C) Dhruv Batra
13
A Brief History of AI (C) Dhruv Batra
Image Credit: Joseph Mehling;
14
A Brief History of AI “We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire.” … [Our] conjecture is that every aspect of learning or intelligence can be so precisely described that a machine can be made to simulate it. An attempt will be made … to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.” (C) Dhruv Batra
15
Why is AI hard? “The three biggest challenges to a computer being able to build knowledge from data are Ambiguity, Ambiguity, Ambiguity.” -- Ray Mooney, CS, UT-Austin, (quote in reference to IBM Watson). (C) Dhruv Batra
16
Image Credit: Liang Huang
Linguistic Ambiguity “I saw her duck” (C) Dhruv Batra Image Credit: Liang Huang
17
Image Credit: Liang Huang
Linguistic Ambiguity “I saw her duck” (C) Dhruv Batra Image Credit: Liang Huang
18
Image Credit: Liang Huang
Linguistic Ambiguity “I saw her duck” (C) Dhruv Batra Image Credit: Liang Huang
19
What is this tutorial about?
Multiple Diverse Predictions in ML and AI (C) Dhruv Batra
20
Classical Machine Learning
Image Classification “Person” (C) Dhruv Batra
21
Classical Machine Learning
Semantic Segmentation (C) Dhruv Batra
22
Classical Machine Learning
Pose Estimation (C) Dhruv Batra
23
Classical Machine Learning
Image Captioning “Two people are petting horses.” (C) Dhruv Batra
24
Classical Machine Learning
VQA “2” How many people are there? (C) Dhruv Batra
25
Classical Machine Learning
Dialogue System “Count us in!” (C) Dhruv Batra Image Credit: Google Research Blog
26
Classical Machine Learning
Input Output (C) Dhruv Batra
27
Classical Machine Learning
Input Output (C) Dhruv Batra
28
Classical Machine Learning
Input Output This Tutorial Machine Learning Multiple Outputs Input (C) Dhruv Batra
29
Example: Segmentation
, , , [Batra et al. ECCV12], [Guzman-Rivera et al. NIPS12], [Yadollahpour et al. CVPR13], [Gimpel et al. EMNLP13], [Guzman-Rivera AISTATS13], [Premachandran et al. CVPR14], [Prasad et a. NIPS14], [Guzman-Rivera et al. AISTATS14], [Sun et al. CVPR15], [Ahmed et al. ICCV15], [Sun et al. NIPS15] (C) Dhruv Batra
30
Exponentially-Large Item Set
Semantic Segmentation (C) Dhruv Batra
31
Exponentially-Large Item Set
Pose Estimation (C) Dhruv Batra
32
Exponentially-Large Item Set
Image Captioning Two people are standing next to two horses. A man pets a horse while a woman looks on. There is a man sitting on a horse. Two people and two horses standing in a field. (C) Dhruv Batra
33
Neural Image Captioning
Image Embedding (VGGNet) 4096-dim Convolution Layer + Non-Linearity Pooling Layer Fully-Connected MLP (C) Dhruv Batra
34
Neural Image Captioning
Convolution Layer + Non-Linearity Pooling Layer Fully-Connected MLP 4096-dim Image Embedding (VGGNet) (C) Dhruv Batra
35
Neural Image Captioning
P(next) P(next) P(next) P(next) P(next) RNN RNN RNN RNN RNN RNN RNN <start> Two people and two horses. (C) Dhruv Batra
36
Beam Search Demo Classical Beam Search Diverse Beam Search
(C) Dhruv Batra
37
What is this tutorial about?
[What?]: Multiple Diverse Predictions in ML and AI [Why?]: Need for Diversity Overcoming poor models Hedging against ambiguity Don’t be boring [How?]: Techniques Diverse Solutions from a Single (Deep) Model Part 1: Alex/Bogdan Training Diverse Deep Ensembles Part 2: Stefan [Now what?]: What do I do multiple predictions? [What?]: Multiple Diverse Predictions in ML and AI (C) Dhruv Batra
38
What is this tutorial about?
[What?]: Multiple Diverse Predictions in ML and AI [Why?]: Need for Diversity Overcoming poor models Hedging against ambiguity Don’t be boring [How?]: Techniques Diverse Solutions from a Single (Deep) Model Part 1: Alex/Bogdan Training Diverse Deep Ensembles Part 2: Stefan [Now what?]: What do I do multiple predictions? (C) Dhruv Batra
39
Need#1: Poor Models Approximation Error Human Body ≠ Tree
-- Model-Class is Wrong! Human Body ≠ Tree Figure Courtesy: [Yang & Ramanan ICCV ‘11] Unfortunately, we often run into a number of problems with MAP. Most often, our model is simply wrong. So even if we predict the most probable state from our model, it could be very far from ground-truth. For example, a tree model assumes that we walk around like this, with our limbs always un-occluded. (C) Dhruv Batra
40
Need#1: Poor Models Approximation Error Embedding (VGGNet) Image
Neural Network Softmax over top K answers Embedding (VGGNet) Image Approximation Error -- Model-Class is Wrong! Convolution Layer + Non-Linearity Pooling Layer Fully-Connected MLP 4096-dim Embedding (LSTM) Question “How many horses are in this image?” (C) Dhruv Batra
41
Need#1: Poor Models (C) Dhruv Batra
So how can we make multiple predictions. Well, it’s a probabilistic model. We could sample from the distribution. Unfortunately, sampling is rather wasteful since we observe the same modes of the distribution over and over again. And if there is a low-probability mode, we will have to wait a long time to observe a sample from it. (C) Dhruv Batra
42
What is this tutorial about?
[What?]: Multiple Diverse Predictions in ML and AI [Why?]: Need for Diversity Overcoming poor models Hedging against ambiguity Don’t be boring [How?]: Techniques Diverse Solutions from a Single (Deep) Model Part 1: Alex/Bogdan Training Diverse Deep Ensembles Part 2: Stefan [Now what?]: What do I do multiple predictions? (C) Dhruv Batra
43
Need#2: Ambiguity Bayes Error “I saw her duck”
-- Not enough information ? “I saw her duck” Even if you can compute MAP, there may simply be multiple acceptable answers. For example, this woman could be rotating left or rotating right. This could be a young woman looking away or an old lady looking left. When we have a user-in-the-loop, different users may expect different outputs from the same input. One instance / Two instances? (C) Dhruv Batra
44
Need#2: Ambiguity Dialogue System “We’ll be there!”
“Sorry, we won’t be able to make it” “Count us in!” “Thanks so much, but we’re out of town.” “Can I bring my dog?” (C) Dhruv Batra Image Credit: Google Research Blog
45
Need#2: Ambiguity Image Captioning
“Single engine train rolling down the tracks” “A steam locomotive is blowing steam” Image Captioning “A locomotive drives along the tracks among trees and bushes” “An engine is coming down the tracks” “An old fashioned train with steam coming out of its pipe” (C) Dhruv Batra
46
Need#2: Ambiguity (C) Dhruv Batra
“An old fashioned train with steam coming out of its pipe” “A steam locomotive is blowing steam” “Single engine train rolling down the tracks” “An engine is coming down the tracks” (C) Dhruv Batra
47
What is this tutorial about?
[What?]: Multiple Diverse Predictions in ML and AI [Why?]: Need for Diversity Overcoming poor models Hedging against ambiguity Don’t be boring [How?]: Techniques Diverse Solutions from a Single (Deep) Model Part 1: Alex/Bogdan Training Diverse Deep Ensembles Part 2: Stefan [Now what?]: What do I do multiple predictions? (C) Dhruv Batra
48
Need#3: Don’t be boring (C) Dhruv Batra
Even if you can compute MAP, there may simply be multiple acceptable answers. For example, this woman could be rotating left or rotating right. This could be a young woman looking away or an old lady looking left. When we have a user-in-the-loop, different users may expect different outputs from the same input. “An old fashioned train with steam coming out of its pipe” “A steam locomotive is blowing steam” “Single engine train rolling down the tracks” “An engine is coming down the tracks” (C) Dhruv Batra
49
Need#3: Don’t be boring [One] bizarre feature of our early prototype was its propensity to respond with “I love you” to seemingly anything. As adorable as this sounds, it wasn’t really what we were hoping for. [It] turns out that responses like “Thanks", "Sounds good", and “I love you” are super common -- so the system would lean on them as a safe bet if it was unsure. (C) Dhruv Batra
50
Need#3: Don’t be boring (C) Dhruv Batra
51
Need#3: Don’t be boring [It] just said everything was awesome all the time — 'all the people had a great time; everybody had an awesome time; it was a great day. Meg Mitchell (C) Dhruv Batra
52
Need#3: Don’t be boring “I don’t know” “no” “I love you” “yes”
Even if you can compute MAP, there may simply be multiple acceptable answers. For example, this woman could be rotating left or rotating right. This could be a young woman looking away or an old lady looking left. When we have a user-in-the-loop, different users may expect different outputs from the same input. “I don’t know” “no” “I love you” “yes” (C) Dhruv Batra
53
Classical Machine Learning
“I love you” Input Boring Output Machine Learning Input Multiple Outputs This Tutorial (C) Dhruv Batra
54
What is this tutorial about?
[What?]: Multiple Diverse Predictions in ML and AI [Why?]: Need for Diversity Overcoming poor models Hedging against ambiguity Don’t be boring [How?]: Techniques Diverse Solutions from a Single (Deep) Model Part 1: Alex/Bogdan Training Diverse Deep Ensembles Part 2: Stefan [Now what?]: What do I do multiple predictions? (C) Dhruv Batra
55
Diverse Predictions Now what? (C) Dhruv Batra
[Batra et al. ECCV12], [Guzman-Rivera et al. NIPS12], [Yadollahpour et al. CVPR13], [Gimpel et al. EMNLP13], [Guzman-Rivera AISTATS13], [Premachandran et al. CVPR14], [Prasad et a. NIPS14], [Guzman-Rivera et al. AISTATS14], [Sun et al. CVPR15], [Ahmed et al. ICCV15], [Sun et al. NIPS15] (C) Dhruv Batra
56
Increasing Side Information
Your Options Nothing: User-in-the-loop [ECCV12] Additional Information: None Tracking [ECCV12] Additional Information: Time (Approximate) Min Bayes Risk [CVPR14] Additional Information: Loss function Re-ranking [CVPR13] Additional Information: higher-order constraints Holistic Scene Understanding Increasing Side Information (C) Dhruv Batra
57
User-in-the-loop (C) Dhruv Batra
58
Increasing Side Information
Your Options Nothing: User-in-the-loop [ECCV12] Additional Information: None Tracking [ECCV12] Additional Information: Time (Approximate) Min Bayes Risk [CVPR14] Additional Information: Loss function Re-ranking [CVPR13] Additional Information: higher-order constraints Holistic Scene Understanding Increasing Side Information (C) Dhruv Batra
59
Image Credit: [Yang & Ramanan, ICCV ‘11]
Pose Estimation Setup Model: Mixture of Parts Tree [Park & Ramanan, ICCV ‘11] Inference: Dynamic Programming Dataset: PARSE Next, we applied our approach to pose-tracking in videos. We replicated the setup of Park & Ramanan who use a mixture of parts tree model. Exact inference can be performed by dynamic programming. (C) Dhruv Batra Image Credit: [Yang & Ramanan, ICCV ‘11]
60
Pose Estimation: 10 guesses/frame
(C) Dhruv Batra [Premachandran, Tarlow, Batra, CVPR14]
61
Image Credit: [Yang & Ramanan, ICCV ‘11]
Pose Tracking Chain CRF with M states at each frame We compute M solutions in each frame of the video, and then choose a smooth trajectory using the Viterbi algorithm. DivMBest Solutions (C) Dhruv Batra Image Credit: [Yang & Ramanan, ICCV ‘11]
62
[Batra, Yadollahpour, Guzman-Rivera, Shakhnarovich, ECCV12]
Pose Tracking Here, on the left, I am showing you the MAP pose on each frame. We can see that is quite noisy and jumps around, while the DivMBest solution is smooth. MAP / 1-Best DivMBest + Viterbi (C) Dhruv Batra [Batra, Yadollahpour, Guzman-Rivera, Shakhnarovich, ECCV12]
63
Your Options Nothing: User-in-the-loop [ECCV12] Tracking [ECCV12]
Additional Information: None Tracking [ECCV12] Additional Information: Time (Approximate) Min Bayes Risk [CVPR14] Additional Information: Loss function Re-ranking [CVPR13] Additional Information: higher-order constraints Holistic Scene Understanding (C) Dhruv Batra
64
Pose Estimation: 10 guesses/frame
(C) Dhruv Batra [Premachandran, Tarlow, Batra, CVPR14]
65
Fit model (CRF, etc) to mimic
Statistics 101 Loss Hamming, Jaccard Index, … “True” Distribution Expected Loss: Min Bayes Risk Fit model (CRF, etc) to mimic (C) Dhruv Batra
66
Pose Estimation #Solutions / Frame DivMBest (Oracle) 22%-gain possible
Better DivMBest (Oracle) 22%-gain possible Same Features Same Model ~7% Gain Same Model No new information! MBR [PTB, CVPR14] State of art 2012 [Yang & Ramanan PAMI12] #Solutions / Frame (C) Dhruv Batra [Premachandran, Tarlow, Batra, CVPR14]
67
Your Options Nothing: User-in-the-loop [ECCV12] Tracking [ECCV12]
Additional Information: None Tracking [ECCV12] Additional Information: Time (Approximate) Min Bayes Risk [CVPR14] Additional Information: Loss function Re-ranking [CVPR13] Additional Information: higher-order constraints Holistic Scene Understanding (C) Dhruv Batra
68
Re-ranking Diverse Segmentations
(C) Dhruv Batra [Yadollahpour, Batra, Shakhnarovich, CVPR13]
69
Your Options Nothing: User-in-the-loop [ECCV12] Tracking [ECCV12]
Additional Information: None Tracking [ECCV12] Additional Information: Time (Approximate) Min Bayes Risk [CVPR14] Additional Information: Loss function Re-ranking [CVPR13] Additional Information: higher-order constraints Holistic Scene Understanding (C) Dhruv Batra
70
PASCAL Sentence Dataset
“A dog is standing next to a woman on a couch” Ambiguity: (dog next to woman) on couch vs dog next to (woman on couch) Vision: Semantic Segmentation NLP: Sentence Parsing Labels: Chairs, desks, etc Output: Parse Tree Couch Person Dog Hypothesis #1 Hypothesis #M Consistent Person Couch PASCAL Sentence Dataset (C) Dhruv Batra
71
Schedule Time Topic Presenter 2:15 – 3:00
Opening Remarks + Need for Multiple Diverse Solutions Dhruv 3:00 – 3:15 Coffee Break 3:15 – 4:45 Generating Diverse Solutions from a Single Model Alex & Bogdan 4:45 – 5:00 5:00 – 5:45 Training Diverse Deep Ensembles Stefan (C) Dhruv Batra
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.