Download presentation
Presentation is loading. Please wait.
Published byKeanu Rosten Modified over 9 years ago
1
A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University, Ithaca, NY, USA *indicates equal contribution
2
Outline Motivation Model Algorithm Results and Discussions Conclusions 2Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen
3
Motivation
4
Scene Understanding Scene Categorization Event Categorization Depth Estimation Saliency Detection Geometric Layout Object Detection … Vision tasks are highly related. But, how do we connect them? S S O O E E L L D D ? 4Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen
5
Motivation Li et al, CVPR’09 Hoiem et al, CVPR’08 Sudderth et al, CVPR’06 5Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen Saxena et al, IJCV’07
6
Motivation S S O O E E L L D D ? 6Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen A generic model which can treat each classifier as a “black-box” and compose them to incorporate the additional information automatically
7
Farhadi et al, CVPR’09 Motivation 7Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen Visual attributes Wang et al, ICCV’09 Ferrari et al, NIPS’07 Lampert et al, CVPR’09
8
Motivation 8Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen Attributes for scene understanding? A model which can compose the “black-box” classifiers and automatically exploit attributes for scene understanding Bocce “opencountry-like scene” attribute “salient region” attribute “depth in the middle region” attribute
9
A model where the first layer is not trained to achieve the best independent performance, but achieve the best performance at the final output. Motivation 9Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen First level of classifiers Second level of classifier Scene Depth Event Saliency Event Features Feed-forward φ S (X) φ D (X) φ E (X) φ Sal (X) Cascaded classifier model (CCM) Heitz, Gould, Saxena and Koller, NIPS’08 Final output ????
10
Model
11
Proposed generic model enables composing “black-box” classifiers Feedback results in the first layer learning “attributes” rather than labels Model 11Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen First level of classifiers Second level of classifier Scene Depth Event Saliency Event Features Feed-forward φ S (X) φ D (X) φ E (X) φ Sal (X) Feed-back Final output Attribute Learner
12
Algorithm
13
13Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen First level of classifiers Second level of classifier Scene; θ S Depth; θ D Event ; θ E Saliency; θ Sal Event; ω E Features Feed-forward φ S (X (k) ) φ D (X (k) ) φ E (X (k) ) φ Sal (X (k) ) Feed-back Y E (k) (Output) TSTS TDTD TETE T Sal Optimization Goal
14
Y E (k) (Output) TSTS TDTD TETE T Sal Algorithm 14Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen First level of classifiers Second level of classifier Scene; θ S Depth; θ D Event ; θ E Saliency; θ Sal Event; ω E Features Feed-forward φ S (X (k) ) φ D (X (k) ) φ E (X (k) ) φ Sal (X (k) ) Feed-back Y E (k) (Output) TSTS TDTD TETE T Sal Our Solution: Motivated from Expectation – Maximization (EM) algorithm Parameter Learning: fix the required outputs and estimate parameters Latent Variable Estimation: fix the model parameters and estimate latent variables (first level outputs) θSθS θSθS θDθD θDθD θEθE θEθE θ Sal ωEωE ωEωE
15
Results and Discussion
16
Experiments Depth Estimation - Make3D Saxena et al, NIPS’05 Saliency Detection Achanta et al, CVPR’09 Event Categorization Li et al, ICCV’07 16 S S D D E E Sal S S S S D D E E D D S S D D E E E E S S D D E E Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen Scene Categorization Oliva et al, IJCV’01
17
Results Improvement on every task with the same algorithm! 17Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen
18
Our proposed Original image Ground truthBase – model CCM [Heitz et. al] Results: Visual improvements 18Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen Depth Estimation Saliency Detection Our proposed Original image Ground truthBase – model CCM [Heitz et. al]
19
Discussion – Attributes of the scene Maps of weights given to depth maps for scene categorization task 19Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen S S D D E E Sal S S
20
Weights given to event and scene attributes for event categorization Discussion – Attributes of the scene 20Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen S S D D E E Sal E E
21
Conclusions
22
Generic model to compose multiple vision tasks to aid holistic scene understanding “Black-box” Feedback results in learning meaningful “attributes” instead of just the “labels” Handles heterogeneous datasets Improved performance for each of the tasks over state-of-art using the same learning algorithm Joint optimization of all the tasks? Congcong Li, Adarsh Kowdle, Ashutosh Saxena, and Tsuhan Chen, Feedback Enabled Cascaded Classification Models for Scene Understanding, NIPS 2010 22Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen
23
Thank you Questions?
25
Event Categorization Depth Estimation Scene Categorization Saliency Detection Image Feature Vector 51 – dim104 – dim512 – dim3 – dim 1 st layer Output 8 – dim class likelihood Pixel level depth map 8 – dim class likelihood Pixel level saliency score Layer-1 Classifier Multi-class Logistic Linear Regression RBF – kernel SVM Linear Regression Layer-2 Classifier Multi-class Logistic Linear Regression Multi-class Logistic Linear Regression Implementation 25Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen
26
Discussion Sparse model learnt by our model Weights for event categorization task 26Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen
27
Maps of weights given to depth maps for event categorization task Discussion – Attributes of the scene Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen
28
Results Improvement on every task with the same model! 28Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.