A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle , Congcong Li , Ashutosh Saxena, and Tsuhan Chen Cornell University,

Slides:

Advertisements

Similar presentations

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

Advertisements

Automatic Photo Pop-up Derek Hoiem Alexei A.Efros Martial Hebert Carnegie Mellon University.

Attributes for Classifier Feedback Amar Parkash and Devi Parikh.

Face Alignment by Explicit Shape Regression

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Interactively Co-segmentating Topically Related Images with Intelligent Scribble Guidance Dhruv Batra, Carnegie Mellon University Adarsh Kowdle, Cornell.

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR.

Large Scale Visual Recognition Challenge (ILSVRC) 2013: Detection spotlights.

Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.

Optimization & Learning for Registration of Moving Dynamic Textures Junzhou Huang 1, Xiaolei Huang 2, Dimitris Metaxas 1 Rutgers University 1, Lehigh University.

Contour Based Approaches for Visual Object Recognition Jamie Shotton University of Cambridge Joint work with Roberto Cipolla, Andrew Blake.

Lecture 14 – Neural Networks

Good morning, everyone, thank you for coming to my presentation.

Unsupervised Learning of Categorical Segments in Image Collections *California Institute of Technology **Technion Marco Andreetto*, Lihi Zelnik-Manor**,

Training Regimes Motivation  Allow state-of-the-art subcomponents  With “Black-box” functionality  This idea also occurs in other application areas.

1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/

Spatial Pyramid Pooling in Deep Convolutional

Kuan-Chuan Peng Tsuhan Chen

SVCL Automatic detection of object based Region-of-Interest for image compression Sunhyoung Han.

Object Detection Sliding Window Based Approach Context Helps

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

I 3D: Interactive Planar Reconstruction of Objects and Scenes Adarsh KowdleYao-Jen Chang Tsuhan Chen School of Electrical and Computer Engineering Cornell.

Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

BING: Binarized Normed Gradients for Objectness Estimation at 300fps

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.

An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.

Sharing Features Between Objects and Their Attributes Sung Ju Hwang 1, Fei Sha 2 and Kristen Grauman 1 1 University of Texas at Austin, 2 University of.

Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR

© Devi Parikh 2008 Devi Parikh and Tsuhan Chen Carnegie Mellon University April 3, ICASSP 2008 Bringing Diverse Classifiers to Common Grounds: dtransform.

A TUTORIAL ON SUPPORT VECTOR MACHINES FOR PATTERN RECOGNITION ASLI TAŞÇI Christopher J.C. Burges, Data Mining and Knowledge Discovery 2, , 1998.

Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.

Describing People: A Poselet-Based Approach to Attribute Classification.

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:

Max-Margin Training of Upstream Scene Understanding Models Jun Zhu Carnegie Mellon University Joint work with Li-Jia Li *, Li Fei-Fei *, and Eric P. Xing.

Extracting Adaptive Contextual Cues From Unlabeled Regions Congcong Li +, Devi Parikh *, Tsuhan Chen + + Cornell University * Toyota Technological Institute.

IEEE 2015 Conference on Computer Vision and Pattern Recognition Active Learning for Structured Probabilistic Models with Histogram Approximation Qing SunAnkit.

Gaussian Conditional Random Field Network for Semantic Segmentation

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.

Recent developments in object detection

Hybrid Deep Learning for Reflectance Confocal Microscopy Skin Images

Deep Feedforward Networks

Object Detection based on Segment Masks

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Data Driven Attributes for Action Detection

Nonparametric Semantic Segmentation

Saliency detection Donghun Yeo CV Lab..

J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009

Object detection as supervised classification

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

Image Classification.

Learning with information of features

Rob Fergus Computer Vision

Bilinear Classifiers for Visual Recognition

Cascaded Classification Models

Outline Background Motivation Proposed Model Experimental Results

Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu

RCNN, Fast-RCNN, Faster-RCNN

Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Presentation transcript:

A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University, Ithaca, NY, USA *indicates equal contribution

Outline  Motivation  Model  Algorithm  Results and Discussions  Conclusions 2Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Motivation

Scene Understanding Scene Categorization Event Categorization Depth Estimation Saliency Detection Geometric Layout Object Detection … Vision tasks are highly related. But, how do we connect them? S S O O E E L L D D ? 4Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Motivation Li et al, CVPR’09 Hoiem et al, CVPR’08 Sudderth et al, CVPR’06 5Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen Saxena et al, IJCV’07

Motivation S S O O E E L L D D ? 6Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen  A generic model which can treat each classifier as a “black-box” and compose them to incorporate the additional information automatically

Farhadi et al, CVPR’09 Motivation 7Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen Visual attributes Wang et al, ICCV’09 Ferrari et al, NIPS’07 Lampert et al, CVPR’09

Motivation 8Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen  Attributes for scene understanding?  A model which can compose the “black-box” classifiers and automatically exploit attributes for scene understanding Bocce “opencountry-like scene” attribute “salient region” attribute “depth in the middle region” attribute

 A model where the first layer is not trained to achieve the best independent performance, but achieve the best performance at the final output. Motivation 9Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen First level of classifiers Second level of classifier Scene Depth Event Saliency Event Features Feed-forward φ S (X) φ D (X) φ E (X) φ Sal (X) Cascaded classifier model (CCM) Heitz, Gould, Saxena and Koller, NIPS’08 Final output ????

Model

 Proposed generic model enables composing “black-box” classifiers  Feedback results in the first layer learning “attributes” rather than labels Model 11Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen First level of classifiers Second level of classifier Scene Depth Event Saliency Event Features Feed-forward φ S (X) φ D (X) φ E (X) φ Sal (X) Feed-back Final output Attribute Learner

Algorithm

13Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen First level of classifiers Second level of classifier Scene; θ S Depth; θ D Event ; θ E Saliency; θ Sal Event; ω E Features Feed-forward φ S (X (k) ) φ D (X (k) ) φ E (X (k) ) φ Sal (X (k) ) Feed-back Y E (k) (Output) TSTS TDTD TETE T Sal Optimization Goal

Y E (k) (Output) TSTS TDTD TETE T Sal Algorithm 14Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen First level of classifiers Second level of classifier Scene; θ S Depth; θ D Event ; θ E Saliency; θ Sal Event; ω E Features Feed-forward φ S (X (k) ) φ D (X (k) ) φ E (X (k) ) φ Sal (X (k) ) Feed-back Y E (k) (Output) TSTS TDTD TETE T Sal  Our Solution: Motivated from Expectation – Maximization (EM) algorithm  Parameter Learning: fix the required outputs and estimate parameters  Latent Variable Estimation: fix the model parameters and estimate latent variables (first level outputs) θSθS θSθS θDθD θDθD θEθE θEθE θ Sal ωEωE ωEωE

Results and Discussion

Experiments Depth Estimation - Make3D Saxena et al, NIPS’05 Saliency Detection Achanta et al, CVPR’09 Event Categorization Li et al, ICCV’07 16 S S D D E E Sal S S S S D D E E D D S S D D E E E E S S D D E E Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen Scene Categorization Oliva et al, IJCV’01

Results Improvement on every task with the same algorithm! 17Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Our proposed Original image Ground truthBase – model CCM [Heitz et. al] Results: Visual improvements 18Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen Depth Estimation Saliency Detection Our proposed Original image Ground truthBase – model CCM [Heitz et. al]

Discussion – Attributes of the scene Maps of weights given to depth maps for scene categorization task 19Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen S S D D E E Sal S S

Weights given to event and scene attributes for event categorization Discussion – Attributes of the scene 20Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen S S D D E E Sal E E

Conclusions

 Generic model to compose multiple vision tasks to aid holistic scene understanding  “Black-box”  Feedback results in learning meaningful “attributes” instead of just the “labels”  Handles heterogeneous datasets  Improved performance for each of the tasks over state-of-art using the same learning algorithm  Joint optimization of all the tasks?  Congcong Li, Adarsh Kowdle, Ashutosh Saxena, and Tsuhan Chen, Feedback Enabled Cascaded Classification Models for Scene Understanding, NIPS Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Thank you Questions?

Event Categorization Depth Estimation Scene Categorization Saliency Detection Image Feature Vector 51 – dim104 – dim512 – dim3 – dim 1 st layer Output 8 – dim class likelihood Pixel level depth map 8 – dim class likelihood Pixel level saliency score Layer-1 Classifier Multi-class Logistic Linear Regression RBF – kernel SVM Linear Regression Layer-2 Classifier Multi-class Logistic Linear Regression Multi-class Logistic Linear Regression Implementation 25Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Discussion Sparse model learnt by our model Weights for event categorization task 26Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Maps of weights given to depth maps for event categorization task Discussion – Attributes of the scene Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Results Improvement on every task with the same model! 28Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen