Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Slides:



Advertisements
Similar presentations
O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.
Advertisements

Attributes for Classifier Feedback Amar Parkash and Devi Parikh.
Learning Specific-Class Segmentation from Diverse Data M. Pawan Kumar, Haitherm Turki, Dan Preston and Daphne Koller at ICCV 2011 VGG reading group, 29.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Interactively Co-segmentating Topically Related Images with Intelligent Scribble Guidance Dhruv Batra, Carnegie Mellon University Adarsh Kowdle, Cornell.
A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University,
SPONSORED BY SA2014.SIGGRAPH.ORG Annotating RGBD Images of Indoor Scenes Yu-Shiang Wong and Hung-Kuo Chu National Tsing Hua University CGV LAB.
Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR.
Patch to the Future: Unsupervised Visual Prediction
Large Scale Visual Recognition Challenge (ILSVRC) 2013: Detection spotlights.
Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.
Bangpeng Yao and Li Fei-Fei
Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.
Optimization & Learning for Registration of Moving Dynamic Textures Junzhou Huang 1, Xiaolei Huang 2, Dimitris Metaxas 1 Rutgers University 1, Lehigh University.
Contour Based Approaches for Visual Object Recognition Jamie Shotton University of Cambridge Joint work with Roberto Cipolla, Andrew Blake.
Good morning, everyone, thank you for coming to my presentation.
Unsupervised Learning of Categorical Segments in Image Collections *California Institute of Technology **Technion Marco Andreetto*, Lihi Zelnik-Manor**,
Training Regimes Motivation  Allow state-of-the-art subcomponents  With “Black-box” functionality  This idea also occurs in other application areas.
Jierui Xie, Boleslaw Szymanski, Mohammed J. Zaki Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180, USA {xiej2, szymansk,
High Speed Obstacle Avoidance using Monocular Vision and Reinforcement Learning Jeff Michels Ashutosh Saxena Andrew Y. Ng Stanford University ICML 2005.
Spatial Pyramid Pooling in Deep Convolutional
Kuan-Chuan Peng Tsuhan Chen
Object Detection Sliding Window Based Approach Context Helps
I 3D: Interactive Planar Reconstruction of Objects and Scenes Adarsh KowdleYao-Jen Chang Tsuhan Chen School of Electrical and Computer Engineering Cornell.
Multiple Instance Real Boosting with Aggregation Functions Hossein Hajimirsadeghi and Greg Mori School of Computing Science Simon Fraser University International.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Supervised Learning of Edges and Object Boundaries Piotr Dollár Zhuowen Tu Serge Belongie.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Intelligent Database Systems Lab Presenter: Wu, Jhen-Wei Authors: Fabian Bürger, Josef Pauli ICPRAM. Representation Optimization with Feature Selection.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.
VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR
© Devi Parikh 2008 Devi Parikh and Tsuhan Chen Carnegie Mellon University April 3, ICASSP 2008 Bringing Diverse Classifiers to Common Grounds: dtransform.
Recognition Using Visual Phrases
Extracting Simple Verb Frames from Images Toward Holistic Scene Understanding Prof. Daphne Koller Research Group Stanford University Geremy Heitz DARPA.
Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.
Describing People: A Poselet-Based Approach to Attribute Classification.
Learning video saliency from human gaze using candidate selection CVPR2013 Poster.
SUN Database: Large-scale Scene Recognition from Abbey to Zoo Jianxiong Xiao *James Haysy Krista A. Ehinger Aude Oliva Antonio Torralba Massachusetts Institute.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Max-Margin Training of Upstream Scene Understanding Models Jun Zhu Carnegie Mellon University Joint work with Li-Jia Li *, Li Fei-Fei *, and Eric P. Xing.
Extracting Adaptive Contextual Cues From Unlabeled Regions Congcong Li +, Devi Parikh *, Tsuhan Chen + + Cornell University * Toyota Technological Institute.
IEEE 2015 Conference on Computer Vision and Pattern Recognition Active Learning for Structured Probabilistic Models with Histogram Approximation Qing SunAnkit.
ICCV 2007 Optimization & Learning for Registration of Moving Dynamic Textures Junzhou Huang 1, Xiaolei Huang 2, Dimitris Metaxas 1 Rutgers University 1,
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Gaussian Conditional Random Field Network for Semantic Segmentation
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Recent developments in object detection
Hybrid Deep Learning for Reflectance Confocal Microscopy Skin Images
Object Detection based on Segment Masks
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Data Driven Attributes for Action Detection
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Combining CNN with RNN for scene labeling (segmentation)
Part-Based Room Categorization for Household Service Robots
Saliency detection Donghun Yeo CV Lab..
Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.
Object detection as supervised classification
Enhanced-alignment Measure for Binary Foreground Map Evaluation
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Computer Vision James Hays
Rob Fergus Computer Vision
Cascaded Classification Models
CornerNet: Detecting Objects as Paired Keypoints
Outline Background Motivation Proposed Model Experimental Results
Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu
边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University
Heterogeneous convolutional neural networks for visual recognition
Image processing and computer vision pipeline for segmentation and cell detection. Image processing and computer vision pipeline for segmentation and cell.
Human-object interaction
Presentation transcript:

A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen Cornell University, Ithaca, NY, USA * indicates equal contribution

Outline Motivation Model Algorithm Results and Discussions Conclusions Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Motivation

? … Motivation Scene Understanding Event Categorization Vision tasks are highly related. But, how do we connect them? Object Detection Depth Estimation S O E L D ? Event Categorization Scene Categorization Saliency Detection Spatial Layout … Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Motivation Li et al, CVPR’09 Sudderth et al, CVPR’06 Hoiem et al, CVPR’08 Saxena et al, IJCV’07 Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Motivation S O E L D ? A generic model which can treat each classifier as a “black-box” and compose them to incorporate the additional information automatically Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Motivation Visual attributes Lampert et al, CVPR’09 Ferrari et al, NIPS’07 Wang et al, ICCV’09 Farhadi et al, CVPR’09 Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

“opencountry-like scene” attribute Motivation Attributes for scene understanding? A model which can compose the “black-box” classifiers and automatically exploit attributes for scene understanding Bocce “opencountry-like scene” attribute Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

First level of classifiers Second level of classifier Motivation A model where the first layer is not trained to achieve the best independent performance, but achieve the best performance at the final output. Cascaded classifier model (CCM) Heitz, Gould, Saxena and Koller, NIPS’08 Features φS(X) φD(X) φE(X) φSal(X) First level of classifiers Scene Depth Event Saliency ? ? ? ? Second level of classifier Event Feed-forward Final output Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Model

First level of classifiers Second level of classifier Model Proposed generic model enables composing “black-box” classifiers Feedback results in the first layer learning “attributes” rather than labels Features φS(X) φD(X) φE(X) φSal(X) Attribute Learner First level of classifiers Scene Depth Event Saliency Feed-forward Second level of classifier Event Feed-back Final output Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Algorithm

First level of classifiers Second level of classifier Algorithm Features φS(X(k)) φD(X(k)) φE(X(k)) φSal(X(k)) First level of classifiers Scene; θS Depth; θD Event; θE Saliency; θSal TS TD TE TSal Feed-forward Second level of classifier Event; ωE Feed-back YE(k) (Output) Optimization Goal Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

First level of classifiers Second level of classifier Algorithm Features φS(X(k)) φD(X(k)) φE(X(k)) φSal(X(k)) First level of classifiers Scene; θS Depth; θD Event; θE Saliency; θSal θS θD θE θSal TS TS TD TD TE TE TSal TSal Feed-forward Second level of classifier Event; ωE ωE Feed-back YE(k) (Output) YE(k) (Output) Our Solution: Motivated from Expectation – Maximization (EM) algorithm Parameter Learning: fix the required outputs and estimate parameters Latent Variable Estimation: fix the model parameters and estimate latent variables (first level outputs) Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Results and Discussion

Experiments Scene Categorization Event Categorization Oliva et al, IJCV’01 Event Categorization Li et al, ICCV’07 S D E Sal S D E Sal Saliency Detection Achanta et al, CVPR’09 Depth Estimation - Make3D Saxena et al, IJCV’07 S D E Sal S D E Sal Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Results Improvement on every task with the same algorithm! Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Results: Visual improvements Depth Estimation Original image Ground truth Base – model CCM [Heitz et. al] Our proposed Saliency Detection Original image Ground truth Base – model CCM [Heitz et. al] Our proposed Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Discussion – Attributes of the scene Maps of weights given to depth maps for scene categorization task S D E Sal Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Weights given to event and scene attributes for event categorization Discussion – Attributes of the scene Weights given to event and scene attributes for event categorization S D E Sal Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Conclusions

Conclusions Generic model to compose multiple vision tasks to aid holistic scene understanding “Black-box” Feedback results in learning meaningful “attributes” instead of just the “labels” Handles heterogeneous datasets Improved performance for each of the tasks over state-of-art using the same learning algorithm Joint optimization of all the tasks Congcong Li, Adarsh Kowdle, Ashutosh Saxena, and Tsuhan Chen, Feedback Enabled Cascaded Classification Models for Scene Understanding, NIPS 2010 Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Thank you Questions?