Max-Margin Training of Upstream Scene Understanding Models Jun Zhu Carnegie Mellon University Joint work with Li-Jia Li *, Li Fei-Fei *, and Eric P. Xing.

Slides:



Advertisements
Similar presentations
Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.
Advertisements

Pattern Recognition and Machine Learning
A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University,
Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR.
Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and.
Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem
CMPUT 466/551 Principal Source: CMU
Structural Human Action Recognition from Still Images Moin Nabi Computer Vision Lab. ©IPM - Oct
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Robust Subspace Discovery: Low-rank and Max-margin Approaches Xiang Bai Joint works with Xinggang Wang, Zhengdong Zhang, Zhuowen Tu, Yi Ma and Wenyu Liu.
Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Discriminative and generative methods for bags of features
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Statistical Recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Kristen Grauman.
1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell.
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework Li-Jia Li, Richard Socher, Li Fei- Fei 1.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
LARGE-SCALE NONPARAMETRIC IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill CVPR 2011Workshop on Large-Scale.
Machine Vision and Dig. Image Analysis 1 Prof. Heikki Kälviäinen C50A6100 Lectures 12: Object Recognition Professor Heikki Kälviäinen Machine Vision and.
MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning.
Classification 10/03/07.
Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.
AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/
Predicting Matchability - CVPR 2014 Paper -
Wayne State University, 1/31/ Multiple-Instance Learning via Embedded Instance Selection Yixin Chen Department of Computer Science University of.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Crash Course on Machine Learning
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.
Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben.
Fast Max–Margin Matrix Factorization with Data Augmentation Minjie Xu, Jun Zhu & Bo Zhang Tsinghua University.
Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.
Discriminative classification methods, kernels, and topic models Jakob Verbeek January 8, 2010.
1 Action Classification: An Integration of Randomization and Discrimination in A Dense Feature Representation Computer Science Department, Stanford University.
Leo Zhu CSAIL MIT Joint work with Chen, Yuille, Freeman and Torralba 1.
Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)
Representations for object class recognition David Lowe Department of Computer Science University of British Columbia Vancouver, Canada Sept. 21, 2006.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
INTRODUCTION Heesoo Myeong and Kyoung Mu Lee Department of ECE, ASRI, Seoul National University, Seoul, Korea Tensor-based High-order.
Nonlinear Learning Using Local Coordinate Coding K. Yu, T. Zhang and Y. Gong, NIPS 2009 Improved Local Coordinate Coding Using Local Tangents K. Yu and.
Modeling the Shape of a Scene: Seeing the trees as a forest Scene Understanding Seminar
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Biointelligence Laboratory, Seoul National University
Context-based vision system for place and object recognition Antonio Torralba Kevin Murphy Bill Freeman Mark Rubin Presented by David Lee Some slides borrowed.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Image Classification over Visual Tree Jianping Fan Dept of Computer Science UNC-Charlotte, NC
Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL.
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Transfer Learning for Image Classification. Transfer Learning Approaches Leverage data from related tasks to improve performance:  Improve generalization.
Using the Forest to see the Trees: A computational model relating features, objects and scenes Antonio Torralba CSAIL-MIT Joint work with Aude Oliva, Kevin.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Probability Theory and Parameter Estimation I
Learning Mid-Level Features For Recognition
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Nonparametric Semantic Segmentation
Part-Based Room Categorization for Household Service Robots
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
Object detection as supervised classification
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
CS 1674: Intro to Computer Vision Scene Recognition
Pattern Recognition and Machine Learning
Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen
An Efficient Projection for L1-∞ Regularization
Generic object recognition
Presentation transcript:

Max-Margin Training of Upstream Scene Understanding Models Jun Zhu Carnegie Mellon University Joint work with Li-Jia Li *, Li Fei-Fei *, and Eric P. Xing * Stanford University

How to Represent a Scene Image? Seeing the forest before the trees – Fast scene categorization with gist features – Oliva & Torralba, global properties (e.g., openness, mean depth, expansion, etc.) for scene gist – Kevin et al., 2003 Use the gist features to see the trees (i.e., recognizing objects) But, the trees compose the forest … – Object recognition is critical for scene categorization – Sudderth et al., 2005; Fei-Fei et al., 2005; CMU, March, 2010 badmintonboccecroquet This is a forest scene.

Upstream Scene Understanding Models Erik Sudderth’s “Scene, Object, and Parts” model (CVPR 2005) Using MLE to estimate model CMU, March, 2010

Upstream Scene Understanding Models Kevin Murphy’s “Forest & Tree” Model (NIPS 2003) Using MLE to estimate model CMU, March, 2010

Upstream Scene Understanding Models Fei-Fei’s “Total Scene Understanding” Model (CVPR 2009) Using MLE to estimate model CMU, March, 2010 Athlete Horse Grass Trees Sky Saddle class: Polo

We want to answer … Are we satisfying with the MLE method? Can we learn scene understanding models CMU, March, 2010

A Simple Working Example Joint scene categorization and object annotation model Global features: – Can be arbitrary! – Gist (Oliva & Torralba, 2001) – Sparse SIFT codes (Yang, Yu, Gong & Huang, CMU, March, 2010

Problem with MLE Model Joint Distribution Prediction rules for scene CMU, March, 2010

Problem with MLE Model Joint Distribution Maximum Likelihood Estimation CMU, March, 2010 Decoupling! Scene ClassificationObject Annotation

Problem with MLE Model Joint Distribution Weak Coupling CMU, March, 2010

Problem with MLE Model Joint Distribution Weak Coupling CMU, March, 2010

Max-margin Training to achieve Strong Coupling Hint: although MLE decouples scene model and object model, the joint prediction rule couples them Discriminant function & Hinge CMU, March, 2010

Max-margin Training to achieve Strong Coupling Hint: although MLE decouples scene model and object model, the joint prediction rule couples them Regularized Hinge Loss Minimization – Hinge loss couples both scene & object models, while log-loss is defined on scene model CMU, March, 2010

Solving the Optimization Problem Approximation to the intractable log-likelihood The optimization CMU, March, 2010

EM-style Algorithm Posterior Inference (inner-max problem): Parameter Estimation (outer-min problem) – alternating minimization (next CMU, March, 2010

Alternating-Minimization for a a Closed-form solutions – Gaussian parameters (c.f. MLE for Gaussian Mixture) – Topic parameters Loss-augmented SVM CMU, March, 2010

Experiments 8-category sports data set (Li & Fei-Fei, 2007): – 1574 images (50/50 split) Badminton, bocce, croquet, polo, rowing, snowboarding, sailing, rockclimbing – Pre-segment each image into regions – Region features: color, texture, and location patches with SIFT features – Global features: Gist (Oliva & Torralba, 2001) Sparse SIFT codes (Yang, Yu, Gong, & Huang, 2009) 67-category MIT indoor scene (Quattoni & Torralba, 2009): – ~80 per-category for training; ~20 per-category for testing – Same feature representation as above – Gist global CMU, March, 2010

Scene Classification Gist features – Fei-Fei’s theme model: 0.65 (different image representation) – SVM: CMU, March, 2010

Scene Classification Loss CMU, March, 2010

Scene Classification Confusion Matrix & CMU, March, 2010 $ blue for correct; red for wrong

MIT Indoor Scene Classification CMU, March, 2010 $ ROI+Gist(annotation) used human annotated interest regions.

MIT Indoor Scene CMU, March, 2010

Object Annotation kNN classifier with features – Overall: – Example CMU, March, 2010

Conclusions & Future Work Conclusions: – MLE estimation can result in a weak coupling in upstream scene understanding models – Max-margin approach can be applied to achieve a well-balanced prediction rule Future Work – Improve the performance of the object annotation model Incorporate global features with conditional models “Double direction” max-margin learning with supervision on object annotation for scene completion – Systematical comparison with downstream scene understanding models Multi-class sLDA (Wang et al., 2009) MedLDA (Zhu et al., CMU, March, 2010