LARGE-SCALE NONPARAMETRIC IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill CVPR 2011Workshop on Large-Scale.

Slides:



Advertisements
Similar presentations
The Layout Consistent Random Field for detecting and segmenting occluded objects CVPR, June 2006 John Winn Jamie Shotton.
Advertisements

Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.
3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany,2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes.
Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.
Challenges to image parsing researchers Lana Lazebnik UNC Chapel Hill sky sidewalk building road car person car mountain.
Joint Optimisation for Object Class Segmentation and Dense Stereo Reconstruction Ľubor Ladický, Paul Sturgess, Christopher Russell, Sunando Sengupta, Yalin.
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Learning to Combine Bottom-Up and Top-Down Segmentation Anat Levin and Yair Weiss School of CS&Eng, The Hebrew University of Jerusalem, Israel.
Shape Sharing for Object Segmentation
Scene Labeling Using Beam Search Under Mutex Constraints ID: O-2B-6 Anirban Roy and Sinisa Todorovic Oregon State University 1.
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Wrap Up. We talked about Filters Edges Corners Interest Points Descriptors Image Stitching Stereo SFM.
Indoor Scene Segmentation using a Structured Light Sensor
- Recovering Human Body Configurations: Combining Segmentation and Recognition (CVPR’04) Greg Mori, Xiaofeng Ren, Alexei A. Efros and Jitendra Malik -
LARGE-SCALE IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill road building car sky.
INTRODUCTION Heesoo Myeong, Ju Yong Chang, and Kyoung Mu Lee Department of EECS, ASRI, Seoul National University, Seoul, Korea Learning.
Bangpeng Yao and Li Fei-Fei
Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.
Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs Roozbeh Mottaghi 1, Sanja Fidler 2, Jian Yao 2, Raquel Urtasun 2, Devi Parikh 3 1 UCLA.
Event prediction CS 590v. Applications Video search Surveillance – Detecting suspicious activities – Illegally parked cars – Abandoned bags Intelligent.
Unsupervised Learning of Categorical Segments in Image Collections *California Institute of Technology **Technion Marco Andreetto*, Lihi Zelnik-Manor**,
Robust Higher Order Potentials For Enforcing Label Consistency
High-Quality Video View Interpolation
Graph Cut based Inference with Co-occurrence Statistics Ľubor Ladický, Chris Russell, Pushmeet Kohli, Philip Torr.
Perceptual Organization: Segmentation and Optical Flow.
Opportunities of Scale, Part 2 Computer Vision James Hays, Brown Many slides from James Hays, Alyosha Efros, and Derek Hoiem Graphic from Antonio Torralba.
What, Where & How Many? Combining Object Detectors and CRFs
3D Scene Models Object recognition and scene understanding Krista Ehinger.
Large Scale Recognition and Retrieval. What does the world look like? High level image statistics Object Recognition for large-scale search Focus on scaling.
Efficient Image Search and Retrieval using Compact Binary Codes
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.
Object Detection Sliding Window Based Approach Context Helps
City University of Hong Kong 18 th Intl. Conf. Pattern Recognition Self-Validated and Spatially Coherent Clustering with NS-MRF and Graph Cuts Wei Feng.
“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)
Why Categorize in Computer Vision ?. Why Use Categories? People love categories!
Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
INTRODUCTION Heesoo Myeong and Kyoung Mu Lee Department of ECE, ASRI, Seoul National University, Seoul, Korea Tensor-based High-order.
1 Markov Random Fields with Efficient Approximations Yuri Boykov, Olga Veksler, Ramin Zabih Computer Science Department CORNELL UNIVERSITY.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
Associative Hierarchical CRFs for Object Class Image Segmentation
A New Method for Automatic Clothing Tagging Utilizing Image-Click-Ads Introduction Conclusion Can We Do Better to Reduce Workload?
A Dynamic Conditional Random Field Model for Object Segmentation in Image Sequences Duke University Machine Learning Group Presented by Qiuhua Liu March.
A SAMPLE RECOGNITION PROBLEM Joseph Tighe University of North Carolina at Chapel Hill.
Object-Graphs for Context-Aware Category Discovery Yong Jae Lee and Kristen Grauman University of Texas at Austin 1.
Context Neelima Chavali ECE /21/2013. Roadmap Introduction Paper1 – Motivation – Problem statement – Approach – Experiments & Results Paper 2 Experiments.
Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.
A global approach Finding correspondence between a pair of epipolar lines for all pixels simultaneously Local method: no guarantee we will have one to.
Learning Hierarchical Features for Scene Labeling
IEEE 2015 Conference on Computer Vision and Pattern Recognition Active Learning for Structured Probabilistic Models with Histogram Approximation Qing SunAnkit.
Learning Hierarchical Features for Scene Labeling Cle’ment Farabet, Camille Couprie, Laurent Najman, and Yann LeCun by Dong Nie.
Edge Preserving Spatially Varying Mixtures for Image Segmentation Giorgos Sfikas, Christophoros Nikou, Nikolaos Galatsanos (CVPR 2008) Presented by Lihan.
Image segmentation.
Scene Parsing with Object Instances and Occlusion Ordering JOSEPH TIGHE, MARC NIETHAMMER, SVETLANA LAZEBNIK 2014 IEEE CONFERENCE ON COMPUTER VISION AND.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
Markov Random Fields with Efficient Approximations
Nonparametric Semantic Segmentation
Paper Presentation: Shape and Matching
Digit Recognition using SVMS
Project Implementation for ITCS4122
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Object-Graphs for Context-Aware Category Discovery
CS 1674: Intro to Computer Vision Scene Recognition
Learning to Combine Bottom-Up and Top-Down Segmentation
Integration and Graphical Models
Brief Review of Recognition + Context
Cascaded Classification Models
“Traditional” image segmentation
Presentation transcript:

LARGE-SCALE NONPARAMETRIC IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill CVPR 2011Workshop on Large-Scale Learning for Vision road building car sky

Small-scale image parsing Tens of classes, hundreds of images He et al. (2004), Hoiem et al. (2005), Shotton et al. (2006, 2008, 2009), Verbeek and Triggs (2007), Rabinovich et al. (2007), Galleguillos et al. (2008), Gould et al. (2009), etc. Figure from Shotton et al. (2009)

Large-scale image parsing Hundreds of classes, tens of thousands of images Non-uniform class frequencies

Evolving training set Large-scale image parsing Hundreds of classes, tens of thousands of images Non-uniform class frequencies

 What’s considered important for small-scale image parsing?  Combination of local cues  Multiple segmentations, multiple scales  Context  Graphical model inference (CRFs, etc.)  How much of this is feasible for large-scale, dynamic datasets? Challenges

Our first attempt: A nonparametric approach  Lazy learning: do (almost) nothing at training time  At test time:  Find a retrieval set of similar images for each query image  Transfer labels from the retrieval set by matching segmentation regions (superpixels)  Related work: SIFT Flow (Liu et al. 2008, 2009)

Step 1: Scene-level matching Gist (Oliva & Torralba, 2001) Spatial Pyramid (Lazebnik et al., 2006) Color Histogram Retrieval set: Source of possible labels Source of region-level matches

Step 2: Region-level matching Superpixels (Felzenszwalb & Huttenlocher, 2004) Superpixel features

Step 2: Region-level matching Snow Road Tree Building Sky Pixel Area (size)

Road Sidewalk Step 2: Region-level matching Absolute mask (location)

Step 2: Region-level matching Road SkySnow Sidewalk Texture

Step 2: Region-level matching Building Sidewalk Road Color histogram

Region-level likelihoods  Nonparametric estimate of class-conditional densities for each class c and feature type k:  Per-feature likelihoods combined via Naïve Bayes: kth feature type of ith region Features of class c within some radius of r i Total features of class c in the dataset

Region-level likelihoods BuildingCarCrosswalk SkyWindowRoad

Step 3: Global image labeling  Compute a global image labeling by optimizing a Markov random field (MRF) energy function: Likelihood score for region r i and label c i Co-occurrence penalty Vector of region labels Regions Neighboring regions Smoothing penalty riri rjrj Efficient approximate minimization using  -expansion (Boykov et al., 2002)

Step 3: Global image labeling  Compute a global image labeling by optimizing a Markov random field (MRF) energy function: Likelihood score for region r i and label c i Co-occurrence penalty Vector of region labels Regions Neighboring regions Smoothing penalty

Step 3: Global image labeling  Compute a global image labeling by optimizing a Markov random field (MRF) energy function: Maximum likelihood labeling Edge penaltiesFinal labelingFinal edge penalties road building car window sky road building car sky Likelihood score for region r i and label c i Co-occurrence penalty Vector of region labels Regions Neighboring regions Smoothing penalty

Step 3: Global image labeling  Compute a global image labeling by optimizing a Markov random field (MRF) energy function: sky tree sand road sea road sky sand sea Original image Maximum likelihood labeling Edge penalties MRF labeling Likelihood score for region r i and label c i Co-occurrence penalty Vector of region labels Regions Neighboring regions Smoothing penalty

Joint geometric/semantic labeling  Semantic labels: road, grass, building, car, etc.  Geometric labels: sky, vertical, horizontal  Gould et al. (ICCV 2009) sky tree car road sky horizontal vertical Original imageSemantic labelingGeometric labeling

Joint geometric/semantic labeling  Objective function for joint labeling: Geometric/semantic consistency penalty Semantic labels Geometric labels Cost of semantic labeling Cost of geometric labeling sky tree car road sky horizontal vertical Original imageSemantic labelingGeometric labeling

Example of joint labeling

Understanding scenes on many levels To appear at ICCV 2011

Understanding scenes on many levels To appear at ICCV 2011

Datasets Training imagesTest imagesLabels SIFT Flow (Liu et al., 2009)2, Barcelona (Russell et al., 2007)14, LabelMe+SUN50,

Datasets Training imagesTest imagesLabels SIFT Flow (Liu et al., 2009)2, Barcelona (Russell et al., 2007)14, LabelMe+SUN50,

Overall performance SIFT FlowBarcelonaLabelMe + SUN SemanticGeom.SemanticGeom.SemanticGeom. Base73.2 (29.1) (8.0) (10.7)81.5 MRF76.3 (28.8) (7.6) (9.1)81.0 MRF + Joint76.9 (29.4) (7.6) (10.5)82.2 LabelMe + SUN IndoorLabelMe + SUN Outdoor SemanticGeom.SemanticGeom. Base22.4 (9.5) (11.0)83.1 MRF27.5 (6.5) (8.6)82.3 MRF + Joint27.8 (9.0) (10.8)84.1 *SIFT Flow: 74.75

Per-class classification rates

Results on SIFT Flow dataset

Results on LM+SUN dataset ImageGround truth Initial semanticFinal semantic Final geometric

Results on LM+SUN dataset ImageGround truth Initial semanticFinal semantic Final geometric

ImageGround truth Initial semanticFinal semantic Final geometric Results on LM+SUN dataset

ImageGround truth Initial semanticFinal semantic Final geometric Results on LM+SUN dataset

Running times SIFT Flow Barcelona dataset

Conclusions  Lessons learned  Can go pretty far with very little learning  Good local features, global (scene) context more important than neighborhood context  What’s missing  A rich representation for scene understanding  The long tail  Scalable, dynamic learning road building car sky