LARGE-SCALE IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill road building car sky.

Slides:



Advertisements
Similar presentations
Kapitel 14 Recognition Scene understanding / visual object categorization Pose clustering Object recognition by local features Image categorization Bag-of-features.
Advertisements

Automatic Photo Pop-up Derek Hoiem Alexei A.Efros Martial Hebert Carnegie Mellon University.
The Layout Consistent Random Field for detecting and segmenting occluded objects CVPR, June 2006 John Winn Jamie Shotton.
Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.
Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.
Challenges to image parsing researchers Lana Lazebnik UNC Chapel Hill sky sidewalk building road car person car mountain.
Scene Labeling Using Beam Search Under Mutex Constraints ID: O-2B-6 Anirban Roy and Sinisa Todorovic Oregon State University 1.
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Wrap Up. We talked about Filters Edges Corners Interest Points Descriptors Image Stitching Stereo SFM.
1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.
INTRODUCTION Heesoo Myeong, Ju Yong Chang, and Kyoung Mu Lee Department of EECS, ASRI, Seoul National University, Seoul, Korea Learning.
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.
Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs Roozbeh Mottaghi 1, Sanja Fidler 2, Jian Yao 2, Raquel Urtasun 2, Devi Parikh 3 1 UCLA.
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Landmark Classification in Large- scale Image Collections Yunpeng Li David J. Crandall Daniel P. Huttenlocher ICCV 2009.
Event prediction CS 590v. Applications Video search Surveillance – Detecting suspicious activities – Illegally parked cars – Abandoned bags Intelligent.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Robust Higher Order Potentials For Enforcing Label Consistency
Lecture 28: Bag-of-words models
LARGE-SCALE NONPARAMETRIC IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill CVPR 2011Workshop on Large-Scale.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Object Recognition: History and Overview Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce.
Bag-of-features models
Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)
Graph Cut based Inference with Co-occurrence Statistics Ľubor Ladický, Chris Russell, Pushmeet Kohli, Philip Torr.
By Suren Manvelyan,
3D Scene Models Object recognition and scene understanding Krista Ehinger.
Efficient Image Search and Retrieval using Compact Binary Codes
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
CSE 185 Introduction to Computer Vision Pattern Recognition.
Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.
Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.
Object Detection Sliding Window Based Approach Context Helps
City University of Hong Kong 18 th Intl. Conf. Pattern Recognition Self-Validated and Spatially Coherent Clustering with NS-MRF and Graph Cuts Wei Feng.
Why Categorize in Computer Vision ?. Why Use Categories? People love categories!
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Handwritten digit recognition Jitendra Malik. Handwritten digit recognition (MNIST,USPS) LeCun’s Convolutional Neural Networks variations (0.8%, 0.6%
Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)
INTRODUCTION Heesoo Myeong and Kyoung Mu Lee Department of ECE, ASRI, Seoul National University, Seoul, Korea Tensor-based High-order.
Modeling the Shape of a Scene: Seeing the trees as a forest Scene Understanding Seminar
Ilya Gurvich 1 An Undergraduate Project under the supervision of Dr. Tammy Avraham Conducted at the ISL Lab, CS, Technion.
Grouplet: A Structured Image Representation for Recognizing Human and Object Interactions Bangpeng Yao and Li Fei-Fei Computer Science Department, Stanford.
A New Method for Automatic Clothing Tagging Utilizing Image-Click-Ads Introduction Conclusion Can We Do Better to Reduce Workload?
A SAMPLE RECOGNITION PROBLEM Joseph Tighe University of North Carolina at Chapel Hill.
CS654: Digital Image Analysis
Object-Graphs for Context-Aware Category Discovery Yong Jae Lee and Kristen Grauman University of Texas at Austin 1.
Context Neelima Chavali ECE /21/2013. Roadmap Introduction Paper1 – Motivation – Problem statement – Approach – Experiments & Results Paper 2 Experiments.
Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.
A global approach Finding correspondence between a pair of epipolar lines for all pixels simultaneously Local method: no guarantee we will have one to.
1.Learn appearance based models for concepts 2.Compute posterior probabilities or Semantic Multinomial (SMN) under appearance models. -But, suffers from.
Learning Hierarchical Features for Scene Labeling
Learning Hierarchical Features for Scene Labeling Cle’ment Farabet, Camille Couprie, Laurent Najman, and Yann LeCun by Dong Nie.
Image segmentation.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
Markov Random Fields with Efficient Approximations
Nonparametric Semantic Segmentation
Paper Presentation: Shape and Matching
By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,
Project Implementation for ITCS4122
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Object-Graphs for Context-Aware Category Discovery
CS 1674: Intro to Computer Vision Scene Recognition
Learning to Combine Bottom-Up and Top-Down Segmentation
Brief Review of Recognition + Context
Cascaded Classification Models
“Traditional” image segmentation
Presentation transcript:

LARGE-SCALE IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill road building car sky

Small-scale image parsing Tens of classes, hundreds of images He et al. (2004), Hoiem et al. (2005), Shotton et al. (2006, 2008, 2009), Verbeek and Triggs (2007), Rabinovich et al. (2007), Galleguillos et al. (2008), Gould et al. (2009), etc. Figure from Shotton et al. (2009)

Large-scale image parsing Hundreds of classes, tens of thousands of images Non-uniform class frequencies

Large-scale image parsing Hundreds of classes, tens of thousands of images Evolving training set Non-uniform class frequencies

Challenges  What’s considered important for small-scale image parsing?  Combination of local cues  Multiple segmentations, multiple scales  Context  How much of this is feasible for large-scale, dynamic datasets?

Our first attempt: A nonparametric approach  Lazy learning: do (almost) nothing up front  To parse (label) an image we will:  Find a set of similar images  Transfer labels from the similar images by matching pieces of the image (superpixels)

Finding Similar Images

Ocean Open Field Highway Street Forest Mountain Inner City Tall Building What is depicted in this image? Which image is most similar? Then assign the label from the most similar image

Pixels are a bad measure of similarity Most similar according to pixel distanceMost similar according to “Bag of Words”

Origin of the Bag of Words model  Orderless document representation:  frequencies of words from a dictionary Salton & McGill (1983) US Presidential Speeches Tag Cloud

What are words for an image?

Wing Tail WheelBuildingPropeller

Wing Tail WheelBuilding PropellerJet Engine

Wing Tail WheelBuilding PropellerJet Engine

Wing Tail WheelBuilding PropellerJet Engine

But where do the words come from?

Then where does the dictionary come from?

Example Dictionary Source: B. Leibe

Another dictionary … … … … Source: B. Leibe

Fei-Fei et al. 2005

Outline of the Bag of Words method  Divide the image into patches  Assign a “word” for each patch  Count the number of occurrences of each “word” in the image

Does this work for our problem? 65,536 Pixels256 Dimensions

Which look the most similar?

building road car sky building road car sky building road car sky building road car sky building road car sky tree sky tree building sand mountain car road

Step 1: Scene-level matching Gist (Oliva & Torralba, 2001) Spatial Pyramid (Lazebnik et al., 2006) Color Histogram Retrieval set: Source of possible labels Source of region-level matches

Step 2: Region-level matching

Superpixels (Felzenszwalb & Huttenlocher, 2004)

Step 2: Region-level matching Snow Road Tree Building Sky Pixel Area (size)

Road Sidewalk Step 2: Region-level matching Absolute mask (location)

Step 2: Region-level matching Road SkySnow Sidewalk Texture

Step 2: Region-level matching Building Sidewalk Road Color histogram

Step 2: Region-level matching Superpixels (Felzenszwalb & Huttenlocher, 2004) Superpixel features

Region-level likelihoods  Nonparametric estimate of class-conditional densities for each class c and feature type k:  Per-feature likelihoods combined via Naïve Bayes: kth feature type of ith region Features of class c within some radius of r i Total features of class c in the dataset

Region-level likelihoods BuildingCarCrosswalk SkyWindowRoad

Step 3: Global image labeling  Compute a global image labeling by optimizing a Markov random field (MRF) energy function: Likelihood score for region r i and label c i Co-occurrence penalty Vector of region labels Regions Neighboring regions Smoothing penalty riri rjrj Efficient approximate minimization using  - expansion (Boykov et al., 2002)

Step 3: Global image labeling  How do we resolve issues like this? sky tree sand road sea road Original image Maximum likelihood labeling sky sand sea

Step 3: Global image labeling  Compute a global image labeling by optimizing a Markov random field (MRF) energy function: Likelihood score for region r i and label c i Co-occurrence penalty Vector of region labels Regions Neighboring regions Smoothing penalty

Step 3: Global image labeling  Compute a global image labeling by optimizing a Markov random field (MRF) energy function: Maximum likelihood labeling Edge penaltiesFinal labelingFinal edge penalties road building car window sky road building car sky Likelihood score for region r i and label c i Co-occurrence penalty Vector of region labels Regions Neighboring regions Smoothing penalty

Step 3: Global image labeling  Compute a global image labeling by optimizing a Markov random field (MRF) energy function: sky tree sand road sea road sky sand sea Original image Maximum likelihood labeling Edge penalties MRF labeling Likelihood score for region r i and label c i Co-occurrence penalty Vector of region labels Regions Neighboring regions Smoothing penalty

Joint geometric/semantic labeling  Semantic labels: road, grass, building, car, etc.  Geometric labels: sky, vertical, horizontal  Gould et al. (ICCV 2009) sky tree car road sky horizontal vertical Original imageSemantic labelingGeometric labeling

Joint geometric/semantic labeling  Objective function for joint labeling: Geometric/semantic consistency penalty Semantic labels Geometric labels Cost of semantic labeling Cost of geometric labeling sky tree car road sky horizontal vertical Original imageSemantic labelingGeometric labeling

Example of joint labeling

Understanding scenes on many levels To appear at ICCV 2011

Understanding scenes on many levels To appear at ICCV 2011

Datasets Training imagesTest imagesLabels SIFT Flow (Liu et al., 2009)2, Barcelona (Russell et al., 2007)14, LabelMe+SUN50,

Datasets Training imagesTest imagesLabels SIFT Flow (Liu et al., 2009)2, Barcelona (Russell et al., 2007)14, LabelMe+SUN50,

Overall performance SIFT FlowBarcelonaLabelMe + SUN SemanticGeom.SemanticGeom.SemanticGeom. Base73.2 (29.1) (8.0) (10.7)81.5 MRF76.3 (28.8) (7.6) (9.1)81.0 MRF + Joint76.9 (29.4) (7.6) (10.5)82.2 LabelMe + SUN IndoorLabelMe + SUN Outdoor SemanticGeom.SemanticGeom. Base22.4 (9.5) (11.0)83.1 MRF27.5 (6.5) (8.6)82.3 MRF + Joint27.8 (9.0) (10.8)84.1 *SIFT Flow: 74.75

Per-class classification rates

Results on SIFT Flow dataset

Results on LM+SUN dataset ImageGround truth Initial semanticFinal semantic Final geometric

Results on LM+SUN dataset ImageGround truth Initial semanticFinal semantic Final geometric

ImageGround truth Initial semanticFinal semantic Final geometric Results on LM+SUN dataset

ImageGround truth Initial semanticFinal semantic Final geometric Results on LM+SUN dataset

Running times SIFT Flow Barcelona dataset

Conclusions  Lessons learned  Can go pretty far with very little learning  Good local features, and global (scene) context is more important than neighborhood context  What’s missing  A rich representation for scene understanding  The long tail  Scalable, dynamic learning road building car sky