Scene Parsing with Object Instances and Occlusion Ordering JOSEPH TIGHE, MARC NIETHAMMER, SVETLANA LAZEBNIK 2014 IEEE CONFERENCE ON COMPUTER VISION AND.

Slides:

Advertisements

Similar presentations

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

Advertisements

Indoor Segmentation and Support Inference from RGBD Images Nathan Silberman, Derek Hoiem, Pushmeet Kohli, Rob Fergus.

CVPR2013 Poster Modeling Actions through State Changes.

Face Alignment by Explicit Shape Regression

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.

Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.

Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.

INTRODUCTION Heesoo Myeong, Ju Yong Chang, and Kyoung Mu Lee Department of EECS, ASRI, Seoul National University, Seoul, Korea Learning.

Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.

Texture Segmentation Based on Voting of Blocks, Bayesian Flooding and Region Merging C. Panagiotakis (1), I. Grinias (2) and G. Tziritas (3)

Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA

Pedestrian Detection in Crowded Scenes Dhruv Batra ECE CMU.

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Lecture 6 Image Segmentation

Robust Higher Order Potentials For Enforcing Label Consistency

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

LARGE-SCALE NONPARAMETRIC IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill CVPR 2011Workshop on Large-Scale.

Processing Digital Images. Filtering Analysis –Recognition Transmission.

Ensemble Learning: An Introduction

Feature Screening Concept: A greedy feature selection method. Rank features and discard those whose ranking criterions are below the threshold. Problem:

Tracking Video Objects in Cluttered Background

Stereo Computation using Iterative Graph-Cuts

5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.

CSE (c) S. Tanimoto, 2007 Segmentation and Labeling 1 Segmentation and Labeling Outline: Edge detection Chain coding of curves Segmentation into.

1 Real Time, Online Detection of Abandoned Objects in Public Areas Proceedings of the 2006 IEEE International Conference on Robotics and Automation Authors.

Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.

3D LayoutCRF Derek Hoiem Carsten Rother John Winn.

SVM by Sequential Minimal Optimization (SMO)

Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.

Professor: S. J. Wang Student : Y. S. Wang

MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/24/10.

Jifeng Dai 2011/09/27.  Introduction  Structural SVM  Kernel Design  Segmentation and parameter learning  Object Feature Descriptors  Experimental.

Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

Marco Pedersoli, Jordi Gonzàlez, Xu Hu, and Xavier Roca

#MOTION ESTIMATION AND OCCLUSION DETECTION #BLURRED VIDEO WITH LAYERS

Learning Collections of Parts for Object Recognition and Transfer Learning University of Illinois at Urbana- Champaign.

Efficient Region Search for Object Detection Sudheendra Vijayanarasimhan and Kristen Grauman Department of Computer Science, University of Texas at Austin.

Supervised Learning of Edges and Object Boundaries Piotr Dollár Zhuowen Tu Serge Belongie.

Feature-Based Stereo Matching Using Graph Cuts Gorkem Saygili, Laurens van der Maaten, Emile A. Hendriks ASCI Conference 2011.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.

Recognition Using Visual Phrases

FACE DETECTION : AMIT BHAMARE. WHAT IS FACE DETECTION ? Face detection is computer based technology which detect the face in digital image. Trivial task.

Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Shape2Pose: Human Centric Shape Analysis CMPT888 Vladimir G. Kim Siddhartha Chaudhuri Leonidas Guibas Thomas Funkhouser Stanford University Princeton University.

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

A Plane-Based Approach to Mondrian Stereo Matching

Object Detection based on Segment Masks

Finding Things: Image Parsing with Regions and Per-Exemplar Detectors

Nonparametric Semantic Segmentation

Enhanced-alignment Measure for Binary Foreground Map Evaluation

Vehicle Segmentation and Tracking in the Presence of Occlusions

Critical Issues with Respect to Clustering

“The Truth About Cats And Dogs”

“Traditional” image segmentation

Presentation transcript:

Scene Parsing with Object Instances and Occlusion Ordering JOSEPH TIGHE, MARC NIETHAMMER, SVETLANA LAZEBNIK 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 1

Outline Introduction Inference Formulation Pixel-level Inference Candidate Instance Generation Instance Scores and Overlap Constraints Instance-level Inference Occlusion Ordering Experiments Conclusion 2

Introduction Despite their rapidly increasing accuracy, state-of-the-art image parsing methods have several limitations: 1. They have no notion of object instances 2. They tend to be more accurate for “stuff” classes that are characterized by local appearance rather than overall shape. To do better on “thing” classes such as car, cat, person, and vase – as well as to gain the ability to represent object instances – it becomes necessary to incorporate detectors that model the overall object shape. 3

Introduction In this work we interpret a scene in terms of both dense pixel labels and object instances defined by segmentation masks rather than just bounding boxes. 1. Start with our earlier region and detector based parsing system [20] to produce an initial pixel labeling and a set of candidate object instances. 2. Select a subset of instances that explain the image well and learn overlap and occlusion ordering constraints from training data. 3. Alternate between using the instance predictions to refine the pixel labels and vice versa. [20] J. Tighe and S. Lazebnik. Finding things: Image parsing with regions and per-exemplar detectors. In CVPR, Jun

Introduction Our method is related to that of Guo and Hoiem [7], who infer background classes at every pixel, even in regions where they are not visible. They learn the relationships between occluders and background classes to boost the accuracy of background prediction. We go further, inferring not only the occluded background, but all the classes and their relationships, and use a much larger number of labels. [7] R. Guo and D. Hoiem. Beyond the line of sight: labeling the underlying surfaces. In ECCV,

Introduction 6

Outline Introduction Inference Formulation Pixel-level Inference Candidate Instance Generation Instance Scores and Overlap Constraints Instance-level Inference Occlusion Ordering Experiments Conclusion 7

Pixel-level Inference Given a query or test image, this system first finds a retrieval set of globally similar training images. Then it computes two pixel-level potentials: a region-based data term, and a detector-based data term. The two data terms are combined using a one-vs-all SVM. 8

Pixel-level Inference 9 unary Object potentialsmoothing

Outline Introduction Inference Formulation Pixel-level Inference Candidate Instance Generation Instance Scores and Overlap Constraints Instance-level Inference Occlusion Ordering Experiments Conclusion 10

Candidate Instance Generation It is worth pointing out the different nature of hypotheses for “thing” and “stuff” classes. It is highly desirable to correctly separate multiple “thing” instances even when they are close together or overlapping. For “stuff” classes, the notion of instances is a lot looser. In some cases, it may be possible to identify separate instances of buildings and trees, but most of the time we are content to treat areas occupied by these classes as undifferentiated masses. Accordingly, we manually separate all classes in our datasets into “things” and “stuff” and use different procedures to generate candidates for each. 11

Candidate Instance Generation For “things,” we get candidates from per-exemplar masks transferred during the computation of the detector-based data term for pixel-level labeling. In that stage, the system scans the test image with per-exemplar detectors associated with every instance from the retrieval set. 12

Candidate Instance Generation 13

Candidate Instance Generation 14

Outline Introduction Inference Formulation Pixel-level Inference Candidate Instance Generation Instance Scores and Overlap Constraints Instance-level Inference Occlusion Ordering Experiments Conclusion 15

Instance Scores and Overlap Constraints 16

Instance Scores and Overlap Constraints It is important to use the non-exclusive SVM because we do not want to penalize an instance due to other classes that may occlude it. However, that we will still need the exclusive SVM to perform pixel-level inference, since the non-exclusive SVM tends to prefer the larger and more common classes. 17

Instance Scores and Overlap Constraints 18

Instance Scores and Overlap Constraints Our final weighted instance scores are given by Our instance-level inference will attempt to select a combination of instance hypotheses that maximize the total score while requiring that every pair of selected objects have a valid overlap relationship. 19

Instance Scores and Overlap Constraints 20

Instance Scores and Overlap Constraints 21

Instance Scores and Overlap Constraints For all pairs of labels (c, c’), we use the training set to build a histogram H(c, c’, os) giving the probability for c to be behind c with overlap score of os. Given instances m and n from the test image, we then determine whether their relationship is valid by thresholding the maximum of the histogram entries corresponding to both orderings: Again we set a fairly conservative threshold of t =

Outline Introduction Inference Formulation Pixel-level Inference Candidate Instance Generation Instance Scores and Overlap Constraints Instance-level Inference Occlusion Ordering Experiments Conclusion 23

Instance-level Inference 24

Instance-level Inference 25

Instance-level Inference To infer the labels x we adopt two methods: The first is greedy inference that works by adding one candidate object at a time. In each round, it searches for the object whose addition will produce the lowest energy M(x) while still respecting all overlap constraints. This continues until each remaining object either increases the energy or violates a constraint. This method is very fast and works well. 26

Instance-level Inference The second method is to use the integer quadratic programming solver CPlex [9]. CPlex tends to find solutions composed of a larger number of smaller instances. These solutions have lower energies and give a slightly higher labeling accuracy overall. On the other hand, CPlex is much slower than greedy and cannot handle as many candidate objects. [9] IBM. Cplex optimizer. optimization/cplex-optimizer/, Oct

Instance-level Inference 28

Outline Introduction Inference Formulation Pixel-level Inference Candidate Instance Generation Instance Scores and Overlap Constraints Instance-level Inference Occlusion Ordering Experiments Conclusion 29

Occlusion Ordering 30

Occlusion Ordering To infer the occlusion order we now need to remove one edge from each pair to generate a directed acyclic graph with the highest edge weights possible. To do so, we remove the edge with the smaller weight for each pair, and check whether there are cycles remaining. If there are, we pick a random cycle and swap the edge pair with the smallest difference in weight, and continue until there are no more cycles. Finally, we perform a topological sort of the resulting graph to assign a depth plane to each object. 31

Occlusion Ordering 32

Outline Introduction Inference Formulation Pixel-level Inference Candidate Instance Generation Instance Scores and Overlap Constraints Instance-level Inference Occlusion Ordering Experiments Conclusion 33

Experiments──Measureme nt 34 For our pixel labeling, we report the number of pixels correctly classified (per-pixel rate) and the average of per-class rates. For our object instance predictions, we evaluate the performance in two ways: Object P/R and Pixel P/R

Experiments Object P/R, measures precision and recall of predicted instances. Then precision is computed as the number of correctly predicted objects divided by the total number of predicted objects, while recall is the number of correctly predicted objects divided by the number of ground truth objects. 35

Experiments Pixel P/R, evaluates the pixel labeling generated by compositing the predicted instances back to front; any pixels that are not contained within a predicted object are considered unlabeled. Precision for this measure is the number of correct pixel labels divided by the total number of predicted pixels, and recall is the number of correct pixel labels divided by the total number of pixels in the test images. 36

Experiments The primary goal of our experiments is to validate our chief contribution, the instance-level inference framework. We have implemented two baselines to compare against. Both are used to replace the instance-level inference. 37

Experiments NMS Detector minimizes the reliance on the pixel parsing when inferring objects. It takes per-exemplar detection masks as candidate instances for both “things” and “stuff” and scores each candidate with the detector responses. NMS SVM uses greedy non maximal suppression instead of the inter-class overlap constraints. 38

Experiments LM+SUN dataset: 45,176 training images, 500 test images, 232 class labels. 39

40

Experiments LMO dataset: 2,488 training images, 200 test images, 33 class labels. 41

42

Experiments 43

Experiments As we have learned from our experiments, alternating between pixel-level and instance-level inference only tends to improve performance in complex scenes where overlap relationships can help correct errors. What is worse, in some cases, enforcing sensible overlap constraints can actually hurt the pixel-level accuracy. 44

Discussion To improve our system, one key direction is generating better candidate instances. Currently, we manually divide our label set into “stuff” and “thing” classes and use different methods to generate candidate objects for each. But many classes, such as trees, can sometimes appear as “things” with well-defined boundaries, and sometimes as diffuse “stuff.” We would like to explore ways of generating candidate object masks that would not rely on a hard things-vs- stuff split. 45

Outline Introduction Inference Formulation Pixel-level Inference Candidate Instance Generation Instance Scores and Overlap Constraints Instance-level Inference Occlusion Ordering Experiments Conclusion 46

Conclusion We have demonstrated a system capable of predicting object instances and their occlusion ordering in complex scenes. However, somewhat disappointingly, instance-level inference does not give us any significant gains in pixel accuracy over our own previous system. 47

Thanks for your listening! 48