Holistic Scene Understanding Virginia Tech ECE6504 2013/02/26 Stanislaw Antol.

Slides:

Advertisements

Similar presentations

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

Advertisements

The Layout Consistent Random Field for detecting and segmenting occluded objects CVPR, June 2006 John Winn Jamie Shotton.

Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.

Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Part 4: Combined segmentation and recognition by Rob Fergus (MIT)

Learning to Combine Bottom-Up and Top-Down Segmentation Anat Levin and Yair Weiss School of CS&Eng, The Hebrew University of Jerusalem, Israel.

Scene Labeling Using Beam Search Under Mutex Constraints ID: O-2B-6 Anirban Roy and Sinisa Todorovic Oregon State University 1.

Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.

Semantic Texton Forests for Image Categorization and Segmentation We would like to thank Amnon Drory for this deck הבהרה : החומר המחייב הוא החומר הנלמד.

K Means Clustering , Nearest Cluster and Gaussian Mixture

Ivan Laptev IRISA/INRIA, Rennes, France September 07, 2006 Boosted Histograms for Improved Object Detection.

Efficient Inference for Fully-Connected CRFs with Stationarity

LOCUS (Learning Object Classes with Unsupervised Segmentation) A variational approach to learning model- based segmentation. John Winn Microsoft Research.

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs Roozbeh Mottaghi 1, Sanja Fidler 2, Jian Yao 2, Raquel Urtasun 2, Devi Parikh 3 1 UCLA.

Contour Based Approaches for Visual Object Recognition Jamie Shotton University of Cambridge Joint work with Roberto Cipolla, Andrew Blake.

Models for Scene Understanding – Global Energy models and a Style-Parameterized boosting algorithm (StyP-Boost) Jonathan Warrell, 1 Simon Prince, 2 Philip.

Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.

Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,

Robust Higher Order Potentials For Enforcing Label Consistency

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

Schedule Introduction Models: small cliques and special potentials Tea break Inference: Relaxation techniques:

TextonBoost : Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation J. Shotton*, J. Winn†, C. Rother†, and A.

Oxford Brookes Seminar Thursday 3 rd September, 2009 University College London1 Representing Object-level Knowledge for Segmentation and Image Parsing:

WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES Prasad Gabbur, Kobus Barnard University of Arizona.

The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects By John Winn & Jamie Shotton CVPR 2006 presented by Tomasz.

5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.

What, Where & How Many? Combining Object Detectors and CRFs

Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Graph-based Segmentation

MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/31/15.

Analysis: TextonBoost and Semantic Texton Forests

MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/24/10.

Why Categorize in Computer Vision ?. Why Use Categories? People love categories!

Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Visual Object Recognition

Texture We would like to thank Amnon Drory for this deck הבהרה : החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Texture We would like to thank Amnon Drory for this deck הבהרה : החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

Associative Hierarchical CRFs for Object Class Image Segmentation Ľubor Ladický 1 1 Oxford Brookes University 2 Microsoft Research Cambridge Based on the.

MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.

1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,

Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.

Associative Hierarchical CRFs for Object Class Image Segmentation

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.

Improved Object Detection

Jigsaws: joint appearance and shape clustering John Winn with Anitha Kannan and Carsten Rother Microsoft Research, Cambridge.

Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N 工科所錢雅馨 2011/01/16 Li-Jia Li, Richard.

Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation J. Shotton ; University of Cambridge J. Jinn,

M ACHINE L EARNING : F OUNDATIONS C OURSE TAU – 2012A P ROF. Y ISHAY M ANSOUR o TextonBoost : Joint Appearance, Shape and Context Modeling for Multi-Class.

Image segmentation.

LOCUS: Learning Object Classes with Unsupervised Segmentation

Nonparametric Semantic Segmentation

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

Learning to Combine Bottom-Up and Top-Down Segmentation

Brief Review of Recognition + Context

Outline Background Motivation Proposed Model Experimental Results

Human-object interaction

“Traditional” image segmentation

Presentation transcript:

Holistic Scene Understanding Virginia Tech ECE /02/26 Stanislaw Antol

What Does It Mean? Computer vision parts extensively developed; less work done on their integration Potential benefit of different components compensating/helping other components

Outline Gaussian Mixture Models Conditional Random Fields Paper 1 Overview Paper 2 Overview My Experiment

4 Gaussian Mixture Where P(X | C i ) is the PDF of class j, evaluated at X, P( C j ) is the prior probability for class j, and P(X) is the overall PDF, evaluated at X. Slide credit: Kuei-Hsien Where w k is the weight of the k-th Gaussian G k and the weights sum to one. One such PDF model is produced for each class. Where M k is the mean of the Gaussian and V k is the covariance matrix of the Gaussian..

G1,w1 G2,w2 G3,w3 G4,w4 G5.w5 Class 1 Variables: μ i, V i, w k We use EM (estimate-maximize) algorithm to approximate this variables. One can use k-means to initialize. Composition of Gaussian Mixture Slide credit: Kuei-Hsien

Background on CRFs Figure from: “An Introduction to Conditional Random Fields” by C. Sutton and A. McCallum

Background on CRFs Figure from: “An Introduction to Conditional Random Fields” by C. Sutton and A. McCallum

Background on CRFs Equations from: “An Introduction to Conditional Random Fields” by C. Sutton and A. McCallum

Paper 1 “TextonBoost: Joint Appearance, Shape, and Context Modeling for Multi-class Object Recognition and Segmentation” – J. Shotton, J. Winn, C. Rother, and A. Criminisi

Introduction Simultaneous recognition and segmentation Simultaneous recognition and segmentation Explain every pixel (dense features) Explain every pixel (dense features) Appearance + shape + context Appearance + shape + context Class generalities + image specifics Class generalities + image specifics Contributions Contributions New low-level features New low-level features New texture-based discriminative model New texture-based discriminative model Efficiency and scalability Efficiency and scalability Example Results Slide credit: J. Shotton

Image Databases MSRC 21-Class Object Recognition Database – 591 hand-labelled images ( 45% train, 10% validation, 45% test ) Corel ( 7-class ) and Sowerby ( 7-class )[He et al. CVPR 04] Slide credit: J. Shotton

Sparse vs Dense Features Successes using sparse features, e.g. [Sivic et al. ICCV 2005], [Fergus et al. ICCV 2005], [Leibe et al. CVPR 2005] But… – do not explain whole image – cannot cope well with all object classes We use dense features – ‘shape filters’ – local texture-based image descriptions Cope with – textured and untextured objects, occlusions, whilst retaining high efficiency problem images for sparse features? Slide credit: J. Shotton

Textons Shape filters use texton maps [Varma & Zisserman IJCV 05] [Leung & Malik IJCV 01] Compact and efficient characterisation of local texture Texton map Colours  Texton Indices Input image  Clustering Filter Bank Slide credit: J. Shotton

Shape Filters Pair: Feature responses v(i, r, t) Large bounding boxes enable long range interactions Integral images rectangle rtexton t (, ) v(i 1, r, t) = a v(i 2, r, t) = 0 v(i 3, r, t) = a/2 appearance context up to 200 pixels Slide credit: J. Shotton

feature response image v(i, r 1, t 1 ) feature response image v(i, r 2, t 2 ) Shape as Texton Layout (, ) (r 1, t 1 ) = (, ) (r 2, t 2 ) = t1t1t1t1 t2t2t2t2 t3t3t3t3 t4t4t4t4 t0t0t0t0 texton mapground truth texton map Slide credit: J. Shotton

summed response images v(i, r 1, t 1 ) + v(i, r 2, t 2 ) Shape as Texton Layout (, ) (r 1, t 1 ) = (, ) (r 2, t 2 ) = t1t1t1t1 t2t2t2t2 t3t3t3t3 t4t4t4t4 t0t0t0t0 texton mapground truth texton map summed response images v(i, r 1, t 1 ) + v(i, r 2, t 2 ) texton map Slide credit: J. Shotton

Joint Boosting for Feature Selection test image 30 rounds2000 rounds1000 rounds inferred segmentation colour = most likely label confidence white = low confidence black = high confidence Using Joint Boost: [Torralba et al. CVPR 2004] Boosted classifier provides bulk segmentation/recognition only Edge accurate segmentation will be provided by CRF model Slide credit: J. Shotton

Accurate Segmentation? Boosted classifier alone – effectively recognises objects – but not sufficient for pixel- perfect segmentation Conditional Random Field (CRF) – jointly classifies all pixels whilst respecting image edges boosted classifier + CRF Slide credit: J. Shotton

Conditional Random Field Model Log conditional probability of Log conditional probability of class labels c given  image x and learned parameters  Slide credit: J. Shotton

Conditional Random Field Model shape-texture potentials jointly across all pixels Shape-texture potentials Shape-texture potentials broad intra-class appearance distribution broad intra-class appearance distribution log boosted classifier log boosted classifier parameters   learned offline parameters   learned offline Slide credit: J. Shotton

Conditional Random Field Model intra-class appearance variations colour potentials Colour potentials Colour potentials compact appearance distribution compact appearance distribution Gaussian mixture model Gaussian mixture model parameters   learned at test time parameters   learned at test time Slide credit: J. Shotton

Conditional Random Field Model Capture prior on absolute image location Capture prior on absolute image location location potentials treeskyroad Slide credit: J. Shotton

Conditional Random Field Model Potts model Potts model encourages neighbouring pixels to have same label encourages neighbouring pixels to have same label Contrast sensitivity Contrast sensitivity encourages segmentation to follow image edges encourages segmentation to follow image edges image edge map edge potentials sum over neighbouring pixels Slide credit: J. Shotton

Conditional Random Field Model partition function (normalises distribution) For details of potentials and learning, see paper Slide credit: J. Shotton

Find most probable labelling – maximizing CRF Inference shape-texturecolourlocation edge Slide credit: J. Shotton

Learning Slide credit: Daniel Munoz

Results on 21-Class Database building Slide credit: J. Shotton

Segmentation Accuracy Overall pixel-wise accuracy is 72.2% – ~15 times better than chance Confusion matrix: Slide credit: J. Shotton

Some Failures Slide credit: J. Shotton

Effect of Model Components Shape-texture potentials only:69.6% + edge potentials:70.3% + colour potentials:72.0% + location potentials:72.2% shape-texture + edge + colour & location pixel-wise segmentation accuracies Slide credit: J. Shotton

Comparison with [He et al. CVPR 04] Our example results: AccuracySpeed ( Train - Test ) SowerbyCorelSowerbyCorel Our CRF model88.6%74.6% 20 mins secs 30 mins secs He et al. mCRF89.5%80.0% 1 day - 30 secs Shape-texture potentials only85.6%68.4% He et al. unary classifier only82.4%66.9% Slide credit: J. Shotton

Paper 2 “Describing the Scene as a Whole: Joint Object Detection, Scene Classification, and Semantic Segmentation” – Jian Yao, Sanja Fidler, and Raquel Urtasun

Motivation Holistic scene understanding: – Object detection – Semantic segmentation – Scene classification Extends idea behind TextonBoost – Adds scene classification, object-scene compatibility, and more

Main idea Create a holistic CRF – General framework to easily allow additions – Utilize other work as components of CRF – Perform CRF, not on pixels, but segments and other higher-level values

Holistic CRF (HCRF) Model

HCRF Pre-cursors Use own scene classification, one-vs-all SVM classifier using SIFT, colorSIFT, RGB histograms, and color moment invariants, to produce scenes Use [5] for object detection (over- detection), b l Use [5] to help create object masks, μ s Use [20] at two different K 0 watershed threshold values to generate segments and super-segments, x i, y j, respectively

HCRF Connection of potentials and their HCRF

Segmentation Potentials TextonBoost averaging

Object Reasoning Potentials

Class Presence Potentials Chow-Liu algorithm Is class k in image?

Scene Potentials Their classification technique

Experimental Results

My (TextonBoost) Experiment Despite statement, HCRF code not available TextonBoost only partially available – Only code prior to CRF released – Expects a very rigid format/structure for images PASCAL VOC2007 wouldn’t run, even with changes MSRCv2 was able to run (actually what they used) – No results processing, just segmented images

My Experiment Run code on the (same) MSRCv2 dataset – Default parameters, except boosting rounds Wanted to look at effects up until 1000 rounds; compute up to 900 Limited time; only got output for values up to 300 Evaluate relationship between boosting rounds and segmentation accuracy

Experimental Advice Remember to compile in Release mode – Classification seems to be ~3 times faster – Training took 26 hours, maybe less if in Release Take advantage of multi-core CPU, if possible – Single-threaded program not utilizing much RAM, so started running two classifications together

Experimental Results

Thank you for your time. Any more questions?