Learning Spatial Context: Using stuff to find things Geremy Heitz Daphne Koller Stanford University October 13, 2008 ECCV 2008.

Slides:

Advertisements

Similar presentations

Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.

Advertisements

Combining Detectors for Human Hand Detection Antonio Hernández, Petia Radeva and Sergio Escalera Computer Vision Center, Universitat Autònoma de Barcelona,

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.

TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.

Ivan Laptev IRISA/INRIA, Rennes, France September 07, 2006 Boosted Histograms for Improved Object Detection.

Lecture 31: Modern object recognition

Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.

AdaBoost & Its Applications

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.

Face detection Many slides adapted from P. Viola.

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Large-Scale Object Recognition with Weak Supervision

Groups of Adjacent Contour Segments for Object Detection Vittorio Ferrari Loic Fevrier Frederic Jurie Cordelia Schmid.

More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.

The Viola/Jones Face Detector (2001)

Beyond bags of features: Part-based models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.

Quantifying and Transferring Contextual Information in Object Detection Professor: S. J. Wang Student : Y. S. Wang 1.

Training Regimes Motivation  Allow state-of-the-art subcomponents  With “Black-box” functionality  This idea also occurs in other application areas.

1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009.

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Scene Understanding through Transfer Learning Stephen Gould Ben Packer Geremy Heitz Daphne Koller DARPA Update September 11, 2008.

Object Detection using Histograms of Oriented Gradients

Lecture 17: Parts-based models and context CS6670: Computer Vision Noah Snavely.

CS 223B Assignment 1 Help Session Dan Maynes-Aminzade.

CONTEXT. Learned Satellite Clusters Results - Satellite Prior: Detector Only Posterior: TAS Model Region Labels.

Learning Spatial Context: Can stuff help us find things? Geremy Heitz Daphne Koller April 14, 2008 DAGS Stuff (n): Material defined by a homogeneous or.

What, Where & How Many? Combining Object Detectors and CRFs

Lecture 29: Recent work in recognition CS4670: Computer Vision Noah Snavely.

Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.

Generic object detection with deformable part-based models

Object Detection Sliding Window Based Approach Context Helps

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

A Statistically Selected Part-Based Probabilistic Model for Object Recognition Zhipeng Zhao, Ahmed Elgammal Department of Computer Science, Rutgers, The.

Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.

Face detection Slides adapted Grauman & Liebe’s tutorial

Visual Object Recognition

Learning Collections of Parts for Object Recognition and Transfer Learning University of Illinois at Urbana- Champaign.

Object Detection with Discriminatively Trained Part Based Models

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Pedestrian Detection and Localization

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

Putting Context into Vision Derek Hoiem September 15, 2004.

Project 3 Results.

Object detection, deep learning, and R-CNNs

Methods for classification and image representation

The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.

Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11.

Improved Object Detection

CS-498 Computer Vision Week 9, Class 2 and Week 10, Class 1

Extracting Simple Verb Frames from Images Toward Holistic Scene Understanding Prof. Daphne Koller Research Group Stanford University Geremy Heitz DARPA.

Object-Graphs for Context-Aware Category Discovery Yong Jae Lee and Kristen Grauman University of Texas at Austin 1.

Context Neelima Chavali ECE /21/2013. Roadmap Introduction Paper1 – Motivation – Problem statement – Approach – Experiments & Results Paper 2 Experiments.

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

More sliding window detection: Discriminative part-based models

Recent developments in object detection

Cascade for Fast Detection

Object detection with deformable part-based models

Lit part of blue dress and shadowed part of white dress are the same color

Digit Recognition using SVMS

Object detection as supervised classification

A Tutorial on HOG Human Detection

Object-Graphs for Context-Aware Category Discovery

Cascaded Classification Models

Lecture 29: Face Detection Revisited

Presentation transcript:

Learning Spatial Context: Using stuff to find things Geremy Heitz Daphne Koller Stanford University October 13, 2008 ECCV 2008

Things vs. Stuff Stuff (n): Material defined by a homogeneous or repetitive pattern of fine-scale properties, but has no specific or distinctive spatial extent or shape. Thing (n): An object with a specific size and shape. From: Forsyth et al. Finding pictures of objects in large collections of images. Object Representation in Computer Vision, 1996.

Finding Things Context is key!

Outline What is Context? The Things and Stuff (TAS) model Results

Satellite Detection Example D(W) = 0.8

Error Analysis Typically… We need to look outside the bounding box! False Positives are OUT OF CONTEXT True Positives are IN CONTEXT

Types of Context Scene-Thing: Stuff-Stuff: gist car “likely” keyboard “unlikely” Thing-Thing: [ Torralba et al., LNCS 2005 ] [ Gould et al., IJCV 2008 ] [ Rabinovich et al., ICCV 2007 ]

Types of Context Stuff-Thing: Based on spatial relationships Intuition: Trees = no cars Houses = cars nearby Road = cars here “Cars drive on roads” “Cows graze on grass” “Boats sail on water” Goal: Unsupervised

Outline What is Context? The Things and Stuff (TAS) model Results

Things Detection “candidates” Low detector threshold -> “over-detect” Each candidate has a detector score

Things Candidate detections Image Window W i + Score Boolean R.V. T i T i = 1: Candidate is a positive detection Thing model TiTi Image Window W i

Stuff Coherent image regions Coarse “superpixels” Feature vector F j in R n Cluster label S j in {1…C} Stuff model Naïve Bayes SjSj FjFj

Relationships Descriptive Relations “Near”, “Above”, “In front of”, etc. Choose set R = { r 1 …r K } R ijk =1: Detection i and region j have relation k Relationship model S 72 = Trees S 4 = Houses S 10 = Road T1T1 R ijk TiTi SjSj R 1,10,in =1

The TAS Model R ijk TiTi SjSj FjFj Image Window W i W i : Window T i : Object Presence S j : Region Label F j : Region Features R ijk : Relationship N J K Supervised in Training Set Always Observed Always Hidden

Unrolled Model T1T1 S1S1 S2S2 S3S3 S4S4 S5S5 T2T2 T3T3 R 2,1,above = 0 R 3,1,left = 1 R 1,3,near = 0 R 3,3,in = 1 R 1,1,left = 1 Candidate Windows Image Regions

Learning the Parameters Assume we know R S j is hidden Everything else observed Expectation-Maximization “Contextual clustering” Parameters are readily interpretable R ijk TiTi SjSj FjFj Image Window W i N J K Supervised in Training Set Always Observed Always Hidden

Learned Satellite Clusters

Which Relationships to Use? Rijk = spatial relationship between candidate i and region j Rij1 = candidate in region Rij2 = candidate closer than 2 bounding boxes (BBs) to region Rij3 = candidate closer than 4 BBs to region Rij4 = candidate farther than 8 BBs from region Rij5 = candidate 2BBs left of region Rij6 = candidate 2BBs right of region Rij7 = candidate 2BBs below region Rij8 = candidate more than 2 and less than 4 BBs from region … RijK = candidate near region boundary How do we avoid overfitting?

Learning the Relationships Intuition “Detached” R ijk = inactive relationship Structural EM iterates: Learn parameters Decide which edge to toggle Evaluate with l (T|F,W,R) Requires inference Better results than using standard E[ l (T,S,F,W,R)] R ij1 TiTi SjSj FjFj R ij2 R ijK

Inference Goal: Block Gibbs Sampling Easy to sample T i ’s given S j ’s and vice versa

Outline What is Context? The Things and Stuff (TAS) model Results

Base Detector - HOG [ Dalal & Triggs, CVPR, 2006 ] HOG Detector: Feature Vector XSVM Classifier

Results - Satellite Prior: Detector Only Posterior: Detections Posterior: Region Labels

Results - Satellite False Positives Per Image Recall Rate Base Detector TAS Model ~10% improvement in recall at 40 fppi

PASCAL VOC Challenge 2005 Challenge 2232 images split into {train, val, test} Cars, Bikes, People, and Motorbikes 2006 Challenge 5304 images plit into {train, test} 12 classes, we use Cows and Sheep

Base Detector Error Analysis Cows

Discovered Context - Bicycles Bicycles Cluster #3

TAS Results – Bicycles Examples Discover “true positives” Remove “false positives” BIKE ? ? ?

Results – VOC 2005

Results – VOC 2006

Conclusions Detectors can benefit from context The TAS model captures an important type of context We can improve any sliding window detector using TAS The TAS model can be interpreted and matches our intuitions We can learn which relationships to use

Merci!

Object Detection Task: Find the things Example: Find all the cars in this image Return a “bounding box” for each Evaluation: Maximize true positives Minimize false positives

Sliding Window Detection Consider every bounding box All shifts All scales Possibly all rotations Each such window gets a score: D(W) Detections: Local peaks in D(W) Pros: Covers the entire image Flexible to allow variety of D(W)’s Cons: Brute force – can be slow Only considers features in box D = 1.5 D = -0.3

Sliding Window Results PASCAL Visual Object Classes Challenge Cows 2006 score(A,B) > 0.5 TRUE POSITIVE score(A,B) ≤ 0.5 FALSE POSITIVE B A Recall(T) = TP / (TP + FN) Precision(T) = TP / (TP + FP) score(A,B) = |A∩B| / |AUB| D(W) > T

Quantitative Evaluation False Positives Per Image Recall Rate

Prior: Detector Only Posterior: TAS Model Region Labels Detections in Context Task: Identify all cars in the satellite image Idea: The surrounding context adds info to the local window detector + = Houses Road

Equations

Features: Haar wavelets Haar filters and integral image Viola and Jones, ICCV 2001 The average intensity in the block is computed with four sums independently of the block size. BOOSTING!

Features: Edge fragments Weak detector = Match of edge chain(s) from training image to edgemap of test image Opelt, Pinz, Zisserman, ECCV 2006 BOOSTING!

Histograms of oriented gradients Dalal & Trigs, 2006 SIFT, D. Lowe, ICCV 1999 SVM!