Lecture 17: Parts-based models and context CS6670: Computer Vision Noah Snavely.

Slides:



Advertisements
Similar presentations
Presenter: Duan Tran (Part of slides are from Pedro’s)
Advertisements

Part Based Models Andrew Harp. Part Based Models Physical arrangement of features.
Face Alignment with Part-Based Modeling
1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.
Lecture 31: Modern object recognition
Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.
Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.
Face Detection, Pose Estimation, and Landmark Localization in the Wild
Multi-Local Feature Manifolds for Object Detection Oscar Danielsson Stefan Carlsson Josephine Sullivan
Automatic Identification of Ice Layers in Radar Echograms David Crandall, Jerome Mitchell, Geoffrey C. Fox School of Informatics and Computing Indiana.
Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.
Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA
More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.
In Search of Objects: 50 years of wondering : Learning-Based Methods in Vision A. Efros, CMU, Spring 2009.
Beyond bags of features: Part-based models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Statistical Recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Kristen Grauman.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Lecture 28: Bag-of-words models
Learning Spatial Context: Using stuff to find things Geremy Heitz Daphne Koller Stanford University October 13, 2008 ECCV 2008.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Object Recognition: History and Overview Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce.
Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.
Lecture 10: Cameras CS4670: Computer Vision Noah Snavely Source: S. Lazebnik.
CS 223B Assignment 1 Help Session Dan Maynes-Aminzade.
Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.
Visual Object Recognition Rob Fergus Courant Institute, New York University
Lecture 11: Stereo and optical flow CS6670: Computer Vision Noah Snavely.
Lecture 18: Context and segmentation CS6670: Computer Vision Noah Snavely.
Opportunities of Scale, Part 2 Computer Vision James Hays, Brown Many slides from James Hays, Alyosha Efros, and Derek Hoiem Graphic from Antonio Torralba.
Learning Spatial Context: Can stuff help us find things? Geremy Heitz Daphne Koller April 14, 2008 DAGS Stuff (n): Material defined by a homogeneous or.
Object Recognition by Discriminative Methods Sinisa Todorovic 1st Sino-USA Summer School in VLPR July, 2009.
Lecture 29: Recent work in recognition CS4670: Computer Vision Noah Snavely.
Generic object detection with deformable part-based models
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Efficient Algorithms for Matching Pedro Felzenszwalb Trevor Darrell Yann LeCun Alex Berg.
Layer-finding in Radar Echograms using Probabilistic Graphical Models David Crandall Geoffrey C. Fox School of Informatics and Computing Indiana University,
EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.
Object Detection Sliding Window Based Approach Context Helps
Multiscale Symmetric Part Detection and Grouping Alex Levinshtein, Sven Dickinson, University of Toronto and Cristian Sminchisescu, University of Bonn.
“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)
Recovering Surface Layout from a Single Image D. Hoiem, A.A. Efros, M. Hebert Robotics Institute, CMU Presenter: Derek Hoiem CS 598, Spring 2009 Jan 29,
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.
Face detection Slides adapted Grauman & Liebe’s tutorial
Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan.
Object Detection with Discriminatively Trained Part Based Models
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.
Efficient Matching of Pictorial Structures By Pedro Felzenszwalb and Daniel Huttenlocher Presented by John Winn.
Sampling for Part Based Object Models Daniel Huttenlocher September, 2006.
Project 3 Results.
Methods for classification and image representation
Discussion of Pictorial Structures Pedro Felzenszwalb Daniel Huttenlocher Sicily Workshop September, 2006.
CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.
Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11.
Recognition Using Visual Phrases
Lecture 15: Eigenfaces CS6670: Computer Vision Noah Snavely.
More sliding window detection: Discriminative part-based models
Object detection with deformable part-based models
Paper Presentation: Shape and Matching
Object detection as supervised classification
Lecture 25: Introduction to Recognition
CS 1674: Intro to Computer Vision Scene Recognition
“The Truth About Cats And Dogs”
Brief Review of Recognition + Context
Object DetectionII Ali Taalimi 01/08/2013.
SIFT keypoint detection
CS4670: Intro to Computer Vision
Lecture 29: Face Detection Revisited
Presentation transcript:

Lecture 17: Parts-based models and context CS6670: Computer Vision Noah Snavely

Announcements Project 3: Eigenfaces – due Wednesday, November 11 at 11:59pm – solo project

Object Bag of ‘words’

What about spatial info? ?

Problem with bag-of-words All have equal probability for bag-of-words methods Location information is important

Model: Parts and Structure

Deformable objects Images from D. Ramanan’s dataset 7

Deformable objects Images from Caltech-256 8

Part-based representation Objects are decomposed into parts and spatial relations among parts 9 Fischler and Elschlager ‘73

Pictorial structures Two components: – Appearance model How much does a given window look like a given part? – Spatial model How well do the parts match the expected shape?

Pictorial Structure Matching = Local part evidence + Global constraint m i (l i ): matching cost for part I d ij (l i,l j ): deformable cost for connected pairs of parts (v i,v j ): connection between part i and j 12

Part-based representation Tree model  Efficient inference by dynamic programming 14

Appearance model Each part has an associated appearance model – E.g., a reference patch, gradient histogram, etc. Left eyeRight eyeNose ChinRight mouthLeft mouth

Spatial model Each edge represents a spring with a certain relative offset, covariance

Matching on tree structure For each l 1, find best l 2 : Remove v 2, and repeat with smaller tree, until only a single part Complexity: O(nk 2 ): n parts, k locations per part 17

Putting it all together Left eyeRight eyeNose ChinRight mouthLeft mouth Marginal on Nose

Sample result on matching human 19

Sample result on matching human 20

Matching results

Learning the model parameters Easiest approach: supervised learning – Someone chooses the number and meaning of the parts, labels them in a bunch of training examples – Use this to learn the appearance and spatial models A lot of work has been done on unsupervised learning of these models

Some learned object models

Part-based representation K-fans model (D.Crandall, et.all, 2005) 24

How much does shape help? Crandall, Felzenszwalb, Huttenlocher CVPR’05 Shape variance decreases with increasing model complexity Do get some benefit from shape

3-minute break

Context: thinking outside the (bounding) box Slides courtesy Alyosha Efros © Oliva & Torralba

Eye of the Beholder Claude Monet Gare St.Lazare Paris, 1877

Eye of the Beholder where did it go?

Seeing less than you think…

“The Miserable Life of a Person Detector”

What the Detector Sees

What the Detector Does True Detection True Detections Missed False Detections Local Detector: [Dalal-Triggs 2005]

If we have 1000 categories (detectors), and each detector produces 1 FP every 10 images, we will have 100 false alarms per image… pretty much garbage… with hundreds of categories… Slide by Antonio Torralba

We know there is a keyboard present in this scene even if we cannot see it clearly. We know there is no keyboard present in this scene … even if there is one indeed. Slide by Antonio Torralba Context to the rescue!

When is context helpful? Distance Information Local features Contextual features Slide by Antonio Torralba

Is it just for small / blurry things? Slide by Antonio Torralba

Is it just for small / blurry things? Slide by Antonio Torralba

Is it just for small / blurry things? Slide by Antonio Torralba

Context is hard to fight! Thanks to Paul Viola for showing me these

more “Look-alikes”

1 Don’t even need to see the object Slide by Antonio Torralba

Don’t even need to see the object Chance ~ 1/30000 Slide by Antonio Torralba

The influence of an object extends beyond its physical boundaries But object can say a lot about the scene