Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.

Slides:

Advertisements

Similar presentations

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Advertisements

Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.

Object class recognition using unsupervised scale-invariant learning Rob Fergus Pietro Perona Andrew Zisserman Oxford University California Institute of.

Clustering with k-means and mixture of Gaussian densities Jakob Verbeek December 3, 2010 Course website:

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Contour Based Approaches for Visual Object Recognition Jamie Shotton University of Cambridge Joint work with Roberto Cipolla, Andrew Blake.

Discriminative and generative methods for bags of features

Local Descriptors for Spatio-Temporal Recognition

Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

Lecture 28: Bag-of-words models

Generic Object Recognition -- by Yatharth Saraf A Project on.

Features-based Object Recognition Pierre Moreels California Institute of Technology Thesis defense, Sept. 24, 2007.

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.

Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.

Scale Invariant Feature Transform (SIFT)

Evaluation of features detectors and descriptors based on 3D objects P. Moreels - P. Perona California Institute of Technology.

Local Features and Kernels for Classification of Object Categories J. Zhang --- QMUL UK (INRIA till July 2005) with M. Marszalek and C. Schmid --- INRIA.

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.

Matthew Brown University of British Columbia (prev.) Microsoft Research [ Collaborators: † Simon Winder, *Gang Hua, † Rick Szeliski † =MS Research, *=MS.

Machine learning & category recognition Cordelia Schmid Jakob Verbeek.

Overview Introduction to local features

The Beauty of Local Invariant Features

Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.

Introduction to Machine Learning for Category Representation

Classification 2: discriminative models

Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)

Machine learning & category recognition Cordelia Schmid Jakob Verbeek.

EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.

SVCL Automatic detection of object based Region-of-Interest for image compression Sunhyoung Han.

Recognition and Matching based on local invariant features Cordelia Schmid INRIA, Grenoble David Lowe Univ. of British Columbia.

Overview Harris interest points Comparing interest points (SSD, ZNCC, SIFT) Scale & affine invariant interest points Evaluation and comparison of different.

Local invariant features Cordelia Schmid INRIA, Grenoble.

Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

A Statistically Selected Part-Based Probabilistic Model for Object Recognition Zhipeng Zhao, Ahmed Elgammal Department of Computer Science, Rutgers, The.

Learning Local Affine Representations for Texture and Object Recognition Svetlana Lazebnik Beckman Institute, University of Illinois at Urbana-Champaign.

Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.

Features-based Object Recognition P. Moreels, P. Perona California Institute of Technology.

Evaluation of interest points and descriptors. Introduction Quantitative evaluation of interest point detectors –points / regions at the same relative.

MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.

Bayesian Parameter Estimation Liad Serruya. Agenda Introduction Bayesian decision theory Scale-Invariant Learning Bayesian “One-Shot” Learning.

A Sparse Texture Representation Using Affine-Invariant Regions Svetlana Lazebnik, Jean Ponce Svetlana Lazebnik, Jean Ponce Beckman Institute University.

Local invariant features Cordelia Schmid INRIA, Grenoble.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Local invariant features Cordelia Schmid INRIA, Grenoble.

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

Overview Introduction to local features Harris interest points + SSD, ZNCC, SIFT Scale & affine invariant interest point detectors Evaluation and comparison.

Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011.

CS654: Digital Image Analysis

1 Shape Descriptors for Maximally Stable Extremal Regions Per-Erik Forss´en and David G. Lowe Department of Computer Science University of British Columbia.

Representing, Learning, and Recognizing Non-Rigid Textures and Texture Categories Svetlana LazebnikCordelia SchmidJean Ponce Beckman InstituteGravir LaboratoryBeckman.

Object Recognition by Parts

SIFT Scale-Invariant Feature Transform David Lowe

Paper Presentation: Shape and Matching

By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,

Object Recognition by Parts

Object Recognition by Parts

Object Recognition by Parts

SIFT keypoint detection

Recognition and Matching based on local invariant features

Object Recognition by Parts

Some slides: courtesy of David Jacobs

Object Recognition with Interest Operators

Presentation transcript:

Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce

Introduction Invariant local descriptors => robust recognition of specific objects or scenes Recognition of textures and object classes => description of intra-class variation, selection of discriminant features, spatial relations texture recognitioncar detection

1.An affine-invariant texture recognition (CVPR’03) 2.A two-layer architecture for texture segmentation and recognition (ICCV’03) 3.Feature selection for object class recognition (ICCV’03) 4.Building affine-invariant part models for recognition Overview

Affine-invariant texture recognition Texture recognition under viewpoint changes and non-rigid transformations Use of affine-invariant regions –invariance to viewpoint changes –spatial selection => more compact representation, reduction of redundancy in texton dictionary [A sparse texture representation using affine-invariant regions, S. Lazebnik, C. Schmid and J. Ponce, CVPR 2003]

Spatial selection clustering each pixelclustering selected pixels

Overview of the approach

Harris detector Laplace detector Region extraction

Descriptors – Spin images

Signature and EMD Hierarchical clustering => Signature : Earth movers distance –robust distance, optimizes the flow between distributions –can match signatures of different size –not sensitive to the number of clusters S S = { ( m 1, w 1 ), …, ( m k, w k ) } SS’ D( S, S’ ) = [  i,j f ij d( m i, m’ j )] / [  i,j f ij ]

Database with viewpoint changes 20 samples of 10 different textures

Results Spin images Gabor-like filters

1.An affine-invariant texture recognition (CVPR’03) 2.A two-layer architecture for texture segmentation and recognition (ICCV’03) 3.Feature selection for object class recognition (ICCV’03) 4.Building affine-invariant part models for recognition Overview

A two-layer architecture Texture recognition + segmentation Classification of individual regions + spatial layout [A generative architecture for semi-supervised texture recognition, S. Lazebnik, C. Schmid, J. Ponce, ICCV 2003]

A two-layer architecture Modeling : 1.Distribution of the local descriptors (affine invariants) Gaussian mixture model estimation with EM, allows incorporating unsegmented images 2.Co-occurrence statistics of sub-class labels over affinely adapted neighborhoods Segmentation + Recognition : 1.Generative model for initial class probabilities 2.Co-occurrence statistics + relaxation to improve labels

Texture Dataset – Training Images T1 (brick)T2 (carpet)T3 (chair)T4 (floor 1) T5 (floor 2) T6 (marble)T7 (wood)

Effect of relaxation + co-occurrence Original image Top: before relaxation (indivual regions), bottom: after relaxation (co-occurrence)

Recognition + Segmentation Examples

Animal Dataset – Training Images no manual segmentation, weakly supervised 10 training images per animal (with background) no purely negative images

Recognition + Segmentation Examples

1.An affine-invariant texture recognition (CVPR’03) 2.A two-layer architecture for texture segmentation and recognition (ICCV’03) 3.Feature selection for object class recognition (ICCV’03) 4.Building affine-invariant part models for recognition Overview

Object class detection/classification Description of intra-class variations of object parts [Selection of scale inv. regions for object class recognition, G. Dorko and C. Schmid, ICCV’03]

Object class detection/classification Description of intra-class variations of object parts Selection of discrimiant features (weakly supervised)

Training the model Training phase 1 –Input : Images of the object with background (positive images), no normalization, alignment of the image –Extraction of local descriptors : Harris-Laplace, Kadir-Brady, SIFT –Clustering : estimation of Gaussian mixture with EM

Training the model Training phase 1 –Input : Images of the object with background (positive images), no normalization, alignment of the image/object –Extraction of local descriptors : Harris-Laplace, Kadir-Brady, SIFT –Clustering : estimation of Gaussian mixture with EM

Training the model Training phase 2 (selection) –Input : verification set, positive and negative images –Rank each cluster with likelihood (or mutual information) –MAP classifier with the n top clusters

5 LikelihoodMutual Information 25 Likelihood – mutual information –likelihood: more discriminant but very specific –mutual Information: discriminant but not too specific

Results for test images Harris-Laplace 354 points49 correct + 37 incorrect31 correct + 20 incorrect 25 Likelihood10 Mutual InformationDetection Harris-Laplace 277 points43 correct + 36 incorrect26 correct + 20 incorrect

Relaxation – propagation of probablities

Classification Assign each test descriptor to the most probable cluster (MAP) Each descriptor assigned to one of the top n clusters is positive If the number of positive descriptors are above a threshold p classify the image as positive

Classification experiments AirplanesMotorbikes Wild Cats Training Phase 1 #Positive images Training Phase 2 #Positive images #Negative images450 Testing #Positive images #Negative images450 Training Verification Test Image Library

Results: Motorbikes Equal-Error-Rates as a function of p. Receiver-Operating-Characteristic p=6

BestEstimated pp=6Fergus p%p%% Airplanes Harris 897, Kadir Motorbikes Harris Kadir Wild Cats Harris Kadir Classification results: ROC equal error rates

1.An affine-invariant texture recognition (CVPR’03) 2.A two-layer architecture for texture segmentation and recognition (ICCV’03) 3.Feature selection for object class recognition (ICCV’03) 4.Building affine-invariant part models for recognition Overview

Matching collections of local affine-invariant regions that map with an affine transformation => part Matching works for unsegmented images Model = a collection of parts A Affine-invariant part models

Matching: Faces spurious match

Matching: 3D Objects closeup

Matching: Finding Repeated Patterns

Matching: Finding Symmetry

Modeling for Recognition Match multiple pairs of training images to produce several candidate parts. Use additional validation images to evaluate repeatability of parts and individual patches. Retain a fixed number of parts having the best repeatability score as class model. No background model

The Butterfly Dataset 16 training images (8 pairs) per class 10 validation images per class 437 test images 619 images total

Butterfly Models Top two rows: pairs of images used for modeling. Bottom two rows: closeup views of some of the parts making up the models of the seven butterfly classes.

Recognition Top 10 models per class used for recognition Multi-class classification results: total model size (smallest/largest)

Classification Rate vs. Number of Parts Number of parts

Successful Detection Examples Model part Yellow: detected in test image Blue: occluded in test image Test image: All ellipses Test image: Matched ellipses Note: only one of the two training images is shown

Successful Detection Examples (cont.)

Detection of Multiple Instances

Detection Failures

Future Work Spatial relation –non-rigid models –relations between clusters and affine-invariant parts Feature selection: dimensionality reduction Shape information: appropriate descriptors Rapid search: structuring of the data