Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Slides:



Advertisements
Similar presentations
The Layout Consistent Random Field for detecting and segmenting occluded objects CVPR, June 2006 John Winn Jamie Shotton.
Advertisements

A Bayesian Approach to Recognition Moshe Blank Ita Lifshitz Reverend Thomas Bayes
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Part 4: Combined segmentation and recognition by Rob Fergus (MIT)
References 2. Parts and Structure. [Agarwal02] S. Agarwal and D. Roth. Learning a sparse representation for object detection. In Proceedings of the 7th.
Fitting: The Hough transform. Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not.
Object class recognition using unsupervised scale-invariant learning Rob Fergus Pietro Perona Andrew Zisserman Oxford University California Institute of.
Part 1: Bag-of-words models by Li Fei-Fei (Princeton)
1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.
Recognition by Probabilistic Hypothesis Construction P. Moreels, M. Maire, P. Perona California Institute of Technology.
LOCUS (Learning Object Classes with Unsupervised Segmentation) A variational approach to learning model- based segmentation. John Winn Microsoft Research.
Contour Based Approaches for Visual Object Recognition Jamie Shotton University of Cambridge Joint work with Roberto Cipolla, Andrew Blake.
Pedestrian Detection in Crowded Scenes Dhruv Batra ECE CMU.
In Search of Objects: 50 years of wondering : Learning-Based Methods in Vision A. Efros, CMU, Spring 2009.
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Model: Parts and Structure. History of Idea Fischler & Elschlager 1973 Yuille ‘91 Brunelli & Poggio ‘93 Lades, v.d. Malsburg et al. ‘93 Cootes, Lanitis,
Beyond bags of features: Part-based models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Fitting: The Hough transform
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Part 2: part-based models by Rob Fergus (MIT). Problem with bag-of-words All have equal probability for bag-of-words methods Location information is important.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Object Recognition: History and Overview Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce.
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
Transferring information using Bayesian priors on object categories Li Fei-Fei 1, Rob Fergus 2, Pietro Perona 1 1 California Institute of Technology, 2.
Object recognition Jana Kosecka Slides from D. Lowe, D. Forsythe and J. Ponce book, ICCV 2005 Tutorial Fei-Fei Li, Rob Fergus and A. Torralba.
CS294‐43: Visual Object and Activity Recognition Prof. Trevor Darrell Spring 2009 March 17 th, 2009.
Distinctive Image Feature from Scale-Invariant KeyPoints
Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)
Object class recognition using unsupervised scale-invariant learning Rob Fergus Pietro Perona Andrew Zisserman Oxford University California Institute of.
Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.
Visual Object Recognition Rob Fergus Courant Institute, New York University
Object Class Recognition by Unsupervised Scale-Invariant Learning R. Fergus, P. Perona, and A. Zisserman Presented By Jeff.
Fitting: The Hough transform
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
1 Outline Overview Integrating Vision Models CCM: Cascaded Classification Models Learning Spatial Context TAS: Things and Stuff Descriptive Querying of.
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
The Beauty of Local Invariant Features
Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.
Part 2: part-based models
Fitting: The Hough transform. Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
A Statistically Selected Part-Based Probabilistic Model for Object Recognition Zhipeng Zhao, Ahmed Elgammal Department of Computer Science, Rutgers, The.
Category Discovery from the Web slide credit Fei-Fei et. al.
Classical Methods for Object Recognition Rob Fergus (NYU)
MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.
Fitting: The Hough transform
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Grouplet: A Structured Image Representation for Recognizing Human and Object Interactions Bangpeng Yao and Li Fei-Fei Computer Science Department, Stanford.
O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.
Discussion of Pictorial Structures Pedro Felzenszwalb Daniel Huttenlocher Sicily Workshop September, 2006.
CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.
Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11.
Lecture 08 27/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT Recognizing and Learning Object Categories ICCV 2005 Beijing, Short Course, Oct 15.
Presented by David Lee 3/20/2006
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
Part 2: part-based models by Rob Fergus (MIT). Problem with bag-of-words All have equal probability for bag-of-words methods Location information is important.
Object Recognition by Parts
Object Recognition by Parts
Object detection as supervised classification
Finding Clusters within a Class to Improve Classification Accuracy
Object Recognition by Parts
Object Recognition by Parts
Brief Review of Recognition + Context
Unsupervised Learning of Models for Recognition
Unsupervised learning of models for recognition
Object Recognition by Parts
Object Recognition with Interest Operators
Presentation transcript:

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based image retrieval Datasets & Conclusions

Model: Parts and Structure

Representation Object as set of parts –Generative representation Model: –Relative locations between parts –Appearance of part Issues: –How to model location –How to represent appearance –Sparse or dense (pixels or regions) –How to handle occlusion/clutter Figure from [Fischler & Elschlager 73]

The correspondence problem Model with P parts Image with N possible assignments for each part Consider mapping to be 1-1 N P combinations!!! Model Image

1 – 1 mapping –Each part assigned to unique feature As opposed to: 1 – Many Bag of words approaches –Sudderth, Torralba, Freeman ’05 –Loeff, Sorokin, Arora and Forsyth ‘05 The correspondence problem Conditional Random Field - Quattoni, Collins and Darrell, 04

History of Parts and Structure approaches Fischler & Elschlager 1973 Yuille ‘91 Brunelli & Poggio ‘93 Lades, v.d. Malsburg et al. ‘93 Cootes, Lanitis, Taylor et al. ‘95 Amit & Geman ‘95, ‘99 Perona et al. ‘95, ‘96, ’98, ’00, ’03, ‘04, ‘05 Felzenszwalb & Huttenlocher ’00, ’04, ‘08 Crandall & Huttenlocher ’05, ’06 Leibe & Schiele ’03, ’04 Many papers since 2000

Sparse representation + Computationally tractable (10 5 pixels  parts) + Generative representation of class + Avoid modeling global variability + Success in specific object recognition - Throw away most image information - Parts need to be distinctive to separate from other classes

Connectivity of parts Complexity is given by size of maximal clique in graph Consider a 3 part model –Each part has set of N possible locations in image –Location of parts 2 & 3 is independent, given location of L –Each part has an appearance term, independent between parts. L 32 Shape Model S(L,2)S(L,3)A(L)A(2)A(3) L32 Variables Factors ShapeAppearance Factor graph S(L)

from Sparse Flexible Models of Local Features Gustavo Carneiro and David Lowe, ECCV 2006 Different connectivity structures O(N 6 )O(N 2 )O(N 3 ) O(N 2 ) Fergus et al. ’03 Fei-Fei et al. ‘03 Crandall et al. ‘05 Fergus et al. ’05 Crandall et al. ‘05 Felzenszwalb & Huttenlocher ‘00 Bouchard & Triggs ‘05Carneiro & Lowe ‘06 Csurka ’04 Vasconcelos ‘00

How much does shape help? Crandall, Felzenszwalb, Huttenlocher CVPR’05 Shape variance increases with increasing model complexity Do get some benefit from shape

Some class-specific graphs Articulated motion –People –Animals Special parameterisations –Limb angles Images from [Kumar, Torr and Zisserman 05, Felzenszwalb & Huttenlocher 05]

Hierarchical representations Pixels  Pixel groupings  Parts  Object Images from [Amit98,Bouchard05,Felzenszwalb’08] Multi-scale approach increases number of low- level features Amit and Geman ‘98 Bouchard & Triggs ’05 Felzenszwalb, McAllester & Ramanan ‘08

Stochastic Grammar of Images S.C. Zhu et al. and D. Mumford

animal head instantiated by tiger head animal head instantiated by bear head e.g. discontinuities, gradient e.g. linelets, curvelets, T- junctions e.g. contours, intermediate objects e.g. animals, trees, rocks Context and Hierarchy in a Probabilistic Image Model Jin & Geman (2006)

How to model location? Explicit: Probability density functions Implicit: Voting scheme Invariance –Translation –Scaling –Similarity/affine –Viewpoint Similarity transformation Translation and Scaling Translation Affine transformation

Cartesian –E.g. Gaussian distribution –Parameters of model,  and  –Independence corresponds to zeros in  –Burl et al. ’96, Weber et al. ‘00, Fergus et al. ’03 Polar –Convenient for invariance to rotation Explicit shape model Mikolajczyk et al., CVPR ‘06

Implicit shape model Spatial occurrence distributions x y s x y s x y s x y s Probabilistic Voting Interest Points Matched Codebook Entries Recognition Learning Learn appearance codebook –Cluster over interest points on training images Learn spatial distributions –Match codebook to training images –Record matching positions on object –Centroid is given Use Hough space voting to find object Leibe and Schiele ’03,’05

Multiple view points Thomas, Ferrari, Leibe, Tuytelaars, Schiele, and L. Van Gool. Towards Multi-View Object Class Detection, CVPR 06 Hoiem, Rother, Winn, 3D LayoutCRF for Multi-View Object Class Recognition and Segmentation, CVPR ‘07

Appearance representation Decision trees Figure from Winn & Shotton, CVPR ‘06 SIFT HoG detectors [Lepetit and Fua CVPR 2005]

Region operators –Local maxima of interest operator function –Can give scale/orientation invariance Figures from [Kadir, Zisserman and Brady 04]

Occlusion Explicit –Additional match of each part to missing state Implicit –Truncated minimum probability of appearance Log probability Appearance space µ part

Background clutter Explicit model –Generative model for clutter as well as foreground object Use a sub-window –At correct position, no clutter is present

Efficient search methods Interpretation tree (Grimson ’87) –Condition on assigned parts to give search regions for remaining ones –Branch & bound, A*

Distance transforms –O(N 2 P)  O(NP) for tree structured models –Potentials must take certain form, e.g. Gaussian Permits exhaustive search for each parts location –No need for feature detectors in recognition Felzenszwalb and Huttenlocher ’00 & ’05

Demo Web Page

Parts and Structure models Summary Correspondence problem Efficient methods for large # parts and # positions in image Challenge to get representation with desired invariance Future directions: Multiple views Approaches to learning Multiple category training

References 2. Parts and Structure

[Agarwal02] S. Agarwal and D. Roth. Learning a sparse representation for object detection. In Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, pages , [Agarwal_Dataset] Agarwal, S. and Awan, A. and Roth, D. UIUC Car dataset. ~cogcomp/Data/Car, [Amit98] Y. Amit and D. Geman. A computational model for visual selection. Neural Computation, 11(7): , [Amit97] Y. Amit, D. Geman, and K. Wilder. Joint induction of shape features and tree classi- ers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(11): , [Amores05] J. Amores, N. Sebe, and P. Radeva. Fast spatial pattern discovery integrating boosting with constellations of contextual discriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, volume 2, pages , [Bar-Hillel05] A. Bar-Hillel, T. Hertz, and D. Weinshall. Object class recognition by boosting a part based model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, volume 1, pages , [Barnard03] K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. Blei, and M. Jordan. Matching words and pictures. JMLR, 3: , February [Berg05] A. Berg, T. Berg, and J. Malik. Shape matching and object recognition using low distortion correspondence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, volume 1, pages 26-33, June [Biederman87] I. Biederman. Recognition-by-components: A theory of human image understanding. Psychological Review, 94: , [Biederman95] I. Biederman. An Invitation to Cognitive Science, Vol. 2: Visual Cognition, volume 2, chapter Visual Object Recognition, pages MIT Press, 1995.

[Blei03] D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3: , January [Borenstein02] E. Borenstein. and S. Ullman. Class-specic, top-down segmentation. In Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, pages , [Burl96] M. Burl and P. Perona. Recognition of planar object classes. In Proc. Computer Vision and Pattern Recognition, pages , [Burl96a] M. Burl, M. Weber, and P. Perona. A probabilistic approach to object recognition using local photometry and global geometry. In Proc. European Conference on Computer Vision, pages , [Burl98] M. Burl, M. Weber, and P. Perona. A probabilistic approach to object recognition using local photometry and global geometry. In Proceedings of the European Conference on Computer Vision, pages , [Burl95] M.C. Burl, T.K. Leung, and P. Perona. Face localization via shape statistics. In Int. Workshop on Automatic Face and Gesture Recognition, [Canny86] J. F. Canny. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6): , [Crandall05] D. Crandall, P. Felzenszwalb, and D. Huttenlocher. Spatial priors for part-based recognition using statistical models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, volume 1, pages 10-17, [Csurka04] G. Csurka, C. Bray, C. Dance, and L. Fan. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV, pages 1-22, [Dalal05] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, pages , [Dempster76] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. JRSS B, 39:1-38, [Dorko04] G. Dorko and C. Schmid. Object class recognition using discriminative local features. IEEE Transactions on Pattern Analysis and Machine Intelligence, Review(Submitted), 2004.

[FeiFei03] L. Fei-Fei, R. Fergus, and P. Perona. A Bayesian approach to unsupervised one-shot learning of object categories. In Proceedings of the 9th International Conference on Computer Vision, Nice, France, pages , October [FeiFei04] L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In Workshop on Generative-Model Based Vision, [FeiFei05] L. Fei-Fei and P. Perona. A Bayesian hierarchical model for learning natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, volume 2, pages , June [Felzenszwalb00] P. Felzenszwalb and D. Huttenlocher. Pictorial structures for object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages , [Felzenszwalb05] P. Felzenszwalb and D. Huttenlocher. Pictorial structures for object recognition. International Journal of Computer Vision, 61:55-79, January [Fergus_Datasets] R. Fergus and P. Perona. Caltech Object Category datasets. caltech.edu/html-files/archive.html, [Fergus03] R. Fergus, P. Perona, and P. Zisserman. Object class recognition by unsupervised scaleinvariant learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages , [Fergus04] R. Fergus, P. Perona, and A. Zisserman. A visual category lter for google images. In Proceedings of the 8th European Conference on Computer Vision, Prague, Czech Republic, pages Springer-Verlag, May [Fergus05 R. Fergus, P. Perona, and A. Zisserman. A sparse object category model for ecient learning and exhaustive recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, volume 1, pages , [Fergus_Technote] R. Fergus, M. Weber, and P. Perona. Ecient methods for object recognition using the constellation model. Technical report, California Institute of Technology, [Fischler73] M.A. Fischler and R.A. Elschlager. The representation and matching of pictorial structures. IEEE Transactions on Computer, c-22(1):67-92, Jan

[Grimson87] W. E. L. Grimson and T. Lozano-Perez. Localizing overlapping parts by searching the interpretation tree. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(4): , [Harris98] C. J. Harris and M. Stephens. A combined corner and edge detector. In Proceedings of the 4th Alvey Vision Conference, Manchester, pages , [Hart68] P.E. Hart, N.J. Nilsson, and B. Raphael. A formal basis for the determination of minimum cost paths. IEEE Transactions on SSC, 4: , [Helmer04] S. Helmer and D. Lowe. Object recognition with many local features. In Workshop on Generative Model Based Vision 2004 (GMBV), Washington, D.C., July [Hofmann99] T. Hofmann. Probabilistic latent semantic indexing. In SIGIR '99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 15-19, 1999, Berkeley, CA, USA, pages ACM, [Holub05] A. Holub and P. Perona. A discriminative framework for modeling object classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, volume 1, pages , [Kadir01] T. Kadir and M. Brady. Scale, saliency and image description. International Journal of Computer Vision, 45(2):83-105, [Kadir_Code] T. Kadir and M. Brady. Scale Scaliency Operator. ~timork/salscale.html, [Kumar05] M. P. Kumar, P. H. S. Torr, and A. Zisserman. Obj cut. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Diego, pages 18-25, [Leibe04] B. Leibe, A. Leonardis, and B. Schiele. Combined object categorization and segmentation with an implicit shape model. In Workshop on Statistical Learning in Computer Vision, ECCV, [Leung98] T. Leung and J. Malik. Contour continuity and region based image segmentation. In Proceedings of the 5th European Conference on Computer Vision, Freiburg, Germany, LNCS 1406, pages Springer-Verlag, [Leung95] T.K. Leung, M.C. Burl, and P. Perona. Finding faces in cluttered scenes using random labeled graph matching. Proceedings of the 5th International Conference on Computer Vision, Boston, pages , June 1995.

[Leung98] T.K. Leung, M.C. Burl, and P. Perona. Probabilistic ane invariants for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages , [Lindeberg98] T. Lindeberg. Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2):77-116, [Lowe99] D. Lowe. Object recognition from local scale-invariant features. In Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, pages , September [Lowe01] D. Lowe. Local feature view clustering for 3D object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii, pages Springer, December [Lowe04] D. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91-110, [Mardia89] K.V. Mardia and I.L. Dryden. \Shape Distributions for Landmark Data". Advances in Applied Probability, 21: , [Sivic05] J. Sivic, B. Russell, A. Efros, A. Zisserman, and W. Freeman. Discovering object categories in image collections. Technical Report A. I. Memo , Massachusetts Institute of Technology, [Sivic03] J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision, pages , October [Sudderth05] E. Sudderth, A. Torralba, W. Freeman, and A. Willsky. Learning hierarchical models of scenes, objects, and parts. In Proceedings of the IEEE International Conference on Computer Vision, Beijing, page To appear, [Torralba04] A. Torralba, K. P. Murphy, and W. T. Freeman. Sharing features: ecient boosting procedures for multiclass object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, pages , 2004.

[Viola01] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 511{518, [Weber00] M.Weber. Unsupervised Learning of Models for Object Recognition. PhD thesis, California Institute of Technology, Pasadena, CA, [Weber00a] M. Weber, W. Einhauser, M. Welling, and P. Perona. Viewpoint-invariant learning and detection of human heads. In Proc. 4th IEEE Int. Conf. Autom. Face and Gesture Recog., FG2000, pages 20{27, March [Weber00b] M. Weber, M. Welling, and P. Perona. Towards automatic discovery of object categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2101{2108, June [Weber00c] M. Weber, M. Welling, and P. Perona. Unsupervised learning of models for recognition. In Proc. 6th Europ. Conf. Comp. Vis., ECCV2000, volume 1, pages 18{32, June [Welling05] M. Welling. An expectation maximization algorithm for inferring oset-normal shape distributions. In Tenth International Workshop on Articial Intelligence and Statistics, [Winn05] J. Winn and N. Joijic. Locus: Learning object classes with unsupervised segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Beijing, page To appear, 2005