Download presentation
Presentation is loading. Please wait.
1
Generic Object Recognition -- by Yatharth Saraf A Project on
2
Problem Definition and Background Recognizing generic class or category of a given object as opposed to recognizing specific, individual objects Recognizing generic class or category of a given object as opposed to recognizing specific, individual objects humans are much better at generic recognition, machines are more competitive at specific object recognition humans are much better at generic recognition, machines are more competitive at specific object recognition Early work by Marr led to the ‘reconstruction school’ Early work by Marr led to the ‘reconstruction school’ advocates 3-D reconstruction and modeling before further reasoning of a scene advocates 3-D reconstruction and modeling before further reasoning of a scene Current work in object categorization tends to fall in the ‘recognition school’ Current work in object categorization tends to fall in the ‘recognition school’ work in the 2-D domain, with 2-D image features and descriptors work in the 2-D domain, with 2-D image features and descriptors e.g. Bag of features approaches, spatial 2-D geometry approaches as in the ‘constellation model’ e.g. Bag of features approaches, spatial 2-D geometry approaches as in the ‘constellation model’
3
Applications Image database annotation and retrieval Image database annotation and retrieval Video surveillance Video surveillance Driver assistance, autonomous robots Driver assistance, autonomous robots Cognitive support for disabled people Cognitive support for disabled people
4
Related Work Discriminative approaches Discriminative approaches SVM, subspace methods SVM, subspace methods Bag of features Bag of features Representation of objects with point descriptors Representation of objects with point descriptors Constellation model Constellation model Representations that take into account spatial geometry (2-D) of key points Representations that take into account spatial geometry (2-D) of key points
5
Assumptions Images are scale-normalized Images are scale-normalized Images are clean, i.e. no background clutter/occlusion Images are clean, i.e. no background clutter/occlusion (-) Implies segmentation is necessary as a pre-processing step (-) Implies segmentation is necessary as a pre-processing step (+) Avoids the problem of exponential search (+) Avoids the problem of exponential search
6
Outline of the Method (Training) Detect salient regions in all training images using Kadir-Brady feature detector Detect salient regions in all training images using Kadir-Brady feature detector Extract X,Y coordinates, scale and 11x11 intensity patches around detected features Extract X,Y coordinates, scale and 11x11 intensity patches around detected features Reduce dimensionality of appearance patches from 121 to 16 using PCA Reduce dimensionality of appearance patches from 121 to 16 using PCA Estimate model parameters Estimate model parameters A single full Gaussian for location; one Gaussian per part A single full Gaussian for location; one Gaussian per part
7
Outline of the Method (Testing) Extract features of test images in the same manner as in training phase Extract features of test images in the same manner as in training phase Use the learnt model to estimate probability of detection Use the learnt model to estimate probability of detection Use Bayes’ Decision Rule to classify Use Bayes’ Decision Rule to classify
8
Experiments Careful tweaking of detector parameters needed Careful tweaking of detector parameters needed A single set of parameter settings may not be suitable for all categories A single set of parameter settings may not be suitable for all categories
9
Starting scale: 23Starting scale: 3
10
Experiments (contd.) 47 clean motorbike images used for training motorbike model 47 clean motorbike images used for training motorbike model Sorting the extracted patches by X- coordinate helped (as opposed to sorting by saliency) Sorting the extracted patches by X- coordinate helped (as opposed to sorting by saliency) Appearance model not doing as well Appearance model not doing as well
11
9 test images used (1-4 motorbikes, 5-7 cars, 8-9 faces)
12
Features sorted by saliency. Features sorted by X-coordinate. Log-probabilities of the 9 test images from location model Image 5Image 9
13
Appearance log-probabilities of the 9 test images Total log-probabilities of the 9 test images Features sorted by saliency.Features sorted by X-coordinate.
14
Experiments (contd.) Using a Mixture of Gaussians for the appearances of parts didn’t make too much difference Using a Mixture of Gaussians for the appearances of parts didn’t make too much difference 3 mixture components per part (EM initialized with k-means and sample covariances)
15
Experiments (contd.) Levenshtein distances on the appearance patches worked quite nicely Levenshtein distances on the appearance patches worked quite nicely Each appearance patch is a single character Matching cost was computed using a straight SSD Cost of inserting a gap = matching cost of the patch with a canonical 11x11 patch having uniform intensity of 128.
16
Conclusions and Future Work Strong dependence on feature detector Strong dependence on feature detector Appearance model doesn’t seem to be working too well Appearance model doesn’t seem to be working too well Levenshtein distances could be more promising Levenshtein distances could be more promising Experiments with more clean training and test data, multiple categories Experiments with more clean training and test data, multiple categories Exponential search for dealing with clutter and occlusion Exponential search for dealing with clutter and occlusion
17
Questions? -- Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.