Download presentation
1
Object recognition and scene “understanding”
Computer Vision, Part 2 Object recognition and scene “understanding”
2
What makes object recognition a hard task for computers?
3
HMAX Riesenhuber, M. & Poggio, T
HMAX Riesenhuber, M. & Poggio, T. (1999), “Hierarchical Models of Object Recognition in Cortex” Serre, T., Wolf, L., Bileschi, S., Risenhuber, M., and Poggio, T. (2006), “Robust Object Recognition with Cortex-Like Mechanisms” HMAX: A hierarchical neural-network model of object recognition. Meant to model human vision at level of “immediate recognition” capabilities of ventral visual pathway, independent of attention or other top-down processes. Also called “Standard Model” (because it incorporates the “standard model” of visual cortex) Inspired by earlier “Neocognitron” model of Fukushima (1980)
4
General ideas behind model
“Immediate” visual processing is feedforward and hierachical: low levels detect simple features, which are combined hierarchically into increasingly complex features to be detected Layers of hierarchy alternate between “sensitivity” (to detecting features) and “invariance” (to position, scale, orientation) Size of receptive fields increases along the hierarchy Degree of invariance increases along the hierarchy
5
The HMAX model for object recognition
(Riesenhuber, Poggio, Serre, et al.)
6
The HMAX model for object recognition
(Riesenhuber, Poggio, Serre, et al.) Image (gray-scale)
7
The HMAX model for object recognition
(Riesenhuber, Poggio, Serre, et al.) S1 layer Edge detectors Image (gray-scale)
8
The HMAX model for object recognition
(Riesenhuber, Poggio, Serre, et al.) Layers alternate between “specificity” and “invariance” over position, scale, orientation C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)
9
The HMAX model for object recognition
(Riesenhuber, Poggio, Serre, et al.) Layers alternate between “specificity” and “invariance” over position, scale, orientation S2 layer Prototypes (small image patches) C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)
10
The HMAX model for object recognition
(Riesenhuber, Poggio, Serre, et al.) C2 layer Max activation over each prototype Layers alternate between “specificity” and “invariance” over position, scale, orientation S2 layer Prototypes (small image patches) C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)
11
The HMAX model for object recognition
(Riesenhuber, Poggio, Serre, et al.) Classification layer Object or image classification C2 layer Max activation over each prototype Layers alternate between “specificity” and “invariance” over position, scale, orientation S2 layer Prototypes (small image patches) C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)
12
The HMAX model for object recognition
(Riesenhuber, Poggio, Serre, et al.) Classification layer Object or image classification C2 layer Max activation over each prototype Job of HMAX is to produce a higher-level representation of an image that will be useful for classification. Layers alternate between “specificity” and “invariance” over position, scale, orientation S2 layer Prototypes (small image patches) C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)
13
S1 layer Edge detectors 4 orientations, 16 scales Image (gray-scale)
14
One S1 receptive field: Etc.: 16 scales
15
Max activation over local S1 units (local position, scale)
C1 layer Max activation over local S1 units (local position, scale) 4 orientations, 8 scales MAX MAX S1 layer Edge detectors 4 orientations, 16 scales Image (gray-scale)
16
… S2 layer Calculate similarity to prototype (radial basis function)
4 orientations, 8 scales … C1 layer Max activation over local S1 units (local position, scale) 4 orientations, 8 scales S2 unit: Calculate similarity to prototype for each “pooled” position in C1 layer.
17
… Prototypes S2 layer Calculate similarity to prototype
(~1000, chosen from image collection, translated to C1 features) S2 layer Calculate similarity to prototype (radial basis function) 4 orientations, 8 scales … C1 layer Max activation over local S1 units (local position, scale) 4 orientations, 8 scales S2 unit: Calculate similarity to prototype for each “pooled” position in C1 layer.
18
… Prototypes S2 layer Calculate similarity to prototype
(~1000, chosen from image collection, translated to C1 features) S2 layer Calculate similarity to prototype (radial basis function) 4 orientations, 8 scales … Similarity: Radial basis function: C1 layer Max activation over local S1 units (local position, scale) 4 orientations, 8 scales S2 unit: Calculate similarity to prototype for each “pooled” position in C1 layer.
19
… … C2 layer Max activation over position, orientation, scale S2 layer
(1 value) MAX (1 value) S21 S22 … S2 layer Calculate similarity to prototype (radial basis function) 4 orientations, 8 scales …
20
… classification (e.g., dog / not dog) C2 layer
Support Vector Machine classification (e.g., dog / not dog) C2 layer Max over position, orientation, scale .11 .78 … .32
21
Streetscenes “scene understanding” system (Bileschi, 2006)
Use HMAX + SVM to identify object classes: Car, Pedestrian, Bicycle, Building, Tree
22
How Streetscenes Works (Bileschi, 2006)
1. Densely tile the image with windows of different sizes. 2. C1 and C2 features are computed in each window. 3. The features in each window are given as input to each of five trained support vector machines 4. If any return a classification with score above a learned threshold, that object is said to be “detected” . …
23
Object detection (here, “car”) with HMAX model (Bileschi, 2006)
24
Sample of results from HMAX model
(Serre et al., 2006)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.