Presentation is loading. Please wait.

Presentation is loading. Please wait.

Robust Object Recognition with Cortex-Like Mechanisms Thomas Serre, Lior Wolf, Stanley Bileshi, Maximilian Riesenhuber, and Tomaso Poggio, Member, IEEE.

Similar presentations


Presentation on theme: "Robust Object Recognition with Cortex-Like Mechanisms Thomas Serre, Lior Wolf, Stanley Bileshi, Maximilian Riesenhuber, and Tomaso Poggio, Member, IEEE."— Presentation transcript:

1 Robust Object Recognition with Cortex-Like Mechanisms Thomas Serre, Lior Wolf, Stanley Bileshi, Maximilian Riesenhuber, and Tomaso Poggio, Member, IEEE IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 3, MARCH 2007

2 Tomaso Poggio Eugene McDermott Professor in the Brain Science and Human Behavior Thomas Serre In 2005, the PhD degree in neuroscience from the MIT. His main research focuses on object recognition with both brains and machines.

3 outline Introduction Related work The Standard Model of Visual Cortex The selection of feature Detailed implement Empirical evaluation Object Recognition in Clutter Object Recognition without Clutter Object Recognition of Texture-Based Objects Toward a Full system for scene understanding

4 Introduction We present a system that is based on a quantitative theory of the ventral stream of visual cortex. A key element in the approach is a new set of scale and position-tolerant feature detectors, which agree quantitatively with the tuning properties of cell along the ventral stream of visual cortex.

5 Related work: The Standard Model of Visual Cortex Object recognition in cortex thought to be mediated by the ventral visual pathway. Neurally interconnected: => retina, => Lateral Geniculate Nucleus Nucleus (LGN) of the thalamus to primary visual cortex(V1) and extrastriate visual areas, => V2, => V4 => IT => prefrontal cortex(PFC) linking perception to memory and action

6 Related work: The Standard Model of Visual Cortex Our system follows a recent theory of the feedforward path of object recognition in cortex that accounts for the first 100-200 milliseconds of processing.

7 Related work: The Standard Model of Visual Cortex A core of well-accepted facts about the ventral stream in the visual cortex 1) Visual processing is hierarchical, aiming to build invariance to position and scale first and then to viewpoint and other transformations. 2) Along the hierarchy, the receptive fields of the neurons (i.e., the part of the visual field that could potentially elicit a response from the neuron) as well as the complexity of their optimal stimuli (i.e., the set of stimuli that elicit a response of the neuron) increases.

8 Related work: The Standard Model of Visual Cortex 3) The initial processing of information is feedforward (for immediate recognition tasks, i.e., when the image presentation is rapid and there is no time for eye movements or shifts of attention). 4) Plasticity and learning probably occurs at all stages and certainly at the level of inferotemporal(IT) cortex and prefrontal cortex(PFC), the top-most layers of the hierarchy.

9 Related work: The Standard Model of Visual Cortex

10 Related work: Feature selection appearance-based patch of an image very selective for a target shape. but lack invariance with respect to object transformations. invariance selectivity Trade-off

11 Related work: Feature selection histogram-based descriptor very robust with respect to object transformations, Most popular features :SIFT features It excels in the redetection of a previously seen object under new image transformations. It is very unlikely that these features could perform well on a generic object recognition task. The new appearance-based feature descriptors described here exhibit a balanced trade-off between invariance and selectivity.

12 Detailed implementation Along the hierarchy, from V1 to IT, two functional stages are interleaved: Simple (S) units build an increasingly complex and specific representation by combining the response of several subunits with different selectivity with TUNING operation. Complex (C) units build an increasingly invariant representation (to position and scale) by combing the response of several subunits with the same selectivity but at slightly different position and scales with a MAX-like operation.

13 Detailed implementation

14

15 By interleaving these two operation, an increasingly complex and invariant representation is built. Two routes: Main route follows the hierarchy of cortical stages strickly. Bypass route skip some of the stages Bypass routes may help provide q richer vocabulary of shape-tuned units with different levels of complexity and invariance.

16 Detailed implementation S 1 units: Correspond to the classical simple cells of Hubel and Wiesel found in the primary visual cortex (V1) S 1 units take the form of Gabor functions The aspect ratio: The orientation: The effective width: The wavelength:

17 Detailed implementation 136 different types of S1 units: (2 phases x 4 orientation x 17 sizes) Each portion of the visual field is analyzed by a full set of unit types. 17 spatial frequencies(=scakes) 4 orientations

18 Detailed implementation Perform TUNING operation between the incoming pattern of input x and there weight vector w. The response of a S 1 unit is maximal when x matches w exactly.

19 Contains a set of units all with the same selectivities. Each portion of the visual field is analyzed by a macro-column which contains all types of mini-columns.

20 Detailed implementation C 1 units: Corresponds to cortical complex cell which show some tolerance to shift and size. Each of the complex C 1 unit receives the outputs of a group of simple S 1 units from the first layer with the same preferred orientation but at slightly different positions and sizes. The operation by which the S 1 unit responses are combined at the C 1 level is a nonlinear MAX-like operation.

21 Detailed implementation

22 This process is done for each of the four orientations and each scale band independently.

23 Detailed implementation For instance, The first band: S=1. two S 1 maps: the one obtrained using a filter of size 7x7 and 9x9. For each orientation,the C 1 unit responses are computed by subsampling these maps using N s xN s =8x8. One single measurement is obtained by taking the maximum of all 64 elements. As a last stage, we take a max over the two scales from within the same spatial neighborhood.

24 Detailed implementation S 2 unit: A TURNING operation is taken over C 1 units at different preferred orientations to increase the complexity of the optimal stimulus. S 2 level units becomes selective to more complex patterns – such as the combination of oriented bars to form contours or boundary-conformations.

25 Detailed implementation Each S 2 units response depends in a Gaussian-way on the Euclidean distance between a new input and a stored prototype. P i is one of the N features learned during training. patch X from the previous C 1 layer at a particular scale S

26 Detailed implementation C 2 Our final set of shift- and scale-invariant C 2 responses is computed by taking a global maximum over all scales and position for each S 2 type over the entire S 2 lattice. Units that are tuned to the same preferred stimulus but at slightly different positions and scales.

27

28 Detailed implementation The learning stage Corresponds to selecting a set of N prototypes P i for the S 2 units. The classsifcation stage The C 1 and C 2 standard model features (SMF) are then extracted and further passed to a simple linear classifier.

29 Empirical evaluation Object Recognition in Clutter Object Recognition without Clutter Object Recognition of Texture-Based Objects Toward a Full System for Scene Understanding

30 Empirical evaluation: Object Recognition in Clutter “ In clutter ” referred to as weakly supervised target object in both training and test sets appears at variable scales and positions within the unsegmented image. To perform a simple object present/absent recognition task. The number of C 2 features depends only on the number of patches extracted during training and is independent of the size of the input image.

31 Empirical evaluation: Object Recognition without Clutter Windowing approach. To class target object in each fixed- sized image window extracted from an input image at various scales and postion. Limited variability to scale and position

32 Empirical evaluation: Object Recognition without Clutter Top row: Sample StreetScenes examples Middle row: True hand-labeling. Bottom row: Results obtained with a system trained on examples like those in the second row.

33 Empirical evaluation: Object Recognition without Clutter Training the SMFs-based systems We trained the classes car, pedestrian, and bycycle. Resize to 128x128 pixels and convert to gray level.

34 Empirical evaluation: Object Recognition of Texture-Based Object Performance is measured by considering each pixel, rather than each instance of an object. We consider four texture-based objects: buildings, trees, roads, and skies.

35 Empirical evaluation: Object Recognition of Texture-Based Object Training the SMFs-based Systems avoid errors due to overlap and loose polygonal labeling in the StreetScenes database by removing pixels with either multiple labels or no label. Training samples were never drawn form within 15 pixels of any object ’ s border.

36 Empirical evaluation: Toward a Full system for scene understanding The objects to be detected are divided into two distinct categories, texture-based objects and shape-based objects.

37 Empirical evaluation: Toward a Full system for scene understanding Shaped-based Object Detection in StreetScenes Shaped-based objects are those objects for which there exists a strong part-to-part correspondence between examples. In conjunction with a standard windowing technique is used to keep the tract of location of objects.

38 Empirical evaluation: Toward a Full system for scene understanding Pixels-Wise Detection of Texture-Based Objects These objects (buildings, roads, trees, and skies) are better described by their texture rather than the geometric structure of reliably detectable parts. Applying them to each pixel within the image, one obtains a detection confidence map of the original.


Download ppt "Robust Object Recognition with Cortex-Like Mechanisms Thomas Serre, Lior Wolf, Stanley Bileshi, Maximilian Riesenhuber, and Tomaso Poggio, Member, IEEE."

Similar presentations


Ads by Google