Robust Object Recognition with Cortex-Like Mechanisms Thomas Serre, Lior Wolf, Stanley Bileshi, Maximilian Riesenhuber, and Tomaso Poggio, Member, IEEE.

Slides:

Advertisements

Similar presentations

Chapter 4: The Visual Cortex and Beyond

Advertisements

Read this article for Friday next week [1]Chelazzi L, Miller EK, Duncan J, Desimone R. A neural basis for visual search in inferior temporal cortex. Nature.

Standard Brain Model for Vision

The Visual System: Feature Detection Model Lesson 17.

Neural Network Models in Vision Peter Andras

Object recognition and scene “understanding”

Object Recognition with Features Inspired by Visual Cortex T. Serre, L. Wolf, T. Poggio Presented by Andrew C. Gallagher Jan. 25, 2007.

November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.

for image processing and computer vision

Fast Readout of Object Identity from Macaque Inferior Tempora Cortex Chou P. Hung, Gabriel Kreiman, Tomaso Poggio, James J.DiCarlo McGovern Institute for.

Read this article for Friday next week [1]Chelazzi L, Miller EK, Duncan J, Desimone R. A neural basis for visual search in inferior temporal cortex. Nature.

HMAX Models Architecture Jim Mutch March 31, 2010.

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.

Thesis title: “Studies in Pattern Classification – Biological Modeling, Uncertainty Reasoning, and Statistical Learning” 3 parts: (1)Handwritten Digit.

Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.

Visual Neuron Responses This conceptualization of the visual system was “static” - it did not take into account the possibility that visual cells might.

Visual Pathways W. W. Norton Primary cortex maintains distinct pathways – functional segregation M and P pathways synapse in different layers Ascending.

Use a pen on the test. The distinct modes of vision offered by feedforward and recurrent processing Victor A.F. Lamme and Pieter R. Roelfsema.

Question Examples If you were a neurosurgeon and you needed to take out part of the cortex of a patient, which technique would you use to identify the.

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

A Study of Approaches for Object Recognition

Spatio-Temporal Sequence Learning of Visual Place Cells for Robotic Navigation presented by Nguyen Vu Anh date: 20 th July, 2010 Nguyen Vu Anh, Alex Leng-Phuan.

Human Visual System Neural Network Stanley Alphonso, Imran Afzal, Anand Phadake, Putta Reddy Shankar, and Charles Tappert.

Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.

Texture Reading: Chapter 9 (skip 9.4) Key issue: How do we represent texture? Topics: –Texture segmentation –Texture-based matching –Texture synthesis.

A.F. Lamme and Pieter R. Roelfsema

Chapter 10 The Central Visual System. Introduction Neurons in the visual system –Neural processing results in perception Parallel pathway serving conscious.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 5: Introduction to Vision 2 1 Computational Architectures in.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Multiclass object recognition

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Presented by: Kamakhaya Argulewar Guided by: Prof. Shweta V. Jain

Olga Zoidi, Anastasios Tefas, Member, IEEE Ioannis Pitas, Fellow, IEEE

The Brain from retina to extrastriate cortex. Neural processing responsible for vision photoreceptors retina –bipolar and horizontal cells –ganglion cells.

Neural Information in the Visual System By Paul Ruvolo Bryn Mawr College Fall 2012.

1 Computational Vision CSCI 363, Fall 2012 Lecture 3 Neurons Central Visual Pathways See Reading Assignment on "Assignments page"

Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.

NEURAL NETWORKS FOR DATA MINING

2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.

THE VISUAL SYSTEM: EYE TO CORTEX Outline 1. The Eyes a. Structure b. Accommodation c. Binocular Disparity 2. The Retina a. Structure b. Completion c. Cone.

Slide 1 Neuroscience: Exploring the Brain, 3rd Ed, Bear, Connors, and Paradiso Copyright © 2007 Lippincott Williams & Wilkins Bear: Neuroscience: Exploring.

Lecture 2b Readings: Kandell Schwartz et al Ch 27 Wolfe et al Chs 3 and 4.

黃文中 Introduction The Model Results Conclusion 2.

Hierarchical Neural Networks for Object Recognition and Scene “Understanding”

Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July

Representations for object class recognition David Lowe Department of Computer Science University of British Columbia Vancouver, Canada Sept. 21, 2006.

Chapter 3: Neural Processing and Perception. Neural Processing and Perception Neural processing is the interaction of signals in many neurons.

Face Detection Using Large Margin Classifiers Ming-Hsuan Yang Dan Roth Narendra Ahuja Presented by Kiang “Sean” Zhou Beckman Institute University of Illinois.

Human vision Jitendra Malik U.C. Berkeley. Visual Areas.

Histograms of Oriented Gradients for Human Detection(HOG)

Class 21, 1999 CBCl/AI MIT Neuroscience II T. Poggio.

Efficient Color Boundary Detection with Color-opponent Mechanisms CVPR2013 Posters.

Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.

Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.

More sliding window detection: Discriminative part-based models

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Independent Component Analysis features of Color & Stereo images Authors: Patrik O. Hoyer Aapo Hyvarinen CIS 526: Neural Computation Presented by: Ajay.

Evaluation of Gender Classification Methods with Automatically Detected and Aligned Faces Speaker: Po-Kai Shen Advisor: Tsai-Rong Chang Date: 2010/6/14.

1 Perception and VR MONT 104S, Spring 2008 Lecture 3 Central Visual Pathways.

Basics of Computational Neuroscience. What is computational neuroscience ? The Interdisciplinary Nature of Computational Neuroscience.

Activity Recognition Journal Club “Neural Mechanisms for the Recognition of Biological Movements” Martin Giese, Tomaso Poggio (Nature Neuroscience Review,

CONTENTS:  Introduction.  Face recognition task.  Image preprocessing.  Template Extraction and Normalization.  Template Correlation with image database.

9.012 Presentation by Alex Rakhlin March 16, 2001

The Relationship between Deep Learning and Brain Function

Non-linear classifiers Neural networks

Optic Nerve Projections

Binocular Disparity and the Perception of Depth

Volume 50, Issue 1, Pages (April 2006)

Presentation transcript:

Robust Object Recognition with Cortex-Like Mechanisms Thomas Serre, Lior Wolf, Stanley Bileshi, Maximilian Riesenhuber, and Tomaso Poggio, Member, IEEE IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 3, MARCH 2007

Tomaso Poggio Eugene McDermott Professor in the Brain Science and Human Behavior Thomas Serre In 2005, the PhD degree in neuroscience from the MIT. His main research focuses on object recognition with both brains and machines.

outline Introduction Related work The Standard Model of Visual Cortex The selection of feature Detailed implement Empirical evaluation Object Recognition in Clutter Object Recognition without Clutter Object Recognition of Texture-Based Objects Toward a Full system for scene understanding

Introduction We present a system that is based on a quantitative theory of the ventral stream of visual cortex. A key element in the approach is a new set of scale and position-tolerant feature detectors, which agree quantitatively with the tuning properties of cell along the ventral stream of visual cortex.

Related work: The Standard Model of Visual Cortex Object recognition in cortex thought to be mediated by the ventral visual pathway. Neurally interconnected: => retina, => Lateral Geniculate Nucleus Nucleus (LGN) of the thalamus to primary visual cortex(V1) and extrastriate visual areas, => V2, => V4 => IT => prefrontal cortex(PFC) linking perception to memory and action

Related work: The Standard Model of Visual Cortex Our system follows a recent theory of the feedforward path of object recognition in cortex that accounts for the first milliseconds of processing.

Related work: The Standard Model of Visual Cortex A core of well-accepted facts about the ventral stream in the visual cortex 1) Visual processing is hierarchical, aiming to build invariance to position and scale first and then to viewpoint and other transformations. 2) Along the hierarchy, the receptive fields of the neurons (i.e., the part of the visual field that could potentially elicit a response from the neuron) as well as the complexity of their optimal stimuli (i.e., the set of stimuli that elicit a response of the neuron) increases.

Related work: The Standard Model of Visual Cortex 3) The initial processing of information is feedforward (for immediate recognition tasks, i.e., when the image presentation is rapid and there is no time for eye movements or shifts of attention). 4) Plasticity and learning probably occurs at all stages and certainly at the level of inferotemporal(IT) cortex and prefrontal cortex(PFC), the top-most layers of the hierarchy.

Related work: The Standard Model of Visual Cortex

Related work: Feature selection appearance-based patch of an image very selective for a target shape. but lack invariance with respect to object transformations. invariance selectivity Trade-off

Related work: Feature selection histogram-based descriptor very robust with respect to object transformations, Most popular features :SIFT features It excels in the redetection of a previously seen object under new image transformations. It is very unlikely that these features could perform well on a generic object recognition task. The new appearance-based feature descriptors described here exhibit a balanced trade-off between invariance and selectivity.

Detailed implementation Along the hierarchy, from V1 to IT, two functional stages are interleaved: Simple (S) units build an increasingly complex and specific representation by combining the response of several subunits with different selectivity with TUNING operation. Complex (C) units build an increasingly invariant representation (to position and scale) by combing the response of several subunits with the same selectivity but at slightly different position and scales with a MAX-like operation.

Detailed implementation

By interleaving these two operation, an increasingly complex and invariant representation is built. Two routes: Main route follows the hierarchy of cortical stages strickly. Bypass route skip some of the stages Bypass routes may help provide q richer vocabulary of shape-tuned units with different levels of complexity and invariance.

Detailed implementation S 1 units: Correspond to the classical simple cells of Hubel and Wiesel found in the primary visual cortex (V1) S 1 units take the form of Gabor functions The aspect ratio: The orientation: The effective width: The wavelength:

Detailed implementation 136 different types of S1 units: (2 phases x 4 orientation x 17 sizes) Each portion of the visual field is analyzed by a full set of unit types. 17 spatial frequencies(=scakes) 4 orientations

Detailed implementation Perform TUNING operation between the incoming pattern of input x and there weight vector w. The response of a S 1 unit is maximal when x matches w exactly.

Contains a set of units all with the same selectivities. Each portion of the visual field is analyzed by a macro-column which contains all types of mini-columns.

Detailed implementation C 1 units: Corresponds to cortical complex cell which show some tolerance to shift and size. Each of the complex C 1 unit receives the outputs of a group of simple S 1 units from the first layer with the same preferred orientation but at slightly different positions and sizes. The operation by which the S 1 unit responses are combined at the C 1 level is a nonlinear MAX-like operation.

Detailed implementation

This process is done for each of the four orientations and each scale band independently.

Detailed implementation For instance, The first band: S=1. two S 1 maps: the one obtrained using a filter of size 7x7 and 9x9. For each orientation,the C 1 unit responses are computed by subsampling these maps using N s xN s =8x8. One single measurement is obtained by taking the maximum of all 64 elements. As a last stage, we take a max over the two scales from within the same spatial neighborhood.

Detailed implementation S 2 unit: A TURNING operation is taken over C 1 units at different preferred orientations to increase the complexity of the optimal stimulus. S 2 level units becomes selective to more complex patterns – such as the combination of oriented bars to form contours or boundary-conformations.

Detailed implementation Each S 2 units response depends in a Gaussian-way on the Euclidean distance between a new input and a stored prototype. P i is one of the N features learned during training. patch X from the previous C 1 layer at a particular scale S

Detailed implementation C 2 Our final set of shift- and scale-invariant C 2 responses is computed by taking a global maximum over all scales and position for each S 2 type over the entire S 2 lattice. Units that are tuned to the same preferred stimulus but at slightly different positions and scales.

Detailed implementation The learning stage Corresponds to selecting a set of N prototypes P i for the S 2 units. The classsifcation stage The C 1 and C 2 standard model features (SMF) are then extracted and further passed to a simple linear classifier.

Empirical evaluation Object Recognition in Clutter Object Recognition without Clutter Object Recognition of Texture-Based Objects Toward a Full System for Scene Understanding

Empirical evaluation: Object Recognition in Clutter “ In clutter ” referred to as weakly supervised target object in both training and test sets appears at variable scales and positions within the unsegmented image. To perform a simple object present/absent recognition task. The number of C 2 features depends only on the number of patches extracted during training and is independent of the size of the input image.

Empirical evaluation: Object Recognition without Clutter Windowing approach. To class target object in each fixed- sized image window extracted from an input image at various scales and postion. Limited variability to scale and position

Empirical evaluation: Object Recognition without Clutter Top row: Sample StreetScenes examples Middle row: True hand-labeling. Bottom row: Results obtained with a system trained on examples like those in the second row.

Empirical evaluation: Object Recognition without Clutter Training the SMFs-based systems We trained the classes car, pedestrian, and bycycle. Resize to 128x128 pixels and convert to gray level.

Empirical evaluation: Object Recognition of Texture-Based Object Performance is measured by considering each pixel, rather than each instance of an object. We consider four texture-based objects: buildings, trees, roads, and skies.

Empirical evaluation: Object Recognition of Texture-Based Object Training the SMFs-based Systems avoid errors due to overlap and loose polygonal labeling in the StreetScenes database by removing pixels with either multiple labels or no label. Training samples were never drawn form within 15 pixels of any object ’ s border.

Empirical evaluation: Toward a Full system for scene understanding The objects to be detected are divided into two distinct categories, texture-based objects and shape-based objects.

Empirical evaluation: Toward a Full system for scene understanding Shaped-based Object Detection in StreetScenes Shaped-based objects are those objects for which there exists a strong part-to-part correspondence between examples. In conjunction with a standard windowing technique is used to keep the tract of location of objects.

Empirical evaluation: Toward a Full system for scene understanding Pixels-Wise Detection of Texture-Based Objects These objects (buildings, roads, trees, and skies) are better described by their texture rather than the geometric structure of reliably detectable parts. Applying them to each pixel within the image, one obtains a detection confidence map of the original.