HMAX Models Architecture Jim Mutch March 31, 2010.

Slides:



Advertisements
Similar presentations
Standard Brain Model for Vision
Advertisements

Midterm 1 Oct. 21 in class. Read this article by Wednesday next week!
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Chapter 2.
Object recognition and scene “understanding”
Object Recognition with Features Inspired by Visual Cortex T. Serre, L. Wolf, T. Poggio Presented by Andrew C. Gallagher Jan. 25, 2007.
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Thesis title: “Studies in Pattern Classification – Biological Modeling, Uncertainty Reasoning, and Statistical Learning” 3 parts: (1)Handwritten Digit.
Visual Pathways W. W. Norton Primary cortex maintains distinct pathways – functional segregation M and P pathways synapse in different layers Ascending.
BEYOND SIMPLE FEATURES: A LARGE-SCALE FEATURE SEARCH APPROACH TO UNCONSTRAINED FACE RECOGNITION Nicolas Pinto Massachusetts Institute of Technology David.
Read Lamme (2000) TINS article for Wednesday. Visual Pathways V1 is, of course, not the only visual area (it turns out it’s not even always “primary”)
The Nature of Statistical Learning Theory by V. Vapnik
How does the visual system represent visual information? How does the visual system represent features of scenes? Vision is analytical - the system breaks.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Pattern Recognition using Hebbian Learning and Floating-Gates Certain pattern recognition problems have been shown to be easily solved by Artificial neural.
Robust Object Recognition with Cortex-Like Mechanisms Thomas Serre, Lior Wolf, Stanley Bileshi, Maximilian Riesenhuber, and Tomaso Poggio, Member, IEEE.
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Spatial Pyramid Pooling in Deep Convolutional
Overview of Back Propagation Algorithm
Radial-Basis Function Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Computer Science Department, Duke UniversityPhD Defense TalkMay 4, 2005 Fast Extraction of Feature Salience Maps for Rapid Video Data Analysis Nikos P.
Spatial data models (types)
Multiclass object recognition
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel:
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
CS654: Digital Image Analysis Lecture 3: Data Structure for Image Analysis.
Low Level Visual Processing. Information Maximization in the Retina Hypothesis: ganglion cells try to transmit as much information as possible about the.
NEURAL NETWORKS FOR DATA MINING
2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.
Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.
Hierarchical Neural Networks for Object Recognition and Scene “Understanding”
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Representations for object class recognition David Lowe Department of Computer Science University of British Columbia Vancouver, Canada Sept. 21, 2006.
CS332 Visual Processing Department of Computer Science Wellesley College Binocular Stereo Vision Region-based stereo matching algorithms Properties of.
Lecture 7: Features Part 2 CS4670/5670: Computer Vision Noah Snavely.
Human vision Jitendra Malik U.C. Berkeley. Visual Areas.
CSC508 Convolution Operators. CSC508 Convolution Arguably the most fundamental operation of computer vision It’s a neighborhood operator –Similar to the.
Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.
November 21, 2013Computer Vision Lecture 14: Object Recognition II 1 Statistical Pattern Recognition The formal description consists of relevant numerical.
A Theoretical Analysis of Feature Pooling in Visual Recognition Y-Lan Boureau, Jean Ponce and Yann LeCun ICML 2010 Presented by Bo Chen.
What is GIS? “A powerful set of tools for collecting, storing, retrieving, transforming and displaying spatial data”
Biological Modeling of Neural Networks: Week 10 – Neuronal Populations Wulfram Gerstner EPFL, Lausanne, Switzerland 10.1 Cortical Populations - columns.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
March 31, 2016Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I 1 … let us move on to… Artificial Neural Networks.
Activity Recognition Journal Club “Neural Mechanisms for the Recognition of Biological Movements” Martin Giese, Tomaso Poggio (Nature Neuroscience Review,
A Neurodynamical Cortical Model of Visual Attention and Invariant Object Recognition Gustavo Deco Edmund T. Rolls Vision Research, 2004.
Big data classification using neural network
9.012 Presentation by Alex Rakhlin March 16, 2001
Bag-of-Visual-Words Based Feature Extraction
Learning Mid-Level Features For Recognition
Article Review Todd Hricik.
Computer Vision Lecture 12: Image Segmentation II
Non-linear classifiers Neural networks
Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules
Fitting Curve Models to Edges
Image Classification.
Deep Learning Hierarchical Representations for Image Steganalysis
KFC: Keypoints, Features and Correspondences
SIFT keypoint detection
On Convolutional Neural Network
Binocular Disparity and the Perception of Depth
Feature descriptors and matching
by Khaled Nasr, Pooja Viswanathan, and Andreas Nieder
CSC 578 Neural Networks and Deep Learning
Volume 27, Issue 1, Pages (January 2017)
Presentation transcript:

HMAX Models Architecture Jim Mutch March 31, 2010

Basic concepts: – Layers, operations, features, scales, etc. – Will use one particular model for illustration; concepts apply generally. Introduce software. Model variants. – Attempts to find best parameters. Some current challenges. Topics

Example Model Our best-performing model for multiclass categorization (with a few simplications). Similar to: – J. Mutch and D.G. Lowe. Object class recognition and localization using sparse features with limited receptive fields. IJCV – T. Serre, L. Wolf, and T. Poggio. Object recognition with features inspired by visual cortex. CVPR Results on the Caltech 101 database: around 62%. State of the art is in the high 70s using multiple kernel approaches.

Layers A layer is a 3-D array of units which collectively represent the activity of some set of features (F) at each location in a 2-D grid of points in retinal space (X, Y). The number and kind of features change as you go higher in the model. – Input: only one feature (pixel intensity). – S1 and C1: responses to gabor filters of various orientations. – S2 and C2: responses to more complex features.

Common Retinal Coordinate System for (X, Y) The number of (X, Y) positions in a layer gets smaller as you go higher in the model. – (X,Y) indices aren’t meaningful across layers. However: each layer’s cells still cover the entire retinal space. – With wider spacing. – With some loss near the edges. Each cell knows its (X, Y) center in a real-valued retinal coordinate system that is consistent across layers. – Keeping track of this explicitly turns out to simplify some operations.

Scale Invariance Finer scales have more (X, Y) positions. Each such position represents a smaller region of the visual field. Not all scales are shown (there are 12 in total). In a single visual cortical area (e.g. V1) you will find cells tuned to different spatial scales. For simplicity in our computational models, we represent different spatial scales using multiple layers. …

Operations Every cell is computed using cells in layer(s) immediately below as inputs. We always pool over a local region in (X, Y) … … sometimes over one scale at a time. … sometimes over multiple scales (tricky!) … sometimes over multiple feature types.

S1 (Gabor Filter) Layers Image (at finest scale) is [256 x 256 x 1]. Only 1 feature at each grid point: image intensity. Center 4 different gabor filters over each pixel position. Resulting S1 layer (at finest scale) is [246 x 246 x 4]. Can’t center filters over pixels near edges. Actual gabors are 11 x 11.

S1 (Gabor Filter) Layers Image (at finest scale) is [256 x 256 x 1]. Only 1 feature at each grid point: image intensity. Center 4 different gabor filters over each pixel position. Resulting S1 layer (at finest scale) is [246 x 246 x 4]. Can’t center filters over pixels near edges. Actual gabors are 11 x 11.

S1 (Gabor Filter) Layers Image (at finest scale) is [256 x 256 x 1]. Only 1 feature at each grid point: image intensity. Center 4 different gabor filters over each pixel position. Resulting S1 layer (at finest scale) is [246 x 246 x 4]. Can’t center filters over pixels near edges. Actual gabors are 11 x 11.

C1 (Local Invariance) Layers S1 layer (finest scale) is [246 x 246 x 4]. For each orientation we compute a local maximum over (X, Y) and scale. We also subsample by a factor of 5 in both X and Y. Resulting C1 layer (finest scale) is [47 x 47 x 4]. Pooling over scales is tricky to define because adjacent scales differ by non-integer multiples. The common, real-valued coordinate system helps.

C1 (Local Invariance) Layers S1 layer (finest scale) is [246 x 246 x 4]. For each orientation we compute a local maximum over (X, Y) and scale. We also subsample by a factor of 5 in both X and Y. Resulting C1 layer (finest scale) is [47 x 47 x 4]. Pooling over scales is tricky to define because adjacent scales differ by non-integer multiples. The common, real-valued coordinate system helps.

C1 (Local Invariance) Layers S1 layer (finest scale) is [246 x 246 x 4]. For each orientation we compute a local maximum over (X, Y) and scale. We also subsample by a factor of 5 in both X and Y. Resulting C1 layer (finest scale) is [47 x 47 x 4]. Pooling over scales is tricky to define because adjacent scales differ by non-integer multiples. The common, real-valued coordinate system helps.

S2 (Intermediate Feature) Layers C1 layer (finest scale) is [47 x 47 x 4]. We now compute the response to (the same) large dictionary of learned features at each C1 grid position (separately for each scale). Each feature is looking for its preferred stimulus: a particular local combination of different gabor filter responses (each of which is already locally invariant). Features can be of different sizes in (X, Y). Resulting S2 layer (finest scale) is [44 x 44 x 4000]. The dictionary is learned by sampling from the C1 layer of training images. – Can decide to ignore some orientations at each position: 4000 features

S2 (Intermediate Feature) Layers C1 layer (finest scale) is [47 x 47 x 4]. We now compute the response to (the same) large dictionary of learned features at each C1 grid position (separately for each scale). Each feature is looking for its preferred stimulus: a particular local combination of different gabor filter responses (each of which is already locally invariant). Features can be of different sizes in (X, Y). Resulting S2 layer (finest scale) is [44 x 44 x 4000]. The dictionary is learned by sampling from the C1 layer of training images. – Can decide to ignore some orientations at each position: 4000 features

S2 (Intermediate Feature) Layers C1 layer (finest scale) is [47 x 47 x 4]. We now compute the response to (the same) large dictionary of learned features at each C1 grid position (separately for each scale). Each feature is looking for its preferred stimulus: a particular local combination of different gabor filter responses (each of which is already locally invariant). Features can be of different sizes in (X, Y). Resulting S2 layer (finest scale) is [44 x 44 x 4000]. The dictionary is learned by sampling from the C1 layer of training images. – Can decide to ignore some orientations at each position: 4000 features

C2 (Global Invariance) Layers Finally, we find the maximum response to each intermediate feature over all (X, Y) positions and all scales. Result: a 4000-D feature vector which can be used in the classifier of your choice features 4000 features

C2 (Global Invariance) Layers Finally, we find the maximum response to each intermediate feature over all (X, Y) positions and all scales. Result: a 4000-D feature vector which can be used in the classifier of your choice features 4000 features

CBCL Software Many different implementations, most now obsolete. – One reason: many different solutions to the “pooling over scales” problem. Two current implementations: 1.“hmin” – a simple C++ implementation of exactly what I’ve described here. 2.“CNS” – a much more general, GPU-based (i.e., fast) framework for simulating any kind of “cortically organized” network, i.e. a network consisting of n-dimensional layers of similar cells. Can support recurrent / dynamic models. – Technical report describing the framework. – Example packages implementing HMAX and other model classes. – Programming guide.

Feedforward object recognition (static CBCL model): 256x256 input, 12 orientations, 4,075 “S2” features. Best CPU-based implementation: 28.2 sec/image. CNS (on NVIDIA GTX 295): sec/image (97x speedup). Action recognition in streaming video: 8 9x9x9 spatiotemporal filters, 300 S2 features. Best CPU-based implementation: 0.55 fps. CNS: 32 fps (58x speedup). Spiking neuron simulation (dynamic model): 9,808 Hodgkin-Huxley neurons and 330,295 synapses. 310,000 simulated time steps required 57 seconds. Jhuang et al Some CNS Performance Numbers

Parameter Sets The CNS “HMAX” package (“fhpkg”) contains parameter sets for several HMAX variants, other than the one I described. – In particular, the more complex model used in the animal/no-animal task, which has two pathways and higher-order learned features. Current project: automatic searching of parameter space using CMA-ES (covariance matrix adaptation – evolutionary strategy). – Mattia Gazzola

In practice, little/no benefit is seen in models using more than one layer of learned features. (True for other hierarchical cortical models as well.) Clearly not true for the brain. Hard to improve on our overly-simple method of learning features (i.e. just sampling and possibly selecting ones the classifier finds useful). Loss of dynamic range: units at higher levels tend towards maximal activation, contrary to actual recordings. Better test datasets needed. Some Current Challenges