A Neurodynamical Cortical Model of Visual Attention and Invariant Object Recognition Gustavo Deco Edmund T. Rolls Vision Research, 2004
Outline 1. Deco and Rolls’ network designed for visual object recognition and attention l Architecture l Low-level features l Dynamics of neuron activity l Learning the weights l How to bias attention 2. Experiments and results 3. Discussion
Architecture: General Features l Hierarchical (Multi-module) l Areas: V1, V2, V4, IT, PP, PF46v and PF46d l Two separate pathways l V1 – V2 – V4 – IT – PF46v (``What’’) l (V1, V2) – MT – PP – PF46d (``Where’’) l Bottom-up connections: larger receptive fields up to IT l Top-down connections: object and area biasing effects l Lateral inhibitory connections in each layer
Architecture: Diagram Note: Columnar feature stacking (depth) in all layers except PP
Low-Level (Input) Features: Gabor Filters l Product of two functions: 1. Complex plane wave 2. Gaussian envelope l Daugman (1985): General 2D form l Lee (1996): Image Representation Using 2D Gabor Wavelets in PAMI l Derived a constrained form (used in this paper) for a family of Gabor filters l Satisfies neurophysiological constraints and completeness Left: Family of 1.5 octave bandwidth filters covering the spatial frequency plane, satisfying Lee’s constraints
Low-Level (Input) Features: Gabor Filters l Remaining DOF: Spatial center position (p,q) Spatial resolution k Orientation: l l ``Mother wavelet’’ in filter family
Features of V1 Module l Nv1 x Nv1 hypercolumns covering N x N scene l Each hypercolumn has L orientation columns with different spatial frequencies l Magnification factor: more high spatial resolution filters nearer the fovea l Modeled by Gaussian centered at fovea l Finally, note that the input to filters is image without DC component:
Neuron ``Pool’’ Dynamics l Mean-field approximation: average activity level of a pool of neurons l Dynamical equations for neuronal pool activity levels l Wilson and Cowan 1972, Gerstner 2000
Non-linear response function F
Temporal Evolution of Entire System p, q: spatial position k: spatial resolution (only for V1) l: pool number l V1:
Temporal Evolution of Entire System p, q: spatial position k: spatial resolution (only for V1) l: pool number l V1: l Other modules:
Top-Down Biasing 1. Spatial attention, e.g: l Weights from PP as Gaussians based on spatial position l Feature-based attention is similar, but using explicit weights l Note that top-down input is not used in learning phase
Lateral Inhibition l Only negative term, defined as: l Decays with distance by Gaussian modulation
``Trace’’ Learning Rule l How do they learn invariance? l Slowly change object appearance l Retain a trace output for all pools (slow decay) l Hebbian-like updating rule for change in weight strength:
How They Control Attention l Two modes 1. Object recognition 2. Visual search
Experiments: Simulate FMRI data l Simultaneous stimuli compete l Attentional modulation: measure effect of using attention vs. without l Two conditions: simultaneous and sequential l 250 ms presentation, 750 ms without l 1 ms per time step (solved via Euler method)
Experiments: IT Receptive Field l One object, learn to recognize in all positions l Test with no background vs. cluttered background (natural scene) l With and without object attention
Experiments: Distractor Object and Placement
Experiment: Visual Search and Object Attention Stimuli: Monkey face on cluttered background
Discussion! l Effective in performing object recognition and visual attention l Straightforward implementation l But many parameters l Scalability has not been demonstrated (only two IT neurons) l Thoughts?
Receptive Field Convergence