Understanding V1 --- Saliency map and pre- attentive segmentation Li Zhaoping University College London, Psychology www.gatsby.ucl.ac.uk/~zhaoping Lecture.

Slides:



Advertisements
Similar presentations
Read this article for Friday next week [1]Chelazzi L, Miller EK, Duncan J, Desimone R. A neural basis for visual search in inferior temporal cortex. Nature.
Advertisements

Chapter 5: Space and Form Form & Pattern Perception: Humans are second to none in processing visual form and pattern information. Our ability to see patterns.
Chapter 2: Marr’s theory of vision. Cognitive Science  José Luis Bermúdez / Cambridge University Press 2010 Overview Introduce Marr’s distinction between.
Visual Saliency: the signal from V1 to capture attention Li Zhaoping Head, Laboratory of natural intelligence Department of Psychology University College.
Chapter 2.
Neural Network Models in Vision Peter Andras
Chapter 3: Neural Processing and Perception. Lateral Inhibition and Perception Experiments with eye of Limulus –Ommatidia allow recordings from a single.
Neurodynamics of figure-ground organization Dražen Domijan University of Rijeka, Rijeka, Croatia 8th Alps-Adria Psychology Conference.
Read this article for Friday next week [1]Chelazzi L, Miller EK, Duncan J, Desimone R. A neural basis for visual search in inferior temporal cortex. Nature.
A saliency map model explains the effects of random variations along irrelevant dimensions in texture segmentation and visual search Li Zhaoping, University.
Features and Objects in Visual Processing
Current Trends in Image Quality Perception Mason Macklem Simon Fraser University
Visual Neuron Responses This conceptualization of the visual system was “static” - it did not take into account the possibility that visual cells might.
Some concepts from Cognitive Psychology to review: Shadowing Visual Search Cue-target Paradigm Hint: you’ll find these in Chapter 12.
Visual Attention More information in visual field than we can process at a given moment Solutions Shifts of Visual Attention related to eye movements Some.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 6: Low-level features 1 Computational Architectures in Biological.
Visual search: Who cares?
Attention II Selective Attention & Visual Search.
Features and Object in Visual Processing. The Waterfall Illusion.
Attention II Theories of Attention Visual Search.
Christian Siagian Laurent Itti Univ. Southern California, CA, USA
Extraction of Salient Contours in Color Images Vonikakis Vasilios, Ioannis Andreadis and Antonios Gasteratos Democritus University of Thrace
December 2, 2014Computer Vision Lecture 21: Image Understanding 1 Today’s topic is.. Image Understanding.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Cognitive Processes PSY 334 Chapter 2 – Perception.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 5: Introduction to Vision 2 1 Computational Architectures in.
September 10, 2012Introduction to Artificial Intelligence Lecture 2: Perception & Action 1 Boundary-following Robot Rules 1  2  3  4  5.
Studying Visual Attention with the Visual Search Paradigm Marc Pomplun Department of Computer Science University of Massachusetts at Boston
Introduction to Neural Networks. Neural Networks in the Brain Human brain “computes” in an entirely different way from conventional digital computers.
Text Lecture 2 Schwartz.  The problem for the designer is to ensure all visual queries can be effectively and rapidly served.  Semantically meaningful.
Attention Part 2. Early Selection Model (Broadbent, 1958) inputdetectionrecognition FI L T E R Only information that passed the filter received further.
2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.
THE VISUAL SYSTEM: EYE TO CORTEX Outline 1. The Eyes a. Structure b. Accommodation c. Binocular Disparity 2. The Retina a. Structure b. Completion c. Cone.
Lecture 2b Readings: Kandell Schwartz et al Ch 27 Wolfe et al Chs 3 and 4.
Visual Distinctness What is easy to find How to represent quantitity Lessons from low-level vision Applications in Highlighting Icon (symbol) design -
Chapter 3: Neural Processing and Perception. Neural Processing and Perception Neural processing is the interaction of signals in many neurons.
Autonomous Robots Vision © Manfred Huber 2014.
Binding problems and feature integration theory. Feature detectors Neurons that fire to specific features of a stimulus Pathway away from retina shows.
Computer Science Readings: Reinforcement Learning Presentation by: Arif OZGELEN.
A bottom up visual saliency map in the primary visual cortex --- theory and its experimental tests. Li Zhaoping University College London Invited presentation.
Spatio-temporal saliency model to predict eye movements in video free viewing Gipsa-lab, Grenoble Département Images et Signal CNRS, UMR 5216 S. Marat,
Bayesian inference accounts for the filling-in and suppression of visual perception of bars by context Li Zhaoping 1 & Li Jingling 2 1 University College.
Understanding early visual coding from information theory By Li Zhaoping Lecture at EU advanced course in computational neuroscience, Arcachon, France,
October 1, 2013Computer Vision Lecture 9: From Edges to Contours 1 Canny Edge Detector However, usually there will still be noise in the array E[i, j],
A Model of Saliency-Based Visual Attention for Rapid Scene Analysis
Outline Of Today’s Discussion 1.LGN Projections & Color Opponency 2.Primary Visual Cortex: Structure 3.Primary Visual Cortex: Individual Cells.
Ko Youngkil.  Biological Evidence  5 prediction  compare with experimental results  Primary visual cortex (V1) is involved in image segmentation,
Ahissar, Hochstein (1997) Nature Task difficulty and specificity of perceptual learning 1 st third 2nd third Final session Task difficulty Stimulus-to-mask.
Biological Modeling of Neural Networks: Week 10 – Neuronal Populations Wulfram Gerstner EPFL, Lausanne, Switzerland 10.1 Cortical Populations - columns.
A Neurodynamical Cortical Model of Visual Attention and Invariant Object Recognition Gustavo Deco Edmund T. Rolls Vision Research, 2004.
Computational Vision --- a window to our brain
A bottom up visual saliency map in the primary visual cortex
The longer one looks, the less one sees --- catching the features
Brain States: Top-Down Influences in Sensory Processing
Computer Vision Lecture 9: Edge Detection II
CSc4730/6730 Scientific Visualization
Bayesian inference Li Zhaoping1 & Li Jingling2 accounts for the
Border Ownership from Intracortical Interactions in Visual Area V2
Christopher C. Pack, Richard T. Born, Margaret S. Livingstone  Neuron 
Volume 87, Issue 1, Pages (July 2015)
The Neural Basis of Perceptual Learning
Binocular Disparity and the Perception of Depth
Confidence as Bayesian Probability: From Neural Origins to Behavior
Brain States: Top-Down Influences in Sensory Processing
Ryo Sasaki, Takanori Uka  Neuron  Volume 62, Issue 1, Pages (April 2009)
Human vision: function
Stephen V. David, Benjamin Y. Hayden, James A. Mazer, Jack L. Gallant 
The Normalization Model of Attention
Jennifer K. Bizley, Ross K. Maddox, Adrian K.C. Lee 
The functional architecture of attention
Presentation transcript:

Understanding V1 --- Saliency map and pre- attentive segmentation Li Zhaoping University College London, Psychology Lecture 2, at EU Advanced Course in Computational Neuroscience, Arcachon, France, Aug Reading materials at

Outline A theory --- a saliency map in the primary visual cortex A computer vision motivation Testing the V1 theory Part 1: Visual saliencies in visual search and segmentation tasks V1 outputs as simulated by a V1 model Explain & predict ? Part 2: Psychophysical tests of the predictions of the V1 theory on the lack of individual feature maps and thus the lack of their summation to the final master map. Contrasting with previous theories of visual saliency.

A saliency map in primary visual cortex “A saliency map in primary visual cortex”, in Trends in Cognitive Sciences Vol 6, No. 1, page 9-16, 2002, First, the theory

Contextual influences, and their confusing role: Strong suppression suppression Weak suppression facilitation What is in V1? Classical receptive fields: --- bar and edge detectors or filters, too small for global visual tasks. Horizontal intra-cortical connections observed as neural substrates

Where is V1 in visual processing? Small receptive fields, limited effects from visual attention Larger receptive fields, much affected by visual attention. V1 had traditionally been viewed as a lower visual area, as contributing little to complex visual phenomena. But, V1 is the largest visual area in the brain --- a hint towards my proposed role for V1. V1 V2, V3, V4, IT, MT, PP etc. retina LGN

Feature search Fast, parallel, pre-attentive, effortless, pops out Conjunction search Slow, serial, effortful, needs attention, does not pop out Psychophysical observations on visual saliency

A saliency map serves to select stimuli for further processing Previous views on saliency map (since 1970s --- Treisman, Julesz, Koch, Ullman, Wolfe, Mueller, Itti etc) Master saliency map in which cortical area? Visual stimuli blue green Feature maps in V2, V3, V4, V? Color feature maps orientation feature maps Other: motion, depth, etc. red blue green A scalar map signaling saliency regardless of features Not in V1

Feature search stimulus Explaining faster feature search: Feature maps Horizontal feature map Vertical feature map Master map

Conjunction search stimulus Explaining slower conjunction search: Feature maps horizontal vertical red blue Master map

V1 physiology Receptive fields and contextual influences Saliency psychophysics Traditional views on V1Previous views on saliency map Low level vision, non-complex visual phenomena blue Master map motion Visual stimuli etc Features maps

V1 physiology Receptive fields and contextual influences Saliency psychophysics V1’s receptive fields and contextual influences are mechanisms to produce a saliency map from visual inputs My proposal Firing rates of the V1’s output cells signal saliency regardless of feature tuning of the cells. Details in Zhaoping Li, “A saliency map in primary visual cortex”, in Trends in Cognitive Sciences Vol 6, No. 1, page 9-16, 2002,

V1’s output as saliency map is viewed under the idealization of the top-down feedback to V1 being disabled, e.g., shortly after visual exposure or under anesthesia. Saliency from bottom- up factors only. V1 producing a saliency map does not preclude it contributing to other visual functions such as object recognition. V1 Saliency map Higher visual Areas Top-down Feedback disabled Superior culliculus

More details Visual stimuli V1 Contrast inputs to V1 Saliency outputs from V1 by cells’ firing rates (as universal currency for saliency) Transforming contrasts to saliencies by contextual influences No separate feature maps, nor any summations of them V1 cells’ firing rates signal saliencies, despite their feature tuning Strongest response to any visual location signals its saliency This theory requires: Conjunctive cells exist in V1

Hmm… I am feature blind anyway Attention auctioned here, no discrimination between your feature preferences, only spikes count! Capitalist… he only cares about money!!! 1 $pike A motion tuned V1 cell 3 $pike A color tuned V1 cell 2 $pike An orientation tuned V1 cell See: Zhaoping Li, “A saliency map in primary visual cortex”, in Trends in Cognitive Sciences Vol 6, No. 1, page 9-16, 2002,

Contrast input to V1 Saliency output from V1 model Implementing the saliency map in a V1 model V1 model A recurrent network with Intra-cortical Interactions that executes contextual influences Highlighting important image locations.

The segmentation problem (must be addressed for object recognition) To group image pixels belonging to one object Dilemma: Segmentation presumes recognition recognition presumes segmentation. Second: A computer vision motivation

To start: focusing on region segmentation A region can be characterized by its smoothness regularities, average luminance, and many more descriptions. Define segmentation as locating the border between regions. The usual approach: segmentation with (by) classification (1) image feature classification (2) Compare features to segment Dilemma: segmentation vs. classification Problem: boundary precision vs. feature precision.

In biological vision: recognition (classification) is neither necessary nor sufficient for segmentation

Pre-attentive and attentive segmentations -- very different. Pre-attentive: effortless, popout Attentive: effortful, slow

My proposal: Pre-attentive segmentation without classification Principle: Detecting the boundaries by detecting translation invariance breaking in inputs via V1 mechanisms of contextual influences. Homogeneous input (one region) Inhomogeneous input (Two regions) Separating A from B without knowing what A and B are

Testing the V1 saliency map Theory statement 1: Strongest V1 outputs at each location signal saliency Outputs from a V1 Explain & predict? Test 1: Saliencies in visual search and segmentation tasks A software V1 born 8 years ago

Original image Sampled by the receptive fields V1 units and initial responses Contextual influences V1 outputs Schematics of how the model works Designed such that the model agrees with physiology and produces desired behavior.

V1 model Intra-cortical Interactions Recurrent dynamics dx i /dt = -x i + J ij g(x j ) – g(y j ) + I i dy i /dt = -y i + W ij g(x j ) The behavior of the network is ensured by computationally designing the recurrent connection weights, using dynamic system theory.

Conditions on the intra-cortical interactions. Highlight boundary Inputs Outputs Enhance contour Design techniques: mean field analysis, stability analysis. Computation desired constraints the network architecture, connections, and dynamics. Network oscillation is one of the dynamic consequences. No symmetry breaking (hallucination) No gross extension

Make sure that the model can produce the usual contextual influences Iso-orientation suppression Random surround less suppression Cross orientation least suppression Single bar Input output Co-linear facilitation

Original inputV1 response S S=0.2, S=0.4, S=0.12, S=0.22 Z = (S-S)/σ --- z score, measuring saliencies of items Z=1.0 Z=7 Z=-1.3 Z=1.7 Histogram of all responses S regardless of features s  Saliency map Pop-out Proposal: V1 produces a saliency map

The V1 saliency map agrees with visual search behavior. inputV1 output Target = + Z=7 Feature search --- pop out Z= Target=Conjunction search --- serial search

V1 model input V1 model output Texture segmentation simulation results --- quantitative agreement with psychophysics (Nothdurft’s data) Prediction: bias in the perceptual estimation of the location of the texture boundary.

V1 model inputs V1 model output highlights More complex patterns Segmentation without classification

inputV1 output Target = + Z=7 Feature search --- pop out A trivial example of search asymmetry Z=0.8 Target = Target lacking a feature

What defines a basic feature? Psychophysical definition: enables pop-out basic feature Computational or mechanistic definition: two neural components or substrates required for basic features: (1) Tuning of the cell receptive field to the feature (2) Tuning of the horizontal connections to the feature --- the horizontal connections are selective to that optimal feature, e.g.,orientation, of the pre- and post- synaptic cells. new

There should be a continuum from pop-out to serial searches The ease of search is measured by a graded number : z score Treisman’s original Feature Integration Theory may be seen as the discrete idealization of the search process.

Influence of the background homogeneities (cf. Duncan & Humphreys, and Rubinstein & Sagi.) Saliency measure: Z = (S- S)/σ σ increases with the background in-homogeneity. Hence, homogeneous background makes target more salient.

Explains spatial configuration and distractor effects. Z=0.25 Target= Distractors dissimilar to each other Z=3.4 Target= Homogeneous background, identical distractors regularly placed Z=0.22 Target= Distractors irregularly placed InputsV1 outputs The easiest search Same target, different backgrounds

Another example of background regularity effect Z=-0.63, next to target, z=0.68 Target= Distractors irregularly placed Homogeneous background, identical distractors regularly placed Z=-0.83, next to target, z=3.7 Target= Neighbors attract attention to target. InputOutput

More severe test of the saliency map theory by using subtler saliency phenomena --- search asymmetries (Treisman) Z=0.41 Z=9.7 Open vs. closed Z= -1.4 Z= 1.8 parallel vs. divergent Z= Z= 1.07 long vs. short Z= 0.3 Z= 1.12 curved vs. straight Z= 0.7 Z= 2.8 elipse vs. circle

Conjunction search revisited Some conjunction searches are easy e.g.: Conjunctions of motion and form (orientation) --- McLeod, Driver, Crisp 1988) e.g., Conjunctions of depth and motion or color --- Nakayama and Silverman Why? Output Input

Recall the two neural components necessary for a basic feature (1) Tuning of the receptive field (CRF) (2) Tuning of the horizontal connections For a conjunction to be basic and pop-out: (1) Simultaneous or conjunctive tunings of the V1 cells to both feature dimensions (e.g., orientation & motion, orientation and depth, but not orientation and orientation) (2) Simultaneous or conjunctive tunings of the horizontal connections to the optimal features in both feature dimensions of the pre- and post- synaptic cells

Predicting from psychophysics to V1 anatomy Since conjunctions of motion and orientation, and depth and motion or color, pop-out The horizontal connections must be selective simultaneously to both orientation & motion, and to both depth and motion (or color) --- can be tested Note that it is already know that V1 cells can be simultaneously tuned to orientation, motion direction, depth (and even color sometimes)

Prediction: Color-orientation conjunction search can be made easier by adjusting the scale and/or density of the stimuli, since V1 cells conjunctively tuned to both orientation and color are mainly tuned to a specific spatial frequency band. Color-orientation conjunction? Stimuli for a conjunction search for target Response from a model with conjunction cells Response from a model without conjunction cells

Double feature search --- opposite of conjunction search Explains Nothdurft (2000) data: orientation-motion double feature search is easier than orientation- color double feature search. Responses to target from 3 cell types : (1) orientation tuned cells tuned to vertical (2) color tuned cells tuned to red (3) conjunctively tuned cells tuned to red-vertical The most responsive of them should signal the target saliency. How easy is double feature compared to single feature searches? Let (1) and (2) determine eases in single searches. Existence of (3) makes double feature search possibly easier. Single feature searches

Testing the V1 saliency map Theory statement 2: No separate feature maps, nor any summations of them, are needed. Simply, the strongest response at a location signal saliency. Contrast with previous theories: feature maps sum to get the master saliency map Test 2: Test predictions from “strongest response” rule, contrast with predictions from feature map summation rule from previous theories/models. (Joint work with Keith May)

To compare “strongest response rule” with “feature map summation rule” --- put the two in the same footing: When each cell signals only one feature, i.e., when there are no cells tuned to more than one feature (conjunctively tuned cells) Then: Max rule The V1 theory Summation rule Previous theories vs. When the interaction between cells tuned to different features are approximated as zero. No separate feature maps, strongest response rule regardless of feature tunings. Separate feature maps, maximize over feature maps. Approximately equivalent to

Separate feature mapsSeparate activity maps Stimulus No difference between two theories. Consider texture segmentation tasks (Summed or max-ed) Combined map

Separate feature mapsSeparate activity maps max Stimulus Predicts easier segmentation than either component Predicts segmentation of the same ease as the easiest component = Combined map sum +

Task: subject answer whether the texture border is left or right Test: measure reaction times in segmentations: Error rates Reaction time (ms)

Error rates Reaction times (ms) AI KM AJAL LZ Data from five subjects

Not a floor effect --- same observation on more difficult texture segmentations Reaction time (ms) Error rates Subject LZ

Feature activation maps Partial combinations of activation maps max Combined maps sum Feature maps Stimulus Predicts difficult segmentation Predicts easy segmentation Homogeneous activations +=

Test: measure reaction times in same segmentation tasks: Error rates Reaction time (ms) AI

Data from five subjects AI AJAL KMLZ Reaction time (ms) Error rates

difficult easy easier Comparing the Saliency theories by their predictions easy Stimulus V1 theory (max) easy Previous theories (sum) Saliency maps Where theories differ!!!

Reaction times (ms) Subject AI Data support the V1 theory Error rates

Analogous cases for visual search tasks Stimulus easier Not easy Saliency maps Previous theories (sum) V1 theory (max) easy Not easy easy Where theories differ!

Analogous cases for visual search tasks Supporting The V1 theory Subject LZ Preliminary Data, another Subject’s data have similar pattern Reaction times ( ms for target left or right side of stimulus center)

Summary: Theory: V1--- saliency map for pre-attentive segmentation. Theory “tested” or demonstrated on an imtation V1 (model) ---Recurrent network model: from local receptive fields to global behaviour for visual tasks. Tested psychophysically to contrast with previous frameworks of saliency map. Linking physiology with psychophysics. Other predictions, some already confirmed, others to be tested. “A saliency map in primary visual cortex” by Zhaoping Li, published in Trends in Cognitive Sciences Vol 6, No. 1, page 9-16, 2002, see for more information.