Modeling Visual Attention and some other things

Modeling Visual Attention and some other things
Developing and Evolving Neural Networks by David Northmore (24 February 2014)

Zenon Pylyshyn Multiple Object Tracking FINSTs
Interface between world of stimuli and world of concepts. Tracking depends on “...the figure's identity over time, or its persisting individuality. ... we have a mechanism that allows preconceptual tracking of a primitive perceptual individuality”. (?*#!?) How could a FINST be implemented in the brain - by a neural network? Uses of a pointer Direct eye movements Selection of features Conjunction of features Serial operations Save to working memory Construct subjective “panorama”

IAC modeling Using simple, yet plausible neural architecture.
IAC architecture inspired by Jets & Sharks model of McClelland Turing machine analogy: if IAC can do it, brain certainly can too Zen of modeling – insight, enlightenment etc. More of us should try it Turing computer Other models: Itti & Koch (2001), Niebur et al. (1993), Kazanovich Y, Borisyuk R. (2006) Yilmaz (2012) these employ “advanced features” like oscillations, synchrony or computer science

Model Units IAC units: simple 1st order dynamics, as used here
Synapses: excitatory & inhibitory Activation/Output Time Excitatory Inhibitory Input Input MaxA MinA RestA Activation Output Computational units from Interactive Activation and Competitive (IAC) models (McClelland & Rumelhart, 1991). Update rule for activation, A: ∆A := (MaxA-A)*ExcitInp – (A-MinA)*InhibInp – (A-RestA)*DecayA A := A + ∆A Output = |A|+ I also used no-delay, linear threshold units: “AUX units”, usually for inhibitory layers.

Connections Connection specs Excitatory Inhibitory Scene: bitmap
Units all arranged in 2-D layers. Typically 14 x 10 Connections made topographically from layer to layer, generic, quasi developmental rules. Two params: connection radius & weight Gaussian spatial distribution of weights Scene: bitmap Input layer: analyzes color, motion, or shape Connection specs Connect ColorLayer to Layer0 //radius //weight Connect Layer0 to Layer1 //radius // weight //radius -0.1 //weight Connect Layer1 to Layer1 //radius // weight Excitatory Inhibitory

A Pointer Network Gain layer Inhibitory layer Pointer layer
Latch onto one stimulus Ignore others Deal with motion Attended stimulus processed advantageously e.g. amplified, faster RT, better recognition etc. This 3-layer pointer network “wants to work” Gain layer Inhibitory layer Pointer layer Attended/tracked object Distractor objects Response amplified compared to distractors This connection improves faithfulness of tracking by inhibiting distractors

Probe Stimulus Experiments
Flashed probe stimuli showed detection enhancement at a target, and detection inhibition at a non-target in both human subjects performing MOT (Pylyshyn, 2006) and in the 3-layer pointer model tracking a single target. Probe stimulus Gain layer responses in the 3-layer pointer to probe stimuli Pylyshyn (2006). Inhibited Enhanced Probe stimulus

Multiple Object Tracking
Tracking of N objects can be achieved by arranging N pointer layers connecting to a common gain layer and inhibitory layer. However, it seems more efficient to employ all pointer resources all the time by partitioning them according to tracking demand. This is achieved with a “pointer slab” consisting of several layers (6 shown here). Different patterns of interlayer inhibitories partition the slab for tracking 1-6 targets. Devoting more pointer resources to a given target should increase the accuracy or robustness of tracking, especially with noise present. Intra-layer inibitories Inter-layer inibitories Long-range Input Scene Pointer slab L1 L2 L3 L4 L5 L6 Gain layers omitted for clarity Tracking 2 targets with 6 distractors, by enabling only long-range inhibitories. Top half of the slab tracks one target; bottom half another. 500 time steps Randomly moving discs Squares show pointers Pointer error vs. time L1 L2 L3 L4 L5 L6 Tracking 4 targets with 4 distractors, by enabling half the short-range inhibitory, plus medium- and long-range connections. Note sublayers 1,2,3 & 6 track different targets, 4 & 5 track the same target. Tracking of different numbers of targets, requires intricate adjustment of intra- and inter-layer inhibition, presumably by top-down control guided by a representation of “intended targets”, and modified by perceived performance. It didn’t seem fruitful to pursue MOT further, without understanding top-down control, or change detection (next slide), which determines what gets to be tracked.

Change Detection Change grabs attention, attention grabs change.
Here I compare two networks that detect change e.g. in the color of a moving object. A seemingly minor alteration in the spread of one connection has important consequences. Current state Change (onset) Past state (memory) Change grabs attention, attention grabs change. Small-field, strictly retinotopic: retina to V1, exquisitely sensitive to change and movement over small regions. Large-field, vaguely retinotopic: change layer signals feature change, uncontaminated by movement. Useful step toward “objecthood”, but subject to crowding. t Movement Change Current state Change (onset) Past state (memory) t Movement Change

Onset detector responses
Change Detection Red Red+ RedMem Green Green+ GreenMem Blue Blue+ BlueMem This network has 3 color-detecting layers, 3 color-onset layers (Red+ etc.) and 3 color-memory layers. The latter are IAC units that have a long decay time. Compare the effect of doubling the spread of these inhibitory connections Colors switching | | Colors switching Stationary | Rotating Onset detector responses Input display – 3 phases: 1. Colored discs are stationary and change color one by one. 2. All discs rotate without switching colors 3. Discs change color while rotating. Crowded condition Colors switching | | Colors switching Stationary | Rotating In the crowded condition, the memory layers with wide inhibitory spread create inhibitory patches over the change-detector layers that merge together and silence the color change responses during movement. Explanation of Suchow-Alvarez illusion, maybe? See next slide. Color change responses largely silenced

Suchow-Alvarez illusion
When fixating the central spot, with the colored discs stationary, changes in their color are easily seen; when the discs rotate about the center, changes in their color are largely unseen. The foregoing model suggests that the color-change detectors are silenced because of crowding of the dots.

Visual Short-term Memory - Luck & Vogel, 1997
“...visual working memory stores up to 4 integrated objects.” Once I had the memory units and change detectors working, it was only a small step to implement the paradigm widely used to study visual short term memory (see next slide). In Luck & Vogel’s experiments a set of stimuli (e.g. left above) is presented to a subject, followed by a blank screen for a period of time, followed by the same set of stimuli in which one feature has been changed. The subject had to indicate whether a change had occurred. Performance was nearly perfect for up to 4 stimulus objects, and declined with more objects. Accuracy was unaffected by which feature dimension was changed, or whether change occurred in one or two feature dimensions, leading the authors to conclude that “visual working memory stores integrated object percepts rather than individual features”. However the next network, which does not deal in “objects”, at least explicitly, yields very similar results to Luck & Vogel.

Visual Short-term Memory in a Model Network
Vertical Vert+ VertMem Horizontal Horiz+ HorizMem White Black White+ Black+ WhiteMem BlackMem Feature detectors Onset detectors Memory units Change? The network was presented in sequence with (a) a variable number of objects differing in orientation and contrast, (b) a blank screen, and (c) a test screen in which one feature had been changed on 50% of the trials. The entire set of onset detectors was “questioned” as whether a change had occurred. Stimulus position was varied from trial to trial. a b c Model Luck & Vogel No change Performance of the model resembled that of Luck & Vogel’s subjects. In the model, increasing the number of objects produces a crowding effect due to the spread of inhibition over the onset detecting layers exerted by the memory units. Crowding the stimuli still further led to an overall degradation of performance by the model (not shown). Crowding effects may need to be considered in VSTM experiments.

Salience Processing Bottom-up attentional capture. Jan Theeuwes (2010)
In Theeuwes’s paradigm a subject has to search for a singleton shape (e.g. diamond) and reaction time is measured. In the presence of a salient distractor, reaction time is increased. Theeuwes’s interpretation is that salience irresistably captures attention by a bottom-up process; top-down processes can control feature selection, but only later. The following network models the bottom up processes.

Salience Processing in a Model Network
Input scene – rotating discs Activity in Green detector layer Activity in Blue detector layer Red Green Pointer Blue RedIAC GreenIAC BlueIAC SalienceIAC InhSalience InhPtr Yellow YellowIAC White WhiteIAC Black BlackIAC Color layers IAC layers Inhibitory units Activity in Salience IAC layer The inhibitory units sum activity of each color. The more activity due to one color the more inhibited the corresponding colorIAC layer. The least prevalent color is represented most strongly here. x The pointer layer amplifies this The slow dynamics of the InhPtr layer gives it a memory for recent pointer location. This “Inhibition of Return” makes the pointer visit other discs after visiting the most salient, in this case, the yellow.

Activity across layers
Evolution Target Novel network solutions for tracking targets are obtained by applying a genetic algorithm. The existing network-building system is readily adapted for a GA using “chromosomes” specifying layers and connections. Each individual network is evaluated with a 3-phase display: 1. a rotating target disc alone for the first 1/3 of the time; 2. addition of a distractor disc for the next 1/3; 3. addition of a second distractor disc for the remaining 1/3. Activity across layers Chromosome Visual sensor layer L C

Genetic Algorithm Target Fitness = 1000 – dist – hops – # active units
Each layer is evaluated as a possible pointer layer over the 3 testing phases by calculating a Fitness according to the formula below . “Hops” is the number of times a pointer switches nearest object. “# active units” penalizes layers that are active overall. Make Initial Population Evaluate fitness Copy Population Cross-over with p=0.3 on Population Mutate with p=0.05 Temporary Pop Rank Temp Pop by fitness New Population Replace unchanged indivs in Population with top ranked indivs in Temp Pop Crit Fitness achieved? Finish Fitness = 1000 – dist – hops – # active units Target Dist

Cross-over Indiv A L C Indiv B Indiv C Indiv D
While other cross-over schemes could be used, this one was chosen because it allows one or more layers with associated connections to be transferred intact.

Best fitness for each layer
Some Results In this evolution run, 2 families of individuals excelled. The first to emerge (#82) in the 11th generation used Layer2 as pointer; Layer1 was inactive and Layer0 was an amplifying relay. A more efficient exemplar of the other family (#120) emerged at about generation 20 and employed the same connectional motif as #82, making Layer0 the pointer. The graphs show the progress of evolution. On the left the yellow symbols show the emergence of networks like #82; the blue symbols, networks like #120. On the right is shown the convergence to good solutions involving all IAC units and 2:1 inhibitory: excitatory synapses. #120 Fitness= #82 Fitness=920.1 L0 L0 L1 L2 Mean & Best Fitness, #s of IAC & AUX units, #s of Excitatory & Inhibitory Synapses Best fitness for each layer InfoFile21.txt popsize=10

How does the evolved network work?
Network #120 can be simplified to the network shown here, with even better tracking results (Fitness = 942.2). Tracking performance during the 3-phases of testing Dist discs // CONNECTIONS ----- Connect White to Layer0 //radius 0.25 // gain Connect Layer0 to Layer0 24 0.150 1000 -0.050 L0 Interpretation: The excitatory feedback connection amplifies and spreads the excitation over Layer0 due to the target input. The weak, widespread, inhibitory connection limits the spread of excitation and prevents the distractor inputs driving Layer0 units above their threshold. This network appears to track as well as my 3-layer pointer network. It remains to be seen how well it performs in the foregoing applications Activity over Layer0 “Training” condition Still works well with different numbers of distractors Triumph of Evolution over Intelligent Design!

Conclusions Simple networks can clarify attentional mechanisms, yield insights, and suggest possible brain circuitry. MOT – the basic function is adequately modeled without synchrony etc. but orchestrating inhibition for variable numbers of targets is complex and likely involves top-down control. Networks show promise in clarifying the processing of change and saliency and their roles in attention and VSTM. For discovering network mechanisms, evolution extends imagination.

Modeling Visual Attention and some other things

Similar presentations

Presentation on theme: "Modeling Visual Attention and some other things"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Modeling Visual Attention and some other things

Similar presentations

Presentation on theme: "Modeling Visual Attention and some other things"— Presentation transcript:

Similar presentations

About project

Feedback