Adaptive Control of Gaze and Attention Mary Hayhoe University of Texas at Austin Jelena Jovancevic University of Rochester Brian Sullivan University of.

Slides:



Advertisements
Similar presentations
Introduction to Eye Tracking
Advertisements

Chapter 3: Neural Processing and Perception. Lateral Inhibition and Perception Experiments with eye of Limulus –Ommatidia allow recordings from a single.
Perception Chapter 9: Event Perception Event Perception: an event is defined as a change in both time and space. Thus far we have discussed how our visual.
How are Memory and Attention related in Working Memory? Elke Lange, Christian Starzynski, Ralf Engbert University of Potsdam.
Detail to attention: Exploiting Visual Tasks for Selective Rendering Kirsten Cater 1, Alan Chalmers 1 and Greg Ward 2 1 University of Bristol, UK 2 Anyhere.
Modeling the Brain’s Operating System Dana H. Ballard Computer Science Dept. University of Austin Texas, NY, USA International Symposium “Vision by Brains.
Visual Attention Attention is the ability to select objects of interest from the surrounding environment A reliable measure of attention is eye movement.
Chapter 6: Visual Attention. Overview of Questions Why do we pay attention to some parts of a scene but not to others? Do we have to pay attention to.
Spatial Neglect and Attention Networks
Neuronal Coding in the Retina and Fixational Eye Movements Christian Mendl, Tim Gollisch Max Planck Institute of Neurobiology, Junior Research Group Visual.
Attention I Attention Wolfe et al Ch 7. Dana said that most vision is agenda-driven. He introduced the slide where the people attended to the many weird.
Covert Attention Mariel Velez What is attention? Attention is the ability to select objects of interest from the surrounding environment Involuntary.
Lesions of Retinostriate Pathway Lesions (usually due to stroke) cause a region of blindness called a scotoma Identified using perimetry note macular sparing.
Test on Friday!. Lesions of Retinostriate Pathway Lesions (usually due to stroke) cause a region of blindness called a scotoma Identified using perimetry.
Control of Attention and Gaze in the Natural World.
Bridget C. Hendricks, James Comerford, Frank Thorn, and Eli Peli The New England College of Optometry, Boston, MA Contrast Matching With Complex and Natural.
Visual Pathways W. W. Norton Primary cortex maintains distinct pathways – functional segregation M and P pathways synapse in different layers Ascending.
Searching for the NCC We can measure all sorts of neural correlates of these processes…so we can see the neural correlates of consciousness right? So what’s.
How does the visual system represent visual information? How does the visual system represent features of scenes? Vision is analytical - the system breaks.
Working Memory Active short-term memory – Maintenance of task-relevant information online. Keeps relevant information available. Like RAM in a computer.
Visual Attention More information in visual field than we can process at a given moment Solutions Shifts of Visual Attention related to eye movements Some.
Jeff B. Pelz, Roxanne Canosa, Jason Babcock, & Eric Knappenberger Visual Perception Laboratory Carlson Center for Imaging Science Rochester Institute of.
Sensory Memory and Working Memory. Sensory Memory Brief Iconic/echoic High capacity Pre-attentive Is there a Neural Correlate of Sensory Memory?
The ‘when’ pathway of the right parietal lobe L. Battelli A. Pascual - LeoneP. Cavanagh.
Studying Visual Attention with the Visual Search Paradigm Marc Pomplun Department of Computer Science University of Massachusetts at Boston
Change blindness and time to consciousness Professor: Liu Student: Ruby.
Manipulating Attention in Computer Games Matthias Bernhard, Le Zhang, Michael Wimmer Institute of Computer Graphics and Algorithms Vienna University of.
Adaptive, behaviorally gated, persistent encoding of task-relevant auditory information in ferret frontal cortex.
Active Vision Key points: Acting to obtain information Eye movements Depth from motion parallax Extracting motion information from a spatio-temporal pattern.
Eye movements: Lab # 1 - Catching a ball. How do we use our eyes to catch balls? What information does the brain need? Most experiments look at simple.
1 Computational Vision CSCI 363, Fall 2012 Lecture 31 Heading Models.
Describe 2 kinds of eye movements and their function. Describe the specialized gaze patterns found by Land in cricket. Describe your results in the ball-catching.
Lecture 2b Readings: Kandell Schwartz et al Ch 27 Wolfe et al Chs 3 and 4.
Visual Attention Derek Hoiem March 14, 2007 Misc Reading Group.
黃文中 Introduction The Model Results Conclusion 2.
Subject wearing a VR helmet immersed in the virtual environment on the right, with obstacles and targets. Subjects follow the path, avoid the obstacels,
A human parietal face area contains aligned head-centered visual and tactile maps Sereno & Huang (2006)
Chapter 3: Neural Processing and Perception. Neural Processing and Perception Neural processing is the interaction of signals in many neurons.
Seeing and Acting in a Virtual World PSY 341K Class hours: Tues, Thurs Room 4-242, SEAY Instructor: Professor Mary Hayhoe SEAY Room X
A new neural framework for visuospatial processing Group #4 Alicia Iafonaro Alyona Koneva Barbara Kim Isaac Del Rio.
Summary of results. Reiterate goal of investigation: How general is anticipatory behavior observed by Land & McCleod? Found: Untrained subjects exhibit.
Primary Cortical Sub-divisions The mapping of objects in space onto the visual cortex.
Autonomous Robots Vision © Manfred Huber 2014.
Computer Science Readings: Reinforcement Learning Presentation by: Arif OZGELEN.
Give examples of the way that virtual reality can be used in Psychology.
An Eyetracking Analysis of the Effect of Prior Comparison on Analogical Mapping Catherine A. Clement, Eastern Kentucky University Carrie Harris, Tara Weatherholt,
How is vision used to catch a ball?
LOGO Change blindness in the absence of a visual disruption Professor: Liu Student: Ruby.
 Example: seeing a bird that is singing in a tree or miss a road sign in plain sight  Cell phone use while driving reduces attention and memory for.
8. What are the advantages and disadvantages of using a virtual reality environment to study the brain and behavior? 9.Give examples of the way that virtual.
Control of Attention and Gaze in Natural Environments.
What is meant by “top-down” and “bottom-up” processing? Give examples of both. Bottom up processes are evoked by the visual stimulus. Top down processes.
Control of Attention and Gaze in Natural Environments.
Eye Movements, Attention, and Working Memory in Natural Tasks Mary Hayhoe Center for Perceptual Systems.
Describe how reaching and grasping abilities develop in the first year of life.
Eye Movements and Working Memory Marc Pomplun Department of Computer Science University of Massachusetts at Boston Homepage:
What is meant by “top-down” and “bottom-up” processing? Give examples of both. Bottom up processes are evoked by the visual stimulus. Top down processes.
Chapter 5 Motor Programs 5 Motor Programs C H A P T E R.
Neural Circuitry underlying generation of saccades and pursuit Lab # 1 - Catching a ball - What to expect/ think about.
Eye movements: Lab # 1 - Catching a ball
What visual image information is needed for the things we do? How is vision used to acquire information from the world?
Lab 2 Issues: Needed to adapt to the “normal environment”. We would have liked to see more rapid adjustment and a stable baseline. Most subjects adapted.
Chapter 9: Perceiving Color. Figure 9-1 p200 Figure 9-2 p201.
Control of Attention and Gaze in Natural Environments
Attentional Modulations Related to Spatial Gating but Not to Allocation of Limited Resources in Primate V1  Yuzhi Chen, Eyal Seidemann  Neuron  Volume.
Saccades actively maintain perceptual continuity
Consequences of the Oculomotor Cycle for the Dynamics of Perception
Wallis, JD Helen Wills Neuroscience Institute UC, Berkeley
Consequences of the Oculomotor Cycle for the Dynamics of Perception
The Normalization Model of Attention
Presentation transcript:

Adaptive Control of Gaze and Attention Mary Hayhoe University of Texas at Austin Jelena Jovancevic University of Rochester Brian Sullivan University of Texas at Austin

Selecting information from visual scenes What controls the selection process?

Fundamental Constraints Acuity is limited. High acuity only in central retina. Attention is limited. Not all information in the image can be processed. Visual Working Memory is limited. Only a limited amount of information can be retained across gaze positions.

target selection signals to muscles inhibits SC saccade decision saccade command planning movements Neural Circuitry for Saccades

Image properties eg contrast, edges, chromatic saliency can account for some fixations when viewing images of scenes (eg Itti & Koch, 2001; Parkhurst & Neibur, 2003; Mannan et al, 1997). Saliency and Attentional Capture

Saliency is computed from the image using feature maps (color, intensity, orientation) at different spatial scales, filtered with a center-surround mechanism, and then summed. Gaze goes to the peak. From Itti & Koch (2000).

Certain stimuli thought to capture attention or gaze in a bottom-up manner, by interrupting ongoing visual tasks. (eg sudden onsets, moving stimuli, etc Theeuwes et al, 2001 etc ) This is conceptually similar to the idea of salience. Attentional Capture

Limitations of Saliency Models Important information may not be salient eg an irregularity in the sidewalk. Salient information may not be important - eg retinal image transients from eye/body movements. Doesn’t account for many observed fixations, especially in natural behavior - previous lecture. (Direct comparisons: Rothkopf et al 2007, Stirk & Underwood, 2007) Will this work in natural vision?

Foot placement Obstacle avoidance Heading Viewing pictures of scenes is different from acting within scenes. Need to Study Natural Behavior

Dynamic Environments

The Problem Any selective perceptual system must choose what to select, and when to select it. How is this done given that the natural world is unpredictable? (The “initial access” problem, Ullman, 1984) Answer - it’s not all that unpredictable and we’re really good at learning it.

Is bottom up capture effective in natural environments? Looming stimuli seem like good candidates for bottom-up attentional capture (Regan & Gray, 200; Franceroni & Simons,2003).

Human Gaze Distribution when Walking Experimental Question: How sensitive are subjects to unexpected salient events? General Design: Subjects walked along a footpath in a virtual environment while avoiding pedestrians. Do subjects detect unexpected potential collisions?

Virtual Walking Environment Virtual Research V8 Head Mounted Display with 3 rd Tech HiBall Wide Area motion tracker V8 optics with ASL501 Video Based Eye Tracker (Left) and ASL 210 Limbus Tracker (Right) D&c emily Video Based Tracker Limbus Tracker

Virtual Environment Bird’s Eye view of the virtual walking environment. Monument

1 - Normal Walking: “Avoid the pedestrians while walking at a normal pace and staying on the sidewalk.” 2 - Added Task: Identical to condition 1. Additional instruction:” Follow the yellow pedestrian.” Normal walking Follow leader Experimental Protocol

Distribution of Fixations on Pedestrians Over Time -Pedestrians fixated most when they first appear -Fewer fixations on pedestrians in the leader trials Time since the appearance onscreen (sec) Probability of fixation Normal Walking Follow Leader

Pedestrians’ paths Colliding pedestrian path What Happens to Gaze in Response to an Unexpected Salient Event? The Unexpected Event: Pedestrians veered onto a collision course for 1 second (10% frequency). Change occurs during a saccade. Does a potential collision evoke a fixation?

Fixation on Collider

No Fixation During Collider Period

Probability of Fixation During Collision Period Pedestrians’ paths Colliding pedestrian path More fixations on colliders in normal walking. Controls Colliders Normal Walking

Small increase in probability of fixating the collider could be caused either by a weak effect of attentional capture or by active, top-down search of the peripheral visual field. Why are colliders fixated?

Probability of Fixation During Collision Period Pedestrians’ paths Colliding pedestrian path More fixations on colliders in normal walking. No effect in Leader condition Controls Colliders Normal Walking Follow Leader

Small increase in probability of fixating the collider could be caused either by a weak effect of attentional capture or by active, top-down search of the peripheral visual field. Failure of collider to attract attention with an added task (following) suggests that detections result from active search. Why are colliders fixated?

Prior Fixation of Pedestrians Affects Probability of Collider Fixation Fixated pedestrians may be monitored in periphery, following the first fixation This may increase the probability of fixation of colliders Conditional probabilities Conditional probabilities

Other evidence for detection of colliders? Do subjects slow down during collider period? Subjects slow down, but only when they fixate collider. Implies fixation measures “detection”. Slowing is greater if not previously fixated. Consistent with peripheral monitoring of previously fixated pedestrians.

Detecting a Collider Changes Fixation Strategy Longer fixation on pedestrians following a detection of a collider “Miss”“Hit” Time fixating normal pedestrians following detection of a collider Normal Walking Follow Leader

Colliders are fixated with equal probability whether or not they increase speed (25%) when they initiate the collision path. No Leader Effect of collider speed

No systematic effects of stimulus properties on fixation.

Summary Subjects fixate pedestrians more when they first appear in the field of view, perhaps to predict future path. A potential collision can evoke a fixation but the increase is modest. Potential collisions do not evoke fixations in the leader condition. Collider detection increases fixations on normal pedestrians.

To make a top-down system work, Subjects need to learn statistics of environmental events and distribute gaze/attention based on these expectations. Subjects rely on active search to detect potentially hazardous events like collisions, rather than reacting to bottom-up, looming signals (attentional capture).

Possible reservation… Perhaps looming robots not similar enough to real pedestrians to evoke a bottom-up response.

Walking -Real World Experimental question: Do subjects learn to deploy gaze in response to the statistics of environmental events?

Experimental Setup System components: Head mounted optics (76g), Color scene camera, Modified DVCR recorder, Eye Vision Software, PC Pentium 4, 2.8GHz processor A subject wearing the ASL Mobile Eye

Occasionally some pedestrians veered on a collision course with the subject (for approx. 1 sec) 3 types of pedestrians: Trial 1: Rogue pedestrian - always collides Safe pedestrian - never collides Unpredictable pedestrian - collides 50% of time Trial 2: Rogue Safe Safe Rogue Unpredictable - remains same Experimental Design (ctd)

Fixation on Collider

Effect of Collision Probability Probability of fixating increased with higher collision probability. ( Probability is computed during period in the field of view, not just collision interval.)

Detecting Collisions: proactive or reactive? Probability of fixating risky pedestrian similar, whether or not he/she actually collides on that trial.

Almost all of the fixations on the Rogue were made before the collision path onset (92%). Thus gaze, and attention are anticipatory.

Effect of Experience Safe and Rogue pedestrians interchange roles.

Learning to Adjust Gaze Changes in fixation behavior fairly fast, happen over 4-5 encounters (Fixations on Rogue get longer, on Safe shorter) N=5

Shorter Latencies for Rogue Fixations Rogues are fixated earlier after they appear in the field of view. This change is also rapid.

Effect of Behavioral Relevance Fixations on all pedestrians go down when pedestrians STOP instead of COLLIDING. STOPPING and COLLIDING should have comparable salience. Note the the Safe pedestrians behave identically in both conditions - only the Rogue changes behavior.

Fixation probability increases with probability of a collision path. Fixation probability similar whether or not the pedestrian collides on that encounter. Fixations are anticipatory. Changes in fixation behavior fairly rapid (fixations on Rogue get longer, and earlier, and on Safe shorter, and later) Summary

Neural Substrate for Learning Gaze Patterns Dopaminergic neurons in basal ganglia signal expected reward. Neurons at all levels of saccadic eye movement circuitry are sensitive to reward. (eg Hikosaka et al, 2000; 2007; Platt & Glimcher, 1999; Sugrue et al, 2004; Stuphorn et al, 2000 etc) This provides the neural substrate for learning gaze patterns in natural behavior, and for modelling these processes using Reinforcement Learning. (eg Sprague, Ballard, Robinson, 2007)

target selection signals to muscles inhibits SC saccade decision saccade command planning movements Neural Circuitry for Saccades

Virtual Humanoid has a small library of simple visual behaviors: –Sidewalk Following –Picking Up Blocks –Avoiding Obstacles Each behavior uses a limited, task-relevant selection of visual information from scene. Walter the Virtual Humanoid Sprague, Ballard, & Robinson TAP (2007) R L Modeling of Gaze Control

Walter learns where/when to direct gaze using reinforcement learning algorithm. Walter’s sequence of fixations obstacles sidewalk litter

Subjects must learn the statistical structure of the world and allocate attention and gaze accordingly. Control of gaze, and attention, is proactive, not reactive, and thus is model based. Anticipatory use of gaze is probably necessary for much visually guided behavior, because of visuo-motor delays. Subjects behave very similarly despite unconstrained environment and absence of instructions. Need reinforcement learning models to account for control of attention and gaze in natural world. Conclusions

Task-based models can do a good job by learning scene statistics (Real walking: Jovancevic & Hayhoe, 2007) Another solution: attention may be attracted to deviations from expectations based on memory representation of scene. How do subjects perceive unexpected events? ctd

Hollingworth & Henderson (2002) argue that elaborate representations of scenes are built up in long-term memory. To detect a change, subjects may compare the current image with the learnt representation. If so, such representations might serve as a basis for attracting attention to changed regions of scenes (eg Brockmole & Henderson, 2005).

Thus subjects should be more sensitive to changes in familiar environments than to unfamiliar ones because the memory representation is well-defined.

Overview of the Experiment Question: If subjects become familiar with an environment, are changes more likely to attract attention? (cf Brockmole & Henderson, 2005). Design: Subjects walked along a footpath in a virtual environment including both stable & changing objects while avoiding pedestrians.

Virtual Environment

MONUMENT

Experimental Setup Video Based Tracker V8 optics with ASL501 Video Based Eye Tracker (Left) Virtual Research V8 Head Mounted Display with 3rd Tech HiBall Wide Area motion tracker

Replaced DisappearanceNew Object Moved Object Object Changes Stable Objects

Procedure Two groups, 19 subjects/ group: –Inexperienced Group: One familiarization trial –Experienced Group: 19 familiarization laps before the changes occurred

Total gaze duration on changed objects were much longer after experience in the environment. Fixation durations on stable objects were almost the same for the two groups. Stable ObjectsChanging Objects Experienced Inexperienced Average gaze duration/object/lap

Effects of Different Changes Stable Replaced Disappeared Moved New Experienced Inexperienced

Distribution of gaze Object fixations account for only a small percentage of gaze allocation. InexperiencedExperienced

Change Blindness Probability of being aware of the changes was correlated with gaze duration on the changing objects (rho=0.59). Awareness of the changes was low, suggesting that fixations are a more sensitive indicator. Change blindness in the natural world may be fairly uncommon, because most scenes are familiar.

Suggests we learn the structure of natural scenes over time, and that attention is attracted by deviations from the normal state. These results are consistent with Brockmole & Henderson (2005) and generalize the result to immersive environments and long time scales. Consistent with Predictive Coding models of cortical function.

Predictive Coding: Input is matched to stored representation. - + U U T e = I - Ur LGN Cortex r I Rao & Ballard, Top-down signal based on memory Bottom-up input from retina Difference signal reveals mis-match Unmatched residual signal prompts a re-evaluation of image data and may thereby attract attention.

A mechanism that attracts attention and gaze based on mis-match with a model is similar to the idea of Bayesian “Surprise” (Itti & Baldi, 2005). One question is where the prior comes from. Itti & Baldi calculate surprise with respect to image changes over a short time scale. Here we suggest surprise is measured with respect to a memory representation. “Surprise”

Conclusion Familiarity with the visual environment increases the probability that gaze will be attracted to changes in the scene. A mechanism whereby attention is attracted by deviations from a learnt representation may serve as a useful adjunct to task-driven fixations when unexpected events occur in natural visual environments.

Thank You

Behaviors Compete for Gaze/ Attentional Resources The probability of fixation is lower for both Safe and Rogue pedestrians in both the Leader conditions than in the baseline condition. Note that all pedestrians are allocated fewer fixations, even the Safe ones.