Control of Attention and Gaze in Natural Environments

Slides:

Advertisements

Similar presentations

Modeling the Brain’s Operating System Dana H. Ballard Computer Science Dept. University of Austin Texas, NY, USA International Symposium “Vision by Brains.

Advertisements

Eye Movements of Younger and Older Drivers Professor: Liu Student: Ruby.

Attention I Attention Wolfe et al Ch 7. Dana said that most vision is agenda-driven. He introduced the slide where the people attended to the many weird.

Chapter 6: Visual Attention. Scanning a Scene Visual scanning – looking from place to place –Fixation –Saccadic eye movement Overt attention involves.

Control of Attention and Gaze in the Natural World.

Visual attention reveals changing color in moving objects James E. Hoffman and Scott McLean University of Delaware.

Software Quality Control Methods. Introduction Quality control methods have received a world wide surge of interest within the past couple of decades.

Uncertainty, Neuromodulation and Attention Angela Yu, and Peter Dayan.

Cognitive demands of hands-free- phone conversation while driving Professor : Liu Student: Ruby.

1 Perception and VR MONT 104S, Fall 2008 Session 13 Visual Attention.

Describe 2 kinds of eye movements and their function. Describe the specialized gaze patterns found by Land in cricket. Describe your results in the ball-catching.

Subject wearing a VR helmet immersed in the virtual environment on the right, with obstacles and targets. Subjects follow the path, avoid the obstacels,

Summary of results. Reiterate goal of investigation: How general is anticipatory behavior observed by Land & McCleod? Found: Untrained subjects exhibit.

What factors influence movement or action? Biomechanical (e.g., size, shape, mass, strength, flexibility, coordination of body/body parts) Environmental.

Basic components of memory

Give examples of the way that virtual reality can be used in Psychology.

An Eyetracking Analysis of the Effect of Prior Comparison on Analogical Mapping Catherine A. Clement, Eastern Kentucky University Carrie Harris, Tara Weatherholt,

8. What are the advantages and disadvantages of using a virtual reality environment to study the brain and behavior? 9.Give examples of the way that virtual.

Control of Attention and Gaze in Natural Environments.

Picture change during blinks: looking without seeing and seeing without looking Professor: Liu Student: Ruby.

Adaptive Control of Gaze and Attention Mary Hayhoe University of Texas at Austin Jelena Jovancevic University of Rochester Brian Sullivan University of.

What is meant by “top-down” and “bottom-up” processing? Give examples of both. Bottom up processes are evoked by the visual stimulus. Top down processes.

Control of Attention and Gaze in Natural Environments.

Describe how reaching and grasping abilities develop in the first year of life.

What is meant by “top-down” and “bottom-up” processing? Give examples of both. Bottom up processes are evoked by the visual stimulus. Top down processes.

What visual image information is needed for the things we do? How is vision used to acquire information from the world?

Lab 2 Issues: Needed to adapt to the “normal environment”. We would have liked to see more rapid adjustment and a stable baseline. Most subjects adapted.

LOGO Visual Attention in Driving: The Effects of Cognitive Load and Visual Disruption Professor: Liu Student: Ruby.

Providing Feedback During the Learning Experience

Investigating the combined effects of word frequency and contextual predictability on eye movements during reading Christopher J. Hand Glasgow Language.

Working Memory Model Baddeley and Hitch (1974) developed an alternative model of short-term memory which they called working memory.

Debugging Intermittent Issues

Research questions Daniel Simons and Christopher Chabris built on previous research from Neisser (1975) to investigate the nature of inattentional blindness.

Chapter 10 Preparing for Exams © Routledge/Taylor & Francis 2016.

Alejandro Lleras & Simona Buetti

Unit 3 – Driver Physical Fitness

CHAPTER 4 Designing Studies

Debugging Intermittent Issues

Statistical Data Analysis

Swapping Segmented paging allows us to have non-contiguous allocations

Artificial Intelligence Lecture No. 5

Perception Unit How can we so easily perceive the world? Objects may be moving, partially hidden, varying in orientation, and projected as a 2D image.

EXPOSURE BASICS.

Lecture XVII: Distributed Systems Algorithms Inspired by Biology

LESSON 12 - Loops and Simulations

Driving in City Traffic

Navigation In Dynamic Environment

CHAPTER 4 Designing Studies

Tips to keep you safe while you are on the road…

Issues in measuring sensory-motor control performance of human drivers: The case of cognitive load and steering control Johan Engström, Volvo Technology.

Processor Fundamentals

Sample slides from the Drivers Edge USA curriculum

CHAPTER 4 Designing Studies

Statistical Reasoning December 8, 2015 Chapter 6.2

Statistical Data Analysis

CHAPTER 4 Designing Studies

How To Be A More Perceptive Driver

CHAPTER 4 Designing Studies

Wallis, JD Helen Wills Neuroscience Institute UC, Berkeley

Mapping and Cracking Sensorimotor Circuits in Genetic Model Organisms

CHAPTER 4 Designing Studies

CHAPTER 4 Designing Studies

EXPOSURE BASICS.

lesson 11.2 BICYCLES AND MOPEDS

CHAPTER 4 Designing Studies

CHAPTER 4 Designing Studies

Stan Van Pelt and W. Pieter Medendorp

CHAPTER 4 Designing Studies

Judging Peripheral Change: Attentional and Stimulus-Driven Effects

Presentation transcript:

Control of Attention and Gaze in Natural Environments

Selecting information from visual scenes What happens when we’re in a visual scene like this? Natural scenes contain much more information than we can perceive in a brief exposure. If we view this scene for a sec or two, we’ll move our gaze around the image, perhaps looking at the bicycle in the center, or large objects like the building. This process of selecting particular information in the scene isn’t random - but we really don’t know what determines where we look & what we attend to. What controls the selection process?

What controls these processes? Fundamental Constraints Acuity is spatially restricted. Attention is limited. Visual Working Memory is limited. Humans must select a limited subset of the available information in the environment. So when viewing scenes like this, we must deal with fundamental constraints>. Humans select a limited amt of the info from the vis env, and can retain only a limited amt. What controls these processes. Only a limited amount of information can be retained. What controls these processes?

Now this is not a question we usually ask Now this is not a question we usually ask. Typically we ask slightly different questions - here’s an example. See a sequence of two brief images of simple shapes - one object is changed in the second view. Your job is to identify the changed item.

Anyone see the one that changed Anyone see the one that changed? If you happened to be looking at the right spot, your may have seen it change - or you might have looked at a couple of the objects, but then forgotten what they were like between the two presentations - When people do expts like this they find that you can remember about 4 items. Gives us a visceral sense of limitations

Saliency - bottom-up Image properties eg contrast, edges, chromatic saliency can account for some fixations when viewing images of scenes. One approach to the problem is to try to predict where you look from an examination of the properties of the image. (Exogenous atten) gaze typically goes wit attn

Limitations of Saliency Models Important information may not be salient eg Stop signs in a cluttered environment. Salient information may not be important - eg retinal image transients from eye/body movements. Doesn’t account for many observed fixations, especially in natural behavior (eg Land etc). However, there are important ways in which saliency models aren’t adequate to explain distribution of gaze in a scene.

Need to Study Natural Behavior Natural vision is not the same as viewing pictures. Behavioral goals determine what information is needed. Task structure (often) allows interpretation of role of fixations. Inclined to think of vision as viewing a picture, but more often we’re acting in the environment - also, if viewing a 2D image, don’t really know what obs is doing - maybe remembering objects, maybe judging image quality - don’t really know - if tasl requiring overt actions, have a good idea of what the obs is doing form moment to moment Need for action means diff info is required Not only is stim different (2 vs 3 D, fov etc) the info you need is different

Top-down factors Viewing pictures of scenes is different from acting within scenes. Heading Obstacle avoidance Other problem with trying to explain fix patterns or dsn of attn by looking at fixations of images is that real vision really is differnet form looking at an image of a scene. If you’re in a scene you need different kinds of info from when you’re looking at an image. For example.. Can think of natural vision as being composed of a set of mini-tasks like this, and gaze needs to be doled out in the service of each of these tasks.Whwen looking at an image, not clear what obs is doing - recog, mem? Foot placement

To what extent is the selection of information from scenes determined by cognitive goals (ie top-down) and how much by the stimulus itself (ie salient regions - bottom-up effects)? 14

Modeling Top Down Control Walter the Virtual Humanoid Virtual Humanoid has a small library of simple visual behaviors: Sidewalk Following Picking Up Blocks Avoiding Obstacles COULD A PURELY TD SYS WORK? This idea is behind the work of Sprague & Ballard, who developed a model of gaze behavior in a walking context. This is Walter, a virtual agent, Walter’s task is to walk through this virtual env. Walter uses vision to do 3 things. The agent has a small library of behavioral routines that need visual input. Through reinforcement learning the humanoid learns the appropriate policy by which to schedule extraction of visual information – In this model a top down scheduler to acquire visual information is adequate for obstacle avoidance and Walter’s other tasks Sprague & Ballard (2003) Each behavior uses a limited, task-relevant selection of visual information from scene. This is computationally efficient.

Walter’s sequence of fixations litter obstacles sidewalk Model suggests that such a system is feasible - subject has a set of sub-tasks to perform, and gaze reflects performance of sub-tasks. Walter learns where/when to direct gaze using reinforcement learning algorithm.

Sprague & Ballard (VSS 2004) What about unexpected events? Walter the Virtual Humanoid Sprague & Ballard (VSS 2004) However what Walter would be able to handle is an unexpected salient event, such as appearance of another pedestrian in the field of view Walter would be in trouble because he doesn’t have looking for other pedestrians in his behavioral repertoire

Dynamic Environments However what Walter would be able to handle is an unexpected salient event, such as appearance of another pedestrian in the field of view Walter would be in trouble because he doesn’t have looking for other pedestrians in his behavioral repertoire

Computational load Unexpected events Bottom-up Top-down Expensive Can handle unexpected salient events Top-down Efficient How to deal with unexpected events? Top down systems are more efficient because they select limited, task-specific inf from the image, but will miss things not on the agenda. Bottom up systems that do a bunch of pre-processing of the image can catch a wider variety of information, but are comp expensive. How would a top down system deal with unexpected events? Through learning or frequent checking?

18

Reward weights estimated from human behavior using Inverse Avatar path Human path Reward weights estimated from human behavior using Inverse Reinforcement Learning - Rothkopf 2008. 19

Driving Simulator

Gaze distribution is very different for different tasks Time fixating Intersection. 21

The Problem Any selective perceptual system must choose the right visual computations, and when to carry them out. How do we deal with the unpredictability of the natural world? Answer - it’s not all that unpredictble and we’re really good at learnig it. So this is the essential problem for top-down systems - How do you know what to look for, and when to look for it? This tight link between vision and task demands brings up the problem of scheduling behaviors. The visual system has limited capacity and computational ability. How does the visual system manage between the current task goals and dealing with new stimuli that may change task demands?. How does this selection occur? Learning, Frequent checking Answer - it’s not all that unpredictable and we’re really good at learning it.

Human Gaze Distribution when Walking Experimental Question: How sensitive are subjects to unexpected salient events? General Design: Subjects walked along a footpath in a virtual environment while avoiding pedestrians. Do subjects detect unexpected potential collisions? To examine these tradeoffs we designed a walking experiment in virtual reality in which we could manipulate the bottom up signal, What happens if ped suddenly starts to come at you - looming stim.

Virtual Walking Environment Virtual Research V8 Head Mounted Display with 3rd Tech HiBall Wide Area motion tracker V8 optics with ASL501 Video Based Eye Tracker (Left) and ASL 210 Limbus Tracker (Right) D&c emily Limbus Tracker Our lab has several systems integrated to allow such a virtual reality experiment. We have a head mounted display with two eyetrackers installed. In the picture on the right hand side you can see the video based tracker for POG recording which is complemented by an limbus tracker used for saccade contingent updates. To allow the subjects to walk a sufficient length - a wide area motion tracking system is used to update the view inside the display while allowing the subject to walk the ~27 meter perimeter of rectangular path in the lab. Video Based Tracker

Bird’s Eye view of the virtual walking environment. Virtual Environment Monument Bird’-eye view of the foot path that the subjects walked – 6 subjects each performed 6 trials of walking. 3 – in the no-following onditon and 3 in the following. Each trial consisted of walking around six times about 3-4minutes and Short clip of normal speed Bird’s Eye view of the virtual walking environment.

Experimental Protocol 1 - Normal Walking: Avoid the pedestrians while walking at a normal pace and staying on the sidewalk. 2 - Added Task: Identical to condition 1. However, the additional instruction of following a yellow pedestrian was given Normal walking Side-by-side pictures of the two conditions (explanation of them) 3 blocks of 6 circuits of each. Follow leader

What Happens to Gaze in Response to an Unexpected Salient Event? Pedestrians’ paths Colliding pedestrian path The Unexpected Event: Pedestrians on a non-colliding path changed onto a collision course for 1 second (10% frequency). Change occurs during a saccade. Pedestrian must be 3-5 meters away and the angular delta could be no greater than 30degs. Contingent on saccade. Does a potential collision evoke a fixation?

Fixation on Collider In this clip a purple pedestrian appears in the visual field shortly after which the pedestrian starts on a collision path. The subject does not fixate the collider pedestrian during its collision course.

No Fixation During Collider Period Purple ped – turn corner fixate ped – look to path maintain fix during collision period and as it passes

Probability of Fixation During Collision Period Pedestrians’ paths Colliding pedestrian path More fixations on colliders in normal walking. No effect in Leader condition So collision event does seem to attract gaze, but only to a limited extent, and not if you have the added task of following a leader. . Normal Walking Follow Leader Controls Colliders

Why are colliders fixated? Small increase in probability of fixating the collider. Failure of collider to attract attention with an added task (following) suggests that detections result from top-down monitoring.

Detecting a Collider Changes Fixation Strategy Time fixating normal pedestrians following detection of a collider Normal Walking Follow Leader TD systems rely on estimating likelihood of environmental events, so detection of an unlikely or signif event like a potential collision might lead subjects to spend more time monitoring peds. This indicates that subjects can quickly modify their fixation strategy in response to information that indicates a need to change policy. “Miss” “Hit” Longer fixation on pedestrians following a detection of a collider

Subjects rely on active search to detect potentially hazardous events like collisions, rather than reacting to bottom-up, looming signals. To make a top-down system work, Subjects need to learn statistics of environmental events and distribute gaze/attention based on these expectations.

Possible reservations… Perhaps looming robots not similar enough to real pedestrians to evoke a bottom-up response.

Walking -Real World Experimental question: Do subjects learn to deploy gaze in response to the probability of environmental events? General design: Subjects walked on an oval path and avoided pedestrians To examine these tradeoffs we designed a walking experiment in virtual reality in which we could manipulate the general task demands as well as a salient bottom up signal, used to probe the questions we have framed. Explain block heads

A subject wearing the ASL Mobile Eye Experimental Setup A subject wearing the ASL Mobile Eye System components: Head mounted optics (76g), Color scene camera, Modified DVCR recorder, Eye Vision Software, PC Pentium 4, 2.8GHz processor

Experimental Design (ctd) Occasionally some pedestrians veered on a collision course with the subject (for approx. 1 sec) 3 types of pedestrians: Trial 1: Rogue pedestrian - always collides Safe pedestrian - never collides Unpredictable pedestrian - collides 50% of time Trail 2: Rogue Safe Safe Rogue Unpredictable - remains same

Fixation on Collider

Effect of Collision Probability Probability of fixating increased with higher collision probability.

Detecting Collisions: pro-active or reactive? Note this may seem obvious, but in contrast, lot of work trying to predict fix locs by analyzing properties of image. Not clear what role of saliency might be in normal vision Body motion generates image motion over whole retina Probability of fixating risky pedestrian similar, whether or not he/she actually collides on that trial.

Learning to Adjust Gaze Changes in fixation behavior fairly fast, happen over 4-5 encounters (Fixations on Rogue get longer, on Safe shorter)

Shorter Latencies for Rogue Fixations Rogues are fixated earlier after they appear in the field of view. This change is also rapid.

Effect of Behavioral Relevance Fixations on all pedestrians go down when pedestrians STOP instead of COLLIDING. STOPPING and COLLIDING should have comparable salience. Note the the Safe pedestrians behave identically in both conditions - only the Rogue changes behavior.

Fixation probability increases with probability of a collision. Fixation probability similar whether or not the pedestrian collides on that encounter. Changes in fixation behavior fairly rapid (fixations on Rogue get longer, and earlier, and on Safe shorter, and later)

Our Experiment: Allocation of gaze when driving. Effect of task on gaze allocation. Does task affect ability to detect unexpected events? Drive along street with other cars and pedestrians. 2 instructions - drive normally or follow a lead car. Measure fixation patterns in the two conditions. A competing task of following a leader diminished fixations on colliders and this is consistent with a top down strategy (and reprioritizing resources).

Note this may seem obvious, but in contrast, lot of work trying to predict fix locs by analyzing properties of image. Not clear what role of saliency might be in normal vision Body motion generates image motion over whole retina

Conclusions Subjects must learn the probabilistic structure of the world and allocate gaze accordingly. That is, gaze control is model-based. Subjects behave very similarly despite unconstrained environment and absence of instructions. Control of gaze is proactive, not reactive, and thus is model based. Anticipatory use of gaze is probably necessary for much visually guided behavior.

Behaviors Compete for Gaze/ Attentional Resources Competes for gaze resources, and we are inferring – attentional resources The probability of fixation is lower for both Safe and Rogue pedestrians in both the Leader conditions than in the baseline condition . Note that all pedestrians are allocated fewer fixations, even the Safe ones.

Conclusions Data consistent with task-driven sampling of visual information rather than bottom up capture of attention - No effect of increased salience of collision event. - Colliders fail to attract gaze in the leader condition, suggesting the extra task interferes with detection. Observers rapidly learn to deploy visual attention based on environmental probabilities. Such learning is necessary in order to deploy gaze and attention effectively. Competing task

Certain stimuli thought to capture attention bottom-up (eg Theeuwes et al, 2001 etc ) Looming stimuli seem like good candidates for bottom-up attentional capture (Regan & Gray, 200; Franceroni & Simons,2003). . All have the intuition that attention is attracted by certain stimuli - Eg something about to hit you.extensive literature on what does and doesn’t capture attention exogenously - considerable debate.

No effect of increased collider speed. Normal Walking No Leader Follow Leader To get more evidence on thi isue we increased the saliency of the collisding ped by incr speed at same time as ped turns onto a collison course. Greater saliency of the unexpected event does not increase fixations.

Other evidence for detection of colliders? Do subjects slow down during collider period? Subjects slow down, but only when they fixate collider. Implies fixation measures “detection”. Slowing is greater if not previously fixated. Consistent with peripheral monitoring of previously fixated pedestrians.

Conclusions Subjects learn the probabilities of events in the environment and distribute gaze accordingly The findings from the Leader manipulation support the claim that different tasks compete for attention

Effect of Context Probability of fixating Safe pedestrian higher in a context of a riskier environment

Summary Direct comparison between real and virtual collisions is difficult, but colliders are still not reliably fixated. Subjects appear to be sensitive to several parameters of the environment: Experience Experience with the Rogue pedestrian elevated fixation probabilities of the Safe pedestrian to 70% (50% wto. exp.) Experience with the Safe lead to 80% fixation probability of the Rogue (89% wto. exp.) Experience of Safe carries less weight than the experience of Rogue Presumably, with such a highly salient stimulus, one would expect a high detection rate of these colliders. Our preliminary results show only a marginal increase in fixations on colliders in real environment (0-20% depending on the condition) when compared to those from the experiment described in Chapter 2. This result would favor active search as source of information (colliders missed if they don’t coincide with an active search episode), rather than bottom-up interpretation. If we compare fixations on Risky (goes from 62-70%) to colliders in virtual (40-60%)… In this experiment there are many more collisions, so there’s the overall context effect. Fixations on the Safe are higher with collisions, than with the stops…

Detection of signs at intersection results from frequent looks. Shinoda et al. (2001) “Follow the car.” or “Follow the car and obey traffic rules.” Time fixating Intersection. What do we know? Previous work on dsn of attn in natural environments: Road Car Roadside Intersection Detection of signs at intersection results from frequent looks. 21

How well do human subjects detect unexpected events? Shinoda et al. (2001) Detection of briefly presented Stop signs. Intersection P = 1.0 Mid-block P = 0.3 Greater probability of detection in probable locations Suggests Ss learn where to attend/look.

What do Humans Do? Shinoda et al. (2001) found better detection of unexpected stop signs in a virtual driving task. To try and answer this question its worth looking at human behavioral data. Shinoda et al subjects in a driving simulation - subjects strategically deployed fixations At key moments during the based on learning What are the capabilities and limitations of a top-down scheduler We would like to examine a more demanding situation – and does shinodas result generalize? A cartoon of a stop sign in the intersection, and in the middle of the block.