Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC 2001. Lecture 13: Scene Perception 1 Computational Architectures in Biological.

Slides:



Advertisements
Similar presentations
Read this article for Friday next week [1]Chelazzi L, Miller EK, Duncan J, Desimone R. A neural basis for visual search in inferior temporal cortex. Nature.
Advertisements

A Neural Model for Detecting and Labeling Motion Patterns in Image Sequences Marc Pomplun 1 Julio Martinez-Trujillo 2 Yueju Liu 2 Evgueni Simine 2 John.
The Physiology of Attention. Physiology of Attention Neural systems involved in orienting Neural correlates of selection.
Human (ERP and imaging) and monkey (cell recording) data together 1. Modality specific extrastriate cortex is modulated by attention (V4, IT, MT). 2. V1.
Chapter 6: Visual Attention. Overview of Questions Why do we pay attention to some parts of a scene but not to others? Do we have to pay attention to.
Visual Thinking and Visual Thinking Tools: Space, Time and Simple Cognitive Models to Support Design Colin Ware Data Visualization Research Lab, CCOM,
Visual Attention: Outline Levels of analysis 1.Subjective: perception of unattended things 2.Functional: tasks to study components of attention 3.Neurological:
Psych 216: Movement Attention. What is attention? There is too much information available in the world to process it all. Demonstration: change-detection.
Chapter 6: Visual Attention. Scanning a Scene Visual scanning – looking from place to place –Fixation –Saccadic eye movement Overt attention involves.
Chapter 6: Visual Attention. Scanning a Scene Visual scanning – looking from place to place –Fixation –Saccadic eye movement Overt attention involves.
Opportunities for extra credit: Keep checking at:
Visual Search: finding a single item in a cluttered visual scene.
Use a pen on the test. The distinct modes of vision offered by feedforward and recurrent processing Victor A.F. Lamme and Pieter R. Roelfsema.
Upcoming Stuff: Finish attention lectures this week No class Tuesday next week – What should you do instead? Start memory Thursday next week – Read Oliver.
Pre-attentive Visual Processing: What does Attention do? What don’t we need it for?
Cognitive Processes PSY 334 Chapter 3 – Attention July 8, 2003.
Perception CS533C Presentation by Alex Gukov. Papers Covered Current approaches to change blindness Daniel J. Simons. Visual Cognition 7, 1/2/3 (2000)
Visual Cognition II Object Perception. Theories of Object Recognition Template matching models Feature matching Models Recognition-by-components Configural.
Disorders of Orienting Lesions to parietal cortex can produce some strange behavioural consequences –patients fail to notice events on the contralesional.
Tracking multiple independent targets: Evidence for a parallel tracking mechanism Zenon Pylyshyn and Ron Storm presented by Nick Howe.
Upcoming: Read Expt 1 in Brooks for Tuesday Read Loftus and Sacks For Thursday Read Vokey Thursday the 6th Idea Journals Due on the 6th! The textbook Cognition.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 6: Low-level features 1 Computational Architectures in Biological.
Features and Object in Visual Processing. The Waterfall Illusion.
December 1, 2009Introduction to Cognitive Science Lecture 22: Neural Models of Mental Processes 1 Some YouTube movies: The Neocognitron Part I:
CS 664, Session 19 1 General architecture. CS 664, Session 19 2 Minimal Subscene Working definition: The smallest set of objects, actors and actions in.
Visual Cognition II Object Perception. Theories of Object Recognition Template matching models Feature matching Models Recognition-by-components Configural.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC. Lecture 12: Visual Attention 1 Computational Architectures in Biological Vision,
Knowledge Representation: Images and Propositions Chapter 7.
A.F. Lamme and Pieter R. Roelfsema
Jeff B. Pelz, Roxanne Canosa, Jason Babcock, & Eric Knappenberger Visual Perception Laboratory Carlson Center for Imaging Science Rochester Institute of.
Christian Siagian Laurent Itti Univ. Southern California, CA, USA
Features and Object in Visual Processing. The Waterfall Illusion.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 8: Stereoscopic Vision 1 Computational Architectures in Biological.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 11: Visual Illusions 1 Computational Architectures in Biological.
PY202 Overview. Meta issue How do we internalise the world to enable recognition judgements to be made, visual thinking, and actions to be executed.
Michael Arbib & Laurent Itti: CS664 – USC, spring Lecture 6: Object Recognition 1 CS664, USC, Spring 2002 Lecture 6. Object Recognition Reading Assignments:
Computational Architectures in Biological Vision, USC, Spring 2001
Laurent Itti: CS564 – Brain theory and artificial intelligence. Scene Perception 1 Brain theory and artificial intelligence Lecture 23. Scene Perception.
Change blindness: Past, present, and future Professor: Liu Student: Ruby.
Major Functional Areas Primary motor: voluntary movement Primary somatosensory: tactile, pain, pressure, position, temp., mvt. Motor association: coordination.
Michael Arbib & Laurent Itti: CS664 – Spring Lecture 5: Visual Attention (bottom-up) 1 CS 664, USC Spring 2002 Lecture 5. Visual Attention (bottom-up)
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC. Lecture 13: Scene Perception 1 Computational Architectures in Biological Vision,
Frames of Reference for Perception and Action in the Human Visual System MELVYN A. GOODALE* AND ANGELA HAFFENDEN Department of Psychology, University of.
Computer vision.
Studying Visual Attention with the Visual Search Paradigm Marc Pomplun Department of Computer Science University of Massachusetts at Boston
Dr. Neil H. Schwartz.  Visualization refer to the 2D and 3D static and animated visual displays that depict conditions, situations, processes, places.
Active Vision Key points: Acting to obtain information Eye movements Depth from motion parallax Extracting motion information from a spatio-temporal pattern.
1 Perception and VR MONT 104S, Fall 2008 Session 13 Visual Attention.
Visual Attention Derek Hoiem March 14, 2007 Misc Reading Group.
Spatio-temporal constraints for recognizing 3D objects in videos Nicoletta Noceti Università degli Studi di Genova.
Generalized Hough Transform
Lecture 3 - Race against Time 1 Three points for today Sensory memory (SM) contains highly transient information about the dynamic sensory array. Stabilizing.
Control of Attention in schizophrenia 1.Advance understanding of schizophrenia. Move from description of deficits to explanation of specific mechanisms.
Information Processing Assumptions Measuring the real-time stages General theory –structures –control processes Representation –definition –content vs.
Lecture 4 – Attention 1 Three questions: What is attention? Are there different types of attention? What can we do with attention that we cannot do without.
Visual Thinking and the Objects of Visual Attention Colin Ware University of New Hampshire.
On the Failure to Detect Changes in Scenes Across Brief Interruptions Professor: Liu Student: Ruby.
Binding problems and feature integration theory. Feature detectors Neurons that fire to specific features of a stimulus Pathway away from retina shows.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 12: Visual Attention 1 Computational Architectures in Biological.
3:01 PM Three points for today Sensory memory (SM) contains highly transient information about the dynamic sensory array. Stabilizing the contents of SM.
Introduction to Psychology Sensation and Perception Prof. Jan Lauwereyns
How conscious experience and working memory interact Bernard J. Baars and Stan Franklin Soft Computing Laboratory 김 희 택 TRENDS in Cognitive Sciences vol.
Picture change during blinks: looking without seeing and seeing without looking Professor: Liu Student: Ruby.
Eye Movements and Working Memory Marc Pomplun Department of Computer Science University of Massachusetts at Boston Homepage:
Chapter 15. Cognitive Adequacy in Brain- Like Intelligence in Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from Humans Cinarel, Ceyda.
Ahissar, Hochstein (1997) Nature Task difficulty and specificity of perceptual learning 1 st third 2nd third Final session Task difficulty Stimulus-to-mask.
9.012 Presentation by Alex Rakhlin March 16, 2001
Perception Unit How can we so easily perceive the world? Objects may be moving, partially hidden, varying in orientation, and projected as a 2D image.
Cognitive Processes PSY 334
Genalin Lagman Taguiam Spring
Presentation transcript:

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 1 Computational Architectures in Biological Vision, USC, Spring 2001 Lecture 13. Scene Perception Reading Assignments: None

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 2

3 How much can we remember? Incompleteness of memory: how many windows in the Taj Mahal? despite conscious experience of picture-perfect, iconic memorization.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 4

5 But… We can recognize complex scenes which we have seen before. So, we do have some form of iconic memory. In this lecture: - examine how we can perceive scenes - what is the representation (that can be memorized) - what are the mechanisms

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 6 Extended Scene Perception Attention-based analysis: Scan scene with attention, accumulate evidence from detailed local analysis at each attended location. Main issues: - what is the internal representation? - how detailed is memory? - do we really have a detailed internal representation at all!!? Gist: Can very quickly (120ms) classify entire scenes or do simple recognition tasks; can only shift attention twice in that much time!

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 7 Accumulating Evidence Combine information across multiple eye fixations. Build detailed representation of scene in memory.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 8 Eye Movements 1) Free examination 2) estimate material circumstances of family 3) give ages of the people 4) surmise what family has been doing before arrival of “unexpected visitor” 5) remember clothes worn by the people 6) remember position of people and objects 7) estimate how long the “unexpected visitor” has been away from family

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 9 Clinical Studies Studies with patients with some visual deficits strongly argue that tight interaction between where and what/how visual streams are necessary for scene interpretation. Visual agnosia: can see objects, copy drawings of them, etc., but cannot recognize or name them! Dorsal agnosia: cannot recognize objects if more than two are presented simulta- neously: problem with localization Ventral agnosia: cannot identify objects.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 10 These studies suggest… We bind features of objects into objects (feature binding) We bind objects in space into some arrangement (space binding) We perceive the scene. Feature binding = what/how stream Space binding = where stream

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 11 Schema-based Approaches Schema (Arbib, 1989): describes objects in terms of their physical properties and spatial arrangements. Abstract representation of scenes, objects, actions, and other brain processes. Intermediate level between neural firing and overall behavior. Schemas both cooperate and compete in describing the visual world:

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 12

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 13 VISOR Leow & Miikkulainen, 1994: low-level -> sub-schema activity maps (coarse description of components of objects) -> competition across several candidate schemas -> one schema wins and is the percept.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 14 Biologically-Inspired Models Rybak et al, Vision Research, What & Where. Feature-based frame of reference.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 15

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 16 Algorithm - At each fixation, extract central edge orientation, as well as a number of “context” edges; - Transform those low-level features into more invariant “second order” features, represented in a referential attached to the central edge; - Learning: manually select fixation points; store sequence of second-order features found at each fixation into “what” memory; also store vector for next fixation, based on context points and in the second-order referential;

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 17 Algorithm As a result, sequence of retinal images is stored in “what” memory, and corresponding sequence of attentional shifts in the “where” memory.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 18 Algorithm - Search mode: look for an image patch that matches one of the patches stored in the “what” memory; - Recognition mode: reproduce scanpath stored in memory and determine whether we have a match.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 19 Robust to variations in scale, rotation, illumination, but not 3D pose.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 20 Schill et al, JEI, 2001

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 21

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 22 Dynamic Scenes Extension to moving objects and dynamic environment. Rizzolatti: mirror neurons in monkey area F5 respond when monkey observes an action (e.g., grasping an object) as well as when he executes the same action. Computer vision models: decompose complex actions using grammars of elementary actions and precise composition rules. Resembles temporal extension of schema-based systems. Is this what the brain does?

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 23 Human activity detection Nevatia/Medioni/Cohen/Itti

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 24 Low-level processing

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 25 Spatio-temporal representation

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 26

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 27 Modeling Events

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 28 Modeling Events

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 29

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 30 Several Problems… with the “progressive visual buffer hypothesis:” Change blindness: Attention seems to be required for us to perceive change in images, while these could be easily detected in a visual buffer! Amount of memory required is huge! Interpretation of buffer contents by high-level vision is very difficult if buffer contains very detailed representations (Tsotsos, 1990)!

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 31 The World as an Outside Memory Kevin O’Regan, early 90s: why build a detailed internal representation of the world? too complex… not enough memory… … and useless? The world is the memory. Attention and the eyes are a look-up tool!

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 32 The “Attention Hypothesis” Rensink, 2000 No “integrative buffer” Early processing extracts information up to “proto-object” complexity in massively parallel manner Attention is necessary to bind the different proto-objects into complete objects, as well as to bind object and location Once attention leaves an object, the binding “dissolves.” Not a problem, it can be formed again whenever needed, by shifting attention back to the object. Only a rather sketchy “virtual representation” is kept in memory, and attention/eye movements are used to gather details as needed

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 33

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 34

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 35

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 36

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 37 Back to accumulated evidence! Hollingworth et al, 2000 argue against the disintegration of coherent visual representations as soon as attention is withdrawn. Experiment: - line drawings of natural scenes - change one object (target) during a saccadic eye movement away from that object - instruct subjects to examine scene, and they would later be asked questions about what was in it - also instruct subjects to monitor for object changes and press a button as soon as a change detected Hypothesis: It is known that attention will precede eye movements. So the change is outside the focus of attention. If subjects can notice it, it means that some detailed memory of the object is retained.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 38 Hollingworth et al, 2000 Subjects can see the change (26% correct overall) Even if they only notice it a long time afterwards, at their next visit of the object

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 39 Hollingworth et al So, these results suggest that “the online representation of a scene can contain detailed visual information in memory from previously attended objects. Contrary to the proposal of the attention hypothesis (see Rensink, 2000), the results indicate that visual object representations do not disintegrate upon the withdrawal of attention.”

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 40 Gist of a Scene Biederman, 1981: from very brief exposure to a scene (120ms or less), we can already extract a lot of information about its global structure, its category (indoors, outdoors, etc) and some of its components. “riding the first spike:” 120ms is the time it takes the first spike to travel from the retina to IT! Thorpe, van Rullen: very fast classification (down to 27ms exposure, no mask), e.g., for tasks such as “was there an animal in the scene?”

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 41

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 42

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 43

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 44

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 45

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 46

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 47

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 48

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 49 Gist of a Scene Oliva & Schyns, Cognitive Psychology, 2000 Investigate effect of color on fast scene perception. Idea: Rather than looking at the properties of the constituent objects in a given scene, look at the global effect of color on recognition. Hypothesis: diagnostic colors (predictive of scene category) will help recognition.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 50 Color & Gist

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 51 Color & Gist

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 52

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 53 Color & Gist Conclusion from Oliva & Schyns study: “colored blobs at a coarse spatial scale concur with luminance cues to form the relevant spatial layout that mediates express scene recognition.”

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 54

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 13: Scene Perception 55 Outlook It seems unlikely that we perceive scenes by building a progressive buffer and accumulating detailed evidence into it. It would take to much resources and be too complex to use. Rather, we may only have an illusion of detailed representation, and the availability of our eyes/attention to get the details whenever they are needed. The world as an outside memory. In addition to attention-based scene analysis, we are able to very rapidly extract the gist of a scene – much faster than we can shift attention around. This gist may be constructed by fairly simple processes that operate in parallel. It can then be used to prime memory and attention.