THE PROBLEM OF VISUAL RECOGNITION (Ch. 3, Farah) Why is it difficult to identify real world objects from the retinal image? Why is it difficult to identify.

Slides:



Advertisements
Similar presentations
Chapter 2: Marr’s theory of vision. Cognitive Science  José Luis Bermúdez / Cambridge University Press 2010 Overview Introduce Marr’s distinction between.
Advertisements

Alignment Visual Recognition “Straighten your paths” Isaiah.
Chapter 2.
Perception Chapter 4.
Last week... why object recognition is difficult, the template model the feature recognition model, word recognition as a case study Today... Recognition.
University of Palestine Faculty of Applied Engineering
Perception Putting it together. Sensation vs. Perception A somewhat artificial distinction Sensation: Analysis –Extraction of basic perceptual features.
Cognitive Processes PSY 334 Chapter 2 – Perception April 9, 2003.
I. Face Perception II. Visual Imagery. Is Face Recognition Special? Arguments have been made for both functional and neuroanatomical specialization for.
Organizational Notes no study guide no review session not sufficient to just read book and glance at lecture material midterm/final is considered hard.
Computational Vision: Object Recognition Object Recognition Jeremy Wyatt.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.
Visual Cognition II Object Perception. Theories of Object Recognition Template matching models Feature matching Models Recognition-by-components Configural.
Cognitive Processes PSY 334 Chapter 2 – Perception June 30, 2003.
COGNITIVE NEUROSCIENCE
Pattern Recognition Pattern - complex composition of sensory stimuli that the human observer may recognize as being a member of a class of objects Issue.
Object Perception. Perceptual Grouping and Gestalt Laws Law of Good continuation. This is perceived as a square and triangle, not as a combination of.
Visual Cognition II Object Perception. Theories of Object Recognition Template matching models Feature matching Models Recognition-by-components Configural.
Visual Cognition I basic processes. What is perception good for? We often receive incomplete information through our senses. Information can be highly.
Chapter 3 2D AND 3D SPATIAL DATA REPRESENTATIONS 김 정 준.
What should be done at the Low Level?
CS292 Computational Vision and Language Visual Features - Colour and Texture.
PY202 Overview. Meta issue How do we internalise the world to enable recognition judgements to be made, visual thinking, and actions to be executed.
Michael Arbib & Laurent Itti: CS664 – USC, spring Lecture 6: Object Recognition 1 CS664, USC, Spring 2002 Lecture 6. Object Recognition Reading Assignments:
What is Cognitive Science? … is the interdisciplinary study of mind and intelligence, embracing philosophy, psychology, artificial intelligence, neuroscience,
Cognitive Processes PSY 334 Chapter 2 – Perception.
An aside: peripheral drift illusion illusion of motion is strongest when reading text (such as this) while viewing the image in your periphery. Blinking.
CSci 6971: Image Registration Lecture 5: Feature-Base Regisration January 27, 2004 Prof. Chuck Stewart, RPI Dr. Luis Ibanez, Kitware Prof. Chuck Stewart,
Cognitive Processes PSY 334 Chapter 2 – Perception.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
1B50 – Percepts and Concepts Daniel J Hulme. Outline Cognitive Vision –Why do we want computers to see? –Why can’t computers see? –Introducing percepts.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Multi-View Drawing (Text Chapter 8)
MIND: The Cognitive Side of Mind and Brain  “… the mind is not the brain, but what the brain does…” (Pinker, 1997)
Neural mechanisms of Spatial Learning. Spatial Learning Materials covered in previous lectures Historical development –Tolman and cognitive maps the classic.
Perception Introduction Pattern Recognition Image Formation
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
Visual Perception Is a Creative Process Instructor : Dr. S. Gharibzadeh Presented By : J. Razjouyan.
Intelligent Vision Systems ENT 496 Object Shape Identification and Representation Hema C.R. Lecture 7.
VIEWING THE WORLD IN COLOR. COLOR A psychological interpretation Based on wavelength, amplitude, and purity Humans can discriminate among c. 10 million.
Chapter 4: Object Recognition What do various disorders of shape recognition tell us about object recognition? What do various disorders of shape recognition.
Chapter 6 Section 2: Vision. What we See Stimulus is light –Visible light comes from sun, stars, light bulbs, & is reflected off objects –Travels in the.
Lecture 3 - Race against Time 1 Three points for today Sensory memory (SM) contains highly transient information about the dynamic sensory array. Stabilizing.
Psychological approaches to the study of vision. The spatial frequency approach ● Like regular (temporal) frequency ● BUT, concerns how many cycles a.
Korea University Dept.of Industrial System & Information Engineering User Interface Lab Chapter 3 _ Object Recognition + 이병용.
Fundamentals of Sensation and Perception RECOGNIZING VISUAL OBJECTS ERIK CHEVRIER NOVEMBER 23, 2015.
3:01 PM Three points for today Sensory memory (SM) contains highly transient information about the dynamic sensory array. Stabilizing the contents of SM.
1 Perception and VR MONT 104S, Fall 2008 Lecture 6 Seeing Motion.
Colour and Texture. Extract 3-D information Using Vision Extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition.
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
Week 4 Motion, Depth, Form: Cormack Wolfe Ch 6, 8 Kandell Ch 27, 28 Advanced readings: Werner and Chalupa Chs 49, 54, 57.
High level vision.
Visual Perception There are two categories of cognitive processes that we use when we assign meaning to incoming information. What are they?
Mental imagery Some mental imagery phenomena
High-Level Vision Object Recognition.
How we actively interpret our environment..  Perception: The process in which we understand sensory information.  Illusions are powerful examples of.
1 Computational Vision CSCI 363, Fall 2012 Lecture 2 Introduction to Vision Science.
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
18. Perception Unit 3 - Neurobiology and Communication
9.012 Presentation by Alex Rakhlin March 16, 2001
Cognitive Processes PSY 334
Recognizing Deformable Shapes
Perceiving and Recognizing Objects
Neuropsychology of Vision Anthony Cate April 19, 2001
Pattern recognition (…and object perception).
Cognitive Processes PSY 334
Cognitive Processes PSY 334
The Network Approach: Mind as a Web
Presentation transcript:

THE PROBLEM OF VISUAL RECOGNITION (Ch. 3, Farah) Why is it difficult to identify real world objects from the retinal image? Why is it difficult to identify real world objects from the retinal image? How many shape representations of each distinguishable object do we need in memory? How many shape representations of each distinguishable object do we need in memory? What is the nature of shape representations in memory? What is the nature of shape representations in memory? How do we evaluate a proposed representation? How do we evaluate a proposed representation? What are the fundamental dimensions of a useful shape representation? What are the fundamental dimensions of a useful shape representation?

Just a small reminder

From image to object: A hard problem Why is it difficult to identify a distal object using only our 2D retinal image? Why is it difficult to identify a distal object using only our 2D retinal image? The 2D retinal image is only partly determined by the shape of a distal object The 2D retinal image is only partly determined by the shape of a distal object

Shape of the 2D retinal image … The shape of a 2D retinal image of an object varies depending upon spatial relation between the viewer (e.g., you) and the object. The shape of a 2D retinal image of an object varies depending upon spatial relation between the viewer (e.g., you) and the object. In a 2D image, the shape of a CUBE or SPHERE is rarely square or round, more often it appears as a parallelogram or an oval, respectively. In a 2D image, the shape of a CUBE or SPHERE is rarely square or round, more often it appears as a parallelogram or an oval, respectively. What else varies in a 2D retinal image? What else varies in a 2D retinal image? Position and size of the object in the picture plane Position and size of the object in the picture plane Which surfaces are visible, foreshortened, or occluded Which surfaces are visible, foreshortened, or occluded The presence or absence of shadows. The presence or absence of shadows.

Shape and Size Constancy

How do we identify the object shape from a 2D retinal image? Do we infer (compute) the shape from the image? Do we infer (compute) the shape from the image? Or, do we learn a separate association between each view of an object and its identity? Or, do we learn a separate association between each view of an object and its identity?

Object recognition in normal humans: Two hypotheses Human shape perception is: Human shape perception is: H1: Viewpoint dependent (Rock, Tarr) H2: Viewpoint independent (Marr, Biederman)

H1: Human shape perception is tied to viewing conditions. Novel perspectives of wire figures can be hard to identify (accuracy: 75% - 39% correct; Rock et al, 1981). Novel perspectives of wire figures can be hard to identify (accuracy: 75% - 39% correct; Rock et al, 1981). But, the same shapes with clay surfaces can be recognized from different perspectives (Farah et al, 1994). But, the same shapes with clay surfaces can be recognized from different perspectives (Farah et al, 1994).

H2: Human shape perception is independent of viewing conditions. Mental rotation in the picture plane: Mental rotation in the picture plane: We can name highly familiar letters/numbers equally fast/accurate at any orientation (Corballis, 1988). We can name highly familiar letters/numbers equally fast/accurate at any orientation (Corballis, 1988). But, orientation is important on first encounter (Jolicoeur, 1985, see figure). But, orientation is important on first encounter (Jolicoeur, 1985, see figure).

Canonical perspective

Variation from canonical perspective Rotating in depth mimics foreshortening and changes canonical perspective. Rotating in depth mimics foreshortening and changes canonical perspective. Time to name objects increases as they are rotated away from a canonical perspective (Palmer et al, 1981). Time to name objects increases as they are rotated away from a canonical perspective (Palmer et al, 1981).

Multiple Views Theory (Tarr, 1995) Shape representations in memory combine shape and viewpoint information (a la Rock). Shape representations in memory combine shape and viewpoint information (a la Rock). We transform perceptual representations to match shape representations across changes in viewpoint. We transform perceptual representations to match shape representations across changes in viewpoint.

General conclusions Our ability to identify “ a familiar object from a novel image may depend strongly on the type, complexity, and familiarity of the object ”. (Farah, p. 68) Our ability to identify “ a familiar object from a novel image may depend strongly on the type, complexity, and familiarity of the object ”. (Farah, p. 68) Most likely, we have more than one shape representation in memory per distinguishable object. Most likely, we have more than one shape representation in memory per distinguishable object. Two potential ways to identify an image: Two potential ways to identify an image: Transform it to correspond to a familiar shape Transform it to correspond to a familiar shape Factor it into true object shape + viewing condition Factor it into true object shape + viewing condition

Shape representation: a computational framework Some information about an object is EXPLICIT in the 2D retinal image (e.g., location in visual field, distance of parts from viewer). Some information about an object is EXPLICIT in the 2D retinal image (e.g., location in visual field, distance of parts from viewer). But, much of the information important for visual recognition is only IMPLICIT in the retinal image (e.g., 3D shape, presence and shape of component parts). But, much of the information important for visual recognition is only IMPLICIT in the retinal image (e.g., 3D shape, presence and shape of component parts).

What is the nature of the shape representation in memory? What are the criteria by which we can evaluate proposed shape representations? What are the criteria by which we can evaluate proposed shape representations? What goes into creating a shape representation? What goes into creating a shape representation?

Criteria for evaluating the usefulness of the internal shape representation for object recognition (Marr & Nishihara, 1978) Accessibility Accessibility Scope Scope Uniqueness Uniqueness Stability Stability Sensitivity Sensitivity

Accessibility : Ease of deriving (recovering) shape information about an object from a 2D retinal image Human object perception is typically fast, effortless, and accurate. Human object perception is typically fast, effortless, and accurate. Hence, the relevant information should be recoverable from the 2D image with minimal demand on resources. Hence, the relevant information should be recoverable from the 2D image with minimal demand on resources.

Scope : Range of stimuli over which a shape representation is effective Most machine vision representations are special purpose systems that can only recognize stimuli in a limited domain (e.g., bank numbers, blocks world). Most machine vision representations are special purpose systems that can only recognize stimuli in a limited domain (e.g., bank numbers, blocks world). In contrast, human object recognition system is often viewed as a general-purpose system, capable of representing all types of stimuli (objects, faces, printed letters, handwriting). In contrast, human object recognition system is often viewed as a general-purpose system, capable of representing all types of stimuli (objects, faces, printed letters, handwriting). From Palmer (1999)

Uniqueness : Assigning the same shape description to a given image of an object To describe an image of an object the same way on different occasions requires that the image is always coded using the same coordinate system. To describe an image of an object the same way on different occasions requires that the image is always coded using the same coordinate system. For example: Assigning the same shape representation to a particular chair on different occasions requires that the chair be coded using the same coordinates on each occasion. For example: Assigning the same shape representation to a particular chair on different occasions requires that the chair be coded using the same coordinates on each occasion.

Stability : Assigning the same shape representation to images of the same object under different viewing conditions A stable representation captures the intrinsic shape of an object regardless of changes in image appearance due to shifts in location, perspective, lighting, position of moving parts (e.g., a cat in many positions). A stable representation captures the intrinsic shape of an object regardless of changes in image appearance due to shifts in location, perspective, lighting, position of moving parts (e.g., a cat in many positions). Stability also captures the similarity relations that exist between images of similar objects (e.g., seeing a polar bear and a black bear as bears or seeing different black bears in different locations or on different occasions as bears). Stability also captures the similarity relations that exist between images of similar objects (e.g., seeing a polar bear and a black bear as bears or seeing different black bears in different locations or on different occasions as bears).

Stability: Cats Cats have movable parts, can be in different positions, colors, etc. Cats have movable parts, can be in different positions, colors, etc. A stable shape representation will capture the intrinsic shape of a cat, regardless of variation in the 2D retinal image. A stable shape representation will capture the intrinsic shape of a cat, regardless of variation in the 2D retinal image. From Kosslyn (1994)

Sensitivity : The degree to which the shape representation codes (subtle) differences between similar shapes and different images of the same shape Making within category discriminations: Making within category discriminations: Being able to distinguish between the shape representations of different bears (black bears, polar bears, grizzly bears), chairs (wooden chair, folding chair) and faces (your face, my face, your friend ’ s face).

Four fundamental aspects of shape representation Marr: Three dimensions of shape representation that must be specified in any computational model: Marr: Three dimensions of shape representation that must be specified in any computational model: Coordinate system Coordinate system Primitives Primitives Organization. Organization. Plaut & Farah: How the shape representation is implemented. Plaut & Farah: How the shape representation is implemented.

Coordinate system: A fundamental aspect of shape representation. “… shape is nothing more than a set of locations occupied by an object ” (Farah, 2000, p. 71) and hence, representing these locations has to be relative to some coordinate system. “… shape is nothing more than a set of locations occupied by an object ” (Farah, 2000, p. 71) and hence, representing these locations has to be relative to some coordinate system. Accessibility and stability trade-off. Highly accessible coordinate systems have low stability and vice versa. Accessibility and stability trade-off. Highly accessible coordinate systems have low stability and vice versa.

Three types of coordinate systems Viewer centered Viewer centered Environment centered Environment centered Object centered Object centered

Viewer-centered Coordinate System Locations are specified relative to viewer – retina, head, hand, etc. Locations are specified relative to viewer – retina, head, hand, etc. Visual stimuli are initially represented in a retinotopic coordinate system (2D space with origin fixed with respect to retina). If either the eyes or the object moves, the retinotopic representation changes. Visual stimuli are initially represented in a retinotopic coordinate system (2D space with origin fixed with respect to retina). If either the eyes or the object moves, the retinotopic representation changes. Very accessible, poor stability. Very accessible, poor stability.

Viewer-centered photos

Environment-centered Coordinate System Locations of objects are specified relative to other objects in the environment. Locations of objects are specified relative to other objects in the environment. Stable over movements of viewer, but not over movements of objects. Stable over movements of viewer, but not over movements of objects. Requires the viewer to continually update the spatial relationship of the environment to the viewer as the viewer moves about the environment. Accessibility is reduced. Requires the viewer to continually update the spatial relationship of the environment to the viewer as the viewer moves about the environment. Accessibility is reduced.

Object-Centered Coordinate System Locations occupied by different parts of an object are represented in a coordinate system intrinsic to, or fixed, relative to the object. Locations occupied by different parts of an object are represented in a coordinate system intrinsic to, or fixed, relative to the object. Mug: Handle is on the outside wall of a cylinder. This spatial relation stays the same, regardless of viewing perspective. Position and orientation invariance yields perfect stability, but reduced accessibility. Position and orientation invariance yields perfect stability, but reduced accessibility. Interesting difficulty: How do you assign relations between parts before you recognize object? Interesting difficulty: How do you assign relations between parts before you recognize object?

Primitives: What is localized in space: Contours, surfaces, or 3D shapes? Contour-based primitives? Contour-based primitives? Edges are extracted from visual image early in cortical processing. They are relatively accessible, but have limited scope, and are not stable across viewing conditions, especially depth rotation. Edges are extracted from visual image early in cortical processing. They are relatively accessible, but have limited scope, and are not stable across viewing conditions, especially depth rotation.

Primitives cont. Surface-based primitives? Evidence suggests simple cells in V1 actually code surfaces. Surfaces provide broader scope, better stability. (Marr ’ s 2 ½ -D sketch). Surface-based primitives? Evidence suggests simple cells in V1 actually code surfaces. Surfaces provide broader scope, better stability. (Marr ’ s 2 ½ -D sketch).

Primitives cont… Volume-based primitives: Although it is computationally difficult to derive them from a 2D image, volume-based primitives seem ideal for object recognition. Volume-based primitives: Although it is computationally difficult to derive them from a 2D image, volume-based primitives seem ideal for object recognition. Marr ’ s cylinders (upper figure) Marr ’ s cylinders (upper figure) Biederman ’ s geons (lower figure) Biederman ’ s geons (lower figure)

Biederman’s GEON model Some geons

Organization: Degree and type of relation among elements of shape representation. Are the elements on: Are the elements on: the same scale as in Biederman ’ s geon model or the same scale as in Biederman ’ s geon model or related hierarchically as in Marr ’ s model? related hierarchically as in Marr ’ s model?

Recapping … Have examined: Have examined: Need for multiple shape representations in memory Need for multiple shape representations in memory Criteria for evaluating shape representations Criteria for evaluating shape representations Three coordinate systems Three coordinate systems Nature of the primitive elements Nature of the primitive elements Taken together, the evidence suggests that object recognition may use an object-centered coordinate system, where volume-based primitive parts combine to represent objects. Taken together, the evidence suggests that object recognition may use an object-centered coordinate system, where volume-based primitive parts combine to represent objects.

Implementation Neural net modeling blurs the distinction between the algorithmic (computational processes involved in perception) and implementation (brain, machine) levels. Hence, consider two aspects here. Neural net modeling blurs the distinction between the algorithmic (computational processes involved in perception) and implementation (brain, machine) levels. Hence, consider two aspects here. Nature of the computations underlying memory search differs between symbolic and neural net models. Nature of the computations underlying memory search differs between symbolic and neural net models. Local vs. distributed representations Local vs. distributed representations

Models in Cognitive Psychology Function: Function: Help to organize what we know Help to organize what we know Help to identify gaps in our knowledge Help to identify gaps in our knowledge Are the source of testable hypotheses Are the source of testable hypotheses When implemented as a computer model, allow us to test the adequacy of the model When implemented as a computer model, allow us to test the adequacy of the model

Main Types of Cognitive Models

Symbolic Models Symbolic Models Symbolic Models Parallel processing vs serial processing Parallel processing vs serial processing Transformation of symbolic information from stage to stage Transformation of symbolic information from stage to stage

Nature of the computations underlying memory search. Symbolic model: Symbolic model: Perceptual representation is separate from the stored shape representation in memory. Perceptual representation is separate from the stored shape representation in memory. Comparison process is separated from knowledge. Comparison process is separated from knowledge. Explicitly compares input (perceptual representation) to memory (shape representations in memory). Explicitly compares input (perceptual representation) to memory (shape representations in memory).

Neural Net Models Neural Net Models Neural Net Models Simple units: Nodes organized in layers (input, hidden, output) Simple units: Nodes organized in layers (input, hidden, output) Activation level of unit Activation level of unit Connections between units Connections between units Connection weights Connection weights

Computations underlying “memory search” in neural net model IN NEURAL NET MODELS Pattern of activation across units corresponds to recognized object, jointly determined by input activation and weights of network (system knowledge). Pattern of activation across units corresponds to recognized object, jointly determined by input activation and weights of network (system knowledge). Difficult to distinguish structure/process; perception/memory. Difficult to distinguish structure/process; perception/memory.

Local vs distributed representations Local : One-to-one mapping of things doing the representing to that which is being represented (i.e., grandmother cells). Local : One-to-one mapping of things doing the representing to that which is being represented (i.e., grandmother cells). Distributed: Many-to-many mapping of things representing onto things being represented. A pattern of activation over many units. Distributed: Many-to-many mapping of things representing onto things being represented. A pattern of activation over many units.

Distributed representations …

Represent and retrieve information efficiently in a network of highly interconnected representational units (like neurons in the brain) Represent and retrieve information efficiently in a network of highly interconnected representational units (like neurons in the brain) Allow a greater number of entities to be represented within a given number of units Allow a greater number of entities to be represented within a given number of units Degrade gracefully Degrade gracefully Automatically generalize (but this can cause interference) Automatically generalize (but this can cause interference)

Onward to Object Recognition Chapter 4: Object recognition Chapter 4: Object recognition Chapter 5: Face Recognition Chapter 5: Face Recognition Chapter 6: Word Recognition Chapter 6: Word Recognition