Presentation is loading. Please wait.

Presentation is loading. Please wait.

THE PROBLEM OF VISUAL RECOGNITION (Ch. 3, Farah) Why is it difficult to identify real world objects from the retinal image? Why is it difficult to identify.

Similar presentations


Presentation on theme: "THE PROBLEM OF VISUAL RECOGNITION (Ch. 3, Farah) Why is it difficult to identify real world objects from the retinal image? Why is it difficult to identify."— Presentation transcript:

1 THE PROBLEM OF VISUAL RECOGNITION (Ch. 3, Farah) Why is it difficult to identify real world objects from the retinal image? Why is it difficult to identify real world objects from the retinal image? How many shape representations of each distinguishable object do we need in memory? How many shape representations of each distinguishable object do we need in memory? What is the nature of shape representations in memory? What is the nature of shape representations in memory? How do we evaluate a proposed representation? How do we evaluate a proposed representation? What are the fundamental dimensions of a useful shape representation? What are the fundamental dimensions of a useful shape representation?

2 Just a small reminder

3 From image to object: A hard problem Why is it difficult to identify a distal object using only our 2D retinal image? Why is it difficult to identify a distal object using only our 2D retinal image? The 2D retinal image is only partly determined by the shape of a distal object The 2D retinal image is only partly determined by the shape of a distal object

4 Shape of the 2D retinal image … The shape of a 2D retinal image of an object varies depending upon spatial relation between the viewer (e.g., you) and the object. The shape of a 2D retinal image of an object varies depending upon spatial relation between the viewer (e.g., you) and the object. In a 2D image, the shape of a CUBE or SPHERE is rarely square or round, more often it appears as a parallelogram or an oval, respectively. In a 2D image, the shape of a CUBE or SPHERE is rarely square or round, more often it appears as a parallelogram or an oval, respectively. What else varies in a 2D retinal image? What else varies in a 2D retinal image? Position and size of the object in the picture plane Position and size of the object in the picture plane Which surfaces are visible, foreshortened, or occluded Which surfaces are visible, foreshortened, or occluded The presence or absence of shadows. The presence or absence of shadows.

5 Shape and Size Constancy

6 How do we identify the object shape from a 2D retinal image? Do we infer (compute) the shape from the image? Do we infer (compute) the shape from the image? Or, do we learn a separate association between each view of an object and its identity? Or, do we learn a separate association between each view of an object and its identity?

7 Object recognition in normal humans: Two hypotheses Human shape perception is: Human shape perception is: H1: Viewpoint dependent (Rock, Tarr) H2: Viewpoint independent (Marr, Biederman)

8 H1: Human shape perception is tied to viewing conditions. Novel perspectives of wire figures can be hard to identify (accuracy: 75% - 39% correct; Rock et al, 1981). Novel perspectives of wire figures can be hard to identify (accuracy: 75% - 39% correct; Rock et al, 1981). But, the same shapes with clay surfaces can be recognized from different perspectives (Farah et al, 1994). But, the same shapes with clay surfaces can be recognized from different perspectives (Farah et al, 1994).

9 H2: Human shape perception is independent of viewing conditions. Mental rotation in the picture plane: Mental rotation in the picture plane: We can name highly familiar letters/numbers equally fast/accurate at any orientation (Corballis, 1988). We can name highly familiar letters/numbers equally fast/accurate at any orientation (Corballis, 1988). But, orientation is important on first encounter (Jolicoeur, 1985, see figure). But, orientation is important on first encounter (Jolicoeur, 1985, see figure).

10 Canonical perspective

11 Variation from canonical perspective Rotating in depth mimics foreshortening and changes canonical perspective. Rotating in depth mimics foreshortening and changes canonical perspective. Time to name objects increases as they are rotated away from a canonical perspective (Palmer et al, 1981). Time to name objects increases as they are rotated away from a canonical perspective (Palmer et al, 1981).

12 Multiple Views Theory (Tarr, 1995) Shape representations in memory combine shape and viewpoint information (a la Rock). Shape representations in memory combine shape and viewpoint information (a la Rock). We transform perceptual representations to match shape representations across changes in viewpoint. We transform perceptual representations to match shape representations across changes in viewpoint.

13 General conclusions Our ability to identify “ a familiar object from a novel image may depend strongly on the type, complexity, and familiarity of the object ”. (Farah, p. 68) Our ability to identify “ a familiar object from a novel image may depend strongly on the type, complexity, and familiarity of the object ”. (Farah, p. 68) Most likely, we have more than one shape representation in memory per distinguishable object. Most likely, we have more than one shape representation in memory per distinguishable object. Two potential ways to identify an image: Two potential ways to identify an image: Transform it to correspond to a familiar shape Transform it to correspond to a familiar shape Factor it into true object shape + viewing condition Factor it into true object shape + viewing condition

14 Shape representation: a computational framework Some information about an object is EXPLICIT in the 2D retinal image (e.g., location in visual field, distance of parts from viewer). Some information about an object is EXPLICIT in the 2D retinal image (e.g., location in visual field, distance of parts from viewer). But, much of the information important for visual recognition is only IMPLICIT in the retinal image (e.g., 3D shape, presence and shape of component parts). But, much of the information important for visual recognition is only IMPLICIT in the retinal image (e.g., 3D shape, presence and shape of component parts).

15 What is the nature of the shape representation in memory? What are the criteria by which we can evaluate proposed shape representations? What are the criteria by which we can evaluate proposed shape representations? What goes into creating a shape representation? What goes into creating a shape representation?

16 Criteria for evaluating the usefulness of the internal shape representation for object recognition (Marr & Nishihara, 1978) Accessibility Accessibility Scope Scope Uniqueness Uniqueness Stability Stability Sensitivity Sensitivity

17 Accessibility : Ease of deriving (recovering) shape information about an object from a 2D retinal image Human object perception is typically fast, effortless, and accurate. Human object perception is typically fast, effortless, and accurate. Hence, the relevant information should be recoverable from the 2D image with minimal demand on resources. Hence, the relevant information should be recoverable from the 2D image with minimal demand on resources.

18 Scope : Range of stimuli over which a shape representation is effective Most machine vision representations are special purpose systems that can only recognize stimuli in a limited domain (e.g., bank numbers, blocks world). Most machine vision representations are special purpose systems that can only recognize stimuli in a limited domain (e.g., bank numbers, blocks world). In contrast, human object recognition system is often viewed as a general-purpose system, capable of representing all types of stimuli (objects, faces, printed letters, handwriting). In contrast, human object recognition system is often viewed as a general-purpose system, capable of representing all types of stimuli (objects, faces, printed letters, handwriting). From Palmer (1999)

19 Uniqueness : Assigning the same shape description to a given image of an object To describe an image of an object the same way on different occasions requires that the image is always coded using the same coordinate system. To describe an image of an object the same way on different occasions requires that the image is always coded using the same coordinate system. For example: Assigning the same shape representation to a particular chair on different occasions requires that the chair be coded using the same coordinates on each occasion. For example: Assigning the same shape representation to a particular chair on different occasions requires that the chair be coded using the same coordinates on each occasion.

20 Stability : Assigning the same shape representation to images of the same object under different viewing conditions A stable representation captures the intrinsic shape of an object regardless of changes in image appearance due to shifts in location, perspective, lighting, position of moving parts (e.g., a cat in many positions). A stable representation captures the intrinsic shape of an object regardless of changes in image appearance due to shifts in location, perspective, lighting, position of moving parts (e.g., a cat in many positions). Stability also captures the similarity relations that exist between images of similar objects (e.g., seeing a polar bear and a black bear as bears or seeing different black bears in different locations or on different occasions as bears). Stability also captures the similarity relations that exist between images of similar objects (e.g., seeing a polar bear and a black bear as bears or seeing different black bears in different locations or on different occasions as bears).

21 Stability: Cats Cats have movable parts, can be in different positions, colors, etc. Cats have movable parts, can be in different positions, colors, etc. A stable shape representation will capture the intrinsic shape of a cat, regardless of variation in the 2D retinal image. A stable shape representation will capture the intrinsic shape of a cat, regardless of variation in the 2D retinal image. From Kosslyn (1994)

22 Sensitivity : The degree to which the shape representation codes (subtle) differences between similar shapes and different images of the same shape Making within category discriminations: Making within category discriminations: Being able to distinguish between the shape representations of different bears (black bears, polar bears, grizzly bears), chairs (wooden chair, folding chair) and faces (your face, my face, your friend ’ s face).

23 Four fundamental aspects of shape representation Marr: Three dimensions of shape representation that must be specified in any computational model: Marr: Three dimensions of shape representation that must be specified in any computational model: Coordinate system Coordinate system Primitives Primitives Organization. Organization. Plaut & Farah: How the shape representation is implemented. Plaut & Farah: How the shape representation is implemented.

24 Coordinate system: A fundamental aspect of shape representation. “… shape is nothing more than a set of locations occupied by an object ” (Farah, 2000, p. 71) and hence, representing these locations has to be relative to some coordinate system. “… shape is nothing more than a set of locations occupied by an object ” (Farah, 2000, p. 71) and hence, representing these locations has to be relative to some coordinate system. Accessibility and stability trade-off. Highly accessible coordinate systems have low stability and vice versa. Accessibility and stability trade-off. Highly accessible coordinate systems have low stability and vice versa.

25 Three types of coordinate systems Viewer centered Viewer centered Environment centered Environment centered Object centered Object centered

26 Viewer-centered Coordinate System Locations are specified relative to viewer – retina, head, hand, etc. Locations are specified relative to viewer – retina, head, hand, etc. Visual stimuli are initially represented in a retinotopic coordinate system (2D space with origin fixed with respect to retina). If either the eyes or the object moves, the retinotopic representation changes. Visual stimuli are initially represented in a retinotopic coordinate system (2D space with origin fixed with respect to retina). If either the eyes or the object moves, the retinotopic representation changes. Very accessible, poor stability. Very accessible, poor stability.

27 Viewer-centered photos

28 Environment-centered Coordinate System Locations of objects are specified relative to other objects in the environment. Locations of objects are specified relative to other objects in the environment. Stable over movements of viewer, but not over movements of objects. Stable over movements of viewer, but not over movements of objects. Requires the viewer to continually update the spatial relationship of the environment to the viewer as the viewer moves about the environment. Accessibility is reduced. Requires the viewer to continually update the spatial relationship of the environment to the viewer as the viewer moves about the environment. Accessibility is reduced.

29 Object-Centered Coordinate System Locations occupied by different parts of an object are represented in a coordinate system intrinsic to, or fixed, relative to the object. Locations occupied by different parts of an object are represented in a coordinate system intrinsic to, or fixed, relative to the object. Mug: Handle is on the outside wall of a cylinder. This spatial relation stays the same, regardless of viewing perspective. Position and orientation invariance yields perfect stability, but reduced accessibility. Position and orientation invariance yields perfect stability, but reduced accessibility. Interesting difficulty: How do you assign relations between parts before you recognize object? Interesting difficulty: How do you assign relations between parts before you recognize object?

30 Primitives: What is localized in space: Contours, surfaces, or 3D shapes? Contour-based primitives? Contour-based primitives? Edges are extracted from visual image early in cortical processing. They are relatively accessible, but have limited scope, and are not stable across viewing conditions, especially depth rotation. Edges are extracted from visual image early in cortical processing. They are relatively accessible, but have limited scope, and are not stable across viewing conditions, especially depth rotation.

31 Primitives cont. Surface-based primitives? Evidence suggests simple cells in V1 actually code surfaces. Surfaces provide broader scope, better stability. (Marr ’ s 2 ½ -D sketch). Surface-based primitives? Evidence suggests simple cells in V1 actually code surfaces. Surfaces provide broader scope, better stability. (Marr ’ s 2 ½ -D sketch).

32 Primitives cont… Volume-based primitives: Although it is computationally difficult to derive them from a 2D image, volume-based primitives seem ideal for object recognition. Volume-based primitives: Although it is computationally difficult to derive them from a 2D image, volume-based primitives seem ideal for object recognition. Marr ’ s cylinders (upper figure) Marr ’ s cylinders (upper figure) Biederman ’ s geons (lower figure) Biederman ’ s geons (lower figure)

33 Biederman’s GEON model Some geons

34 Organization: Degree and type of relation among elements of shape representation. Are the elements on: Are the elements on: the same scale as in Biederman ’ s geon model or the same scale as in Biederman ’ s geon model or related hierarchically as in Marr ’ s model? related hierarchically as in Marr ’ s model?

35 Recapping … Have examined: Have examined: Need for multiple shape representations in memory Need for multiple shape representations in memory Criteria for evaluating shape representations Criteria for evaluating shape representations Three coordinate systems Three coordinate systems Nature of the primitive elements Nature of the primitive elements Taken together, the evidence suggests that object recognition may use an object-centered coordinate system, where volume-based primitive parts combine to represent objects. Taken together, the evidence suggests that object recognition may use an object-centered coordinate system, where volume-based primitive parts combine to represent objects.

36 Implementation Neural net modeling blurs the distinction between the algorithmic (computational processes involved in perception) and implementation (brain, machine) levels. Hence, consider two aspects here. Neural net modeling blurs the distinction between the algorithmic (computational processes involved in perception) and implementation (brain, machine) levels. Hence, consider two aspects here. Nature of the computations underlying memory search differs between symbolic and neural net models. Nature of the computations underlying memory search differs between symbolic and neural net models. Local vs. distributed representations Local vs. distributed representations

37 Models in Cognitive Psychology Function: Function: Help to organize what we know Help to organize what we know Help to identify gaps in our knowledge Help to identify gaps in our knowledge Are the source of testable hypotheses Are the source of testable hypotheses When implemented as a computer model, allow us to test the adequacy of the model When implemented as a computer model, allow us to test the adequacy of the model

38 Main Types of Cognitive Models

39 Symbolic Models Symbolic Models Symbolic Models Parallel processing vs serial processing Parallel processing vs serial processing Transformation of symbolic information from stage to stage Transformation of symbolic information from stage to stage

40 Nature of the computations underlying memory search. Symbolic model: Symbolic model: Perceptual representation is separate from the stored shape representation in memory. Perceptual representation is separate from the stored shape representation in memory. Comparison process is separated from knowledge. Comparison process is separated from knowledge. Explicitly compares input (perceptual representation) to memory (shape representations in memory). Explicitly compares input (perceptual representation) to memory (shape representations in memory).

41 Neural Net Models Neural Net Models Neural Net Models Simple units: Nodes organized in layers (input, hidden, output) Simple units: Nodes organized in layers (input, hidden, output) Activation level of unit Activation level of unit Connections between units Connections between units Connection weights Connection weights

42 Computations underlying “memory search” in neural net model IN NEURAL NET MODELS Pattern of activation across units corresponds to recognized object, jointly determined by input activation and weights of network (system knowledge). Pattern of activation across units corresponds to recognized object, jointly determined by input activation and weights of network (system knowledge). Difficult to distinguish structure/process; perception/memory. Difficult to distinguish structure/process; perception/memory.

43 Local vs distributed representations Local : One-to-one mapping of things doing the representing to that which is being represented (i.e., grandmother cells). Local : One-to-one mapping of things doing the representing to that which is being represented (i.e., grandmother cells). Distributed: Many-to-many mapping of things representing onto things being represented. A pattern of activation over many units. Distributed: Many-to-many mapping of things representing onto things being represented. A pattern of activation over many units.

44 Distributed representations …

45 Represent and retrieve information efficiently in a network of highly interconnected representational units (like neurons in the brain) Represent and retrieve information efficiently in a network of highly interconnected representational units (like neurons in the brain) Allow a greater number of entities to be represented within a given number of units Allow a greater number of entities to be represented within a given number of units Degrade gracefully Degrade gracefully Automatically generalize (but this can cause interference) Automatically generalize (but this can cause interference)

46 Onward to Object Recognition Chapter 4: Object recognition Chapter 4: Object recognition Chapter 5: Face Recognition Chapter 5: Face Recognition Chapter 6: Word Recognition Chapter 6: Word Recognition


Download ppt "THE PROBLEM OF VISUAL RECOGNITION (Ch. 3, Farah) Why is it difficult to identify real world objects from the retinal image? Why is it difficult to identify."

Similar presentations


Ads by Google