Reading Assignments: Lecture 6. Object Recognition None

Slides:



Advertisements
Similar presentations
Chapter 2: Marr’s theory of vision. Cognitive Science  José Luis Bermúdez / Cambridge University Press 2010 Overview Introduce Marr’s distinction between.
Advertisements

Object recognition and scene “understanding”
Last week... why object recognition is difficult, the template model the feature recognition model, word recognition as a case study Today... Recognition.
Perception Putting it together. Sensation vs. Perception A somewhat artificial distinction Sensation: Analysis –Extraction of basic perceptual features.
Computational Vision: Object Recognition Object Recognition Jeremy Wyatt.
Visual Cognition II Object Perception. Theories of Object Recognition Template matching models Feature matching Models Recognition-by-components Configural.
Chapter 2: Pattern Recognition
A Study of Approaches for Object Recognition
Cognitive Processes PSY 334 Chapter 2 – Perception June 30, 2003.
Object Perception. Perceptual Grouping and Gestalt Laws Law of Good continuation. This is perceived as a square and triangle, not as a combination of.
Computational Architectures in Biological Vision, USC, Spring 2001
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall Lecture 7. Object Recognition CS564 – Lecture 7. Object Recognition and Scene.
Visual Cognition II Object Perception. Theories of Object Recognition Template matching models Feature matching Models Recognition-by-components Configural.
What should be done at the Low Level?
Michael Arbib & Laurent Itti: CS664 – USC, spring Lecture 6: Object Recognition 1 CS664, USC, Spring 2002 Lecture 6. Object Recognition Reading Assignments:
An aside: peripheral drift illusion illusion of motion is strongest when reading text (such as this) while viewing the image in your periphery. Blinking.
Cognitive Processes PSY 334 Chapter 2 – Perception.
THE PROBLEM OF VISUAL RECOGNITION (Ch. 3, Farah) Why is it difficult to identify real world objects from the retinal image? Why is it difficult to identify.
Biases: An Example Non-accidental properties: Properties that appear in an image that are very unlikely to have been produced by chance, and therefore.
Lecture 3 - Race against Time 1 Three points for today Sensory memory (SM) contains highly transient information about the dynamic sensory array. Stabilizing.
Information Processing Assumptions Measuring the real-time stages General theory –structures –control processes Representation –definition –content vs.
1 Artificial Intelligence: Vision Stages of analysis Low level vision Surfaces and distance Object Matching.
Lecture 7: Intro to Computer Graphics. Remember…… DIGITAL - Digital means discrete. DIGITAL - Digital means discrete. Digital representation is comprised.
Autonomous Robots Vision © Manfred Huber 2014.
Fundamentals of Sensation and Perception RECOGNIZING VISUAL OBJECTS ERIK CHEVRIER NOVEMBER 23, 2015.
Learning object affordances based on structural object representation Kadir F. Uyanik Asil Kaan Bozcuoglu EE 583 Pattern Recognition Jan 4, 2011.
3:01 PM Three points for today Sensory memory (SM) contains highly transient information about the dynamic sensory array. Stabilizing the contents of SM.
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
High level vision.
Image features and properties. Image content representation The simplest representation of an image pattern is to list image pixels, one after the other.
1 Computational Vision CSCI 363, Fall 2012 Lecture 32 Biological Heading, Color.
CONTENTS:  Introduction.  Face recognition task.  Image preprocessing.  Template Extraction and Normalization.  Template Correlation with image database.
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Visual Recognition Lecture 12
General Principles: The senses as physical instruments
9.012 Presentation by Alex Rakhlin March 16, 2001
Binary Notation and Intro to Computer Graphics
Theories of Perception
Cognitive Processes PSY 334
The Components of the Phenomenon of Repetition Suppression
Making meaning of complex patterns
© 2016 by W. W. Norton & Company Recognizing Objects Chapter 4 Lecture Outline.
Perceiving and Recognizing Objects
Ch 6: The Visual System pt 3
4.2 Data Input-Output Representation
Brain States: Top-Down Influences in Sensory Processing
Neuropsychology of Vision Anthony Cate April 19, 2001
Questions for lesson 3 Perception 11/27/2018 lesson 3.
One-Dimensional Dynamics of Attention and Decision Making in LIP
Araceli Ramirez-Cardenas, Maria Moskaleva, Andreas Nieder 
Pattern recognition (…and object perception).
Shape representation in the inferior temporal cortex of monkeys
Intact Memory for Irrelevant Information Impairs Perception in Amnesia
Ian M. Finn, Nicholas J. Priebe, David Ferster  Neuron 
SIFT keypoint detection
Cognitive Processes PSY 334
Karl R Gegenfurtner, Jochem Rieger  Current Biology 
Brain States: Top-Down Influences in Sensory Processing
Intact Memory for Irrelevant Information Impairs Perception in Amnesia
Volume 45, Issue 5, Pages (March 2005)
Volume 64, Issue 6, Pages (December 2009)
The Normalization Model of Attention
Experiencing the World
Stability of Cortical Responses and the Statistics of Natural Scenes
Gregor Rainer, Earl K Miller  Neuron 
Psychophysical and Physiological Evidence for Viewer-centered Object Representations in the Primate N.K. Logothetis and J. Pauls Cerebral Cortex (1995)
Volume 50, Issue 1, Pages (April 2006)
Volume 99, Issue 1, Pages e4 (July 2018)
Volume 27, Issue 1, Pages (January 2017)
Presentation transcript:

Reading Assignments: Lecture 6. Object Recognition None CS664, USC, Spring 2002 Lecture 6. Object Recognition Reading Assignments: None

Four stages of representation (Marr, 1982) 1) pixel-based (light intensity) 2) primal sketch (discontinuities in intensity) 3) 2 ½ D sketch (oriented surfaces, relative depth between surfaces) 4) 3D model (shapes, spatial relationships, volumes) problem: computationally intractable!

Challenges of Object Recognition The binding problem: binding different features (color, orientation, etc) to yield a unitary percept. (see next slide) Bottom-up vs. top-down processing: how much is assumed top-down vs. extracted from the image? Perception vs. recognition vs. categorization: seeing an object vs. seeing is as something. Matching views of known objects to memory vs. matching a novel object to object categories in memory. Viewpoint invariance: a major issue is to recognize objects irrespectively of the viewpoint from which we see them.

Viewpoint Invariance Major problem for recognition. Biederman & Gerhardstein, 1994: We can recognize two views of an unfamiliar object as being the same object. Thus, viewpoint invariance cannot only rely on matching views to memory.

Models of Object Recognition See Hummel, 1995, The Handbook of Brain Theory & Neural Networks Direct Template Matching: Processing hierarchy yields activation of view-tuned units. A collection of view-tuned units is associated with one object. View tuned units are built from V4-like units, using sets of weights which differ for each object. e.g., Poggio & Edelman, 1990; Riesenhuber & Poggio, 1999

Computational Model of Object Recognition (Riesenhuber and Poggio, 1999)

the model neurons are tuned for size and 3D orientation of object

Models of Object Recognition Hierarchical Template Matching: Image passed through layers of units with progressively more complex features at progressively less specific locations. Hierarchical in that features at one stage are built from features at earlier stages. e.g., Fukushima & Miyake (1982)’s Neocognitron: Several processing layers, comprising simple (S) and complex (C) cells. S-cells in one layer respond to conjunc- tions of C-cells in previous layer. C-cells in one layer are excited by small neighborhoods of S-cells.

Models of Object Recognition Transform & Match: First take care of rotation, translation, scale, etc. invariances. Then recognize based on standardized pixel representation of objects. e.g., Olshausen et al, 1993, dynamic routing model Template match: e.g., with an associative memory based on a Hopfield network.

Recognition by Components Structural approach to object recognition: Biederman, 1987: Complex objects are composed so simpler pieces We can recognize a novel/unfamiliar object by parsing it in terms of its component pieces, then comparing the assemblage of pieces to those of known objects.

Recognition by components (Biederman, 1987) GEONS: geometric elements of which all objects are composed (cylinders, cones, etc). On the order of 30 different shapes. Skips 2 ½ D sketch: Geons are directly recognized from edges, based on their nonaccidental properties (i.e., 3D features that are usually preserved by the projective imaging process).

Basic Properties of GEONs They are sufficiently different from each other to be easily discriminated They are view-invariant (look identical from most viewpoints) They are robust to noise (can be identified even with parts of image missing)

Support for RBC: We can recognize partially occluded objects easily if the occlusions do not obscure the set of geons which constitute the object.

Potential difficulties Structural description not enough, also need metric info Difficult to extract geons from real images Ambiguity in the structu- ral description: most often we have several candidates For some objects, deriving a structural repre- sentation can be difficult Edelman, 1997

Geon Neurons in IT? These are preferred stimuli for some IT neurons.

Fusiform Face Area in Humans

Standard View on Visual Processing representation visual processing The intuition is quite simple. If the visual system needs to construct some highly abstracted representation for certain object-recognition tasks, which we believe it does, then it must do so via a number of stages. The intermediate results at each stage is effective a representation. The entire processing pathway thus contains a hierarchy of representations, ranging from the most image-specific at the earliest stage to the most image-invariant at the latest stage. Image specific Supports fine discrimination Noise tolerant Image invariant Supports generalization Noise sensitive Tjan, 1999

? (e.g. Kanwisher et al; Ishai et al) (Tjan, 1999) Face Early visual processing Place ? Common objects (e.g. Kanwisher et al; Ishai et al) primary visual processing [Work in progress. Please do not circulate.] A convergence of data seemed to suggest that the brain recognize a multitude of objects by the means of several distinct processing pathways, each with a particular functional specialization. In this talk, I am going to propose an alternative, which is more parsimonious but still can explain the same set of data. This alternative relies on a single processing pathway. Flexibility and self-adaptiveness are achieved by having multiple memory and decision sites along the processing pathway. (Tjan, 1999) Multiple memory/decision sites

Tjan’s “Recognition by Anarchy” primary visual processing ... Sensory Memory memory memory memory Independent Decisions “R1” “Ri” “Rn” Delays t1 ti tn The central idea of our proposal is that the brain can tap this rich collection of representations, which are already there, by attaching memory modules along the visual-processing pathway. We further speculate that each memory site makes independent decisions about the identity of the incoming image. However, such response is not immediately sent out to the homunculus, but delayed by an amount set by each memory site at a trial by trial basis, depending on the site's confidence about its current decision and the amount of memory it needs to consult before reaching the decision. [Animation] The homunculus does nothing but simple takes the first-arriving response as the system's response. Homunculus’ Response the first arriving response

A toy visual system Task: Identify letters from arbitrary positions & orientations “e” For this purpose, we are going to build a simple toy visual system. The task for this toy system is to identify letters from arbitrary position and orientations. A generic implementation would go something like this:

normalize position normalize orientation Image down- sampling memory An image comes in. The target letter is first centered in the image by computing the centroid of its luminance profile. Once centered, the principle axis of the luminance profile is determined and the entire image is rotated so that this axis is vertical. at this point, we have a representation that is invariant in both position and orientation. The traditional view is that this will be stored in memory or compared against existing items in memory, leading to recognition. memory

normalize position normalize orientation Image Site 1 Site 2 Site 3 down- sampling In contrast, our proposal stated that the intermediate results are also committed to some form of sensory memory. memory Site 1 memory Site 2 memory Site 3

Study stimuli: 5 orientations  20 positions at high SNR Test stimuli: 1) familiar (studied) views, 2) new positions, 3) new position & orientations We measure performance of such a system by first exposing it to 5 orientations and 20 positions of each letters at high contrast. The system keeps these learning views in memory. We then tested that system by presenting it with letters selected from either the views studied, or views it hasn't seen before. The test stimuli are also presented at different SNR or contrast levels. 1800 {30%} 1500 {25%} 800 {20%} 450 {15%} 210 {10%} Signal-to-Noise Ratio {RMS Contrast}

Processing speed for each recognition module depends raw image norm. pos. norm. ori. Site 3 Site 2 Site 1 We are going to use this little icon to represent our toy system in the data plots that follows. The gray arrows indicate the primary visual pathway. Starting with a raw image, its position is first normalized, then its orientation is also normalized. The color arrows represents memory sites: the raw image (red, Site 1), a representation that is invariant to position (green, Site 2), and one that is both position and orientation invariant (blue, Site 3). Processing speed for each recognition module depends on recognition difficulty by that module.

Novel positions & orientations Familiar views Novel positions raw image norm. pos. norm. ori. Site 3 Site 2 Site 1 Proportion Correct These three plots show the performance obtained at each individual site under different stimulus condition. The red curves indicate performance for Site 1, green curves from Site 2 and blue curves for Site 3. As expected, Site 1, which keeps raw images in memory, has the best accuracy when tested with study views, but it cannot generalize to novel views. Site 3 on the other hand maintains essentially the same level of performance regardless of the view condition. This is because it uses a representation invariant to position and orientation. Contrast (%)

Black curve: full model in which recognition is based Novel positions & orientations Familiar views Novel positions raw image norm. pos. norm. ori. Site 3 Site 2 Site 1 Proportion Correct The black curve indicated the system's or the homunculus' performance based on the first-arriving response. Clearly, it tracks the performance of the best performing site in all conditions. Note that in the condition with novel positions, the system's performance is even better than the best performance sites. This is because even though Site 2 and 3 perform equally well, they make different kinds of errors. The simple timing rule used to delay the responses effectively picks out the most reliable response for each trial. Contrast (%) Black curve: full model in which recognition is based on the fastest of the responses from the three stages.