Computational Architectures in Biological Vision, USC, Spring 2001

Slides:



Advertisements
Similar presentations
Chapter 2: Marr’s theory of vision. Cognitive Science  José Luis Bermúdez / Cambridge University Press 2010 Overview Introduce Marr’s distinction between.
Advertisements

Chapter 2.
Visual Sensation & Perception How do we see?. Structure of the eye.
Central Visual Processes. Anthony J Greene2 Central Visual Pathways I.Primary Visual Cortex Receptive Field Columns Hypercolumns II.Spatial Frequency.
September 2, 2014Computer Vision Lecture 1: Human Vision 1 Welcome to CS 675 – Computer Vision Fall 2014 Instructor: Marc Pomplun Instructor: Marc Pomplun.
Last week... why object recognition is difficult, the template model the feature recognition model, word recognition as a case study Today... Recognition.
PSYC 330: Perception Seeing in Color PSYC 330: Perception
Visual Pathways W. W. Norton Primary cortex maintains distinct pathways – functional segregation M and P pathways synapse in different layers Ascending.
Question Examples If you were a neurosurgeon and you needed to take out part of the cortex of a patient, which technique would you use to identify the.
Exam 1 week from today in class assortment of question types including written answers.
Computational Vision: Object Recognition Object Recognition Jeremy Wyatt.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 10: Color Perception. 1 Computational Architectures in Biological.
Introduction to Cognitive Science Lecture 2: Vision in Humans and Machines 1 Vision in Humans and Machines September 10, 2009.
Cognitive Processes PSY 334 Chapter 2 – Perception June 30, 2003.
Processing Digital Images. Filtering Analysis –Recognition Transmission.
COGNITIVE NEUROSCIENCE
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 1: Overview & Introduction 1 Computational Architectures in Biological.
Object Perception. Perceptual Grouping and Gestalt Laws Law of Good continuation. This is perceived as a square and triangle, not as a combination of.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 4: Introduction to Vision 1 Computational Architectures in Biological.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 6: Low-level features 1 Computational Architectures in Biological.
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall Lecture 7. Object Recognition CS564 – Lecture 7. Object Recognition and Scene.
Brain Theory and Artificial Intelligence
Visual Cognition II Object Perception. Theories of Object Recognition Template matching models Feature matching Models Recognition-by-components Configural.
Visual Cognition I basic processes. What is perception good for? We often receive incomplete information through our senses. Information can be highly.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC. Lecture 12: Visual Attention 1 Computational Architectures in Biological Vision,
Human Sensing: The eye and visual processing Physiology and Function Martin Jagersand.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 8: Stereoscopic Vision 1 Computational Architectures in Biological.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 11: Visual Illusions 1 Computational Architectures in Biological.
What should be done at the Low Level?
Michael Arbib & Laurent Itti: CS664 – USC, spring Lecture 6: Object Recognition 1 CS664, USC, Spring 2002 Lecture 6. Object Recognition Reading Assignments:
Computational Architectures in Biological Vision, USC, Spring 2001
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 4: Introduction to Vision 1 Computational Architectures in Biological.
Cognitive Processes PSY 334 Chapter 2 – Perception.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 5: Introduction to Vision 2 1 Computational Architectures in.
1B50 – Percepts and Concepts Daniel J Hulme. Outline Cognitive Vision –Why do we want computers to see? –Why can’t computers see? –Introducing percepts.
MIND: The Cognitive Side of Mind and Brain  “… the mind is not the brain, but what the brain does…” (Pinker, 1997)
Neural Information in the Visual System By Paul Ruvolo Bryn Mawr College Fall 2012.
1 Computational Vision CSCI 363, Fall 2012 Lecture 3 Neurons Central Visual Pathways See Reading Assignment on "Assignments page"
2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.
Lecture 2b Readings: Kandell Schwartz et al Ch 27 Wolfe et al Chs 3 and 4.
1 Computational Vision CSCI 363, Fall 2012 Lecture 20 Stereo, Motion.
1 Perception, Illusion and VR HNRS 299, Spring 2008 Lecture 8 Seeing Depth.
The primate visual systemHelmuth Radrich, The primate visual system 1.Structure of the eye 2.Neural responses to light 3.Brightness perception.
Why is computer vision difficult?
Lecture 3 - Race against Time 1 Three points for today Sensory memory (SM) contains highly transient information about the dynamic sensory array. Stabilizing.
National Taiwan A Road Sign Recognition System Based on a Dynamic Visual Model C. Y. Fang Department of Information and.
1 Computational Vision CSCI 363, Fall 2012 Lecture 5 The Retina.
September 3, 2013Computer Vision Lecture 1: Human Vision 1 Welcome to CS 675 – Computer Vision Fall 2013 Instructor: Marc Pomplun Instructor: Marc Pomplun.
Autonomous Robots Vision © Manfred Huber 2014.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 12: Visual Attention 1 Computational Architectures in Biological.
3:01 PM Three points for today Sensory memory (SM) contains highly transient information about the dynamic sensory array. Stabilizing the contents of SM.
Visual Computation I. Physiological Foundations
Understanding early visual coding from information theory By Li Zhaoping Lecture at EU advanced course in computational neuroscience, Arcachon, France,
Week 4 Motion, Depth, Form: Cormack Wolfe Ch 6, 8 Kandell Ch 27, 28 Advanced readings: Werner and Chalupa Chs 49, 54, 57.
Sensation & Perception. Motion Vision I: Basic Motion Vision.
Sensation and Perception. Transformation of stimulus energy into a meaningful understanding –Each sense converts energy into awareness.
Psychology 3680 Illusion Lecture. What is Vision?
CHAPTER 10 Vision and visual perception Form Vision.
1 Computational Vision CSCI 363, Fall 2012 Lecture 32 Biological Heading, Color.
1 Perception and VR MONT 104S, Spring 2008 Lecture 3 Central Visual Pathways.
National Taiwan Normal A System to Detect Complex Motion of Nearby Vehicles on Freeways C. Y. Fang Department of Information.
Welcome to CS 675 – Computer Vision Spring 2018
Ch 6: The Visual System pt 3
Reading Assignments: Lecture 6. Object Recognition None
Early Processing in Biological Vision
Optic Nerve Projections
Cognitive Processes PSY 334
Fundamentals of Sensation and Perception
Outline Announcements Human Visual Information Processing
Presentation transcript:

Computational Architectures in Biological Vision, USC, Spring 2001 Lecture 14. Object Recognition Reading Assignments: None

Next Time You will tell us about your project! Prepare a 15-minute presentation with: 1) background: what problem are you attacking? why is it important? what has been done previously? 2) design: what is your approach? how is it different from previous ones? explain what you did. 3) results & conclusion: what did you find? did it work? what did you learn? was the project useful?

Next Time Grading: we’ll have a peer-review grading system ;-) While others present, take critical notes on what is nice/interesting, what sounds bogus, what is missing. Assign a score to the presentation. I will give you guidelines, to evaluate: - scientific value - quality of presentation - significance of results - overall Keep this information secret. Once all the presentations have been made, rank the presentations overall. Give me your notes. Final grade will be: quizzes (10%), white paper (20%), scientific value (20%), quality of presentation (20%), significance of results (20%), rank (10%)

Four stages of representation (Marr, 1982) 1) pixel-based (light intensity) 2) primal sketch (discontinuities in intensity) 3) 2 ½ D sketch (oriented surfaces, relative depth between surfaces) 4) 3D model (shapes, spatial relationships, volumes) problem: computationally intractable!

Challenges of Object Recognition The binding problem: binding different features (color, orientation, etc) to yield a unitary percept. (see next slide) Bottom-up vs. top-down processing: how much is assumed top-down vs. extracted from the image? Perception vs. recognition vs. categorization: seeing an object vs. seeing is as something. Matching views of known objects to memory vs. matching a novel object to object categories in memory. Viewpoint invariance: a major issue is to recognize objects irrespectively of the viewpoint from which we see them.

Viewpoint Invariance Major problem for recognition. Biederman & Gerhardstein, 1994: We can recognize two views of an unfamiliar object as being the same object. Thus, viewpoint invariance cannot only rely on matching views to memory.

Models of Object Recognition See Hummel, 1995, The Handbook of Brain Theory & Neural Networks Direct Template Matching: Processing hierarchy yields activation of view-tuned units. A collection of view-tuned units is associated with one object. View tuned units are built from V4-like units, using sets of weights which differ for each object. e.g., Poggio & Edelman, 1990; Riesenhuber & Poggio, 1999

Computational Model of Object Recognition (Riesenhuber and Poggio, 1999)

the model neurons are tuned for size and 3D orientation of object

Models of Object Recognition Hierarchical Template Matching: Image passed through layers of units with progressively more complex features at progressively less specific locations. Hierarchical in that features at one stage are built from features at earlier stages. e.g., Fukushima & Miyake (1982)’s Neocognitron: Several processing layers, comprising simple (S) and complex (C) cells. S-cells in one layer respond to conjunc- tions of C-cells in previous layer. C-cells in one layer are excited by small neighborhoods of S-cells.

Models of Object Recognition Transform & Match: First take care of rotation, translation, scale, etc. invariances. Then recognize based on standardized pixel representation of objects. e.g., Olshausen et al, 1993, dynamic routing model Template match: e.g., with an associative memory based on a Hopfield network.

Recognition by Components Structural approach to object recognition: Biederman, 1987: Complex objects are composed so simpler pieces We can recognize a novel/unfamiliar object by parsing it in terms of its component pieces, then comparing the assemblage of pieces to those of known objects.

Recognition by components (Biederman, 1987) GEONS: geometric elements of which all objects are composed (cylinders, cones, etc). On the order of 30 different shapes. Skips 2 ½ D sketch: Geons are directly recognized from edges, based on their nonaccidental properties (i.e., 3D features that are usually preserved by the projective imaging process).

Basic Properties of GEONs They are sufficiently different from each other to be easily discriminated They are view-invariant (look identical from most viewpoints) They are robust to noise (can be identified even with parts of image missing)

Support for RBC: We can recognize partially occluded objects easily if the occlusions do not obscure the set of geons which constitute the object.

Potential difficulties Structural description not enough, also need metric info Difficult to extract geons from real images Ambiguity in the structu- ral description: most often we have several candidates For some objects, deriving a structural repre- sentation can be difficult Edelman, 1997

Geon Neurons in IT? These are preferred stimuli for some IT neurons.

Fusiform Face Area in Humans

Standard View on Visual Processing representation visual processing The intuition is quite simple. If the visual system needs to construct some highly abstracted representation for certain object-recognition tasks, which we believe it does, then it must do so via a number of stages. The intermediate results at each stage is effective a representation. The entire processing pathway thus contains a hierarchy of representations, ranging from the most image-specific at the earliest stage to the most image-invariant at the latest stage. Image specific Supports fine discrimination Noise tolerant Image invariant Supports generalization Noise sensitive Tjan, 1999

? (e.g. Kanwisher et al; Ishai et al) (Tjan, 1999) Face Early visual processing Place ? Common objects (e.g. Kanwisher et al; Ishai et al) primary visual processing [Work in progress. Please do not circulate.] A convergence of data seemed to suggest that the brain recognize a multitude of objects by the means of several distinct processing pathways, each with a particular functional specialization. In this talk, I am going to propose an alternative, which is more parsimonious but still can explain the same set of data. This alternative relies on a single processing pathway. Flexibility and self-adaptiveness are achieved by having multiple memory and decision sites along the processing pathway. (Tjan, 1999) Multiple memory/decision sites

Tjan’s “Recognition by Anarchy” primary visual processing ... Sensory Memory memory memory memory Independent Decisions “R1” “Ri” “Rn” Delays t1 ti tn The central idea of our proposal is that the brain can tap this rich collection of representations, which are already there, by attaching memory modules along the visual-processing pathway. We further speculate that each memory site makes independent decisions about the identity of the incoming image. However, such response is not immediately sent out to the homunculus, but delayed by an amount set by each memory site at a trial by trial basis, depending on the site's confidence about its current decision and the amount of memory it needs to consult before reaching the decision. [Animation] The homunculus does nothing but simple takes the first-arriving response as the system's response. Homunculus’ Response the first arriving response

A toy visual system Task: Identify letters from arbitrary positions & orientations “e” For this purpose, we are going to build a simple toy visual system. The task for this toy system is to identify letters from arbitrary position and orientations. A generic implementation would go something like this:

normalize position normalize orientation Image down- sampling memory An image comes in. The target letter is first centered in the image by computing the centroid of its luminance profile. Once centered, the principle axis of the luminance profile is determined and the entire image is rotated so that this axis is vertical. at this point, we have a representation that is invariant in both position and orientation. The traditional view is that this will be stored in memory or compared against existing items in memory, leading to recognition. memory

normalize position normalize orientation Image Site 1 Site 2 Site 3 down- sampling In contrast, our proposal stated that the intermediate results are also committed to some form of sensory memory. memory Site 1 memory Site 2 memory Site 3

Study stimuli: 5 orientations  20 positions at high SNR Test stimuli: 1) familiar (studied) views, 2) new positions, 3) new position & orientations We measure performance of such a system by first exposing it to 5 orientations and 20 positions of each letters at high contrast. The system keeps these learning views in memory. We then tested that system by presenting it with letters selected from either the views studied, or views it hasn't seen before. The test stimuli are also presented at different SNR or contrast levels. 1800 {30%} 1500 {25%} 800 {20%} 450 {15%} 210 {10%} Signal-to-Noise Ratio {RMS Contrast}

Processing speed for each recognition module depends raw image norm. pos. norm. ori. Site 3 Site 2 Site 1 We are going to use this little icon to represent our toy system in the data plots that follows. The gray arrows indicate the primary visual pathway. Starting with a raw image, its position is first normalized, then its orientation is also normalized. The color arrows represents memory sites: the raw image (red, Site 1), a representation that is invariant to position (green, Site 2), and one that is both position and orientation invariant (blue, Site 3). Processing speed for each recognition module depends on recognition difficulty by that module.

Novel positions & orientations Familiar views Novel positions raw image norm. pos. norm. ori. Site 3 Site 2 Site 1 Proportion Correct These three plots show the performance obtained at each individual site under different stimulus condition. The red curves indicate performance for Site 1, green curves from Site 2 and blue curves for Site 3. As expected, Site 1, which keeps raw images in memory, has the best accuracy when tested with study views, but it cannot generalize to novel views. Site 3 on the other hand maintains essentially the same level of performance regardless of the view condition. This is because it uses a representation invariant to position and orientation. Contrast (%)

Black curve: full model in which recognition is based Novel positions & orientations Familiar views Novel positions raw image norm. pos. norm. ori. Site 3 Site 2 Site 1 Proportion Correct The black curve indicated the system's or the homunculus' performance based on the first-arriving response. Clearly, it tracks the performance of the best performing site in all conditions. Note that in the condition with novel positions, the system's performance is even better than the best performance sites. This is because even though Site 2 and 3 perform equally well, they make different kinds of errors. The simple timing rule used to delay the responses effectively picks out the most reliable response for each trial. Contrast (%) Black curve: full model in which recognition is based on the fastest of the responses from the three stages.

Syllabus Overview This is tentative and still open to suggestions! Course Overview and Fundamentals of Neuroscience. Neuroscience basics. Experimental techniques in visual neuroscience. Introduction to vision. Low-level processing and feature detection. Coding and representation. Stereoscopic vision. Perception of motion. Color perception. Visual illusions. Visual attention. Shape perception and scene analysis. Object recognition. Computer graphics, virtual reality and robotics.

Syllabus Overview Course Overview and Fundamentals of Neuroscience. - why is vision hard while it seems so naturally easy? - why is half of our brain primarily concerned with vision? - Towards domestic robots: how far are we today? - What can be learned from the interplay between biology and computer science?

Syllabus Overview Neuroscience basics. - The brain, its gross anatomy - Major anatomical and functional areas - The spinal cord and nerves - Neurons, different types - Support machinery and glial cells - Action potentials - Synapses and inter-neuron communication - Neuromodulation - Power consumption and supply - Adaptability and learning

Syllabus Overview Experimental techniques in visual neuroscience Recording from neurons: electrophysiology Multi-unit recording using electrode arrays Stimulating while recording Anesthetized vs. awake animals Single-neuron recording in awake humans Probing the limits of vision: visual psychophysics Functional neuroimaging: Techniques Experimental design issues Optical imaging Transcranial magnetic stimulation

Syllabus Overview Introduction to vision. Biological eyes compared to cameras and VLSI sensors Different types of eyes Optics Theoretical signal processing limits Introduction to Fourier transforms, applicability to vision The Sampling Theorem Experimental probing of theoretical limits Phototransduction Retinal organization Processing layers in the retina Adaptability and gain control.

Syllabus Overview More Introduction to Vision. Leaving the eyes: optic tracts, optic chiasm Associated pathology and signal processing The lateral geniculate nucleus of the thalamus: the first relay station to cortical processing Image processing in the LGN Notion of receptive field Primary visual cortex Cortical magnification Retinotopic mapping Overview of higher visual areas Visual processing pathways

Syllabus Overview Low-level processing and feature detection. Basis transforms; wavelet transforms; jets Optimal coding Texture segregation Grouping Edges and boundaries; optimal filters for edge detection Random Markov fields and their relevance to biological vision Simple and complex cells Cortical gain control Columnar organization & short-range interactions Long-range horizontal connections and non-classical surround How can artificial vision systems benefit from these recent advances in neuroscience?

Syllabus Overview Coding and representation. Spiking vs. mean-rate neurons Spike timing analysis Autocorrelation and power spectrum Population coding; optimal readout Neurons as random variables Statistically efficient estimators Entropy & mutual information Principal component analysis (PCA) Independent component analysis (ICA) Application of these neuroscience analysis tools to engineering problems where data is inherently noisy (e.g., consumer-grade video cameras, VLSI implementations, computationally efficient approximate implementations).

Syllabus Overview Stereoscopic vision Challenges in stereo-vision The Correspondence Problem Inferring depth from several 2D views Several cameras vs. one moving camera Brief overview of epipolar geometry and depth computation Neurons tuned for disparity Size constancy Do we segment objects first and then match their projections on both eyes to infer distance? Random-dot stereograms ("magic eye"): how do they work and what do they tell us about the brain?

Syllabus Overview Perception of motion Optic flow Segmentation and regularization Efficient algorithms Robust algorithms The spatio-temporal energy model Computing the focus of expansion and time-to-contact Motion-selective neurons in cortical areas MT and MST

Syllabus Overview Color perception Color-sensitive photoreceptors (cones) Visible wavelengths and light absorption The Color Constancy problem: how can we build stable percepts of colors despite variations in illumination, shadows, etc

Syllabus Overview Visual illusions What can illusions teach us about the brain? Examples of illusions Which subsystems studied so far do various illusions tell us about? What computational explanations can we find for many of these illusions?

Syllabus Overview Visual attention Several kinds of attention Bottom-up and top-down Overt and covert Attentional modulation How can understanding attention contribute to computer vision systems? Biological models of attention Change blindness Attention and awareness Engineering applications of attention: image compression, target detection, evaluation of advertising, etc...

Syllabus Overview Shape perception and scene analysis Shape-selective neurons in cortex Coding: one neuron per object or population codes? Biologically-inspired algorithms for shape perception The "gist" of a scene: how can we get it in 100ms or less? Visual memory: how much do we remember of what we have seen? The world as an outside memory and our eyes as a lookup tool

Syllabus Overview Object recognition The basic issues Translation and rotation invariance Neural models that do it 3D viewpoint invariance (data and models) Classical computer vision approaches: template matching and matched filters; wavelet transforms; correlation; etc. Examples: face recognition. More examples of biologically- inspired object recognition systems which work remarkably well