Download presentation
Presentation is loading. Please wait.
Published byCarmel Patrick Modified over 9 years ago
1
Low Level Visual Processing
2
Information Maximization in the Retina Hypothesis: ganglion cells try to transmit as much information as possible about the image. What kind of receptive field maximizes mutual information I(s,r)?
3
Information Maximization in the Retina In this particular context, information is maximized for a factorial code: For a factorial code, the mutual information is 0 (there are no redundancies):
4
Information Maximization in the Retina Independence is hard to achieve. Instead, we can look for codes that decorrelate the activity of the ganglion cells. This is a lot easier because decorrelation can be achieved with a simple linear transformation.
5
Information Maximization in the Retina We assume that ganglion cells are linear: The goal is to find a receptive field profile, D s (x), for which the ganglion cells are decorrelated (i.e., a whitening filter).
6
Information Maximization in the Retina Intuition: take a white noise image and filter it with a gaussian filter. To decorrelate the resulting image, you need to deconvolve. The deconvolving filter would be a mexican hat (band pass) filter in this case.
7
Information Maximization in the Retina Correlations are captured by the crosscorrelogram (all signals are assumed to be zero mean): The crosscorrelogram is also a convolution (in 4D)
8
Fourier Transform The Fourier transform of a convolution is equal to the product of the individual spectra: The spectrum of a Dirac function is flat.
9
Information Maximization in the Retina To decorrelate, we need to ensure that the crosscorrelogram is a Dirac function, i.e., its Fourier transform should be as flat as possible.
10
Information Maximization in the Retina
11
Grows with k.. Amplifies noise!
12
Information Maximization in the Retina If we assume that the retina adds noise on top of the signal to be transmitted to the brain, the previous filter is a bad idea because it amplifies the noise. Solution: use a noise filter first, :
13
Information Maximization in the Retina Goes to 0 as k goes to infinity
14
Information Maximization in the Retina The shape of the whitening filter depends on the noise level. For high contrast/low noise: bandpass filter. Center-surround RF. For low contrast/high noise: low pass filter. Gaussian RF
15
Information Maximization in the Retina j +
16
00 0 0 5 5 10 15 1 1 2 2 33 4 4 temporal frequency (Hz) temporal frequency (H
17
Information Maximization beyond the Retina The bottleneck argument can only work once… The whitening filter only decorrelates. To find independent components, use ICA: predicts oriented filter Use other constrained beside infomax, such as sparseness.
18
Information Maximization beyond the Retina One big problem with the previous theory: Receptive field size goes down for very low luminance level… Alternative: the retina tries to transmit edges.
19
Center Surround Receptive Fields The center surround receptive fields are decent edge detectors +
20
Center Surround Receptive Fields
21
At high luminance: small RFs are fine because the SNR is high At intermediate luminance: large RFs are needed for signal averaging At very low luminance, larger RF would not be good because most of the info would be lost. Use small RFs instead and look for support in the cortex for contiguous contours
22
Feature extraction: Energy Filters
23
2D Fourier Transform xx yy
24
Frequency Orientation xx yy
25
2D Fourier Transform xx yy
26
Motion Energy Filters Space Time
27
Motion Energy Filters In a space time diagram 1 st order motion shows up as diagonal lines. The slope of the line indicates the velocity Therefore, a space-time Fourier transform can recover the speed of motion
28
Motion Energy Filters
31
1 st order motion Time Space
32
Motion Energy Filters 2 nd order motion http://www.psypress.com/mather/resources/swf/Demo11_9.swf
33
Motion Energy Filters In a space time diagram, 2 nd order motion does not show up as a diagonal line… Methods based on linear filtering of the image followed by a nonlinearity cannot work You need to apply a nonlinearity to the image first
34
Motion Energy Filters A Fourier transform returns a set of complex coefficients:
35
Motion Energy Filters The power spectrum is given by
36
Motion Energy Filters
37
cos sin + ajaj bjbj cj2cj2 x2x2 x2x2 I
38
Motion Energy Filters Therefore, taking a Fourier transform in space time is sufficient to compute motion. To compute velocity, just look at where the power is and compute the angle. Better still, use the Fourier spectrum as your observation and design an optimal estimator of velocity (tricky because the noise is poorly defined)…
39
Motion Energy Filters How do you compute a Fourier transform with neurons? Use neurons with spatio- temporal filters looking like oriented sine and cosine functions. Problem: the receptive fields are non local and would have a hard time dealing with multiple objects in space and multiple events in time…
40
Motion Energy Filters Solution: use oriented Gabor-like filters or causal version of Gabor-like filters. To recover the spectrum, take quadrature pairs, square them and add them: this is what is called an Energy Filter.
41
Motion Energy Filters xx x2x2 Local cosine filter: Gabor filter
42
Motion Energy Filters xx x2x2 Local sine filter: Gabor filter
43
Motion Energy Filters x2x2 x2x2 + Neurons do not have negative activity: use quadrature pairs
44
V1 energy filters tt xx Constant velocity
45
tt xx
46
Bank of filters yy xx
47
From V1 to MT V1 cells are tuned to velocity but they are also tuned to spatial and temporal frequencies tt xx
48
From V1 to MT tt xx Constant velocity
49
From V1 to MT MT cells are tuned to velocity across a wide range of spatial and temporal frequencies tt xx
50
MT Cells
51
Pooling across Filters Motion opponency: it is not possible to perceive transparent motion within the same spatial bandwidth. This suggests that the neural read out mechanism for speed computes the difference between filters tuned to different spatial frequencies within the same temporal bandwidth.
52
Pooling across Filters + Flicker
53
Energy Filters For second order motion, apply a nonlinearity to the image and then run a motion energy filter.
54
Energy Filters: Generalization The same technique can be used to compute orientation, disparity, … etc.
55
Energy Filters: Generalization Stereopsis: constant disparities correspond to oriented lines in right/left RF diagram.
56
Energy Filters: Generalization
59
Orientation Selectivity At first sight: a simple, if not downright stupid, problem. Use an orientation energy filter The only challenge: finding out exactly how the brain does it… Two classes of models: feedforward and lateral connection models.
60
Orientation Selectivity: Feedforward (Hubel and Wiesel) Pooled LGN afferences LGN Cortex Orientation (deg) -45045 0 20 40 60 80 100 Activity spikes/s Orientation (deg) -45045 0 2 4 6 8 10 Pooled LGN Tuning Curves
61
Orientation Selectivity: The Lateral Connection Model Pooled LGN afferences Output Tuning Curve LGN Cortex Orientation (deg) -45045 0 20 40 60 80 100 Activity spikes/s Orientation (deg) -45045 0 2 4 6 8 10 Pooled LGN Tuning Curves
62
Orientation Selectivity How does it work? Simple consequence of Fourier transform…
63
Orientation Selectivity: Feedforward (Hubel and Wiesel) -- +
64
Orientation Selectivity To get a complex cell: take a quadrature pair, rectify, square and sum.
65
Orientation Selectivity The evidence in favor for the Hubel and Wiesel model are overwhelming –No further tuning over time –Aspect ratio consistent with this model –LGN input to layer 4 cells as tuned as output –LGN/Cortex connectivity as predicted by feedforward model Why are there lateral connections?
66
Orientation Selectivity But which model is the most efficient?
67
Fisher information Fisher information, one neuron: For a population of neurons with independent noise:
68
Current view Common conclusion: Increasing the slope leads to better discrimination (or more information) Two ways to increase the slope: 1.Sharpening 2.Increase in gain
69
Main Problems 1- Neurons are not independent Q: Covariance matrix of the noise
70
Source of the variability Where does the noise come from? The variability comes from the dynamics of balanced recurrent networks
71
Source of the variability Tuning curves and variability are intertwined So what? f1()f1() f2()f2() f3()f3() Changing tuning curves requires changing the synaptic weights Changing the synaptic weights changes the noise distribution
72
Orientation Selectivity Output Tuning Curve Cortex Orientation (deg) -45045 0 20 40 60 80 100 Activity spikes/s Pooled LGN Tuning Curves Output Tuning Curve LGN Cortex Orientation (deg) -45045 0 20 40 60 80 100 Activity spikes/s Orientation (deg) -45045 0 2 4 6 8 10 Pooled LGN Tuning Curves Orientation (deg) -45045 0 2 4 6 8 10 No-Sharpening Sharpening LGN 2- Is this a bad representation of orientation? 2- Does sharpening help? 1- Are the output representations equally informative? 3- Does sharpening help?
73
Orientation Selectivity The no-sharpening model extracts a lot more information Information (deg -2 )
74
Orientation Selectivity Why are the models extracting different amount of information? It’s all in the output covariance matrices… S NS 180 0 90 180 90 Preferred Orientation (deg) 180 0 90 180 90 Preferred Orientation (deg)
75
Conclusions Fisher information: what matters is the ratio of the slope over the variance of the noise However, in the presence of correlations, increasing the slope does not necessarily translate into better representations.
76
Object Recognition
77
Nature of the problem Given an image, recognize (identify) the objects that are currently seen. The hard part: the same object can look incredibly different in different images due to differences in view points, illumination, color, occlusions,… etc. A robust recognizer must develop invariant recognition (translation, size, rotation?)
78
What do we know from imaging Are they specialized areas? Face and scene areas? Or areas for expertise? How distributed are the representations?
79
Match Filter/Nearest Neighbor Store images of the objects under different viewpoint, scale, orientation, etc. To recognize, find the template that matches the image the best. Optimal under a gaussian noise assumption which is clearly wrong in this case…. Not invariant Works surprisingly well… (especially with tangent distance)
80
Object as nonlinear manifolds The set of points corresponding to the same object forms a nonlinear manifold in pixel space (of whatever space is used to represent the object) Tangent distance: compute distance to the tangent plane to the manifold. Tangent distance is not unlike relaxation in line attractor networks…
81
Wavelets Templates Filter the image with wavelets Store objects as vector of wavelets activation Slightly more invariant
82
Radial Basis Functions A set of functions is said to form a basis set if most other functions can be approximated as linear combinations of the basis functions. Ex: sine and cosine functions (Fourier transform) or gaussians (RBF).
83
Radial Basis Functions -1000100 0 50 100 150 200 250 Direction (deg) Activity -200-1000100200 0 0.2 0.4 0.6 0.8 1 Preferred Direction (deg) Activity
84
Radial Basis Functions Paper clip recognition. Parameterize the object with the angles between adjacent segments. Recognition is a nonlinear mapping from angle space onto identity. Use an RBF network to implement the mapping
85
Radial Basis Functions Object 1 Object 2
86
Radial Basis Functions Object 1 View 1 View Based Unit Invariant unit for Object 1
87
Radial Basis Functions Object 1 View 1
88
Radial Basis Functions RBF network: not unlike nearest neighbours but can be applied to high level features, not just pixel values Illustrate the value of overcomplete set: good recognition can be performed with a simple linear computation
89
Radial Basis Functions Learning: use a gradient descent procedure to tune the network parameters The RBF units can be tuned with an unsupervised learning rule.
90
Radial Basis Functions Problems: how does the cortex extract angles? How does this generalize to other objects?
92
Radial Basis Functions Basis sets can be complete (sine) or overcomplete (Wavelets, RBF).
93
Radial Basis Functions Representation using complete sets are often very distributed (e.g. Fourier transform) and global. This means that they can be hard to decode and poor at dealing with multiple objects Overcomplete sets are memory intensive but they lead to sparse representations (linearly decodable), they work with multiple objects and they are efficient for learning.
94
Convolution Networks LeNet Multilayer networks with hierarchical organization and built-in translation invariance Fine to coarse analysis provide some scale and rotation invariance, plus noise resistance. No natural description of the features used for the decomposition… Bad news for neurophysiologists…
95
Geons Two ideas: –Objects can be decomposed in terms of combinations of simple shapes (geons) –Object representations involve a list of geons and relative position in object-centered coordinates Finally an approach that does not turn the brain into a lookup table…
96
Geons Numerous problems: –No convincing experimental support and lots of data against it –No good theory on how to build such representations (current implementations cannot deal with real images) –The list of Geons seems awfully arbitrary…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.