Gaussian Pyramid and Texture EECS 274 Computer Vision Gaussian Pyramid and Texture
Scaled representations Big bars (resp. spots, hands, etc.) and little bars are both interesting Stripes and hairs, say Inefficient to detect big bars with big filters And there is superfluous detail in the filter kernel Alternative: Apply filters of fixed size to images of different sizes Typically, a collection of images whose edge length changes by a factor of 2 (or root 2) This is a pyramid (or Gaussian pyramid) by visual analogy
A bar in the big images is a hair on the zebra’s nose; in smaller images, a stripe; in the smallest, the animal’s nose
Aliasing Can’t shrink an image by taking every second pixel If we do, characteristic errors appear In the next few slides Typically, small phenomena look bigger; fast phenomena can look slower Common phenomenon Wagon wheels rolling the wrong way in movies Checkerboards misrepresented in ray tracing Striped shirts look funny on colour television
Resample the checkerboard by taking one sample at each circle Resample the checkerboard by taking one sample at each circle. In the case of the top left board, new representation is reasonable. Top right also yields a reasonable representation. Bottom left is all black (dubious) and bottom right has checks that are too big. Sampling scheme is crucially related to frequency
Constructing a pyramid by taking every second pixel leads to layers that badly misrepresent the top layer
Open questions What causes the tendency of differentiation to emphasize noise? In what precise respects are discrete images different from continuous images? How do we avoid aliasing? General thread: a language for fast changes --- The Fourier Transform It’s worth elaborating the nature of the links to fast changes here; differentiation emphasizes noise, because derivatives are large at fast changes, and noise pixels tend to be different from their neighbours. The difference between discrete and continuous images is important here, because aliasing is related to passing from one to the other (the checkerboard example).
The Fourier Transform Represent function on a new basis Think of functions as vectors, with many components We now apply a linear transformation to transform the basis dot product with each basis element In the expression, u and v select the basis element, so a function of x and y becomes a function of u and v basis elements have the form Measure amount of the sinusoid with given frequency and orientation in the signal
as a function of x,y for some fixed u, v. To get some sense of what basis elements look like, we plot a basis element --- or rather, its real part --- as a function of x,y for some fixed u, v. We get a function that is constant when (ux+vy) is constant, e.g., (u,v)=(1,2) The magnitude of the vector (u, v) gives a frequency, and its direction gives an orientation. The function is a sinusoid with this frequency along the direction, and constant perpendicular to the direction. Spatial frequency component
Here u and v are larger than in the previous slide, (u,v)=(0,0.4)
And larger still...
Phase and Magnitude Fourier transform of a real function is complex difficult to plot, visualize instead, we can think of the phase and magnitude of the transform Phase is the phase of the complex transform Magnitude is the magnitude of the complex transform Curious fact all natural images have about the same magnitude transform hence, phase seems to matter, but magnitude largely doesn’t Demonstration Take two pictures, swap the phase transforms, compute the inverse - what does the result look like?
This is the magnitude transform of the cheetah pic
This is the phase transform of the cheetah pic
This is the magnitude transform of the zebra pic
This is the phase transform of the zebra pic
Reconstruction with zebra phase, cheetah magnitude Magnitude spctrum of an image is rather uninformative
Reconstruction with cheetah phase, zebra magnitude
Various Fourier Transform Pairs Important facts The Fourier transform is linear There is an inverse FT if you scale the function’s argument, then the transform’s argument scales the other way. This makes sense --- if you multiply a function’s argument by a number that is larger than one, you are stretching the function, so that high frequencies go to low frequencies The FT of a Gaussian is a Gaussian. The convolution theorem The Fourier transform of the convolution of two functions is the product of their Fourier transforms The inverse Fourier transform of the product of two Fourier transforms is the convolution of the two inverse Fourier transforms There’s a table in the book.
Sampling in 1D takes a continuous function and replaces it with a vector of values, consisting of the function’s values at a set of sample points. We’ll assume that these sample points are on a regular grid, and can place one at each integer for convenience.
Sampling in 2D does the same thing, only in 2D Sampling in 2D does the same thing, only in 2D. We’ll assume that these sample points are on a regular grid, and can place one at each integer point for convenience.
A continuous model for a sampled function We want to be able to approximate integrals sensibly Leads to the delta function model on right This slide needs the instructor to tell the story. The point is that modelling a sampled signal as a function that is zero everywhere except at the sample points, where it takes the value of the original function, doesn’t work, because in this model the integral of the sampled function is zero, which is a poor approximation to the original integral. If we use a delta function --- whose integral property you should show, as a limit of box functions (it’s in the book) --- then we can get something that integrates the right way. Zero everywhere except at integer points
The Fourier transform of a sampled signal The crucial missing bit here is that the Fourier transform of a bed-of-nails function is another bed-of-nails function. I mention this, but don’t prove it, as it takes some sophistication with limits and sums. It is usually a good idea to point out to students that the linearity of the Fourier transform doesn’t mean you can take an infinite sum of shifted versions of the Fourier transform of a single delta function and expect the right answer. F(u,v) is the Fourier transform of f(x,y)
Example Transform of box filter is sinc. Transform of Gaussian is Gaussian. (Trucco and Verri)
Fourier transform of the sampled signal consists of a sum of copies of the Fourier transform of the original signal, shifted with respect to each other by the sampling frequency. If the shifted copies do not overlap, the original signal can be reconstructed from the sampled signal. If they overlap, we cannot obtain a separate copy of the Fourier transform, and the signal is aliased
If they overlap, we cannot obtain a separate copy of the Fourier transform, and the signal is aliased This and the last slid constitute a “proof by drawing” of Nyquist’s theorem. It’s worth mentioning this, though we don’t use it explicitly. The crucial point to mention here is that sampling doesn’t work if you have frequency components that are too high.
Smoothing as low-pass filtering The message of the FT is that high frequencies lead to trouble with sampling. Solution: suppress high frequencies before sampling multiply the FT of the signal with something that suppresses high frequencies or convolve with a low-pass filter A filter whose FT is a box is bad, because the filter kernel has infinite support Common solution: use a Gaussian multiplying FT by Gaussian is equivalent to convolving image with Gaussian.
Sampling without smoothing Sampling without smoothing. Top row shows the images, sampled at every second pixel to get the next; bottom row shows the magnitude spectrum of these images.
Sampling with smoothing. Top row shows the images Sampling with smoothing. Top row shows the images. We get the next image by smoothing the image with a Gaussian with σ=1 pixel, then sampling at every second pixel to get the next; bottom row shows the magnitude spectrum of these images. Low pass filter suppresses high frequency components with less aliasing
Large σ less aliasing, but with little detail Sampling with smoothing. Top row shows the images. We get the next image by smoothing the image with a Gaussian with σ=1.4 pixels, then sampling at every second pixel to get the next; bottom row shows the magnitude spectrum of these images. The phenomena at the coarsest scale are related to the fact that a Gaussian is a fairly poor low-pass filter. If the sigma is big, then the reconstruction error will be smaller (because the function is flatter in the relevant region of FT space), but aliasing will be larger, because it doesn’t die off quickly enough. Large σ less aliasing, but with little detail Gaussian is not an ideal low-pass filter
Applications of scaled representations Search for correspondence look at coarse scales, then refine with finer scales, coarse-to-fine matching 4 × 4, 16 × 16, …, 1024 × 1024 versions of images Edge tracking a “good” edge at a fine scale has parents at a coarser scale Control of detail and computational cost in matching e.g. finding stripes terribly important in texture representation
The Gaussian pyramid Smooth with Gaussians, because Synthesis Analysis a Gaussian*Gaussian=another Gaussian Synthesis smooth and sample Analysis take the top image
Texture Key issue: representing texture Texture based matching little is known Texture segmentation key issue: representing texture Texture synthesis useful; also gives some insight into quality of representation Shape from texture cover superficially
Typical textured images. For materials such as brush, grass, foliage and water, our perception of what the material is is quite intimately related to the texture (for the figure on the left, what would the surface feel like if you ran your fingers over it? is it wet?, etc.). Notice how much information you are getting about the type of plants, their shape, etc. from the textures in the figure on the right. These textures are also made of quite stylized subelements, arranged in a rough pattern.
Representing textures Textures are made up of quite stylised subelements, repeated in meaningful ways Representation: find the subelements, and represent their statistics But what are the subelements, and how do we find them? recall normalized correlation find subelements by applying filters, looking at the magnitude of the response What filters? experience suggests spots and oriented bars at a variety of different scales details probably don’t matter What statistics? within reason, the more the merrier. At least, mean and standard deviation better, various conditional histograms.
Spots and bars at a fine scale for the butterfly; images show squared response for corresponding filter
ditto, for a coarser scale ditto, for a coarser scale. The size of the filter has stayed fixed, but the image is half the original size (this points to the utility of pyramids).
I’ve scaled up the coarse scale responses so they can be compared I’ve scaled up the coarse scale responses so they can be compared. Notice that these representations don’t respond uniquely --- i.e. there isn’t one peak at one orientation and one scale for a given bar --- but they do give quite a good idea of what is there. If they were tightly tuned, there would have to be a lot more filters. But looking at these pix gives quite a good idea of where there are big bars, where there are little ones, and at what orientation.
Gabor filters at different scales and spatial frequencies top row shows anti-symmetric (or odd) filters, bottom row the symmetric (or even) filters. Gabor filters are just another way to generate spots and bars.
The Laplacian Pyramid Synthesis Analysis preserve difference between upsampled Gaussian pyramid level and Gaussian pyramid level band pass filter - each level represents spatial frequencies (largely) unrepresented at other levels Analysis reconstruct Gaussian pyramid, take top layer
This is a Gaussian pyramid
And this a Laplacian pyramid And this a Laplacian pyramid. Notice that the lowest layer is the same in each case, and that if I upsample the lowest layer and add to the next, I get the Gauss layer. Notice also that each layer shows detail at a particular scale --- these are, basically, bandpass filtered versions of the image.
And this shows a (rough) representation of what happens in FT space when we form pyramids. In the Laplace pyramid, each layer represents a range of frequencies ---an annulus in FT space --- or, roughly, a scale. We’d actually like to break out each annulus into an orientation, which is equivalent to a wedge. We won’t discuss the design of the relevant filters in detail, just note that this is possible. If we do this, we have an oriented pyramid.
Oriented pyramids Laplacian pyramid is orientation independent Apply an oriented filter to determine orientations at each layer by clever filter design, we can simplify synthesis this represents image information at a particular scale and orientation
on Information Theory, 1992, copyright 1992, IEEE This is an example with four orientations at each layer Reprinted from “Shiftable MultiScale Transforms,” by Simoncelli et al., IEEE Transactions on Information Theory, 1992, copyright 1992, IEEE
Analysis
synthesis
Final texture representation Form an oriented pyramid (or equivalent set of responses to filters at different scales and orientations). Square the output Take statistics of responses e.g. mean of each filter output (are there lots of spots) std of each filter output mean of one scale conditioned on other scale having a particular range of values (e.g. are the spots in straight rows?)
Texture synthesis Use image as a source of probability model Choose pixel values by matching neighbourhood, then filling in Matching process look at pixel differences count only synthesized pixels
Figure from Texture Synthesis by Non-parametric Sampling, A Figure from Texture Synthesis by Non-parametric Sampling, A. Efros and T.K. Leung, Proc. Int. Conf. Computer Vision, 1999 copyright 1999, IEEE
Variations Texture synthesis at multiple scales Texture synthesis on surfaces Texture synthesis by tiles “Analogous” texture synthesis