CS 556 – Computer Vision Image Basics & Review
What is an Image? Image: a representation, resemblance, or likeness An image is a signal: a function carrying information Thus, an image is a function, f, from 2 to : f (x, y) gives the amount of some value at position (x, y) In practice, an image is only defined over a finite rectangular domain and with a finite range: f : [a, b] [c, d] [v min, v max ]
Images as Functions
An image is a signal: a function carrying information Functions have domains and ranges: Domain: (t) (x,y) (x,y,t) (x,y,z) (x,y,z,t) Range: sound (air pressure) graylevel (light intensity) color (RGB, HSL) LANDSAT (7 bands)
Image as Functions A color image is “vector-valued” function created by pasting three functions together: Spatial image: function of two or three spatial dimensions f(x, y): images (grayscale, color, multi-spectral) f(x, y, z): medical scans or image volumes (CT, MRI) Spatio-temporal image: 2/3-D space, 1-D time f(x, y, t): videos, movies, animations
May be quantities we cannot sense: What do the Range Values Mean? May be visible light: Radio waves (e.g., doppler radar) Magnetic resonance Range images Ultrasound X-rays (e.g., CT) Intensity (gray-level) What intensity does the value “213” represent? Color (RGB)
Digital Images: Domains & Ranges The real world is analog: continuous domain and range Computers operate on digital (discrete) data Converting from continuous to discrete: Domains: selection of discrete points is called sampling Ranges: selection of discrete values is called quantization Domain Sampling Range Quantization
Digital Image Formation To create a digital image: Sample the 2-D space on a regular grid Quantize each sample (round to nearest integer) If the samples are apart, we can write this as: f [i, j] = Quantize{f (i , j )} The image can now be represented as a matrix of integer values i j
Resolution Ability to discern detail – both domain & range Not simply the number of samples/pixels Determined by the averaging or spreading of information when sampled or reconstructed
Apertures Point measurements are impossible Have to make measurements using a (weighted) average over some aperture: Time window Spatial area Etc. Size determines resolution: Smaller better resolution Larger worse resolution
Apertures Lenses allow physically larger aperture with effectively smaller one Sensor Lens Effective Aperture Physical Aperture
Image Transformations An image processing operation typically defines a new image g in terms of an existing image f We can transform either the domain or the range of f Range transformation (a.k.a level operations) : g(x, y) = t( f (x, y)) What’s kinds of operations can this perform?
Image Transformations Some operations preserve the range but change the domain of f (a.k.a geometric operations) : g(x, y) = f (t x (x, y), t y (x, y)) What kinds of operations can this perform? Many image transforms operate both on the domain and the range
Linear Transforms A general and very useful class of transforms are linear transforms Properties of linear transforms: Multiplying input f(x) by a constant value multiplies the output by the same constant: t(a f(x)) = a t( f(x)) Adding two inputs causes corresponding outputs to add: t( f(x) + h(x)) = t( f(x)) + t(h(x)) Linearity: the transform t is linear iff t(a f(x) + b h(x)) = a t( f(x)) + b t(h(x))
Linear Transforms A linear transforms of a discrete signal/image f can be defined by a matrix M using matrix multiplication f [i]f [i]M [i, j]g[i]g[i] Note that matrix and vector indices start at 0 instead of 1 Does M(a f + b h) = a M f + b M h?
Linear Transforms: Examples Let’s start with a discrete 1-D image (a “signal”) : f [x] f [x] x
Linear Transforms: Examples Identity transform: fMg M = I M f = I f = f
Linear Transforms: Examples Scale: fMg 4a4a 4a4a 0 0 2a2a 4a4a 6a6a 6a6a M = a I M f = a I f = a f
Linear Transforms: Examples Shift (translate) : fMg f [x]g[x]g[x]
Linear Transforms: Examples Derivative (finite difference) : fMg 0 – f [x]g[x]g[x]
Linear Transforms: Examples The transformation matrix doesn’t have to be square fMg f [x]g[x]g[x]
Fourier Transform One important linear transform is the Fourier transform Basic idea: any function can be written as the sum of (complex-valued) sinusoids of different frequencies Euler’s equation: e i2 sx = cos(2 sx) + i sin(2 sx) Note: i is the imaginary number To get the weights (amount of each frequency) :
Fourier Transform In matrix form: where The frequency increases with the row number
Linear Shift-Invariant Transform A special class of linear transforms are shift invariant Shift invariance: an operation is invariant to translation Implication: shifting the input produces the same output with an equal shift if g(x) = t( f(x)) then t( f(x + x 0 ) = g(x + x 0 )
Filters Filter: linear, shift-invariant transform Often applied to operations that are not technically filters (e.g., median “filter”) Transformation matrix M: Shifted copy of some pattern applied to each row Pattern is (usually) centered on (or near) the diagonal Pattern is called a filter, kernel, or mask and is represented by a vector h M h[x] = [a b c]
Filters Filter operations can be written (for a kernel size of 2k + 1) as: Assumes negative kernel indices... Actual implementation may need to use h[j + k] instead of h[j] Can think of it as a dot (or inner) product of h with a portion of f Since 2k + 1 is often much less than n, this computation is more efficient (it ignores summing terms that are multiplied with 0)
Cross-Correlation & Convolution Filtering operations come in two (very similar) types: Cross-correlation (already seen) : Convolution: Convolution is cross-correlation where either the kernel or signal is flipped first How do the results differ for cross-correlation and convolution if the kernel is symmetric? Anti-symmetric?
2-D Linear Transforms A 2-D discrete image (in matrix form) can form a 1-D vector by concatenating the rows into one long vector: However, it is usually easier to think about it in terms of the computation for an individual value of g[u, v]: M
2-D Transforms: Fourier Transform The 2-D discrete Fourier Transform is given by: where the weight values w u,v have been replaced with
Fourier Transform: Examples
2-D Filtering A 2-D image f[x, y] can be filtered by convolving (or cross-correlating) it with a 2-D kernel h[x, y] to produce an output image g[u, v]: As with the 1-D case, actual implementation may need to use h[i + k, j + k] instead of h[i, j] to adjust for negative indices Filtering is useful for many reasons such as noise reduction and edge detection
Noise Unavoidable/undesirable fluctuation from “correct” value: The nemesis of signal/image processing and computer vision Usually random: modeled as a statistical distribution Mean ( ) at the “correct” value Measured sample varies from according to distribution ( ) Signal-to-Noise Ratio (SNR) = : Measures how “noise free” the acquired signal is “Signal” can refer to absolute or relative value
Noise Filtering is useful for noise reduction... Common types of noise: Salt and pepper: random occurrences of black and white pixels Impulse: random occurrences of white pixels Gaussian: intensity variations drawn from a normal distribution What kind of filter (i.e., kernel) reduces noise? Why?
Noise Reduction: Mean Filter What does a 3 3 mean (i.e., averaging) kernel look like? What does it do to the salt and pepper noise? What does it do to the edges of the white box? f[x, y]g[x, y]
Noise Reduction: Mean Filter f[x, y]g[x, y] h[x, y]
Noise Reduction: Mean Filter f[x, y]g[x, y] h[x, y]
Mean Filter: Effect 333355557777 Gaussian noise Salt and pepper noise
Noise Reduction: Gaussian Filter A Gaussian kernel gives less weight to pixels further from the center of the window This kernel is an approximation of a Gaussian function: h[x, y]
Mean vs. Gaussian Filtering
Non-Linear Operations They are often mistakenly called “filters” Strictly speaking, non-linear operators are not filters They can be useful, though Examples: Order statistics (e.g., median filter) Iterative algorithms (e.g., CLEAN) Anisotropic diffusion Non-uniform convolution-like operations
Median “Filter” Instead of a local neighborhood weighted average, compute the median of the neighborhood Advantages: Removes noise like low-pass filtering does Value is from actual image values Removes outliers – doesn’t average (blur) them into result (“despeckling”) Edge preserving Disadvantages: Not linear Not shift invariant Slower to compute
Comparison: Salt & Pepper Noise 3333 7777 GaussianMeanMedian
Comparison: Gaussian Noise 3333 7777 GaussianMeanMedian
Edge Detection: Differentiation Recall F and H be the Fourier transforms of f and h Convolution theorem: Convolution in the spatial (image) domain is equivalent to multiplication in the frequency (Fourier) domain Symmetric theorem: Convolution in the frequency domain is equivalent to multiplication in the spatial domain Why is this useful?