CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding Outline: Motivation Human vision and illusions Image representation: Sampling, Quantization, Thresholding Stereo vision as an AI problem Stereograms, Geometry of stereograms, Computing correspondences Letting cues vote for hypotheses: Polar representation of a line, Hough transform Gestalt grouping CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding Motivation Allow computer and robots to read books. Allow mobile robots to navigate using vision. Support applications in industrial inspection, medical image analysis, security and surveillance, and remote sensing of the environment. Permit computers to recognize users’ faces, fingerprints, and to track them in various environments. Provide prostheses for the blind. Develop artistic intelligence. CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding Human Vision 25% of brain volume is allocated to visual perception. Human vision is a parallel & distributed system, involving 2 eyes, retinal processing, and multiple layers of processing in the striate cortex. Most humans are trichromats and they perceive color in a 3-D color space (except for bichromats and monochromats). Vision provides a high-bandwidth input mechanism... “a picture is worth 1000 words.” CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding Visual Illusions They provide insights about the nature of the human visual system, helping us understand how it works. Mueller-Lyer illusion CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding Herman Grid Illusion CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
Herman Grid Illusion (dark on light) CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
Subjective Contour (Triangle) CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding Image Representation Sampling: Number and density of “pixel” measurements Quantization: Number of levels permitted in pixel values. CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
Image Representation (cont.) Sampling: e.g., 4 by 4, square grid, 1 pixel/cm Quantization: e.g., binary, {0, 1}, 0 = black, 1 = white. 1 1 1 1 1 1 CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
Aliasing due to Under-sampling Here the apparent frequency is about 1/5 the true frequency. CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding Quantization Capturing a wide dynamic range of brightness levels or colors requires fine quantization. Common is 256 levels of each of red, green and blue. Segmentation is simplified by having a small number of levels -- provided foreground and background pixels are reliably distinguished by their dark or light value. Grayscale thresholding is typically to used to reduce the number of quantization levels to 2. CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
Vision as Inferring Information from Clues Deriving 3D structure from 2D info requires additional information: e.g., constraints. Deriving global descriptions from local data requires information fusion, i.e., inference. CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
Stereo Vision as an AI Problem Projection from 3 dimension to 2 loses information. With 2 projections, we can gain back some of that information. Recovering the missing information is an inference problem. The missing information is constrained by knowledge about the real world and assumptions about the scene. The use of knowledge and assumptions to make inferences is a standard approach in artificial intelligence. CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding Stereograms Two-view stereograms: 1. spatially separated left-eye/right-eye pair (including virtual-reality goggles) 2. superimposed, with separation using color filters. 3. superimposed, with temporal shuttering. 4. superimposed, with separation using polarizing filters. Single-view stereograms: 1. Magic-eye pictures with depth-modulated carrier. 2. Wallpaper offering depth effects due to its periodicity. CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
Geometry of Stereograms CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
Computing Correspondence Approach 1: Extract features and find a consistent matching of features in each view. Approach 2: Directly compute a disparity map, performing local correlations of the views. CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
Inferring Trends via Voting Methods The classical Hough Transform identifies prominent lines in a scene by letting each edge point vote for the line(s) it is on. Voting methods can do well under noisy conditions. Votes are tallied in an array of accumulators, indexed by theta and rho (polar parameters of a line). ρ = x cos θ + y sin θ. CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
Letting a Point Vote for all the Lines that Pass Through It CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
Hough Transform: Polar representation ρ = x cos θ + y sin θ. (x, y) ρ (0, 0) θ CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
Hough Transform (Cont.) nondirectional, unweighted Hough Transform: H(θ,ρ) = Σ Σ f(x,y) δ(x cos θ + y sin θ - ρ). δ(x) = 1 if | x | < 1 0 otherwise CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding Gestalt Grouping CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding
CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding Gestalt Grouping Texture element = “texel” Texel directionality Texel granularity Alignments of endpoints Spacing of texels Groups cue for surfaces, objects. CSE 415 -- (c) S. Tanimoto, 2007 Image Understanding