Lecture 3 Image Pyramids Antonio Torralba, 2012. What is a good representation for image analysis? Fourier transform domain tells you “ what ” (textural.

Slides:



Advertisements
Similar presentations
CSCE 643 Computer Vision: Template Matching, Image Pyramids and Denoising Jinxiang Chai.
Advertisements

3-D Computational Vision CSc Image Processing II - Fourier Transform.
November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.
University of Ioannina - Department of Computer Science Wavelets and Multiresolution Processing (Background) Christophoros Nikou Digital.
Multiscale Analysis of Images Gilad Lerman Math 5467 (stealing slides from Gonzalez & Woods, and Efros)
Fourier Transform – Chapter 13. Image space Cameras (regardless of wave lengths) create images in the spatial domain Pixels represent features (intensity,
CS 691 Computational Photography
Pyramids and Texture. Scaled representations Big bars and little bars are both interesting Spots and hands vs. stripes and hairs Inefficient to detect.
Freeman Algorithms and Applications in Computer Vision Lihi Zelnik-Manor Lecture 5: Pyramids.
Lecture05 Transform Coding.
Reminder Fourier Basis: t  [0,1] nZnZ Fourier Series: Fourier Coefficient:
CSCE 641 Computer Graphics: Image Sampling and Reconstruction Jinxiang Chai.
CSCE 641 Computer Graphics: Image Sampling and Reconstruction Jinxiang Chai.
Digital Image Processing Chapter 4: Image Enhancement in the Frequency Domain.
Wavelet Transform A very brief look.
Paul Heckbert Computer Science Department Carnegie Mellon University
CSCE 641 Computer Graphics: Fourier Transform Jinxiang Chai.
Introduction to Computer Vision CS / ECE 181B  Handout #4 : Available this afternoon  Midterm: May 6, 2004  HW #2 due tomorrow  Ack: Prof. Matthew.
Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.
Texture Reading: Chapter 9 (skip 9.4) Key issue: How do we represent texture? Topics: –Texture segmentation –Texture-based matching –Texture synthesis.
Multi-Resolution Analysis (MRA)
2D Fourier Theory for Image Analysis Mani Thomas CISC 489/689.
Introduction to Wavelets
Lecture 2: Image filtering
1 Computer Science 631 Lecture 4: Wavelets Ramin Zabih Computer Science Department CORNELL UNIVERSITY.
Computer Vision - A Modern Approach Set: Pyramids and Texture Slides by D.A. Forsyth Scaled representations Big bars (resp. spots, hands, etc.) and little.
Computational Photography: Fourier Transform Jinxiang Chai.
4 – Image Pyramids. Admin stuff Change of office hours on Wed 4 th April – Mon 31 st March pm (right after class) Change of time/date of last.
Transforms: Basis to Basis Normal Basis Hadamard Basis Basis functions Method to find coefficients (“Transform”) Inverse Transform.
Multiscale transforms : wavelets, ridgelets, curvelets, etc.
Digital Image Processing, 2nd ed. © 2002 R. C. Gonzalez & R. E. Woods Chapter 4 Image Enhancement in the Frequency Domain Chapter.
(1) A probability model respecting those covariance observations: Gaussian Maximum entropy probability distribution for a given covariance observation.
ENG4BF3 Medical Image Processing
Image Representation Gaussian pyramids Laplacian Pyramids
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5409 T-R 10:30am – 11:50am.
Chapter 2. Image Analysis. Image Analysis Domains Frequency Domain Spatial Domain.
Computer vision.
The Wavelet Tutorial: Part3 The Discrete Wavelet Transform
CAP5415: Computer Vision Lecture 4: Image Pyramids, Image Statistics, Denoising Fall 2006.
INDEPENDENT COMPONENT ANALYSIS OF TEXTURES based on the article R.Manduchi, J. Portilla, ICA of Textures, The Proc. of the 7 th IEEE Int. Conf. On Comp.
Transforms. 5*sin (2  4t) Amplitude = 5 Frequency = 4 Hz seconds A sine wave.
Low Level Visual Processing. Information Maximization in the Retina Hypothesis: ganglion cells try to transmit as much information as possible about the.
Image Processing © 2002 R. C. Gonzalez & R. E. Woods Lecture 4 Image Enhancement in the Frequency Domain Lecture 4 Image Enhancement.
University of Texas at Austin CS384G - Computer Graphics Fall 2010 Don Fussell Image processing.
Image Processing Xuejin Chen Ref:
Image Processing Edge detection Filtering: Noise suppresion.
09/19/2002 (C) University of Wisconsin 2002, CS 559 Last Time Color Quantization Dithering.
Lecture 7: Features Part 2 CS4670/5670: Computer Vision Noah Snavely.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm.
Dr. Scott Umbaugh, SIUE Discrete Transforms.
1 CMPB 345: IMAGE PROCESSING DISCRETE TRANSFORM 2.
1 Methods in Image Analysis – Lecture 3 Fourier CMU Robotics Institute U. Pitt Bioengineering 2630 Spring Term, 2004 George Stetten, M.D., Ph.D.
Machine Vision Edge Detection Techniques ENT 273 Lecture 6 Hema C.R.
Computer vision. Applications and Algorithms in CV Tutorial 3: Multi scale signal representation Pyramids DFT - Discrete Fourier transform.
Instructor: Mircea Nicolescu Lecture 7
Last Lecture photomatix.com. Today Image Processing: from basic concepts to latest techniques Filtering Edge detection Re-sampling and aliasing Image.
Chapter 13 Discrete Image Transforms
Digital Image Processing Lecture 8: Fourier Transform Prof. Charlene Tsai.
- photometric aspects of image formation gray level images
Multiresolution Analysis (Chapter 7)
Linear Filters and Edges Chapters 7 and 8
Wavelets : Introduction and Examples
Image Enhancement in the
Fourier Transform.
Outline Linear Shift-invariant system Linear filters
Jeremy Bolton, PhD Assistant Teaching Professor
Computer Vision Lecture 16: Texture II
Outline Linear Shift-invariant system Linear filters
Lecture 4 Image Enhancement in Frequency Domain
Review and Importance CS 111.
Presentation transcript:

Lecture 3 Image Pyramids Antonio Torralba, 2012

What is a good representation for image analysis? Fourier transform domain tells you “ what ” (textural properties), but not “ where ”. Pixel domain representation tells you “ where ” (pixel location), but not “ what ”. Want an image representation that gives you a local description of image events—what is happening where.

The image through the Gaussian window Too much Too little Probably still too little… …but hard enough for now 1

Analysis of local frequency (x 0, y 0 ) Fourier basis: Gabor wavelet: We can look at the real and imaginary parts:

Gabor wavelets u 0 =0 U 0 =0.1 U 0 =0.2

8 7 John Daugman, 1988

9 Outline Linear filtering Fourier Transform Phase Sampling and Aliasing Spatially localized analysis Quadrature phase Oriented filters Motion analysis Human spatial frequency sensitivity Image pyramids 9

+ (.) 2 Quadrature filter pairs A quadrature filter is a complex filter whose real part is related to its imaginary part via a Hilbert transform along a particular axis through origin of the frequency domain.

+ (.) 2 Contrast invariance!

14 How quadrature pair filters work

15 How quadrature pair filters work + (.) 2 15

16 Gabor filter measurements for iris recognition code 18 John Daugman,

Iris code Real Imaginary Iris codes are compared using Hamming distance Images from John Daugman

18 John Daugman,

19 Outline Linear filtering Fourier Transform Phase Sampling and Aliasing Spatially localized analysis Quadrature phase Oriented filters Motion analysis Human spatial frequency sensitivity Image pyramids 19

Gabor wavelet: Tuning filter orientation: Space Fourier domain Real Imag Real Imag

21 Simple example “Steerability”-- the ability to synthesize a filter of any orientation from a linear combination of filters at fixed orientations. Filter Set: 0o0o 90 o Synthesized 30 o Response: Raw Image Taken from: W. Freeman, T. Adelson, “The Design and Use of Sterrable Filters”, IEEE Trans. Patt, Anal. and Machine Intell., vol 13, #9, pp , Sept 1991

Steerable filters Derivatives of a Gaussian: cos(  ) +sin(  ) = Freeman & Adelson 92 An arbitrary orientation can be computed as a linear combination of those two basis functions: The representation is “shiftable” on orientation: We can interpolate any other orientation from a finite set of basis functions.

Steering theorem Change from Cartesian to polar coordinates f(x,y) H(r,  ) A convolution kernel can be written using Fourier series in polar angle as: Theorem : Let T be the number of nonzero coefficients a n (r). Then, the function f can be steer with T functions.

Steering theorem for polynomials For an Nth order polynomial with even symmetry N+1 basis functions are sufficient. f(x,y) = W(r) P(x,y)

26 Steerability Important example is 2 nd derivative of Gaussian (~Laplacian): Taken from: W. Freeman, T. Adelson, “The Design and Use of Steerable Filters”, IEEE Trans. Patt, Anal. and Machine Intell., vol 13, #9, pp , Sept 1991

Two equivalent basis These two basis can use to steer 2 nd order Gaussian derivatives

Approximated quadrature filters for 2 nd order Gaussian derivatives (this approximation requires 4 basis to be steerable)

Second directional derivative of a Gaussian and its quadrature pair

Orientation analysis High resolution in orientation requires many oriented filters as basis (high order gaussian derivatives).

Orientation analysis

33 Local energy Phase ~ 0 Phase ~ 90 edge detector output A contour detector 33

34

35

36

37

38 Outline Linear filtering Fourier Transform Phase Sampling and Aliasing Spatially localized analysis Quadrature phase Oriented filters Motion analysis Human spatial frequency sensitivity Image pyramids 38

39 The space time volume

40 The space time volume xyt- space-time volume xt-slice t x y x t

41 The space time volume xt-slice t x x Static objects- vertical lines Moving objects slanted lines, slope ~ motion velocity

42 Motion signals in space-time space-time domain space time spatio-temporal Fourier transform domain note locus of energy space time power in frequency domain cos phase filter sin phase filter power in frequency domain spatio- temporal filters

43 Evidence for filter-based analysis of motion in the human visual system

44 filters to analyze motion Approximation to a square wave using a sequence of odd harmonics

45 Space-time picture of translating square wave space time sin(w x)

46 space time Space-time picture of translating fluted square wave 1/3 sin(3w x)

47 Translating Square Wave (phase advances by 90 degrees each time step)

48 Translating Fluted Square Wave (phase of lowest remaining sinusoidal component advances by 270 degrees (-90) each time step)

49 Outline Linear filtering Fourier Transform Phase Sampling and Aliasing Spatially localized analysis Quadrature phase Oriented filters Motion analysis Human spatial frequency sensitivity Image pyramids 49

Local image representations V1 sketch: hypercolumns A pixel [r,g,b] An image patch J.G.Daugman, “Two dimensional spectral analysis of cortical receptive field profiles,” Vision Res., vol.20.pp L. Wiskott, J-M. Fellous, N. Kuiger, C. Malsburg, “Face Recognition by Elastic Bunch Graph Matching”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.19(7), July 1997, pp Gabor filter pair in quadrature Gabor jet

Gabor Filter Bank or = [ ]; or = [ ];

52 Image pyramids Gaussian pyramid Laplacian pyramid Wavelet/QMF pyramid Steerable pyramid 52

53 Image pyramids

54 The Gaussian pyramid Smooth with gaussians, because –a gaussian*gaussian=another gaussian Gaussians are low pass filters, so representation is redundant. 54

55 The computational advantage of pyramids

56

57

58 Convolution and subsampling as a matrix multiply (1-d case) (Normalization constant of 1/16 omitted for visual clarity.)

59 Next pyramid level

60 The combined effect of the two pyramid levels

61

62 Gaussian pyramids used for up- or down- sampling images. Multi-resolution image analysis –Look for an object over various spatial scales –Coarse-to-fine image processing: form blur estimate or the motion analysis on very low- resolution image, upsample and repeat. Often a successful strategy for avoiding local minima in complicated estimation tasks. 62

63 1-d Gaussian pyramid matrix, for [ ] low-pass filter full-band image, highest resolution lower-resolution image lowest resolution image

64 Image pyramids

65 The Laplacian Pyramid Synthesis –Compute the difference between upsampled Gaussian pyramid level and Gaussian pyramid level. –band pass filter - each level represents spatial frequencies (largely) unrepresented at other level. 65

66 Laplacian pyramid algorithm

67 Upsampling Insert zeros between pixels, then apply a low-pass filter, [ ]

68 Showing, at full resolution, the information captured at each level of a Gaussian (top) and Laplacian (bottom) pyramid.

69 Laplacian pyramid reconstruction algorithm: recover x 1 from L 1, L 2, L 3 and x 4 G# is the blur-and-downsample operator at pyramid level # F# is the blur-and-upsample operator at pyramid level # Laplacian pyramid elements: L1 = (I – F1 G1) x1 L2 = (I – F2 G2) x2 L3 = (I – F3 G3) x3 x2 = G1 x1 x3 = G2 x2 x4 = G3 x3 Reconstruction of original image (x1) from Laplacian pyramid elements: x3 = L3 + F3 x4 x2 = L2 + F2 x3 x1 = L1 + F1 x2

70 Laplacian pyramid reconstruction algorithm: recover x 1 from L 1, L 2, L 3 and g

71 Gaussian pyramid 71

72 Laplacian pyramid 72

73 1-d Laplacian pyramid matrix, for [ ] low-pass filter high frequencies mid-band frequencies low frequencies

74 Laplacian pyramid applications Texture synthesis Image compression Noise removal 74

75 Image blending

76 Szeliski, Computer Vision, 2010

77 Image blending Build Laplacian pyramid for both images: LA, LB Build Gaussian pyramid for mask: G Build a combined Laplacian pyramid: L(j) = G(j) LA(j) + (1-G(j)) LB(j) Collapse L to obtain the blended image 77

78 Image pyramids

79 Linear transforms Linear transform Vectorized image transformed image Note: not all important transforms need to have an inverse

80 Linear transforms U= Pixels

81 Linear transforms U= U= Pixels Derivative

82 Linear transforms U= U= U -1 = Pixels Derivative Integration - No locality for reconstruction - Needs boundary

83 Haar transform U=U -1 = The simplest set of functions:

84 Haar transform U=U -1 = To code a signal, repeat at several locations: U= The simplest set of functions: U -1 = ½

85 Haar transform Reordering rows Low pass High pass Apply the same decomposition to the Low pass component: = And repeat the same operation to the low pass component, until length 1. Note: each subband is sub-sampled and has aliased signal components.

86 Haar transform The entire process can be written as a single matrix: Average Multiscale derivatives

87 Haar transform U= U -1 = Properties: Orthogonal decomposition Perfect reconstruction Critically sampled

88 2D Haar transform Basic elements:

89 2D Haar transform Basic elements: = 2 Low pass

90 2D Haar transform Basic elements: = = = = 2 Low pass

91 2D Haar transform Basic elements: = = = = Low pass

92 2D Haar transform Basic elements: = = = = Low pass High pass vertical High pass horizontal High pass diagonal 92

93 2D Haar transform Sketch of the Fourier transform

94 2D Haar transform Horizontal high pass, vertical high pass Horizontal high pass, vertical low- pass Horizontal low pass, vertical high-pass Horizontal low pass, Vertical low-pass Sketch of the Fourier transform 94

95 Simoncelli and Adelson, in “Subband coding”, Kluwer, Pyramid cascade 95

96 Wavelet/QMF representation Same number of pixels!

97 Good and bad features of wavelet/QMF filters Bad: – Aliased subbands – Non-oriented diagonal subband Good: – Not overcomplete (so same number of coefficients as image pixels). – Good for image compression (JPEG 2000). – Separable computation, so it’s fast. 97

98 What is wrong with orthonormal basis? Input Decomposition coefficients

99 What is wrong with orthonormal basis? Input (shifted by one pixel) Decomposition coefficients The representation is not translation invariant. It is not stable.

100 Shifttable transforms The representation has to be stable under typical transformations that undergo visual objects: Translation Rotation Scaling … Shiftability under space translations corresponds to lack of aliasing

101 Image pyramids

10259 Steerable Pyramid Images from: 2 Level decomposition of white circle example: Low pass residual Subbands 102

10356 Steerable Pyramid We may combine Steerability with Pyramids to get a Steerable Laplacian Pyramid as shown below Images from: Decomposition Reconstruction … … 103

10457 Steerable Pyramid We may combine Steerability with Pyramids to get a Steerable Laplacian Pyramid as shown below Images from: Decomposition Reconstruction 104

105 Simoncelli and Freeman, ICIP 1995 But we need to get rid of the corner regions before starting the recursive circular filtering

106 Reprinted from “Shiftable MultiScale Transforms,” by Simoncelli et al., IEEE Transactions on Information Theory, 1992, copyright 1992, IEEE There is also a high pass residual…

107

108 Monroe 108

109 Dog or cat?

110 Almost no dog information 110

111 Steerable pyramids Good: – Oriented subbands – Non-aliased subbands – Steerable filters – Used for: noise removal, texture analysis and synthesis, super-resolution, shading/paint discrimination. Bad: – Overcomplete – Have one high frequency residual subband, required in order to form a circular region of analysis in frequency from a square region of support in frequency. 111

112 Simoncelli and Freeman, ICIP 1995

113 Summary of pyramid representations

114 Image pyramids Shows the information added in Gaussian pyramid at each spatial scale. Useful for noise reduction & coding. Progressively blurred and subsampled versions of the image. Adds scale invariance to fixed-size algorithms. Shows components at each scale and orientation separately. Non-aliased subbands. Good for texture and feature analysis. But overcomplete and with HF residual. Bandpassed representation, complete, but with aliasing and some non-oriented subbands. Gaussian Laplacian Wavelet/QMF Steerable pyramid 114

115 Schematic pictures of each matrix transform Shown for 1-d images The matrices for 2-d images are the same idea, but more complicated, to account for vertical, as well as horizontal, neighbor relationships. Fourier transform, or Wavelet transform, or Steerable pyramid transform Vectorized image transformed image 115

116 Fourier transform = * pixel domain image Fourier bases are global: each transform coefficient depends on all pixel locations. Fourier transform real imaginary color key 1 j

117 Gaussian pyramid = * pixel image Overcomplete representation. Low-pass filters, sampled appropriately for their blur. Gaussian pyramid

118 Laplacian pyramid = * pixel image Overcomplete representation. Transformed pixels represent bandpassed image information. Laplacian pyramid

119 Wavelet (QMF) transform = * pixel imageOrtho-normal transform (like Fourier transform), but with localized basis functions. Wavelet pyramid

120 = * pixel image Over-complete representation, but non-aliased subbands. Steerable pyramid Multiple orientations at one scale Multiple orientations at the next scale the next scale… Steerable pyramid

121 Matlab resources for pyramids (with tutorial)

122 Matlab resources for pyramids (with tutorial)

123 Why use these representations? Handle real-world size variations with a constant-size vision algorithm. Remove noise Analyze texture Recognize objects Label image features Image priors can be specified naturally in terms of wavelet pyramids. 123