Special Topic on Image Retrieval

Slides:

Advertisements

Similar presentations

Distinctive Image Features from Scale-Invariant Keypoints David Lowe.

Advertisements

BRIEF: Binary Robust Independent Elementary Features

Group Meeting Presented by Wyman 10/14/2006

Presented by Xinyu Chang

November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.

BRISK (Presented by Josh Gleason)

Interest points CSE P 576 Ali Farhadi Many slides from Steve Seitz, Larry Zitnick.

Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.

TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.

Computer Vision Lecture 16: Texture

CSE 473/573 Computer Vision and Image Processing (CVIP)

Instructor: Mircea Nicolescu Lecture 15 CS 485 / 685 Computer Vision.

SURF: Speeded-Up Robust Features

Patch Descriptors CSE P 576 Larry Zitnick

Interest points CSE P 576 Larry Zitnick Many slides courtesy of Steve Seitz.

Lecture 6: Feature matching CS4670: Computer Vision Noah Snavely.

Content-based Image Retrieval CE 264 Xiaoguang Feng March 14, 2002 Based on: J. Huang. Color-Spatial Image Indexing and Applications. Ph.D thesis, Cornell.

Lecture 4: Feature matching

Feature extraction: Corners and blobs

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Scale Invariant Feature Transform (SIFT)

Christian Siagian Laurent Itti Univ. Southern California, CA, USA

Blob detection.

1 Invariant Local Feature for Object Recognition Presented by Wyman 2/05/2006.

776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2013.

Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Overview Introduction to local features

Interest Point Descriptors

Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.

Computer vision.

Overview Harris interest points Comparing interest points (SSD, ZNCC, SIFT) Scale & affine invariant interest points Evaluation and comparison of different.

Local invariant features Cordelia Schmid INRIA, Grenoble.

Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.

776 Computer Vision Jan-Michael Frahm Fall SIFT-detector Problem: want to detect features at different scales (sizes) and with different orientations!

CSE 185 Introduction to Computer Vision Local Invariant Features.

Lecture 7: Features Part 2 CS4670/5670: Computer Vision Noah Snavely.

Local invariant features Cordelia Schmid INRIA, Grenoble.

Vision and SLAM Ingeniería de Sistemas Integrados Departamento de Tecnología Electrónica Universidad de Málaga (Spain) Acción Integrada –’Visual-based.

Overview Introduction to local features Harris interest points + SSD, ZNCC, SIFT Scale & affine invariant interest point detectors Evaluation and comparison.

Feature extraction: Corners and blobs. Why extract features? Motivation: panorama stitching We have two images – how do we combine them?

A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )

Local features: detection and description

CS654: Digital Image Analysis

Presented by David Lee 3/20/2006

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

CSE 185 Introduction to Computer Vision Local Invariant Features.

Recognizing specific objects Matching with SIFT Original suggestion Lowe, 1999,2004.

Blob detection.

Visual homing using PCA-SIFT

SIFT Scale-Invariant Feature Transform David Lowe

CS262: Computer Vision Lect 09: SIFT Descriptors

Interest Points EE/CSE 576 Linda Shapiro.

Lecture 13: Feature Descriptors and Matching

Presented by David Lee 3/20/2006

Distinctive Image Features from Scale-Invariant Keypoints

Local features: main components

Project 1: hybrid images

TP12 - Local features: detection and description

Video Google: Text Retrieval Approach to Object Matching in Videos

Content-based Image Retrieval

SURF: Speeded-Up Robust Features

Paper Presentation: Shape and Matching

Feature description and matching

SIFT keypoint detection

Video Google: Text Retrieval Approach to Object Matching in Videos

Fourier Transform of Boundaries

Lecture VI: Corner and Blob Detection

Feature descriptors and matching

Presentation transcript:

Special Topic on Image Retrieval 2014-03

Popular Visual Features Global feature Color correlation histogram Shape context GIST Color name Local feature Detector DoG, MSER, Hessian Affine, KAZE FAST Descriptor SIFT, SURF, LIOP BRIEF, ORB, FREAK, BRISK, CARD

2-D color images – Color histograms Each color image – a 2-d array of pixels Each pixel – 3 color components (R,G,B) h colors – each color denoting a point in 3-d color space (as high as 224 colors) For each image compute the h-element color histogram – each component is the percentage of pixels that are most similar to that color The histogram of image I is defined as: For a color Ci , Hci(I) represents the number of pixels of color Ci in image I OR: For any pixel in image I, Hci(I) represents the possibility of that pixel having color Ci.

2-D color images – Color histograms Usually cluster similar colors together and choose one representative color for each ‘color bin’ Most commercial CBIR systems include color histogram as one of the features No spatial information

Color histograms - distance One method to measure the distance between two histograms x and y is: where the color-to-color similarity matrix A has entries aij that describe the similarity between color i and color j

Color Correlation Histogram Given any pixel of color ci in the image, gives the probability that a pixel at distance k away from the given pixel is of color cj.

Color Auto-correlogram The auto-correlogram of image I for color Ci , with distance k: Integrate both color information and space information. Red ? k P2 P1 Image: I

Color auto-correlogram

Implementation Pixel Distance Measures Use D8 distance (also called chessboard distance): Co-occurrence count: Then, The denominator is the total number of pixels at distance k from any pixel of color ci. Computational complexity:

Efficient Implementation with Dynamic Programming Define: to count the number of pixels of a given color within a given distance from a fixed pixel in the positive horizontal/vertical directions. Then: With initial condition: Since we do O(n2) work for each k, the total time taken is O(n2d). can aslo be computed in a similar way. Finally:

Distance Metric Features Distance Measures: D( f(I1) - f(I2) ) is small  I1 and I2 are similar. Example: f(a)=1000, f(a’)=1050; f(b)=100, f(b’)=150 For histogram: For correlogram:

Color Histogram vs Correlogram If there is no difference between the query and the target images, both methods have good performance. Correlogram method Query Image (512 colors) 1st 2nd 3rd 4th 5th Histogram method 1st 2nd 3rd 4th 5th

Color Histogram vs Correlogram The correlogram method is more stable to color change than the histogram method. Query Correlogram method: 1st Histogram method: 48th Target

Color Histogram vs Correlogram The correlogram method is more stable to large appearance change than the histogram method Query Correlogram method: 1st Histogram method: 31th Target

Color Histogram vs Correlogram The correlogram method is more stable to contrast & brightness change than the histogram method. Query 1 Query 2 Query 3 Query 4 C: 178th H: 230th C: 1st H: 1st C: 1st H: 3rd C: 5th H: 18th Target

Color Histogram vs Correlogram The color correlogram describes the global distribution of local spatial correlations of colors. It’s easy to compute It’s more stable than the color histogram method

Popular Visual Features Global feature Color correlation histogram Shape context GIST Color name Local feature Detector DoG, MSER, Hessian Affine, KAZE FAST Descriptor SIFT, GLOH, SURF, LIOP BRIEF, ORB, FREAK, BRISK, CARD

Shape Context What points on these two sampled contours are most similar? How do you know?

Shape context descriptor [Belongie et al ’02] Count the number of points inside each bin, e.g.: Count = 4 ... Count = 10 Compact representation of distribution of points relative to each point Shape context slides from Belongie et al.

Shape context descriptor

Comparing shape contexts Compute matching costs using Chi Squared distance: Recover correspondences by solving for least cost assignment, using costs Cij (Then use a deformable template match, given the correspondences.)

Invariance/ Robustness Translation Scaling Rotation Modeling transformations – thin plate splines (TPS) Generalization of cubic splines to 2D Matching cost = f(Shape context distances, bending energy of thin plate splines) Can add appearance information too Outliers?

An example of shape context-based matching

Some retrieval results

Popular Visual Features Global feature Color correlation histogram Shape context GIST Color name Local feature Detector DoG, MSER, Hessian Affine, KAZE FAST Descriptor SIFT, SURF, LIOP BRIEF, ORB, FREAK, BRISK, CARD

GIST Feature Definition and background Nature of tasks done with gist Essence, holistic characteristics of an image Context information obtained within a eye saccade (app. 150 ms.) Evidence of place recognizing cells at Parahippocampal Place Area (PPA) Biologically plausible models of Gist are yet to be proposed Nature of tasks done with gist Scene categorization/context recognition Region priming/layout recognition Resolution/scale selection

Human Vision Architecture Visual Cortex: Low level filters, center-surround, and normalization Saliency Model: Attend to pertinent regions Gist Model: Compute image general characteristics High Level Vision: Object recognition Layout recognition Scene understanding

Gist Model Utilize the same Visual Cortex raw features in the saliency model [Itti 2001] Gist is theoretically non-redundant with Saliency Gist vs. Saliency Instead of looking at most conspicuous locations in image, looks at scene as a whole Detection of regularities, not irregularities Cooperation (Accumulation) vs. competition (WTA) among locations More spatial emphasis in saliency Local vs. global/regional interaction

Gist Model Implementation Raw image feature-Maps Orientation Channel Gabor filters at 4 angles (0,45,90,135) on 4 scales = 16 sub-channels Color: red-green and blue-yellow center surround each with 6 scale combinations = 12 sub-channels Intensity dark-bright center-surround with 6 scale combinations = 6 sub-channels = Total of 34 sub-channels

Gist Model Implementation Gist Feature Extraction Average values of predetermined grid

Gist Model Implementation Dimension Reduction Original: 34 sub-channels x 16 features = 544 features PCA/ICA reduction: 80 features Kept >95% of variance PCA/ICA reduction Too much redundancy Reduction matrix is too random to decipher

System Example Run

Popular Visual Features Global feature Color correlation histogram Shape context GIST Color name Local feature Detector DoG, MSER, Hessian Affine, KAZE FAST Descriptor SIFT, GLOH, SURF, LIOP BRIEF, ORB, FREAK, BRISK, CARD

Color Name: Chip-Based vs. Real-World

Basic Color Terms The English language consists of 11 basic color terms. These basic color terms are defined by the linguistics Berlin and Kay as those color names: Which are applied to diverse classes of objects. Whose meaning is not subsumable under one of the other basic color terms. Which are used consistently and with consensus by most speakers of the language.

Learning Color Names Color names are learned with an adapted Probabilistic Latent Semantic Analysis (PLSA-bg). Google set: 1100 images queried with Google image, containing 100 images per color name.

Popular Visual Features Global feature Color correlation histogram Shape context GIST Color name Local feature Detector DoG, MSER, Hessian Affine FAST Descriptor SIFT, SURF, LIOP BRIEF, ORB, FREAK, BRISK, CARD

Blob Detector: MSER Maximally Stable Extremal Region

Blob Detector: MSER

Extremal/Maximal Regions Definition: A set of all connected components (pixels) below all thresholds.

Extremal/Minimal Regions Definition: A set of all connected components (pixels) above all thresholds.

Maximally stable extremal regions (MSER) Examples of thresholded images high threshold low threshold

MSER

Popular Visual Features Global feature Color correlation histogram Shape context GIST Color name Local feature Detector DoG, MSER, Hessian Affine FAST Descriptor SIFT, GLOH, SURF, LIOP BRIEF, ORB, FREAK, BRISK, CARD

GLOH: Gradient location-orientation histogram (Mikolajczyk and Schmid 2005) SIFT GLOH 272D  128D by PCA

Popular Visual Features Global feature Color correlation histogram Shape context GIST Color name Local feature Detector DoG, MSER, Hessian Affine FAST Descriptor SIFT, GLOH, SURF, LIOP BRIEF, ORB, FREAK, BRISK, CARD

SURF: Speeded Up Robust Features ECCV 2006, CVIU 2008 Using integral images for major speed up Integral Image (summed area tables) is an intermediate representation for the image and contains the sum of gray scale pixel values of image Second order derivative and Haar-wavelet response Cost four additions operation only

Detection Hessian-based interest point localization Lxx(x,y,σ) is the Laplacian of Gaussian of the image It is the convolution of the Gaussian second order derivative with the image Lindeberg showed Gaussian function is optimal for scale-space analysis Gaussian is overrated since the property that no new structures can appear while going to lower resolution is not proven in 2D case

Detection Approximated second order derivatives with box filters (mean/average filter)

Detection Scale analysis with constant image size Scale spaces are usually implemented as image pyramids. The images are repeatedly smoothed with a Gaussian and subsequently sub-sampled in order to achieve a higher level of the pyramid. Due to the use of box filters and integral images, we do not have to iteratively apply the same filter to the output of a previously filtered layer, but instead can apply such filters of any size at exactly the same speed directly on the original image. 9 x 9, 15 x 15, 21 x 21, 27 x 27  39 x 39, 51 x 51 … 1st octave 2nd octave

Detection Non-maximum suppression and interpolation Blob-like feature detector

Description Orientation Assignment Circular neighborhood of radius 6s around the interest point (s = the scale at which the point was detected) x response y response Side length = 4s Cost 6 operation to compute the response

Description Dominant orientation The Haar wavelet responses are represented as vectors Sum all responses within a sliding orientation window covering an angle of 60 degree The two summed response yield a new vector The longest vector is the dominant orientation Second longest is … ignored

Description Split the interest region up into 4 x 4 square sub-regions with 5 x 5 regularly spaced sample points inside Calculate Haar wavelet response dx and dy Weight the response with a Gaussian kernel centered at the interest point Sum the response over each sub-region for dx and dy separately  feature vector of length 32 In order to bring in information about the polarity of the intensity changes, extract the sum of absolute value of the responses  feature vector of length 64 Normalize the vector into unit length

Description

Description SURF-128 The sum of dx and |dx| are computed separately for dy < 0 and dy >0 Similarly for the sum of dy and |dy| This doubles the length of a feature vector

Matching Fast indexing through the sign of the Laplacian for the underlying interest point The sign of trace of the Hessian matrix Trace = Lxx + Lyy Either 0 or 1 (Hard thresholding, may have boundary effect …) In the matching stage, compare features if they have the same type of contrast (sign)

Experimental Results

Experimental Results Viewpoint change of 30 degrees

Popular Visual Features Global feature Color correlation histogram Shape context GIST Color name Local feature Detector DoG, MSER, Hessian Affine FAST Descriptor SIFT, SURF, LIOP BRIEF, ORB, FREAK, BRISK, CARD

LIOP: Local Intensity Order Pattern for Feature Description (2011) Motivation Orientation estimation error in SIFT Figure. Orientation assignment errors. (a) Between corresponding points, only 63.77% of errors are in the range of [-20,20]. (b) Between corresponding points that are also matched by SIFT.

LIOP: Local Intensity Order Pattern for Feature Description

LIOP: Local Intensity Order Pattern for Feature Description

Popular Visual Features Global feature Color correlation histogram Shape context GIST Color name Local feature Detector DoG, MSER, Hessian Affine FAST Descriptor SIFT, SURF, LIOP BRIEF, ORB, FREAK, BRISK, CARD

BRIEF: Binary Robust Independent Elementary Features (2010) Binary test BRIEF descriptor For each S*S patch Smooth it Pick pixels using pre-defined binary tests

Smoothing kernels De-noising Gaussian kernels

Spatial arrangement of the binary tests (X,Y)~i.i.d. Uniform (X,Y)~i.i.d. Gaussian X~i.i.d. Gaussian , Y~i.i.d. Gaussian Randomly sampled from discrete locations of a coarse polar grid introducing a spatial quantization. and takes all possible values on a coarse polar grid containing points

Distance Distributions

Experiments