Visual matching: distance measures

Slides:



Advertisements
Similar presentations
Similarity and Difference
Advertisements

Clustering.
Empirical Evaluation of Dissimilarity Measures for Color and Texture
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
CLUSTERING PROXIMITY MEASURES
The Image Histogram.
Computer Vision – Image Representation (Histograms)
Unsupervised Learning with Artificial Neural Networks The ANN is given a set of patterns, P, from space, S, but little/no information about their classification,
Measuring the degree of similarity: PAM and blosum Matrix
Image Similarity and the Earth Mover’s Distance Empirical Evaluation of Dissimilarity Measures for Color and Texture Y. Rubner, J. Puzicha, C. Tomasi and.
Computer Vision Group, University of BonnVision Laboratory, Stanford University Abstract This paper empirically compares nine image dissimilarity measures.
The Capacity of Color Histogram Indexing Dong-Woei Lin NTUT CSIE.
Isometry invariant similarity
Dimensional reduction, PCA
Segmentation Graph-Theoretic Clustering.
Supervised Distance Metric Learning Presented at CMU’s Computer Vision Misc-Read Reading Group May 9, 2007 by Tomasz Malisiewicz.
Distance Measures Tan et al. From Chapter 2.
Cluster Analysis (1).
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
CS292 Computational Vision and Language Visual Features - Colour and Texture.
Content-Based Image Retrieval using the EMD algorithm Igal Ioffe George Leifman Supervisor: Doron Shaked Winter-Spring 2000 Technion - Israel Institute.
Distance Measures Tan et al. From Chapter 2. Similarity and Dissimilarity Similarity –Numerical measure of how alike two data objects are. –Is higher.
Separate multivariate observations
Image Processing David Kauchak cs458 Fall 2012 Empirical Evaluation of Dissimilarity Measures for Color and Texture Jan Puzicha, Joachim M. Buhmann, Yossi.
Today Wrap up of probability Vectors, Matrices. Calculus
Clustering Unsupervised learning Generating “classes”
Entropy and some applications in image processing Neucimar J. Leite Institute of Computing
Geo479/579: Geostatistics Ch13. Block Kriging. Block Estimate  Requirements An estimate of the average value of a variable within a prescribed local.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Image Features Kenton McHenry, Ph.D. Research Scientist.
CSE554Laplacian DeformationSlide 1 CSE 554 Lecture 8: Laplacian Deformation Fall 2012.
Machine Vision for Robots
1 TEMPLATE MATCHING  The Goal: Given a set of reference patterns known as TEMPLATES, find to which one an unknown pattern matches best. That is, each.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
Segmentation Course web page: vision.cis.udel.edu/~cv May 7, 2003  Lecture 31.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Paper: Large-Scale Clustering of cDNA-Fingerprinting Data Presented by: Srilatha Bhuvanapalli INFS 795 – Special Topics in Data Mining.
Data Extraction using Image Similarity CIS 601 Image Processing Ajay Kumar Yadav.
Images Similarity by Relative Dynamic Programming M. Sc. thesis by Ady Ecker Supervisor: prof. Shimon Ullman.
Computer Graphics and Image Processing (CIS-601).
CHAPTER 5 SIGNAL SPACE ANALYSIS
Chapter 2: Getting to Know Your Data
Levels of Image Data Representation 4.2. Traditional Image Data Structures 4.3. Hierarchical Data Structures Chapter 4 – Data structures for.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Colour and Texture. Extract 3-D information Using Vision Extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach,
Similarity Measures Spring 2009 Ben-Gurion University of the Negev.
Measurements and Data. Topics Types of Data Distance Measurement Data Transformation Forms of Data Data Quality.
Image features and properties. Image content representation The simplest representation of an image pattern is to list image pixels, one after the other.
Pattern Recognition Mathematic Review Hamid R. Rabiee Jafar Muhammadi Ali Jalali.
CS598:Visual information Retrieval
Université d’Ottawa / University of Ottawa 2003 Bio 8102A Applied Multivariate Biostatistics L4.1 Lecture 4: Multivariate distance measures l The concept.
What Is Cluster Analysis?
Lecture 2-2 Data Exploration: Understanding Data
Introduction to Data Mining
Clustering Algorithms
LECTURE 03: DECISION SURFACES
CH 5: Multivariate Methods
LOCUS: Learning Object Classes with Unsupervised Segmentation
The Earth Mover's Distance
Similarity and Dissimilarity
School of Computer Science & Engineering
Image Segmentation Techniques
بازیابی تصاویر بر اساس محتوا
Group 9 – Data Mining: Data
Information Theoretical Analysis of Digital Watermarking
Presentation transcript:

Visual matching: distance measures

Metric and non-metric distances: what distance to use It is generally assumed that visual data may be thought of as vectors (e.g. histograms) that can be compared for similarity using the Euclidean distance or more generally metric distances: Given a S set of patterns a distance d: S ×S →R is metric if satisfying: Self-identity: ∀ x ∈ S, d(x,x) = 0 Positivity: ∀ x≠y ∈ S, d(x,y) > 0 Symmetry: ∀ x,y ∈ S, d(x,y) = d(y,x) Triangle inequality: ∀ x,y,z ∈ S, d(x,z) ≤ d(x,y) + d(y,z) However, this may not be a valid assumption. A number of approaches in computer vision compare images using measures of similarity that are not Euclidean nor even metric, in that they do not obey the triangle inequality or simmetry.

Most notable cases where non-metric distances are suited are: Recognition systems that attempt to faithfully reflect human judgments of similarity. Much research in psychology suggests that human similarity judgments are not metric and distances are not symmetric. Matching of subsets of the images while ignoring the most dissimilar parts. In this case non-metric distances are less affected by extreme differences than the Euclidean distance, and more robust to outliers. Distance functions that are robust to outliers or to extremely noisy data will typically violate the triangle inequality. Comparison between data that are output of a complex algorithm, like image comparisons using deformable template matching scheme, has no obvious way of ensuring that the triangle inequality holds.

Human judgements of similarity: color histograms Feature vectors are often in the form of histograms that collect the distribution of salient features. Several distances can be defined between histograms. Choice is dependent on the goals of matching and type of histogram . When histograms include grey levels or colors and human perceptual similarity must be accounted, non-metric distances are preferred. Color histogram Euclidean distance is appropriate Euclidean distance is not appropriate

Human judgements of similarity: perceptual symmetry Symmetry does not always hold for human perception: d(A,B) < d(B,A) A B

? ? Human judgements of similarity: subset matching B A C Partial similarity is a non-metric relation: d(A,B) + d(B,C) < d(A,C) Slide from. M. Bronstein B A C ? ? Am I human? Am I equine? I am a centaur. Yes, I’m partially human. Yes, I’m partially equine.

Similarity from complex algorithms Shape deformation similarity is non-metric: similarity can be assessed by minimizing the energy of deformation E spent while maximizing matching M between edges

Metric distances Feature maching where image data are represented by vector data are well suited to work with metric distances. Many metric distance measures are possible. Among them: Heuristic Minkowski-form Geometric Cosine distance Working with distributions (histograms) L1 L2 (Euclidean) Hamming Weighted-Mean-Variance (WMV)

Minkowski distance Lp metrics also called Minkowski distance defined for two feature vectors A = (x1,…,xn) and B = (y1,…,yn) : L1: City Block or Manhattan d1 = ||A-B|| = L2: Euclidean distance (green line) L ∞: max, Chess board distance d∞ = maxi |xi – yi| The L1 norm and the L2 norm are mostly used because of their low computational cost.

L1 and L2 distance L1 and L2 distances are widely used in comparison of histogram representations. With color histograms comparison with Manhattan or Euclidean distance must take care of: the L1 and Euclidean distances result in many false negatives because neighboring bins are not considered the L2 distance is only suited for Lab and Luv color spaces. L1 and L2 distance d(H1,H2)

Hamming distance Hamming distance hypothesizes that histograms are binary vectors. With color histograms can detect absence/presence of colors: 1 0 0 0 0 1 0 0 1 0 Hamming distance d(H1,H2)

Cosine Distance x3 x1 x2 F1 = 2x1+ 3x2 + 5x3 F2 = 3x1 + 7x2 + x3 Cosine distance derives from the definition of dot product between two vectors. Geometrically, dot product means that a and b are drawn with a common start point and then the length of a is multiplied with the length of that component of b that points in the same direction as a. Cosine distance measures how much a is not aligned with b Properties: Metric Only angle is relevant, not vector lengths x3 x1 x2 F1 = 2x1+ 3x2 + 5x3 F2 = 3x1 + 7x2 + x3 Q = 0x1 + 0x2 + 2x3 7 3 2 5 Example: F1 = 2x1 + 3x2 + 5x3 F2 = 3x1 + 7x2 + x3 Q = 0x1 + 0x2 + 2x3 Q is closer to F1 than F2

Weighted-Mean-Variance Weighted-Mean-Variance (WMV) distance includes some minimal information about the data distribution: WMV is particularly quick because the calculation is quick and the values can be precomputed offline.

Non-metric distances With vector data : Heuristic Minkowski-form p<1 Mahalanobis Working with distributions (histograms) Nonparametric test statistics Kolmogorov-Smirnov (KS) Cramer/Von Mises (CvM)  2 (Chi Square) Ground distance measures Histogram intersection Quadratic form (QF) Earth Movers Distance (EMD) Information-theory divergences Kullback-Liebler (KL) Jeffrey-divergence (JD)

Effects of variance and covariance on Euclidean distance B A The ellipse shows the 50% contour of a hypothetical population. Euclidean distance is not suited to account for differences in variance between the variables and to account for correlations between variables. Points A and B have similar Euclidean distances from the mean, but point B is more different from the population than point A. This is particularly critical for effects connected to human perception in low-level feature image matching. In this case the Malahanobis distance should be used.

Mahalanobis (Quadratic) Distance Quadratic Form distance accounts for correlation between features : where W is the covariance matrix and diagonal terms are variance in each dimension and off diagonal terms indicate the dependency between variables. Properties: Metric only if w ij= wji and wii=1 Non-metric otherwise

Geometric interpretation of metric distances

Mahanalobis distance is used for color histogram similarity as it closely resembles human perception: being the similarity matrix denoting similarity between bins i and j of N feature vectors x1,…,xN, each of length n. The quadratic form distance in image retrieval results in false positives because it tends to overestimate the mutual similarity of color distributions without a pronounced mode: the same mass in a given bin of the first histogram is simultaneously made to correspond to masses contained in different bins of the other histogram

Histogram intersection Histogram intersection helps to check occurrence of object in region H obj[j] < H reg[j] . Histogram intersection is not symmetric: Histogram intersection is widely used because of its ability to handle partial matches when the areas of the two histograms are different.

Cumulative Difference distances Kolmogorov-Smirnov distance (KS) Cramer/von Mises distance (CvM) where Fr(I; .) is the marginal histogram distribution Both Kolmogorov-Smirnov and Cramer/von Mises distance are statistical measures that measure the underlying similarity of two unbinned distributions. Work only for 1D data or cumulative histograms. They are non-symmetric distance functions.

Cumulative Histogram Cumulative Histogram describes the probability that a random variable X with a certain pdf will be found at a value less than or equal to x. Normal Histogram Cumulative Histogram

Cumulative Difference Example Histogram 1 Histogram 2 Difference - = CvM = K-S =

2 distance 2 distance measures the underlying similarity of two samples where differences are emphasized: 2 distance measures how unlikely it is that one distribution was drawn from the population represented by the other. The major drawback of these measures is that it accounts only for the correspondence between bins with the same index and do not uses information across bins is the expected frequency

Earth Mover’s distance Earth Mover’s distance (EMD) between two distributions represents the minimum work to morph one distribution into the other. Given feature vectors with their associated feature weights A = {(x i,wi)} and B = {(yj,uj)} and a function fij expressing the capability of flowing from xi to yj over a distance dij earth moving distance is defined as:: provided that: fij≥0, Sj fij ≤ wi , Si fij ≤ ui , Sij f ij = min(W,U) Properties: Respects scaling Metric if d metric, and W = U If W ≠U: No positivity, surplus not taken into account, No triangle inequality

fij amount of mass from xi to yj dij distance from xi to yj Informally, if the two distributions represent different ways of amassing the same amount of material from a region D and the Earth Mover’s distance is given by the amount of mass times the distance by which it is moved. Region D fij amount of mass from xi to yj dij distance from xi to yj EMD = 0.23*155.7 + 0.51*252.3 + 0.26*316.3 = 246.7 EMDopt = 0.23*155.7 + 0.25*252.3 + 0.26*198.2 + 0.26*277 = 224.4

Earth Mover’s distance with histograms Earth Mover’s distance is widely used for color, edge, motion vector histograms. It is the only measure that works on distributions with a different number of bins. However it has high computational cost. Considering two histograms H1and H2 as defined f.e. in a color space, pixels can be regarded as the unit of mass to be transported from one distribution to the other. It has to be based on some metric of distance between individual features. ≠

=

Computing Earth Mover’s Distance (amount moved) =

(amount moved) * (distance moved) =

With variable length representations Using Earth Mover’s Distance can be applied to evaluate distance between histograms with different bins, subjected to a few constrains: P (distance moved) * (amount moved) m clusters Q n clusters

P P’ Q Q’ Constraints 1. Move “earth” only from P to Q m clusters n clusters

P P’ Q Q’ Constraints 2. Cannot send more “earth” than there is m clusters P’ Q Q’ n clusters

Constraints 3. Q cannot receive more “earth” than it can hold P m clusters P’ Q Q’ n clusters

P P’ Q Q’ Constraints 4. As much “earth” as possible must be moved m clusters P’ Q Q’ n clusters

Kullback-Leibler distance Kullback-Leibler distance considers histograms as distributions and measures their similarity by calculating the relative entropy. It measures the shared information between two variables.: i.e. the cost of encoding one distribution as another. In other words it measures how well can one distribution be coded using the other as a codebook Si Hi[Iq] = Si Hi[Id] =1 Hi[Iq], Hi[Id] ≥ 0 The Kullback-Leibler divergence is not symmetric. It can be used to determine “how far away” a probability distribution P is from another distribution Q i.e. as a distance measure between two documents. The Kullback Leibler divergence does not necessarily match perceptual similarity well and is sensitive to histogram binning.

Jeffrey divergence Jeffrey divergence The divergence is an empirical modification of the KL divergence that is numerically stable, symmetric and robust with respect to noise and the size of histogram bins

Distance properties summary /- /- /- /- Lp Minkowski-form WMV Weighted-Mean-Variance  2 Chi Square KS CvM Kolmogorov-Smirnov Cramer/von Mises KL Kullback-Liebler JD Jeffrey-divergence QF Quadratic form EMD Earth Movers Distance by Kein Folientitel

Examples using Color CIE Lab L1 distance Jeffrey divergence χ2 statistics Quadratic form distance Earth Mover Distance

Image Lookup

Merging Similarities In the case in which several features are considered, distances computed between each feature vector can be merged together to evauate the full similarity. Combination of distances can be performed according to different policies: Linear weighting: combine k different feature distances d i, e.g. color, texture and shape distances: Linear weighting (weighted average) Non-linear weighting: α-trimmed mean: weight only α percent highest of the k values

Distances for symbolic representations In some cases features are represented as strings of symbols. This is the case of spatial relations, temporal features, semantic content….. In these cases edit distances can be used that compute the number of changes required to transform one string into the other: Edit distance operations that are considered are: Insertion, where an extra character is inserted into the string Deletion, where a character has been removed from the string Transposition, in which two characters are reversed in their sequence Substitution, which is an insertion followed by a deletion

Hamming and Levenshtein distances The Hamming distance (seen for histograms) is suited to compute edit distances between binary vectors . the Needleman-Wunch distance (specialization of Levenshtein edit distance) between components of the feature vectors: A 1111111222111111111 B 1111111444111111111 N-W Distance: 6 = (4-2) + (4-2) + (4-2)