Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2
Intensity Model “Density Estimation”
3 Intensity Models The histogram of the whole image will represent the rate of occurrence for all the classes in the given empirical density Each class can be described by the histogram of the occurrences of the gray levels within that class Intensity models describe the statistical characteristics of each class in the given image. marginal density T he objective of the intensity model is to estimate the marginal density for each class from the mixed normalized histogram of the occurrences of the gray levels
4 Density estimation can be studied, under two primary umbrellas: Parametric methods and Nonparametric methods. Intensity Models Nonparametric methods take a strong stance of letting the data (e.g., pixels’ gray levels) represent themselves. One of the core methods that nonparametric density estimation approaches based on is the k-nearest neighbors (k-NN) method. These approaches calculate the probability of a sample by combining the memorized responses for the k nearest neighbors of this sample in the training data. Nonparametric methods achieve good estimation for any input distribution as more data are observed. Flexible: they can fit almost any data well. No prior knowledge is required. However, they apparently often have a high computational cost and have many parameters that need to be tuned.
5 Nonparametric methods 1 Based on the fact that the probability P that a given sample x falls within a region (window) of R is given by This integral can be approximated either by the product of the value of p(x) with the area/volume of the region, or by the ratio of number of samples fall within the region: if we observe a large number, n, of fish and count those whose length fall within the range defined by R, then k/n can be used as n estimate of P as n→∞ 1 Chapter 4, Duda, R., Hart, P., Stork, D., Pattern Classification, 2nd edR John Wiley & Sons.
6 Nonparametric methods In order to make sure that we get a good estimate of p(x) at each point, we have to have lots of data points (instances) for any given R (or volume V ). This can be done in two ways We can fix V and take more and more samples in this volume. Then k/n→P, however, we then estimate only an average of of p(x), not the p(x) itself, because P can change in any region of nonzero volume Alternatively, we can fix n and make V →0, so that p(x) is constant in that region. However, in practice we have a finite number of training data, so as V →0, V will be so small that it will eventually contain no samples: k =0 p(x) = 0, a useless result! Therefore, V →0 is not feasible and there will always be some variance in k/n and hence some averaging in p(x) within the finite non-zero volume V. A compromise need to be found for V so that It will be large enough to contain sufficient number of samples It will be small enough to justify our assumption of p(x) be constant within the chosen volume/region.
7 Nonparametric methods To make sure that k/n is a good estimate of P, and consequently, p n (x*) is a good estimate p(x*) the following need to be satisfied:
8 Nonparametric methods There are two ways to ensure these conditions: Shrink an initial volume V n as a function of n, e.g.,. Then, as n increases so does k, which can be determined from the training data. Parzen Windows (PW) density estimation Specify k n as a function of n, e.g., V n grows until it encloses k n samples. Then V n can be determined from the training data. K- Nearest Neighbor (KNN) It can be shown that as n→∞, both KNN and PW approach the true density p(x*), provided that V n shrinks and k n grows proportionately with n.
9 Parzen Windows The number of samples falling into the specified region is obtained by the help of a windowing function, hence the name Parzen windows. We first assume that R is a d -dimensional hypercube of each side h, whose volume is then V=(h) d Then define a window function φ(u), called a kernel function, to count the number of samples k that fall into R
10 Parzen Windows Example:Given this image D = {1,2,5,1,1,2,3,5,5,1.5,1.5} For h=1, compute p(2.5) d = V = n = K = p(2.5) =
11 Parzen Windows Now consider ϕ (.) as a general function, typically a smooth and continuous function, instead of a hypercube. The general expression of p(x) remains unchanged Then p(x) is an interpolation of ϕ (.) s, where each ϕ (.) measures how far a given x i is from x In practice, x i are the training data points and we estimate p(x) by interpolating the contributions of each sample data point x i based on its distance from x, the point at which we want to estimate the density. The kernel function ϕ (.) provides the numerical value of this distance If ϕ (.) is itself a distribution, then p(x) will converge to p(x) as n increases. A typical choice for ϕ (.) is the Gaussian The density p(x) is then estimated simply by a superposition of Gaussians, where each Gaussian is centered at the training data instances. The parameter h is then the variance of the Gaussian
12 Assume you have n samples drawn from normal distribution N(0,1). Use PW with Gaussian kernel to estimate this distribution. Try different window widths and numbers of samples. i.e., Try to generate a similar figure. Image Modeling Homework #2 due Sept. 1 st
13 Classification using Parzen Windows In classifiers based on Parzen-window estimation, we estimate the densities for each class and classify a test point by the label corresponding to the maximum posterior. 10,c 1 20,c 1 15,c ,c ,c 2 60,c ,c ,c 2 Example: