Presentation is loading. Please wait.

Presentation is loading. Please wait.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

Similar presentations


Presentation on theme: "Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University."— Presentation transcript:

1 Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering
Some content provided by Milos Hauskrecht, University of Pittsburgh Computer Science

2 ITK Questions?

3 Classification What is classification??

4 Classification Classification is simply the problem of separating different classes of data in some feature space This is a linear decision boundary…can be other types (and often are)

5 Classification Quadratic decision boundary
These depict decision boundaries in two dimensions…feature space is n-dimensional

6 Features Loosely stated, a feature is a value describing something about your data points (e.g. for pixels: intensity, local gradient, distance from landmark, etc) Multiple (n) features are put together to form a feature vector, which defines a data point’s location in n-dimensional feature space

7 Feature Space Feature Space -
The theoretical n-dimensional space occupied by n input raster objects (features). Each feature represents one dimension, and its values represent positions along one of the orthogonal coordinate axes in feature space. The set of feature values belonging to a data point define a vector in feature space. - Explain feature space (features) as it pertains to image analysis

8 Statistical Notation Class probability distribution:
p(x,y) = p(x | y) p(y) x: feature vector – {x1,x2,x3…,xn} y: class p(x | y): probabilty of x given y p(x,y): probability of both x and y - p(x|y) = probability of x given y

9 Example: Binary Classification

10 Example: Binary Classification
Two class-conditional distributions: p(x | y = 0) p(x | y = 1) Priors: p(y = 0) + p(y = 1) = 1

11 Modeling Class Densities
In the text, they choose to concentrate on methods that use Gaussians to model class densities

12 Modeling Class Densities
- Note that these are identical gaussians (i.e. equal covariance)

13 Generative Approach to Classification
Represent and learn the distribution: p(x,y) Use it to define probabilistic discriminant functions e.g. go(x) = p(y = 0 | x) g1(x) = p(y = 1 | x) - Discriminant function is able to determine class given data point

14 Generative Approach to Classification
Typical model: p(x,y) = p(x | y) p(y) p(x | y) = Class-conditional distributions (densities) p(y) = Priors of classes (probability of class y) We Want: p(y | x) = Posteriors of classes You get p(x,y), denoting the probability of the data and the class Want p(y|x), posterior

15 Class Modeling We model the class distributions as multivariate Gaussians x ~ N(μ0, Σ0) for y = 0 x ~ N(μ1, Σ1) for y = 1 Priors are based on training data, or a distribution can be chosen that is expected to fit the data well (e.g. Bernoulli distribution for a coin flip) - N(mu, sigma) represents a normal (gaussian) distribution

16 Making a class decision
We need to define discriminant functions ( gn(x) ) We have two basic choices: Likelihood of data – choose the class (Gaussian) that best explains the input data (x): Posterior of class – choose the class with a better posterior probability:

17 Calculating Posteriors
Use Bayes’ Rule: In this case,

18 Linear Decision Boundary
When covariances are the same

19 Linear Decision Boundary

20 Linear Decision Boundary

21 Quadratic Decision Boundary
When covariances are different

22 Quadratic Decision Boundary

23 Quadratic Decision Boundary
- Ok, that’s it for Linear classifiers for now…on to more interesting stuff: Clustering

24 Clustering Basic Clustering Problem: Clustering is useful for:
Distribute data into k different groups such that data points similar to each other are in the same group Similarity between points is defined in terms of some distance metric Clustering is useful for: Similarity/Dissimilarity analysis Analyze what data point in the sample are close to each other Dimensionality Reduction High dimensional data replaced with a group (cluster) label - In many respects clustering is a similar problem to classification

25 Clustering

26 Clustering

27 Distance Metrics Euclidean Distance, in some space (for our purposes, probably a feature space) Must fulfill three properties:

28 Distance Metrics Common simple metrics:
Euclidean: Manhattan: Both work for an arbitrary k-dimensional space

29 Clustering Algorithms
k-Nearest Neighbor k-Means Parzen Windows

30 k-Nearest Neighbor In essence, a classifier Requires input parameter k
In this algorithm, k indicates the number of neighboring points to take into account when classifying a data point Requires training data

31 k-Nearest Neighbor Algorithm
For each data point xn, choose its class by finding the most prominent class among the k nearest data points in the training set Use any distance measure (usually a Euclidean distance measure)

32 k-Nearest Neighbor Algorithm
+ e1 + q1 - + + - - Note: k=1 turns out to be a voronoi diagram 1-nearest neighbor: the concept represented by e1 5-nearest neighbors: q1 is classified as negative

33 k-Nearest Neighbor Advantages: Disadvantages: Simple
General (can work for any distance measure you want) Disadvantages: Requires well classified training data Can be sensitive to k value chosen All attributes are used in classification, even ones that may be irrelevant Inductive bias: we assume that a data point should be classified the same as points near it

34 k-Means Suitable only when data points have continuous values
Groups are defined in terms of cluster centers (means) Requires input parameter k In this algorithm, k indicates the number of clusters to be created Guaranteed to converge to at least a local optima

35 k-Means Algorithm Algorithm: Randomly initialize k mean values
Repeat next two steps until no change in means: Partition the data using a similarity measure according to the current means Move the means to the center of the data in the current partition Stop when no change in the means Explain all of this in better terms… Data is assigned to whichever mean it is closest to Then we move means to represent center of its current set of data points

36 k-Means

37 k-Means Advantages: Disadvantages: Simple
General (can work for any distance measure you want) Requires no training phase Disadvantages: Result is very sensitive to initial mean placement Can perform poorly on overlapping regions Doesn’t work on features with non-continuous values (can’t compute cluster means) Inductive bias: we assume that a data point should be classified the same as points near it

38 Parzen Windows Similar to k-Nearest Neighbor, but instead of using the k closest training data points, its uses all points within a kernel (window), weighting their contribution to the classification based on the kernel As with our classification algorithms, we will consider a gaussian kernel as the window

39 Parzen Windows Assume a region defined by a d-dimensional Gaussian of scale σ We can define a window density function: Note that we consider all points in the training set, but if a point is outside of the kernel, its weight will be 0, negating its influence

40 Parzen Windows Left – small window…high accuracy for given test sample, but VERY specific…probably no good for new data Right – much more general…for large set, probably more accurate

41 Parzen Windows Advantages: Disadvantages:
More robust than k-nearest neighbor Excellent accuracy and consistency Disadvantages: How to choose the size of the window? Alone, kernel density estimation techniques provide little insight into data or problems


Download ppt "Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University."

Similar presentations


Ads by Google