Discrimination and Classification

Slides:



Advertisements
Similar presentations
Discrimination and Classification. Discrimination Situation: We have two or more populations  1,  2, etc (possibly p-variate normal). The populations.
Advertisements

Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition.
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Clustering.
Clustering (2). Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram –A tree like.
Hierarchical Clustering
Unsupervised Learning
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Hierarchical Clustering, DBSCAN The EM Algorithm
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.
Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Today Unsupervised Learning Clustering K-means. EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms Ali Al-Shahib.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
A Probabilistic Model for Classification of Multiple-Record Web Documents June Tang Yiu-Kai Ng.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Continuous Random Variables and Probability Distributions
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.
Classification with several populations Presented by: Libin Zhou.
Clustering Unsupervised learning Generating “classes”
Clustering Algorithms Mu-Yu Lu. What is Clustering? Clustering can be considered the most important unsupervised learning problem; so, as every other.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Principles of Pattern Recognition
Chapter 12 – Discriminant Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
Clustering Algorithms Presented by Michael Smaili CS 157B Spring
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L10.1 Lecture 10: Cluster analysis l Uses of cluster analysis.
Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.
Discrimination and Classification. Discrimination Situation: We have two or more populations  1,  2, etc (possibly p-variate normal). The populations.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Linear Classifier Team teaching.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Lecture 2. Bayesian Decision Theory
Chapter 12 – Discriminant Analysis
Data Mining: Basic Cluster Analysis
Semi-Supervised Clustering
CH 5: Multivariate Methods
Inference for the mean vector
Data Clustering Michael J. Watts
Clustering.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.
A graphical explanation
Mathematical Foundations of BME
数据的矩阵描述.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Multivariate Methods Berlin Chen
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies
Multivariate Methods Berlin Chen, 2005 References:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
SEEM4630 Tutorial 3 – Clustering.
EM Algorithm and its Applications
Discrimination and Classification
Presentation transcript:

Discrimination and Classification

The Optimal Classification Rule Suppose that the data x1, … , xp has joint density function f(x1, … , xp ;q) where q is either q1 or q2. Let g(x1, … , xp) = f(x1, … , xn ;q1) and h(x1, … , xp) = f(x1, … , xn ;q2) We want to make the decision D1: q = q1 (g is the correct distribution) against D2: q = q2 (h is the correct distribution)

then the optimal regions (minimizing ECM, expected cost of misclassification) for making the decisions D1 and D2 respectively are C1 and C2 and where

Fishers Linear Discriminant Function. Suppose that x1, … , xp is either data from a p-variate Normal distribution with mean vector: The covariance matrix S is the same for both populations p1 and p2.

We make the decision D1 : population is p1 if where and

In the case where the populations are unknown but estimated from data Fisher’s linear discriminant function

The function Is called Fisher’s linear discriminant function

Discrimination of p-variate Normal distributions (unequal Covariance matrices) Suppose that x1, … , xp is either data from a p-variate Normal distribution with mean vector: and covariance matrices, S1 and S2 respectively.

The optimal rule states that we should classify into populations p1 and p2 using: That is make the decision D1 : population is p1 if l ≥ k

or where and

Summarizing we make the decision to classify in population p1 if: where and

Discrimination of p-variate Normal distributions (unequal Covariance matrices)

Classification or Cluster Analysis Have data from one or several populations

Situation Have multivariate (or univariate) data from one or several populations (the number of populations is unknown) Want to determine the number of populations and identify the populations

Example

Hierarchical Clustering Methods The following are the steps in the agglomerative Hierarchical clustering algorithm for grouping N objects (items or variables). Start with N clusters, each consisting of a single entity and an N X N symmetric matrix (table) of distances (or similarities) D = (dij). Search the distance matrix for the nearest (most similar) pair of clusters. Let the distance between the "most similar" clusters U and V be dUV. Merge clusters U and V. Label the newly formed cluster (UV). Update the entries in the distance matrix by deleting the rows and columns corresponding to clusters U and V and adding a row and column giving the distances between cluster (UV) and the remaining clusters.

Repeat steps 2 and 3 a total of N-1 times Repeat steps 2 and 3 a total of N-1 times. (All objects will be a single cluster a termination of this algorithm.) Record the identity of clusters that are merged and the levels (distances or similarities) at which the mergers take place.

Different methods of computing inter-cluster distance

Example To illustrate the single linkage algorithm, we consider the hypothetical distance matrix between pairs of five objects given below:

Treating each object as a cluster, the clustering begins by merging the two closest items (3 & 5). To implement the next level of clustering we need to compute the distances between cluster (35) and the remaining objects: d(35)1 = min{3,11} = 3 d(35)2 = min{7,10} = 7 d(35)4 = min{9,8} = 8 The new distance matrix becomes:

The new distance matrix becomes: The next two closest clusters ((35) & 1) are merged to form cluster (135). Distances between this cluster and the remaining clusters become:

Distances between this cluster and the remaining clusters become: d(135)2 = min{7,9} = 7 d(135)4 = min{8,6} = 6 The distance matrix now becomes: Continuing the next two closest clusters (2 & 4) are merged to form cluster (24).

Distances between this cluster and the remaining clusters become: d(135)(24) = min{d(135)2,d(135)4)= min{7,6} = 6 The final distance matrix now becomes: At the final step clusters (135) and (24) are merged to form the single cluster (12345) of all five items.

The results of this algorithm can be summarized graphically on the following "dendogram"

for clustering the 11 languages on the basis of the ten numerals Dendograms for clustering the 11 languages on the basis of the ten numerals

Dendogram Cluster Analysis of N=22 Utility companies Euclidean distance, Average Linkage

Dendogram Cluster Analysis of N=22 Utility companies Euclidean distance, Single Linkage