6. Introduction to nonparametric clustering

Slides:



Advertisements
Similar presentations
Hierarchical Dirichlet Processes
Advertisements

Data Mining Cluster Analysis: Advanced Concepts and Algorithms
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Clustering Prof. Navneet Goyal BITS, Pilani
K Means Clustering , Nearest Cluster and Gaussian Mixture
Pattern recognition Professor Aly A. Farag
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Chapter 4 (Part 1): Non-Parametric Classification
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Segmentation Divide the image into segments. Each segment:
© University of Minnesota Data Mining CSCI 8980 (Fall 2002) 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Clustering.
1 Unsupervised Learning: Estimating the Cluster Tree of a Density by Analyzing the Minimal Spanning Tree of a Sample Werner Stuetzle Professor and Chair,
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Jeremy Tantrum, Department of Statistics, University of Washington joint work with Alejandro Murua & Werner Stuetzle Insightful Corporation University.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Image Segmentation Rob Atlas Nick Bridle Evan Radkoff.
Computer Vision James Hays, Brown
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
CS654: Digital Image Analysis Lecture 30: Clustering based Segmentation Slides are adapted from:
Intro. ANN & Fuzzy Systems Lecture 23 Clustering (4)
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.
Presented by Ho Wai Shing
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
1 8. Estimating the cluster tree of a density from the MST by Runt Pruning Problem: 1-nn density estimate is very noisy --- singularity at each observation.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Another Example: Circle Detection
Clustering (1) Clustering Similarity measure Hierarchical clustering
Spatial Data Management
Data Transformation: Normalization
Hierarchical Clustering: Time and Space requirements
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 8: Introduction to Statistics CIS Computational Probability.
Data Science Algorithms: The Basic Methods
Bayesian Generalized Product Partition Model
Ch8: Nonparametric Methods
Mean Shift Segmentation
Computer Vision Lecture 12: Image Segmentation II
Clustering (3) Center-based algorithms Fuzzy k-means
A segmentation and tracking algorithm
3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
The University of Adelaide, School of Computer Science
Efficient Distance Computation between Non-Convex Objects
Hierarchical clustering approaches for high-throughput data
Grouping.
Efficient Distribution-based Feature Search in Multi-field Datasets Ohio State University (Shen) Problem: How to efficiently search for distribution-based.
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Text Categorization Berlin Chen 2003 Reference:
Nonparametric density estimation and classification
CSE572: Data Mining by H. Liu
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Clustering.
Presentation transcript:

6. Introduction to nonparametric clustering Regard feature vectors x1, … , xn as sample from some density p(x) Parametric approach: (Cheeseman, McLachlan, Raftery) Based on premise that each group g is represented by density pg that is a member of some parametric family => p(x) is a mixture Estimate the parameters of the group densities, the mixing proportions, and the number of groups from the sample. Nonparametric approach: (Wishart, Hartigan) Based on the premise that distinct groups manifest themselves as multiple modes of p(x) Estimate modes from sample 11/17/2018

6.1 Describing the modal structure of a density Consider feature vectors x1 , …. , xn as a sample from some density p(x) . Define level set L(c ; p) as the subset of feature space for which the density p(x) is greater than c. Note: Level sets with multiple connected components indicate multi-modality There might not be a single level set that reveals all the modes 11/17/2018

The cluster tree of a density Modal structure of density is described by cluster tree. Each node N of cluster tree represents a subset D(N) of feature space is associated with a density level c(N) Root node represents the entire feature space is associated with density level c(N) = 0 Tree defined recursively: to determine descendents of node N Find lowest level c for which intersection of D(N) with L(c ; p) has two connected components If there is no such c then N is leaf of tree; leaves of tree <==> modes Otherwise, create daughter nodes representing the connected components, with associated level c 11/17/2018

First step: Estimate p(x) by density estimate p*(x) (see below) Goal: Estimate the cluster tree of the underlying density p(x) from the sample feature vectors x1 , …. , xn First step: Estimate p(x) by density estimate p*(x) (see below) Second step: Compute cluster tree of p* (maybe approximately) 11/17/2018

6.2 Density estimation Consider feature vectors x1 , …. , xn as a sample from some density p(x). Goal: Estimate p(x) Simplest idea: Let S(x, r) denote a sphere in feature space with radius r, centered at x. Assuming density is roughly constant over S(x, r), the expected number of sample points in S(x, r) is k ~ n * Volume ( S(x, r) ) * p(x), giving p(x) ~ k / (n * Volume ( S(x, r) ) Kernel estimate: Fix radius r ; k = # of sample feature vectors in S(x, r) K-near-neighbor estimate: Fix count k; r = smallest radius for which S(x, r) contains k sample feature vectors Many refinements have been suggested 11/17/2018

Example - kernel density estimate in 2-d Swept under the rug: Choice of sphere radius r (for kernel estimate) or count k (for near-neighbor estimate) --- critical !! There are automatic methods. Down-weight observations depending on distance from query point Adaptive estimation --- vary radius r depending on density Other types of estimates, etc, etc, etc (extensive literature) 11/17/2018

Computational complexity Computing kernel or near-neighbor estimate at query point x requires finding nearest neighbors of x in sample x1 , …. , xn. Can find k nearest neighbors of x in time ~ log n using spatial partitioning schemes such as k-d trees, after n log n pre-processing However Spatial partitioning most effective if n large relative to d. Theoretical analysis shows that number of nearest neighbors should increase with n and decrease with dimensionality d: k ~ n ^ (4 / (d + 4)). Relevance ? In low dimensions (d <= 4) can use histogram or average shifted histogram density estimates based on regular binning. Evaluation for query point in constant time, after pre-processing ~ n High dimensionality may present problem 11/17/2018

6.3 Recursive algorithms for constructing a cluster tree For most density estimates p*(x), computing level sets and finding their connected components is a daunting problem --- especially in high dimensions. Idea: Compute sample cluster tree instead Each node N of sample cluster tree represents a subset X(N) of the sample is associated with a density level c(N) Root node represents the entire sample is associated with density level c(N) = 0 11/17/2018

To determine descendents of node N Find lowest level c for which the intersection of X(N) with L(c ; p*) falls into two connected components Note: Intersection of X(N) with L(c ; p*) consists of those feature vectors in the node N for which estimated density p*(xi) > c. @ If there is no such c then N is leaf of tree; Otherwise, create daughter nodes representing the “connected components”, with associated level c. Note: @ is the critical step. Will in general have to rely on heuristic. Daughters of a node N do not define a partition of X(N). Assigning low density observations in X(N) to one of the daughters is supervised learning problem 11/17/2018

Illustration 11/17/2018

Heuristic 1 : (goes with k-near-neighbor density estimate) Critical step Find lowest level c for which observations in X(N) with estimated density p*(xi) > c fall into two connected components of level set L(c ; p*) Heuristic 1 : (goes with k-near-neighbor density estimate) Select feature vectors xi in X(N) with p*(xi) > c Generate graph connecting each feature vector to its k nearest neighbors Check whether graph has 1 or 2 connected components Heuristic 2 : (goes with kernel density estimate) Generate graph connecting feature vectors with distance < r 11/17/2018

6.4 Related work / references Looking for the connected components of a level set --- One-level Mode Analysis --- was first suggested by David Wishart (1969). Wishart’s paper appeared in obscure place --- Proceedings of the Colloquium in Numerical Taxonomy, St. Andrews, 1968. Nobody in CS cites Wishart. Idea has been re-invented multiple times --- “sharpening” (Tukey & Tukey); DBSCAN (Ester et al)… Methods differ in heuristics for finding connected components of level set. Wishart also realized that looking at single level set might not be enough to detect all the modes ==> Hierarchical Mode Analysis. Did not think of it as estimating cluster tree. Algorithm awkward --- based on iterative merging instead of recursive partitioning. OPTICS method of Ankerst et al also considers level sets for different levels. 11/17/2018