DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation Alexander Hinneburg Martin-Luther-University Halle-Wittenberg, Germany Hans-Henning Gabriel.

Slides:

Advertisements

Similar presentations

EM Algorithm Jur van den Berg.

Advertisements

Expectation Maximization

Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

K Means Clustering , Nearest Cluster and Gaussian Mixture

Information Bottleneck EM School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel Gal Elidan and Nir Friedman.

Segmentation and Fitting Using Probabilistic Methods

Jeroen Hermans, Frederik Maes, Dirk Vandermeulen, Paul Suetens

Machine Learning and Data Mining Clustering

Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 143, Brown James Hays 02/22/11 Many slides from Derek Hoiem.

Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/12.

Mixture Language Models and EM Algorithm

EE-148 Expectation Maximization Markus Weber 5/11/99.

First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.

Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.

Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.

Lecture 5: Learning models using EM

Overview Of Clustering Techniques D. Gunopulos, UCR.

Parametric Inference.

Expectation Maximization Algorithm

Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”

Unsupervised Learning

Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.

Expectation-Maximization

What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.

Part 3 Vector Quantization and Mixture Density Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.

A Unifying Review of Linear Gaussian Models

Clustering & Dimensionality Reduction 273A Intro Machine Learning.

Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 9, Friday June 15 th, 2007 (EM.

Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.

Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.

An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg and Daniel A. Keim.

Lecture 19: More EM Machine Learning April 15, 2010.

CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.

Topic9: Density-based Clustering

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

CS654: Digital Image Analysis Lecture 30: Clustering based Segmentation Slides are adapted from:

HMM - Part 2 The EM algorithm Continuous density HMM.

CY3A2 System identification1 Maximum Likelihood Estimation: Maximum Likelihood is an ancient concept in estimation theory. Suppose that e is a discrete.

Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.

Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

Lecture 2: Statistical learning primer for biologists

Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.

Chapter 13 (Prototype Methods and Nearest-Neighbors )

CSE 517 Natural Language Processing Winter 2015

Information Bottleneck versus Maximum Likelihood Felix Polyakov.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.

Modeling Annotated Data (SIGIR 2003) David M. Blei, Michael I. Jordan Univ. of California, Berkeley Presented by ChengXiang Zhai, July 10, 2003.

EM Algorithm 主講人：虞台文大同大學資工所智慧型多媒體研究室. Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model.

Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.

1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.

Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Classification of unlabeled data:

Expectation-Maximization

Igor V. Cadez, Padhraic Smyth, Geoff J. Mclachlan, Christine and E

Mathematical Foundations of BME

Clustering (2) & EM algorithm

Presentation transcript:

DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation Alexander Hinneburg Martin-Luther-University Halle-Wittenberg, Germany Hans-Henning Gabriel 101tec GmbH, Halle, Germany

Overview Density-based clustering and DENCLUE 1.0 Hill climbing as EM-algorithm Identification of local maxima Applications of general EM-acceleration Experiments

Density-Based Clustering Assumption –clusters are regions of high density in the data space, How to estimate density? –parametric models mixture models –non-parametric models histogram kernel density estimation

Kernel Density Estimation Idea –influence of a data point is modeled by a kernel –density is the normalized sum of all kernels –smoothing parameter h Gaussian Kernel Density Estimate

DENCLUE 1.0 Framework Clusters are defined by local maxima of the density estimate –find all maxima by hill climbing Problem –const. step size Gradient Hill Climbing const. step size

Problem of const. Step Size Not efficient –many unnecessary small steps Not effective –does not converge to a local maximum just comes close Example

New Hill Climbing Approach General approach –differentiate density estimate and set to zero –no solution, but can be used for iteration

New DENCLUE 2.0 Hill Climbing Efficient –automatically adjusted step size at no extra costs Effective –converges to local maximum (proof follows) Example

Proof of Convergence Cast the problem of maximizing kernel denstiy as maximizing the likelihood of a mixture model Introduce hidden variable

Proof of Convergence Complete likelihood is maximized by EM-Algorithm this also maximizes the original likelihood, which is the kernel density estimate When starting the EM with we do the hill climbing for E-Step M-Step

Identification of local Maxima EM-Algorithm iterates until –reached end point –sum of k last step sizes Assumption –true local maximum is in a ball of around Points with end points closer belong to the same maximum M In case of non-unique assignment do a few extra EM iterations

Acceleration Sparse EM –update only the p% points with largest posterior –saves 1-p% of kernel computations after first iteration Data Reduction –use only %p of the data as representative points –random sampling –kMeans

Experiments Comparison of DENCLUE 1.0 (FS) vs. 2.0 (SSA) 16-dim. artificial data both methods are tuned to find the correct clustering

Experiments Comparison of acceleration methods

Experiments Clustering quality (normalized mutual information, NMI) vs. sample size (RS)

Experiments Cluster Quality (NMI) of DENCLUE 2.0 (SSA) and acceleration methods and k-Means on real data sample sizes 0.8, 0.4, 0.2

Conclusion New hill climbing for DENCLUE Automatic step size adjustment Convergence proof by reduction to EM Allows the application of general EM accelerations Future work –automatic setting of smoothing parameter h (so far tuned manually)

Thank you for your attention!