Clustering methods: Part 10

Slides:



Advertisements
Similar presentations
Clustering Data Streams Chun Wei Dept Computer & Information Technology Advisor: Dr. Sprague.
Advertisements

People Counting and Human Detection in a Challenging Situation Ya-Li Hou and Grantham K. H. Pang IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART.
Hierarchical Clustering, DBSCAN The EM Algorithm
PARTITIONAL CLUSTERING
Unsupervised Learning
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
K Means Clustering , Nearest Cluster and Gaussian Mixture
Patch-based Image Deconvolution via Joint Modeling of Sparse Priors Chao Jia and Brian L. Evans The University of Texas at Austin 12 Sep
ICIP 2000, Vancouver, Canada IVML, ECE, NTUA Face Detection: Is it only for Face Recognition?  A few years earlier  Face Detection Face Recognition 
Lecture 6 Image Segmentation
Adaptive Offset Subspace Self- Organizing Map with an Application to Handwritten Digit Recognition Huicheng Zheng, Pádraig Cunningham and Alexey Tsymbal.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
Clustering.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Image Segmentation Image segmentation is the operation of partitioning an image into a collection of connected sets of pixels. 1. into regions, which usually.
ENN: Extended Nearest Neighbor Method for Pattern Recognition
Clustering Methods: Part 2d Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.
Clustering methods Course code: Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu,
Self-organizing map Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND Pasi Fränti Clustering Methods: Part.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.
Yang, Luyu.  Postal service for sorting mails by the postal code written on the envelop  Bank system for processing checks by reading the amount of.
Density-Based Clustering Algorithms
Fast search methods Pasi Fränti Clustering methods: Part 5 Speech and Image Processing Unit School of Computing University of Eastern Finland
Color Image Segmentation Speaker: Deng Huipeng 25th Oct , 2007.
Self Organizing Feature Map CS570 인공지능 이대성 Computer Science KAIST.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Other Clustering Techniques
Visual Tracking by Cluster Analysis Arthur Pece Department of Computer Science University of Copenhagen
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
1 On Demand Classification of Data Streams Charu C. Aggarwal Jiawei Han Philip S. Yu Proc Int. Conf. on Knowledge Discovery and Data Mining (KDD'04),
Clustering Categorical Data
CLARANS: A Method for Clustering Objects for Spatial Data Mining IEEE Transactions on Knowledge and Data Enginerring, 2002 Raymond T. Ng et al. 22 MAR.
Agglomerative clustering (AC)
Chapter 5 Unsupervised learning
Clustering Anna Reithmeir Data Mining Proseminar 2017
Semi-Supervised Clustering
CSE 4705 Artificial Intelligence
Classification of unlabeled data:
Clustering Uncertain Taxi data
Machine Learning University of Eastern Finland
BIRCH: An Efficient Data Clustering Method for Very Large Databases
Topic 3: Cluster Analysis
Overview of Supervised Learning
K Nearest Neighbor Classification
Image Segmentation Techniques
CSSE463: Image Recognition Day 23
Data Mining Anomaly/Outlier Detection
K-means properties Pasi Fränti
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
CSE572, CBS572: Data Mining by H. Liu
Clustering Wei Wang.
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Patch-Based Image Classification Using Image Epitomes
CSSE463: Image Recognition Day 23
Pasi Fränti and Sami Sieranoja
A Block Based MAP Segmentation for Image Compression
Topic 5: Cluster Analysis
CSSE463: Image Recognition Day 23
CSE572: Data Mining by H. Liu
Machine Learning – a Probabilistic Perspective
Dimensionally distributed Pasi Fränti and Sami Sieranoja
Presentation transcript:

Clustering methods: Part 10 Very large data sets Pasi Fränti 5.5.2014 Speech and Image Processing Unit School of Computing University of Eastern Finland

Methods for large data sets Birch Clarans On-line EM Scalable EM GMG Let’s study this (no material for the others) 

Gradual model generator (GMG) [Kärkkäinen & Fränti, 2007: Pattern Recognition] Problem split into two parts, model generation and later processing of the model Gather points into buffer Select subset of points to generate a new component into model Points that fit the model are used to update the model directly Repeat until all points have been used in either component generation or direct update Postprocessing can be done without the original data

Goal of the GMG algorithm EM GMG Generate a model of data in single pass without storing the entire data set EM does several passes, GMG one, probability density distribution has quite similar form

Contours of probability density distributions EM GMG Generate a model of data in single pass without storing the entire data set EM does several passes, GMG one, probability density distribution has quite similar form

Model update New data points are mapped immediately when input. Points too far (from any model) will remain in buffer. Buffered points are re-tested when new models created. Before update After update

Generating new components When buffer full, selected points are used to generate new components. Most compact k-neighborhood is selected as seed for a new component. Data in buffer Selected points and a new component Find k nearest neighbors for all points Pick the one with the smallest maximum distance

Example Red pluses are objects that have not been used yet They have been seen by the algorithm Ellipses represent clusters Gray points are objects that have been used and discarded by the algorithm Objects are used to generate new clusters and update existing ones Data arrives from left to right All objects will be used eventually one way or another

Example

Example

Example

Example

Example

Post-processing Model before processing

Post-processing Model before processing Updated model

Post-processing Model before processing Updated model + data

Literature I. Kärkkäinen and P. Fränti, "Gradual model generator for single-pass clustering", Pattern Recognition, 40 (3), 784-795, March 2007. P. Bradley, U. Fayyad, C. Reina, Clustering Very Large Databases Using EM Mixture Models, Proc. of the 15th Int. Conf. on Pattern Recognition, vol. 2, 2000, pp. 76-80. R. Ng, J. Han, CLARANS: A Method for Clustering Objects for Spatial Data Mining, IEEE Trans. Knowledge & Data Engineering 14(5) (2002) 1003-1016. M. Sato, S. Ishii, On-line EM Algorithm for the Normalized Gaussian Network, Neural Computation 12(2) (2000) 407-432. T. Zhang, R. Ramakrishnan, M. Livny, BIRCH: A New Data Clustering Algorithm and Its Applications, Data Mining and Knowledge Discovery 1(2) (1997) 141-182.