K-MEANS ALGORITHM Jelena Vukovic 53/07

Slides:

Advertisements

Similar presentations

Slide 1 Insert your own content. Slide 2 Insert your own content.

Advertisements

Chapters 1 & 2 Theorem & Postulate Review Answers

MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)

MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

Teacher Name Class / Subject Date A:B: Write an answer here #1 Write your question Here C:D: Write an answer here.

CS4026 Formal Models of Computation Running Haskell Programs – power.

MCMC estimation in MlwiN

Algorithms and applications

Chapter 4: Basic Estimation Techniques

Load Balancing Parallel Applications on Heterogeneous Platforms.

K-means Clustering Given a data point v and a set of points X,

Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 11: K-Means Clustering Martin Russell.

K-means Clustering Ke Chen.

Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,

Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.

Limits (Algebraic) Calculus Fall, What can we do with limits?

Document Clustering Carl Staelin. Lecture 7Information Retrieval and Digital LibrariesPage 2 Motivation It is hard to rapidly understand a big bucket.

Addition 1’s to 20.

Test B, 100 Subtraction Facts

= 2 = 4 = 5 = 10 = 12. Estimating Non-Perfect Squares For Integers that are NOT perfect squares, you can estimate a square root = 2.83.

STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION GROUP

Bottoms Up Factoring. Start with the X-box 3-9 Product Sum

FIND THE AREA ( ROUND TO THE NEAREST TENTHS) 2.7 in 15 in in.

Cluster Analysis: Basic Concepts and Algorithms

PARTITIONAL CLUSTERING

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.

More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.

K-Means and DBSCAN Erik Zeitler Uppsala Database Laboratory.

© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.

Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.

Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.

Cluster Analysis: Basic Concepts and Algorithms

Cluster Analysis (1).

What is Cluster Analysis?

What is Cluster Analysis?

© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.

Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.

K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?

Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.

Evaluating Performance for Data Mining Techniques

Computer Vision James Hays, Brown

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.

A Method for Registration of 3D Surfaces ICP Algorithm

CSE 185 Introduction to Computer Vision Pattern Recognition 2.

Web-Protégé Jelena Vukovic, 53/07

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.

DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.

Data Mining – Algorithms: K Means Clustering

Data Mining: Basic Cluster Analysis

Semi-Supervised Clustering

More on Clustering in COSC 4335

Data Mining K-means Algorithm

Clustering (3) Center-based algorithms Fuzzy k-means

数据挖掘 Introduction to Data Mining

Topic 3: Cluster Analysis

AIM: Clustering the Data together

Clustering Basic Concepts and Algorithms 1

MIS2502: Data Analytics Clustering and Segmentation

MIS2502: Data Analytics Clustering and Segmentation

Text Categorization Berlin Chen 2003 Reference:

Statistical Models and Machine Learning Algorithms --Review

Topic 5: Cluster Analysis

Clustering The process of grouping samples so that the samples are similar within each group.

Presentation transcript:

K-MEANS ALGORITHM Jelena Vukovic 53/07

Introduction Basic idea of k-means algorithm Detailed explenation Most common problems of the algorithm Applications Possible improvements Elektrotehnički fakultet u Beogradu 2/16

Bassic principles of algorithm Elektrotehnički fakultet u Beogradu 3/16 Given the set of points (x 1, x 2, …, x n ) Partition n points into k sets (n>k) (S 1, S 2, …, S k ) The goal is to minimize within-cluster sum of squares µ i is the mean of points in S i

The algorithm Initialize the number of means (k) Iterate: 1. Assign each point to the nearest mean 2. Move mean to center of its cluster Elektrotehnički fakultet u Beogradu 4/16

The algorithm Elektrotehnički fakultet u Beogradu 5/16 Assign points to nearest mean Move means

The algorithm The complexity is O(n * k * I * d) n – number of points k – number of clusters I – number of iterations d – number of attributes Elektrotehnički fakultet u Beogradu 6/16 Re-assign points

The algorithm Elektrotehnički fakultet u Beogradu 7/16

K nearest neighbors Very similar algorithm The decision is made based on the simple majority of the closest k neighbors In k-means the Euclidian distant measure is used Elektrotehnički fakultet u Beogradu 8/16

Some limitations of algorithm The number of clusters needs to be known in advance Initialization of means position Problems appear when clusters have different Shapes Sizes Density Elektrotehnički fakultet u Beogradu 9/16

Initial centroids problem Random distribution (the most common) Multiple runs Testing on a data sample Analyze the data Elektrotehnički fakultet u Beogradu 10/16

Different density Elektrotehnički fakultet u Beogradu 11/16 Original points3 Clusters

Non-globular shapes Elektrotehnički fakultet u Beogradu 12/16 Original points2 Clusters

Pros and cons Pros Simple to implement Fast Not highly demanding Cons K needs to be known Ellipsoid shape is assumed Requires some knowledge about data in advance Possibility of many loop turns, without significant changes in clusters Elektrotehnički fakultet u Beogradu 13/16

Applications of the algorithm Many different uses Computer vision Market segmentation Geostatic Astronomy etc Elektrotehnički fakultet u Beogradu 14/16

Improvements Pre-processing of the data in order to better estimate k Run multiple iteration in parallel with different centroid initialization Ignore possible errors to avoid non-standard cluster shapes Elektrotehnički fakultet u Beogradu 15/16

Thank you! Elektrotehnički fakultet u Beogradu 16/16