Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ

Slides:



Advertisements
Similar presentations
1 Radio Maria World. 2 Postazioni Transmitter locations.
Advertisements

The Fall Messier Marathon Guide
Números.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
SKELETAL QUIZ 3.
PDAs Accept Context-Free Languages
ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala
Reflection nurulquran.com.
Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ
Clustering Validity Adriano Joaquim de O Cruz ©2006 NCE/UFRJ
EuroCondens SGB E.
Worksheets.
STATISTICS Linear Statistical Models
Addition and Subtraction Equations
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Multiplication X 1 1 x 1 = 1 2 x 1 = 2 3 x 1 = 3 4 x 1 = 4 5 x 1 = 5 6 x 1 = 6 7 x 1 = 7 8 x 1 = 8 9 x 1 = 9 10 x 1 = x 1 = x 1 = 12 X 2 1.
By John E. Hopcroft, Rajeev Motwani and Jeffrey D. Ullman
1 When you see… Find the zeros You think…. 2 To find the zeros...
Western Public Lands Grazing: The Real Costs Explore, enjoy and protect the planet Forest Guardians Jonathan Proctor.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
Summative Math Test Algebra (28%) Geometry (29%)
Introduction to Turing Machines
ASCII stands for American Standard Code for Information Interchange
The 5S numbers game..
突破信息检索壁垒 -SciFinder Scholar 介绍
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
The basics for simulations
EE, NCKU Tien-Hao Chang (Darby Chang)
© 2010 Concept Systems, Inc.1 Concept Mapping Methodology: An Example.
DIVISIBILITY, FACTORS & MULTIPLES
MM4A6c: Apply the law of sines and the law of cosines.
Frequency Tables and Stem-and-Leaf Plots 1-3
Figure 3–1 Standard logic symbols for the inverter (ANSI/IEEE Std
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Progressive Aerobic Cardiovascular Endurance Run
TCCI Barometer September “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
When you see… Find the zeros You think….
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
2.10% more children born Die 0.2 years sooner Spend 95.53% less money on health care No class divide 60.84% less electricity 84.40% less oil.
Foundation Stage Results CLL (6 or above) 79% 73.5%79.4%86.5% M (6 or above) 91%99%97%99% PSE (6 or above) 96%84%100%91.2%97.3% CLL.
Subtraction: Adding UP
Numeracy Resources for KS2
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Static Equilibrium; Elasticity and Fracture
ANALYTICAL GEOMETRY ONE MARK QUESTIONS PREPARED BY:
Resistência dos Materiais, 5ª ed.
PSSA Preparation.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
14. Stochastic Processes Introduction
Biostatistics course Part 14 Analysis of binary paired data
UNDERSTANDING THE ISSUES. 22 HILLSBOROUGH IS A REALLY BIG COUNTY.
9. Two Functions of Two Random Variables
A Data Warehouse Mining Tool Stephen Turner Chris Frala
Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Introduction Embedded Universal Tools and Online Features 2.
Graeme Henchel Multiples Graeme Henchel
úkol = A 77 B 72 C 67 D = A 77 B 72 C 67 D 79.
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Cluster Analysis Adriano Joaquim de O Cruz ©2002 NCE/UFRJ
Presentation transcript:

Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ

K-means

Adriano Cruz *NCE e IM - UFRJ Cluster 3 K-means algorithm Based on the Euclidean distances among elements of the cluster Based on the Euclidean distances among elements of the cluster Centre of the cluster is the mean value of the objects in the cluster. Centre of the cluster is the mean value of the objects in the cluster. Classifies objects in a hard way. Each object belongs to a single cluster. Classifies objects in a hard way. Each object belongs to a single cluster.

Adriano Cruz *NCE e IM - UFRJ Cluster 4 K-means algorithm Consider n (X={x 1, x 2,..., x n }) objects and k clusters. Consider n (X={x 1, x 2,..., x n }) objects and k clusters. Each object x i is defined by l characteristics x i =(x i1, x i2,..., x il ). Each object x i is defined by l characteristics x i =(x i1, x i2,..., x il ). Consider A a set of k clusters (A={A 1, A 2,..., A k }). Consider A a set of k clusters (A={A 1, A 2,..., A k }).

Adriano Cruz *NCE e IM - UFRJ Cluster 5 K-means properties The union of all clusters makes the Universe The union of all clusters makes the Universe No element belongs to more than one cluster No element belongs to more than one cluster There is no empty cluster There is no empty cluster

Adriano Cruz *NCE e IM - UFRJ Cluster 6 Membership function

Adriano Cruz *NCE e IM - UFRJ Cluster 7 Membership matrix U Matrix containing the values of inclusion of each element into each cluster (0 or 1). Matrix containing the values of inclusion of each element into each cluster (0 or 1). Matrix has c (clusters) lines and n (elements) columns. Matrix has c (clusters) lines and n (elements) columns. The sum of all elements in the column must be equal to one (element belongs only to one cluster The sum of all elements in the column must be equal to one (element belongs only to one cluster The sum of each line must be less than n e grater than 0. No empty cluster, or cluster containing all elements. The sum of each line must be less than n e grater than 0. No empty cluster, or cluster containing all elements.

Adriano Cruz *NCE e IM - UFRJ Cluster 8 Matrix examples X1X2X3 X4X5X6 Two examples of clustering. What do the clusters represent?

Adriano Cruz *NCE e IM - UFRJ Cluster 9 Matrix examples cont. X1X2X3 X4X5X6 U1 and U2 are the same matrices.

Adriano Cruz *NCE e IM - UFRJ Cluster 10 How many clusters? The cardinality of any hard k-partition of n elements is The cardinality of any hard k-partition of n elements is

Adriano Cruz *NCE e IM - UFRJ Cluster 11 How many clusters (example)? Consider the matrix U2 (k=3, n=6) Consider the matrix U2 (k=3, n=6)

Adriano Cruz *NCE e IM - UFRJ Cluster 12 K-means inputs and outputs Inputs: the number of clusters c and a database containing n objects with l characteristics each. Inputs: the number of clusters c and a database containing n objects with l characteristics each. Output: A set of k clusters that minimises the square-error criterion. Output: A set of k clusters that minimises the square-error criterion.

Adriano Cruz *NCE e IM - UFRJ Cluster 13 Number of Clusters

Adriano Cruz *NCE e IM - UFRJ Cluster 14 K-means algorithm v1 Arbitrarily assigns each object to a cluster (matrix U). Repeat Update the cluster centres; Update the cluster centres; Reassign objects to the clusters to which the objects are most similar; Reassign objects to the clusters to which the objects are most similar; Until no change;

Adriano Cruz *NCE e IM - UFRJ Cluster 15 K-means algorithm v2 Arbitrarily choose c objects as the initial cluster centres. Repeat Reassign objects to the clusters to which the objects are most similar. Reassign objects to the clusters to which the objects are most similar. Update the cluster centres. Until no change

Adriano Cruz *NCE e IM - UFRJ Cluster 16 Algorithm details The algorithm tries to minimise the function The algorithm tries to minimise the function d ie is the distance between the element x e (m characteristics) and the centre of the cluster i (v i ) d ie is the distance between the element x e (m characteristics) and the centre of the cluster i (v i )

Adriano Cruz *NCE e IM - UFRJ Cluster 17 Cluster Centre The centre of the cluster i (v i ) is an l characteristics vector. The centre of the cluster i (v i ) is an l characteristics vector. The jth co-ordinate is calculated as The jth co-ordinate is calculated as

Adriano Cruz *NCE e IM - UFRJ Cluster 18 Detailed Algorithm Choose c (number of clusters). Choose c (number of clusters). Set error ( > 0) and step (r=0). Set error ( > 0) and step (r=0). Arbitrarily set matrix U (r). Do not forget, each element belongs to a single cluster, no empty cluster and no cluster has all elements. Arbitrarily set matrix U (r). Do not forget, each element belongs to a single cluster, no empty cluster and no cluster has all elements.

Adriano Cruz *NCE e IM - UFRJ Cluster 19 Detailed Algorithm cont. Repeat Calculate the centre of the clusters v i (r) Calculate the distance d i (r) of each point to the centre of the clusters Generate U (r+1) recalculating all characteristic functions using the equations Until ||U (r+1) -U (r) || < Until ||U (r+1) -U (r) || <

Adriano Cruz *NCE e IM - UFRJ Cluster 20 Matrix norms Consider a matrix U of n lines and n columns: Consider a matrix U of n lines and n columns: Column norm Column norm Line norm Line norm

Adriano Cruz *NCE e IM - UFRJ Cluster 21 K-means problems? Suitable when clusters are compact clouds well separated. Suitable when clusters are compact clouds well separated. Scalable because computational complexity is O(nkr). Scalable because computational complexity is O(nkr). Necessity of choosing c is disadvantage. Necessity of choosing c is disadvantage. Not suitable for nonconvex shapes. Not suitable for nonconvex shapes. It is sensitive to noise and outliers because they influence the means. It is sensitive to noise and outliers because they influence the means. Depends on the initial allocation. Depends on the initial allocation.

Adriano Cruz *NCE e IM - UFRJ Cluster 22 Examples of results

Adriano Cruz *NCE e IM - UFRJ Cluster 23 K-means: Actual Data

Adriano Cruz *NCE e IM - UFRJ Cluster 24 K-means: Results

Adriano Cruz *NCE e IM - UFRJ Cluster 25 K-medoids

Adriano Cruz *NCE e IM - UFRJ Cluster 26 K-medoids Algorithm 1

Adriano Cruz *NCE e IM - UFRJ Cluster 27 K-medoids methods K-means is sensitive to outliers since an object with an extremely large value may distort the distribution of data. K-means is sensitive to outliers since an object with an extremely large value may distort the distribution of data. Instead of taking the mean value the most centrally object (medoid) is used as reference point. Instead of taking the mean value the most centrally object (medoid) is used as reference point. The algorithm minimizes the sum of dissimilarities between each object and the medoid (similar to k-means) The algorithm minimizes the sum of dissimilarities between each object and the medoid (similar to k-means)

Adriano Cruz *NCE e IM - UFRJ Cluster 28 K-medoids strategies Find k-medoids arbitrarily. Find k-medoids arbitrarily. Each remaining object is clustered with the medoid to which is the most similar. Each remaining object is clustered with the medoid to which is the most similar. Then iteratively replaces one of the medoids by a non-medoid as long as the quality of the clustering is improved. Then iteratively replaces one of the medoids by a non-medoid as long as the quality of the clustering is improved. The quality is measured using a cost function that measures the average dissimilarity between the objects and the medoid of its cluster. The quality is measured using a cost function that measures the average dissimilarity between the objects and the medoid of its cluster.

Adriano Cruz *NCE e IM - UFRJ Cluster 29 Reassignment costs Each time a reassignment occurs a difference in square-error J is contributed. Each time a reassignment occurs a difference in square-error J is contributed. The cost function J calculates the total cost of replacing a current medoid by a non-medoid. The cost function J calculates the total cost of replacing a current medoid by a non-medoid. If the total cost is negative then m j is replaced by m random, otherwise the replacement is not accepted. If the total cost is negative then m j is replaced by m random, otherwise the replacement is not accepted.

Adriano Cruz *NCE e IM - UFRJ Cluster 30Source Algorithm presented in: Finding groups in Data: An Introduction to clusters analysis, L. Kaufman and P. J. Rousseeuw, John Wiley & Sons Algorithm presented in: Finding groups in Data: An Introduction to clusters analysis, L. Kaufman and P. J. Rousseeuw, John Wiley & Sons

Adriano Cruz *NCE e IM - UFRJ Cluster 31Phases Build phase: an initial clustering is obtained by the successive selection of representative objects until k (number of clusters) objects have been found. Build phase: an initial clustering is obtained by the successive selection of representative objects until k (number of clusters) objects have been found. Swap phase: it is attempted to improve the set of the k representative objects. Swap phase: it is attempted to improve the set of the k representative objects.

Adriano Cruz *NCE e IM - UFRJ Cluster 32 Build Phase The first object is the one for which the sum of dissimilarities to all objects is as small as possible. The first object is the one for which the sum of dissimilarities to all objects is as small as possible. This is most centrally located object. This is most centrally located object. At each subsequent step another object that decreases the objective function is selected, for instance average dissimilarity. At each subsequent step another object that decreases the objective function is selected, for instance average dissimilarity.

Adriano Cruz *NCE e IM - UFRJ Cluster 33 Build Phase next steps - I Consider an object e i which has not yet been selected. Consider an object e i which has not yet been selected. Consider a nonselected object e j ; calculate the difference C ji, between its dissimilarity D j (d(e n,e j )) with the most similar previously selected object e n, and its dissimilarity d(e i,e j ) with object e i. Consider a nonselected object e j ; calculate the difference C ji, between its dissimilarity D j (d(e n,e j )) with the most similar previously selected object e n, and its dissimilarity d(e i,e j ) with object e i. If C ij = D j - d(e i,e j ) is positive object e j will contribute to select object e i, If C ij = D j - d(e i,e j ) is positive object e j will contribute to select object e i, C ij = max(D j - d(e i,e j ), 0) C ij = max(D j - d(e i,e j ), 0)

Adriano Cruz *NCE e IM - UFRJ Cluster 34 Build Phase next steps - II Calculate the total gain obtained by selecting object e i ; G i = sum j (C ji ) Calculate the total gain obtained by selecting object e i ; G i = sum j (C ji ) Choose the not yet selected object e i which maximizes G i = sum j (C ji ) Choose the not yet selected object e i which maximizes G i = sum j (C ji ) The process continues until k objects have been found. The process continues until k objects have been found.

Adriano Cruz *NCE e IM - UFRJ Cluster 35 Swap Phase It is attempted to improve the set of representative elements. It is attempted to improve the set of representative elements. Consider all pairs of elements (i,h) for which e i has been selected and e h has not. Consider all pairs of elements (i,h) for which e i has been selected and e h has not. What is the effect of swapping e i and e h ? What is the effect of swapping e i and e h ? Consider the objective function as the sum of dissimilarities between each element and the most similar representative object. Consider the objective function as the sum of dissimilarities between each element and the most similar representative object.

Adriano Cruz *NCE e IM - UFRJ Cluster 36 Swap Phase: possibility a Consider a non selected object e j and calculate its contribution C jih to the swap: Consider a non selected object e j and calculate its contribution C jih to the swap: If e j is more distant from both e i and e h than from one of the other representatives, e.g. e k, so C ijh = 0 If e j is more distant from both e i and e h than from one of the other representatives, e.g. e k, so C ijh = 0 So e j belongs to object e k, sometimes referred as the medoid m k and the swap will not change the quality of the clustering So e j belongs to object e k, sometimes referred as the medoid m k and the swap will not change the quality of the clustering Remember: positive contributions decrease the quality of the clustering. Remember: positive contributions decrease the quality of the clustering.

Adriano Cruz *NCE e IM - UFRJ Cluster 37 eheh Swap Phase possibility a Object e j belongs to medoid e k (i<>k). If e i is replaced by e h and e j is still closer to e k, then C jih = 0. Object e j belongs to medoid e k (i<>k). If e i is replaced by e h and e j is still closer to e k, then C jih = 0. ekek eiei ejej

Adriano Cruz *NCE e IM - UFRJ Cluster 38 Swap Phase possibility b.1 If e j is not further from e i than from any one of the other representative (d(e i,e j )=D j ), two situations must be considered: If e j is not further from e i than from any one of the other representative (d(e i,e j )=D j ), two situations must be considered: e j is closer to e h than to the second closest representative e n, d(e j,e h ) < d(e j,e n ) then C jih =d(e j, e h )-d(e j,e i ). e j is closer to e h than to the second closest representative e n, d(e j,e h ) < d(e j,e n ) then C jih =d(e j, e h )-d(e j,e i ). Contribution C jih can either positive or negative. Contribution C jih can either positive or negative. if element e j is closer to e i than to e h the contribution is positive, the swap is not favourable. if element e j is closer to e i than to e h the contribution is positive, the swap is not favourable.

Adriano Cruz *NCE e IM - UFRJ Cluster 39 eheh Swap Phase possibility b.1.+ Object e j belongs to medoid e i. If e i is replaced by e h and e j is close to e i than e h the contribution is positive. C jih > 0 Object e j belongs to medoid e i. If e i is replaced by e h and e j is close to e i than e h the contribution is positive. C jih > 0 enen eiei ejej

Adriano Cruz *NCE e IM - UFRJ Cluster 40 eheh Swap Phase possibility b.1.- Object e j belongs to medoid e i. If e i is replaced by e h and e j is not closer to e i than e h the contribution is negative. C jih <0 Object e j belongs to medoid e i. If e i is replaced by e h and e j is not closer to e i than e h the contribution is negative. C jih <0 ekek eiei ejej

Adriano Cruz *NCE e IM - UFRJ Cluster 41 Swap Phase possibility b.2 e j is at least as distant from e h than from the second closest representative d(e j,e h ) >= d(e j,e n ) then C jih = d(e j,e n ) - d(e j,e i ) e j is at least as distant from e h than from the second closest representative d(e j,e h ) >= d(e j,e n ) then C jih = d(e j,e n ) - d(e j,e i ) The contribution is always positive because it is not advantageous to replace e i by an e h further away from e j than from the second best closest representative object. The contribution is always positive because it is not advantageous to replace e i by an e h further away from e j than from the second best closest representative object.

Adriano Cruz *NCE e IM - UFRJ Cluster 42 enen Swap Phase possibility b.2 Object e j belongs to medoid e i. If e i is replaced by e h and e j is further from e h than e n, the contribution is always positive. C jih > 0 Object e j belongs to medoid e i. If e i is replaced by e h and e j is further from e h than e n, the contribution is always positive. C jih > 0 eheh eiei ejej

Adriano Cruz *NCE e IM - UFRJ Cluster 43 Swap Phase possibility c e j more distant from e i than from at least one of the other representative objects (e n ) but closer to e h than to any representative object, then C jih = d(e j,e h ) - d(e j,e i ) e j more distant from e i than from at least one of the other representative objects (e n ) but closer to e h than to any representative object, then C jih = d(e j,e h ) - d(e j,e i ) enen eheh eiei ejej

Adriano Cruz *NCE e IM - UFRJ Cluster 44 K-medoids Algorithm 2

Adriano Cruz *NCE e IM - UFRJ Cluster 45 K-medoid algorithm Arbitrarily choose k objects as the initial medoids. Repeat Assign each remaining object to the cluster with the nearest medoid; Randomly select a nonmedoid object, m random ; Compute the total cost J of swapping m j with m random ; If J<0 then swap o j with o random ; Until no change

Adriano Cruz *NCE e IM - UFRJ Cluster 46 Replacing medoids case 1 Object p belongs to medoid m j. If m j is replaced by m random and p is closest to one of m i (i<>j), then reassigns p to m i Object p belongs to medoid m j. If m j is replaced by m random and p is closest to one of m i (i<>j), then reassigns p to m i mimi mjmj m random p

Adriano Cruz *NCE e IM - UFRJ Cluster 47 m random Replacing medoids case 2 Object p belongs to medoid m j. If m j is replaced by m random and p is closest to m random, then reassigns p to m random Object p belongs to medoid m j. If m j is replaced by m random and p is closest to m random, then reassigns p to m random mimi mjmj p

Adriano Cruz *NCE e IM - UFRJ Cluster 48 m random Replacing medoids case 3 Object p belongs to medoid m i (i<>j). If m j is replaced by m random and p is still close to m i, then does not change. Object p belongs to medoid m i (i<>j). If m j is replaced by m random and p is still close to m i, then does not change. mimi mjmj p

Adriano Cruz *NCE e IM - UFRJ Cluster 49 m random Replacing medoids case 4 Object p belongs to medoid m i (i<>j). If m j is replaced by m random and p is closest to m random,then reassigns p to m random. Object p belongs to medoid m i (i<>j). If m j is replaced by m random and p is closest to m random,then reassigns p to m random. mimi mjmj p

Adriano Cruz *NCE e IM - UFRJ Cluster 50Comparisons? K-medoids is more robust than k-means in presence of noise and outliers. K-medoids is more robust than k-means in presence of noise and outliers. K-means is less costly in terms of processing time. K-means is less costly in terms of processing time.

Adriano Cruz *NCE e IM - UFRJ Cluster 51 Fuzzy C-means

Adriano Cruz *NCE e IM - UFRJ Cluster 52 Fuzzy C-means Fuzzy version of K-means Fuzzy version of K-means Elements may belong to more than one cluster Elements may belong to more than one cluster Values of characteristic function range from 0 to 1. Values of characteristic function range from 0 to 1. It is interpreted as the degree of membership of an element to a cluster relative to all other clusters. It is interpreted as the degree of membership of an element to a cluster relative to all other clusters.

Adriano Cruz *NCE e IM - UFRJ Cluster 53 Fuzzy C-means setup Consider n (X={x 1, x 2,..., x n }) objects and c clusters. Consider n (X={x 1, x 2,..., x n }) objects and c clusters. Each object x i is defined by l characteristics x i =(x i1, x i2,..., x il ). Each object x i is defined by l characteristics x i =(x i1, x i2,..., x il ). Consider A a set of k clusters (A={A 1, A 2,..., A k }). Consider A a set of k clusters (A={A 1, A 2,..., A k }).

Adriano Cruz *NCE e IM - UFRJ Cluster 54 Fuzzy C-means properties The union of all clusters makes the Universe The union of all clusters makes the Universe There is no empty cluster There is no empty cluster

Adriano Cruz *NCE e IM - UFRJ Cluster 55 Membership function

Adriano Cruz *NCE e IM - UFRJ Cluster 56 Membership matrix U Matrix containing the values of inclusion of each element into each cluster [0,1]. Matrix containing the values of inclusion of each element into each cluster [0,1]. Matrix has c (clusters) lines and n (elements) columns. Matrix has c (clusters) lines and n (elements) columns. The sum of all elements in the column must be equal to one. The sum of all elements in the column must be equal to one. The sum of each line must be less than n e grater than 0. No empty cluster, or cluster containing all elements. The sum of each line must be less than n e grater than 0. No empty cluster, or cluster containing all elements.

Adriano Cruz *NCE e IM - UFRJ Cluster 57 Matrix examples X1X2X3 X4X5X6 Two examples of clustering. What do the clusters represent?

Adriano Cruz *NCE e IM - UFRJ Cluster 58 Fuzzy C-means algorithm v1 Arbitrarily assigns each object to a cluster (matrix U). Repeat Update the cluster centres; Update the cluster centres; Reassign objects to the clusters to which the objects are most similar; Reassign objects to the clusters to which the objects are most similar; Until no change;

Adriano Cruz *NCE e IM - UFRJ Cluster 59 Fuzzy C-means algorithm v2 Arbitrarily choose c objects as the initial cluster centres. Repeat Reassign objects to the clusters to which the objects are most similar. Reassign objects to the clusters to which the objects are most similar. Update the cluster centres. Until no change

Adriano Cruz *NCE e IM - UFRJ Cluster 60 Algorithm details The algorithm tries to minimise the function, m is the nebulisation factor. The algorithm tries to minimise the function, m is the nebulisation factor. d ie is the distance between the element x e (l characteristics) and the centre of the cluster i (v i ) d ie is the distance between the element x e (l characteristics) and the centre of the cluster i (v i )

Adriano Cruz *NCE e IM - UFRJ Cluster 61 Nebulisation factor m is the nebulisation factor. m is the nebulisation factor. This value has a range [1, ) This value has a range [1, ) If m=1 the the system is crisp. If m=1 the the system is crisp. If m the all the membership values tend to 1/c If m the all the membership values tend to 1/c The most common values are 1.25 and 2.0 The most common values are 1.25 and 2.0

Adriano Cruz *NCE e IM - UFRJ Cluster 62 Cluster Centre The centre of the cluster i (v i ) is a l characteristics vector. The centre of the cluster i (v i ) is a l characteristics vector. The jth co-ordinate is calculated as The jth co-ordinate is calculated as

Adriano Cruz *NCE e IM - UFRJ Cluster 63 Detailed Algorithm Choose c (number of clusters). Choose c (number of clusters). Set error ( > 0), nebulisation factor (m) and step (r=0). Set error ( > 0), nebulisation factor (m) and step (r=0). Arbitrarily set matrix U (r). Do not forget, each element belongs to a single cluster, no empty cluster and no cluster has all elements. Arbitrarily set matrix U (r). Do not forget, each element belongs to a single cluster, no empty cluster and no cluster has all elements.

Adriano Cruz *NCE e IM - UFRJ Cluster 64 Detailed Algorithm cont. Repeat Calculate the centre of the clusters v i (r) Calculate the distance d i (r) of each point to the centre of the clusters Generate U (r+1) recalculating all characteristic functions(How?) Until ||U (r+1) -U (r) || < Until ||U (r+1) -U (r) || <

Adriano Cruz *NCE e IM - UFRJ Cluster 65 How to recalculate? If there is any distance greater than zero then membership grade is the weighted average of the distances to all centers. If there is any distance greater than zero then membership grade is the weighted average of the distances to all centers. else the element belongs to this cluster and no one else.

Adriano Cruz *NCE e IM - UFRJ Cluster 66 Example of clustering result

Adriano Cruz *NCE e IM - UFRJ Cluster 67 Example of clustering result

Adriano Cruz *NCE e IM - UFRJ Cluster 68 Possibilistic Clustering

Adriano Cruz *NCE e IM - UFRJ Cluster 69 Membership function

Adriano Cruz *NCE e IM - UFRJ Cluster 70 Membership function The membership degree is the representativity of typicality of the datum x for the cluster i. The membership degree is the representativity of typicality of the datum x for the cluster i.

Adriano Cruz *NCE e IM - UFRJ Cluster 71 Algorithm details The algorithm tries to minimise the function, m is the nebulisation factor. The algorithm tries to minimise the function, m is the nebulisation factor. The first sum is the usual and the second rewards high memberships. The first sum is the usual and the second rewards high memberships.

Adriano Cruz *NCE e IM - UFRJ Cluster 72 How to calculate?

Adriano Cruz *NCE e IM - UFRJ Cluster 73 Detailed Algorithm Choose c (number of clusters) and m. Choose c (number of clusters) and m. Set error ( > 0) and step (r=0). Set error ( > 0) and step (r=0). Execute FCM Execute FCM

Adriano Cruz *NCE e IM - UFRJ Cluster 74 Detailed Algorithm cont. For 2 times Initialize U (0) and the centre of the clusters v i (0) with previous results Initialize k and r=0 Repeat Calculate the distance d i (r) of each point to the centre of the clusters Generate U (r+1) recalculating all characteristic functions using the equations Until ||U (r+1) -U (r) || < Until ||U (r+1) -U (r) || < End FOR

Adriano Cruz *NCE e IM - UFRJ Cluster 75 Gustafson-Kessel Algorithm

Adriano Cruz *NCE e IM - UFRJ Cluster 76 Gustafson-Kessel method This method (GK) is fuzzy clustering method similar to the Fuzzy C-means (FCM). This method (GK) is fuzzy clustering method similar to the Fuzzy C-means (FCM). The difference is the way the distance is calculated. The difference is the way the distance is calculated. FCM uses Euclidean distances FCM uses Euclidean distances GK uses Mahalanobis distances GK uses Mahalanobis distances

Adriano Cruz *NCE e IM - UFRJ Cluster 77 Gustafson-Kessel method Mahalanobis distance is calculated as Mahalanobis distance is calculated as The matrices A i are given by The matrices A i are given by

Adriano Cruz *NCE e IM - UFRJ Cluster 78 Gustafson-Kessel method The Fuzzy Covariance Matrix is The Fuzzy Covariance Matrix is

Adriano Cruz *NCE e IM - UFRJ Cluster 79 GK comments The clusters are hyperellipsoids on the l. The clusters are hyperellipsoids on the l. The hyperellipsoids have aproximately the same size. The hyperellipsoids have aproximately the same size. In order to be possible to calculate S -1 the number of samples n must be at least equal to the number of dimensions l plus 1. In order to be possible to calculate S -1 the number of samples n must be at least equal to the number of dimensions l plus 1.

Adriano Cruz *NCE e IM - UFRJ Cluster 80 Results of GK

Adriano Cruz *NCE e IM - UFRJ Cluster 81 Gath-Geva method It is also known as Gaussian Mixture Decomposition. It is also known as Gaussian Mixture Decomposition. It is similar to the FCM method It is similar to the FCM method The Gauss distance is used instead of Euclidean distance. The Gauss distance is used instead of Euclidean distance. The clusters do not have a definite shape anymore and have various sizes. The clusters do not have a definite shape anymore and have various sizes.

Adriano Cruz *NCE e IM - UFRJ Cluster 82 Gath-Geva Method Gauss distance is given by Gauss distance is given by A i =S i -1 A i =S i -1

Adriano Cruz *NCE e IM - UFRJ Cluster 83 Gath-Geva Method The term P i is the probability of a sample belong to a cluster. The term P i is the probability of a sample belong to a cluster.

Adriano Cruz *NCE e IM - UFRJ Cluster 84 Gath-Geva Comments Pi is a parameter that influences the size of a cluster. Pi is a parameter that influences the size of a cluster. Bigger clusters attract more elements. Bigger clusters attract more elements. The exponential term makes more difficult to avoid local minima. The exponential term makes more difficult to avoid local minima. Usually another clustering method is used to initialise the partition matrix U. Usually another clustering method is used to initialise the partition matrix U.

Adriano Cruz *NCE e IM - UFRJ Cluster 85 GG Results – Random Centers

Adriano Cruz *NCE e IM - UFRJ Cluster 86 GG Results – Centers FCM

Adriano Cruz *NCE e IM - UFRJ Cluster 87 Clustering based on Equivalence Relations A relation crisp R on a universe X can be thought as a relation from X to X A relation crisp R on a universe X can be thought as a relation from X to X R is an equivalence relation if it has the following three properties: R is an equivalence relation if it has the following three properties: Reflexivity (x i, x i ) R Reflexivity (x i, x i ) R Symmetry (x i, x j ) R (x j, x i ) R Symmetry (x i, x j ) R (x j, x i ) R Transitivity (x i, x j ) R and (x j, x k ) R (x i, x k ) R Transitivity (x i, x j ) R and (x j, x k ) R (x i, x k ) R

Adriano Cruz *NCE e IM - UFRJ Cluster 88 Crisp tolerance relation R is a tolerance relation if it has the following two properties: R is a tolerance relation if it has the following two properties: Reflexivity (x i, x i ) R Reflexivity (x i, x i ) R Symmetry (x i, x j ) R (x j, x i ) R Symmetry (x i, x j ) R (x j, x i ) R

Adriano Cruz *NCE e IM - UFRJ Cluster 89 Composition of Relations XYZ RS T=R°S

Adriano Cruz *NCE e IM - UFRJ Cluster 90 Composition of Crisp Relations The operation ° is similar to matrix multiplication.

Adriano Cruz *NCE e IM - UFRJ Cluster 91 Transforming Relations A tolerance relation can be transformed into a equivalence relation by at most (n-1) compositions with itself. A tolerance relation can be transformed into a equivalence relation by at most (n-1) compositions with itself. n is the cardinality of the set R. n is the cardinality of the set R.

Adriano Cruz *NCE e IM - UFRJ Cluster 92 Example of crisp classification Let X={1,2,3,4,5,6,7,8,9,10} Let X={1,2,3,4,5,6,7,8,9,10} Let R be defined as the relation for the identical remainder after dividing each element by 3. Let R be defined as the relation for the identical remainder after dividing each element by 3. This relation is an equivalence relation This relation is an equivalence relation

Adriano Cruz *NCE e IM - UFRJ Cluster 93 Relation Matrix

Adriano Cruz *NCE e IM - UFRJ Cluster 94 Crisp Classification Consider equivalent columns. Consider equivalent columns. It is possible to group the elements in the following classes It is possible to group the elements in the following classes R 0 = {3, 6, 9} R 0 = {3, 6, 9} R 1 = {1, 4, 7, 10} R 1 = {1, 4, 7, 10} R 2 = {2, 5, 8} R 2 = {2, 5, 8}

Adriano Cruz *NCE e IM - UFRJ Cluster 95 Clustering and Fuzzy Equivalence Relations A relation fuzzy R on a universe X can be thought as a relation from X to X A relation fuzzy R on a universe X can be thought as a relation from X to X R is an equivalence relation if it has the following three properties: R is an equivalence relation if it has the following three properties: Reflexivity: (x i, x i ) R or (x i, x i ) = 1 Reflexivity: (x i, x i ) R or (x i, x i ) = 1 Symmetry: (x i, x j ) R (x j, x i ) R or Symmetry: (x i, x j ) R (x j, x i ) R or (x i, x j ) = (x j, x i ) (x i, x j ) = (x j, x i ) Transitivity: (x i, x j ) and (x j, x k ) R (x i, x k ) R or if (x i, x j ) = 1 and (x j, x k ) = 2 then (x i, x k ) = and >=min( 1, 2 ) Transitivity: (x i, x j ) and (x j, x k ) R (x i, x k ) R or if (x i, x j ) = 1 and (x j, x k ) = 2 then (x i, x k ) = and >=min( 1, 2 )

Adriano Cruz *NCE e IM - UFRJ Cluster 96 Fuzzy tolerance relation R is a tolerance relation if it has the following two properties: R is a tolerance relation if it has the following two properties: Reflexivity (x i, x i ) R Reflexivity (x i, x i ) R Symmetry (x i, x j ) R (x j, x i ) R Symmetry (x i, x j ) R (x j, x i ) R

Adriano Cruz *NCE e IM - UFRJ Cluster 97 Composition of Fuzzy Relations The operation ° is similar to matrix multiplication.

Adriano Cruz *NCE e IM - UFRJ Cluster 98 Distance Relation Let X be a set of data on l. Let X be a set of data on l. The distance function is a tolerance relation that can be transformed into a equivalence. The distance function is a tolerance relation that can be transformed into a equivalence. The relation R can be defined by the Minkowski distance formula. The relation R can be defined by the Minkowski distance formula. is a constant that ensures that R [0,1] and is equal to the inverse of the largest distance in X. is a constant that ensures that R [0,1] and is equal to the inverse of the largest distance in X.

Adriano Cruz *NCE e IM - UFRJ Cluster 99 Example of Fuzzy classification Let X={(0,0),(1,1),(2,3),(3,1),(4,0)} be a set of points in 2. Let X={(0,0),(1,1),(2,3),(3,1),(4,0)} be a set of points in 2. Set q=2, Euclidean distances. Set q=2, Euclidean distances. The largest distance is 4 ( x 1, x 5 ), so =0.25. The largest distance is 4 ( x 1, x 5 ), so =0.25. The relation R can be calculated by the equation The relation R can be calculated by the equation

Adriano Cruz *NCE e IM - UFRJ Cluster 100 Points to be classified

Adriano Cruz *NCE e IM - UFRJ Cluster 101 Tolerance matrix The matrix calculated by the equation is The matrix calculated by the equation is The is a tolerance relation that needs to be transformed into a equivalence relation The is a tolerance relation that needs to be transformed into a equivalence relation

Adriano Cruz *NCE e IM - UFRJ Cluster 102 Equivalence matrix The matrix transformed is The matrix transformed is

Adriano Cruz *NCE e IM - UFRJ Cluster 103 Results of clustering Taking -cuts of fuzzy equivalent relation at various values of =0.44, 0.5, 0.65 and 1.0 we get the following classes: Taking -cuts of fuzzy equivalent relation at various values of =0.44, 0.5, 0.65 and 1.0 we get the following classes: R.44 =[{x 1,x 2,x 3,x 4,x 5 }] R.44 =[{x 1,x 2,x 3,x 4,x 5 }] R.55 =[{x 1,x 2,x 4,x 5 }{x 3 }] R.55 =[{x 1,x 2,x 4,x 5 }{x 3 }] R.65 =[{x 1,x 2 },{x 3 },{x 4,x 5 }] R.65 =[{x 1,x 2 },{x 3 },{x 4,x 5 }] R 1.0 =[{x 1 },{x 2 },{x 3 },{x 4 },{x 5 }] R 1.0 =[{x 1 },{x 2 },{x 3 },{x 4 },{x 5 }]