Presentation is loading. Please wait.

Presentation is loading. Please wait.

In each epoch, start by clustering those points whose numeric difference from the mean is minimum in every dimension. Finish the other points with HDkM.

Similar presentations


Presentation on theme: "In each epoch, start by clustering those points whose numeric difference from the mean is minimum in every dimension. Finish the other points with HDkM."— Presentation transcript:

1

2 In each epoch, start by clustering those points whose numeric difference from the mean is minimum in every dimension. Finish the other points with HDkM p1 2 7 6 p2 6 7 6 p3 3 7 5 p4 2 7 5 p5 3 2 1 p6 2 2 1 p7 7 0 1 p8 7 0 1 p5 m2=p1 p6 p7=m1 p2 D1 D2 D3 p3 p4 p6 7 0 12 7 6 m1= m2= 6 7 6 ) =1 7 5 6 7 6 ) =4 0 0 3 7 5 ) =4 7 4 3 7 5 ) =1 0 1 2 7 5 ) =5 7 4 2 7 5 ) =0 0 1 3 2 1 ) =4 2 0 3 2 1 2 2 1 ) =5 2 0 2 2 1 ) =1 5 5 ) =0 5 5 7 0 1 ) =0 0 0 7 0 1 ) =5 7 5 7 0 1 ) =0 0 0 7 0 1 ) =5 7 5 Coordinate-wise differences 2 7 6 ) =5 7 5 2 7 6 ) =0 0 0 Below, this method reduces 8 HDkM comparisons down to 3. A modification that clusters if the condition holds in x% of the dimensions? If x%=66%, we get them all and get the same (correct?) classification! We are comparing to thresholds. We can create as mask pTrees with EIN formulas:

3 p1 2 7 6 p2 6 7 6 p3 3 7 5 p4 2 7 5 p5 3 2 1 p6 2 2 1 p7 7 0 1 p8 7 0 1 p5 m2=p1 p6 p7=m1 p2 D1 D3 p3 p4 p6 7 0 1 2 7 6 m1= m2= P x2>(100) 2 = P 23  (P 22  P 21 ) If 1st coord > 5=101 2 > 4.5 (avg) then that coordinate is green, else red. P x1>(101) 2 = P 13  P 12 If 2 or more coordinates are green the point is green, so 5,6,7,8 green. If 2 or more coordinates are red, the point is red, so 1,2,3,4 red P 11 0 1 0 1 P 12 1 P 13 0 1 0 1 0 1 P 21 1 0 P 22 1 0 P 23 1 0 P 31 1 0 P 32 1 0 P 33 0 1 P 13 0 1 0 1 0 1 1 st coord 3,5,7,8 green 1,2,4,6 red P 12 1 If 2nd > 4=100 2 > 3.5 then red, else green. 2nd coord 1,2,3,4 red 5,6,7,8 green P 21 1 0 P 22 1 0 P 23 1 0 P x>4 1 0 This method give the correct clustering in one epoch of puck-muck! One could try using the floor rather than the roof of the average? How would this perform on IRIS (classical example dataset from UCI repository that people use to assess clustering algorithms)? Other classical UCI datasets? Bioinformatics datasets (microarray datasets)? Image classification datasets? geotiff images and hyperspectral images from Dataminer? 4.5 3.5 3.5 avg= If 3rd > 4=100 2 > 3.5 then red, else green. P x3>(100) 2 = P 33  (P 32  P 31 ) P 31 1 0 P 32 1 0 or 1 0 3rd coord 3,4 red 1,2,5,6,7,8 green P 33 0 1 pTree -means clustering (puck-muck)? or pTree assisted k-means afinity-neighborhood clustering (pakman) clustering?

4 p1 2 7 6 p2 6 7 6 p3 3 7 5 p4 2 7 5 p5 3 2 1 p6 2 2 1 p7 7 0 1 p8 7 0 1 P x 2 >100 = P 23  (P 22  P 21 ) If 1 st col>5=101 2 >4.5 it's green, else red. P x 1 >101 = P 13  P 12 If 2 or more coordinates are green the point is green, so 5,6,7,8 green. If 2 or more coordinates are red, the point is red, so 1,2,3,4 red P 11 0 P 12 1 P 13 0 P 21 1 0 P 22 1 0 P 23 1 0 P 31 1 0 P 32 0 P 33 0 1 1, 2, 3, 4, 5, 6, 7, 8 If 2 nd >4=100 2 >3. red, else green. This method give the correct clustering! p 111 0 1 0 p 112 0 1 p 131 0 1 0 p 132 1 0 1 Pp 11 0 Pp 13 0 Pp 21 1 p1 222 1 0 Pp 22 1 0 Pp 23 1 Pp 31 1 p 321 1 0 Pp 32 0 1 p 331 0 1 Pp 33 0 1 P 13 0 Pp 12 1 P 12 1 Pp 13 0 Pp 12 1 p 131 0 1 0 p 132 1 0 1 P 21 1 0 P 22 1 0 Pp 21 1 Pp 22 1 0 P 23 1 0 Pp 23 1 1010 1,2,3,4 5,6,7,8 P x 3 >100 = P 33  (P 32  P 31 ) If 3 rd >4=100 2 >3. red, else green. P 31 1 0 P 32 0 Pp 31 1 Pp 32 0 1 1010 P 33 0 1 Pp 33 0 1 1111 p 331 0 1 3,4 1,2,5,6,7,8

5 P 11 0 P 12 1 P 13 0 P 21 1 0 P 22 1 0 P 23 1 0 P 31 1 0 P 32 0 P 33 0 1 p 111 0 1 0 p 112 0 1 p 131 0 1 0 p 132 1 0 1 p1 222 1 0 p 321 1 0 p 331 0 1 P 11 0 P 12 1 P 13 0 P 21 1 0 P 22 1 0 P 23 1 0 P 31 1 0 P 32 0 P 33 0 1 p 111 0 1 p 112 0 1 p 131 1 0 p 132 0 1 p1 222 1 0 p 321 1 0 p 331 0 1 p 1111 0 1 p 1312 1 0 p 1321 1 0 p 111 p 112 p 131 p 132 p 222 p 321 p 331 level-1 catelog p 1111 p 1312 p 1321 level-0 catelog

6 p1 2 7 1 p2 6 7 0 p3 3 7 1 p4 2 7 7 p5 3 2 4 p6 2 2 5 p7 7 0 4 p8 7 0 4 7 0 4 m1= 2 7 1 m2= P x2>(100) 2 =P4  (P2  P1) If Attr1 > 5=101 2 > 4.5 (avg) then that coordinate is green, else red. P x1>(101) 2 = P 13  P 12 If 2 or more are green the point is green, so 3,5,7,8 green, 1,2,4,6 red P 11 0 1 0 1 P 12 1 P 13 0 1 0 1 0 1 P 21 1 0 P 22 1 0 P 23 1 0 P 41 0 1 P 42 0 1 0 P 43 1 0 1 0 1 0 P 13 0 1 0 1 0 1 Attribute1 1 2 3 4 5 6 red 7 8 green P 12 1 Attribute2>4=100 2 then red, else green. P 21 1 0 P 22 1 0 P 23 1 0 P x2>4 1 0 4.5 3.5 2.5 =avg Another example (attributes 1,2,4) P x4>(011) 2 =P4  (P2  P1) P 41 1 0 P 42 1 0 P 44 1 0 P x4>3 1 0 Attribute4>3=011 2 then green, else red. 1 2 4 Att2 1 2 3 4 red 5 6 7 8 green Att4 1 2 3 4 green 5 6 7 8 red I did an agglomerative clustering on the same 8 data points (in 3-space) using the means to represent multipoint clusters and got the very same two clusters. The dendogram is shown. 1 2 3 4 5 6 7 8


Download ppt "In each epoch, start by clustering those points whose numeric difference from the mean is minimum in every dimension. Finish the other points with HDkM."

Similar presentations


Ads by Google