Download presentation
Presentation is loading. Please wait.
1
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies Hans-Peter Kriegel, Stefan Brecheisen, Peer Kröger, Martin Pfeifle, Maximillian Viermetz MDM/KDD2003 Washington, DC August 24 - 27, 2003 Database Group Institute for Computer Science University of Munich, Germany
2
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction OPTICS Conclusion Introduction Cluster Recognition Cluster Representatives BOSS
3
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Introduction Telecommunication DataMarket-Basket Data Problem: Larger and larger amounts of data gathered automatically Too large for humans to analyze manually Space Telescopes Data anlysis tools: Help the user to get an overview over large data sets Help companies to get a competitive advantage out of the data
4
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Introduction Solution based on Visual Data Mining OPTICS DATA Visualisation of the intermediate Result Reachability-Plot BOSS Cluster Recognition Cluster Representatives Knowledge
5
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction OPTICS Introduction Conclusion Cluster Recognition Cluster Representatives BOSS
6
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich OPTICS Ordering Points to Identify the Clustering Structure OPTICS [Ankerst, Breunig, Kriegel, Sander 99] Yields a density-based hierarchical clustering Insensitive to its two input parameters, MinPts Result (so called reachability plot) can be easily visualized and is suitable for interactive exploration A1A1 A2A2 22 A1A1 A2A2 B B AB A B 11 Data Space Reachability Plot
7
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H 44 reach seedlist:OPTICSAlgorithm Example Database (2-dimensional, 16 points) = 44, MinPts = 3 (A, )
8
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist:OPTICSAlgorithm A 44 reach Database: 20 2-dimensional points, = 44, MinPts = 3 (B,40) (I, 40) core- distance Example Database (2-dimensional, 16 points) = 44, MinPts = 3
9
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (I, 40) (C, 40)OPTICSAlgorithm A B 44 reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3
10
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (J, 20) (K, 20) (L, 31) (C, 40) (M, 40) (R, 43)OPTICSAlgorithm ABI 44 reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3
11
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (L, 19) (K, 20) (R, 21) (M, 30) (P, 31) (C, 40)OPTICSAlgorithm ABIJ 44 reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3
12
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (M, 18) (K, 18) (R, 20) (P, 21) (N, 35) (C, 40)OPTICSAlgorithm ABIJL 44 reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3
13
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (K, 18) (N, 19) (R, 20) (P, 21) (C, 40)OPTICSAlgorithm ABIJLM 44 reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3
14
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (N, 19) (R, 20) (P, 21) (C, 40)OPTICSAlgorithm ABIJLMK 44 reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3
15
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (R, 20) (P, 21) (C, 40)OPTICSAlgorithm ABIJLMKN 44 reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3
16
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (P, 21) (C, 40)OPTICSAlgorithm ABIJLMKNR 44 reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3
17
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (C, 40)OPTICSAlgorithm ABIJLMKNRP 44 reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3
18
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (D, 22) (F, 22) (E, 30) (G, 35)OPTICSAlgorithm ABIJLMKNRPC 44 reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3
19
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G HOPTICSAlgorithm seedlist: (F, 22) (E, 22) (G, 32) ABIJLMKNRPCD 44 reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3
20
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (G, 17) (E, 22)OPTICSAlgorithm ABIJLMKNRPCDF 44 reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3
21
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (E, 15) (H, 43)OPTICSAlgorithm ABIJLMKNRPCDFG 44 reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3
22
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (H, 43)OPTICSAlgorithm ABIJLMKNRPCDFGE 44 reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3
23
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: -OPTICSAlgorithm ABIJLMKNRPCDFGEH 44 reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3
24
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: -OPTICSAlgorithm ABIJLMKNRPCDFGEH 44 reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3
25
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction OPTICS Conclusion Cluster Recognition Cluster Representatives BOSS Cluster Recognition
26
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Recognition of Clusters via steepness: Definition: Steep Elements UpPoint: The successor is % higher than this point DownPoint: The successor is % lower than this point Definition: Steep Areas A steep area starts end ends with a steep point A steep area contains at most MinPoints contiguous non-steep points A steep area must be maximal Cluster Recognition - Clustering [Kriegel et al. 99] Steep Downward Points Steep Upward Points Steep Down AreaSteep Upward Area Cluster
27
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Recognition Cluster-Tree [Sander et al. 03] Root 3 45 12 12 345 significant local maxima insignificant local maxima Algorithm: Find all local maxima and sort them in descending order Split data set Test for significance of split Decide where to attach the sublcusters Call the method recursively for new sublcusters
28
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Recognition Cluster-Tree [Sander et al. 03] Root 3 45 12 12 345 significant local maxima insignificant local maxima Algorithm: Find all local maxima and sort them in descending order Split data set Test for significance of split Decide where to attach the sublcusters Call the method recursively for new sublcusters
29
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Recognition Cluster-Tree [Sander et al. 03] Root 3 45 12 12 345 significant local maxima insignificant local maxima Algorithm: Find all local maxima and sort them in descending order Split data set Test for significance of split Decide where to attach the sublcusters Call the method recursively for new sublcusters
30
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Recognition Cluster-Tree [Sander et al. 03] Root 3 45 12 12 345 Algorithm: Find all local maxima and sort them in descending order Split data set Test for significance of split Decide where to attach the sublcusters Call the method recursively for new sublcusters Similar reachability values => no new cluster hierarchy
31
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Recognition Cluster-Tree [Sander et al. 03] Root 3 45 12 12 345 significant local maxima insignificant local maxima Algorithm: Find all local maxima and sort them in descending order Split data set Test for significance of split Decide where to attach the sublcusters Call the method recursively for new sublcusters
32
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Motivation: Detection of narrowing clusters, e.g. cluster C Cluster Definition: A set of elements which is smaller than a given value A set of elements which contains at least MinPts elements and at least MinPts elements less than its parent cluster Cluster Recognition Drop-Down Clustering: Motivation A A B C A
33
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Motivation: Detection of narrowing clusters, e.g. cluster C Cluster Definition: A set of elements which is smaller than a given value A set of elements which contains at least MinPts elements and at least MinPts elements less than its parent cluster Cluster Recognition Drop-Down Clustering: Motivation A B A B C B A
34
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Motivation: Detection of narrowing clusters, e.g. cluster C Cluster Definition: A set of elements which is smaller than a given value A set of elements which contains at least MinPts elements and at least MinPts elements less than its parent cluster Cluster Recognition Drop-Down Clustering: Motivation A C B A B C B C A
35
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 1 (initial clustering): Sort all elements by descending reachability value Find root clusters by scanning sorted list Cluster Recognition Drop-Down Clustering: Algortihm sorted list of reachability values
36
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 1 (initial clustering): Sort all elements by descending reachability value Find root clusters by scanning sorted list Cluster Recognition Drop-Down Clustering: Algortihm
37
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm border points cluster hierarchy
38
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm border points cluster hierarchy
39
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm border points cluster hierarchy
40
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm pred succ pred << succ border points cluster hierarchy
41
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm pred succ border points cluster hierarchy
42
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm pred succ border points cluster hierarchy
43
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm border points cluster hierarchy
44
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm border points cluster hierarchy
45
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm border points cluster hierarchy
46
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm border points cluster hierarchy
47
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm border points cluster hierarchy
48
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Representatives First Experimental Results Drop-Down-Clustering Tree-Clustering - Clustering many clusters and subclusters are recognized some clusters are recognized no clusters are recognized detection of narrowing clusters
49
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction OPTICS Conclusion Cluster Recognition Cluster Representatives BOSS Cluster Representatives Cluster Recognition
50
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Representatives Algorithms for Detecting Cluster Representatives: Medoid-Approach Medoid-Approach A I B J K L R M P N C D E G H S T U V Example with MinPts = 3 I I
51
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Representatives Algorithms for Detecting Cluster Representatives: Medoid-Approach Core-Distance Approach (based on an OPTICS run) Core-Distance Approach A I B J K L R M N C D E G H S T U V I Example with MinPts = 3 P P PP
52
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Representatives Algorithms for Detecting Cluster Representatives: Medoid-Approach Core-Distance Approach (based on an OPTICS run) Core-Distance Approach A I B J K L R M N C D E G H S T I Example with MinPts = 5 P L L U V
53
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Representatives Algorithms for Detecting Cluster Representatives: Medoid-Approach Core-Distance Approach (based on an OPTICS run) Maximizing Successors (based on an OPTICS run) Maximizing Successors A I B J K R M P N C D E G H S T I P L OPTICS run L L L Example with MinPts = 3 EGBIKLPRNJM reach V narrowing cluster
54
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Representatives First Experimental Results
55
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction OPTICS Conclusion Cluster Recognition Cluster Representatives BOSS Cluster Representatives
56
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich BOSS Browsing Optics-Plots for Similarity Search BOSS (Browsing OPTICS-Plots for Similarity Search) Interactive data browsing tool based on reachability plots Interactive data browsing tool based on reachability plots User-friendly method to support the time-consuming task of finding similar parts: of finding similar parts: Revealing the hierarchical clustering structure Revealing the hierarchical clustering structure of the dataset at a glance of the dataset at a glance Displaying suitable representatives for large clusters Displaying suitable representatives for large clusters
57
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich BOSSArchitecture
58
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich BOSSScreenshot
59
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction OPTICS Conclusion Cluster Recognition Cluster Representatives BOSS Conclusion BOSS
60
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Contribution New algorithm for cluster recognition New algorithms for finding suitable cluster representatives BOSS: a new data analysis tool Future Work detailed evaluation of the new algorithmsConclusions
61
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Thank you for your attention Any questions? ? ? ? ? ? ? ? ?
62
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich OPTICS Application Ranges OPTICS yields an intermediate result which serves as a multi-purpose basis for further analysis: Similarity Search Similarity search Visualisation of the intermeediate result OPTICS DATA Other Algorithms Knowledge Visual Data Mining Visual data mining Evaluation of similarity models Evaluation of Similarity Models k-nn query:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.