Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies.

Slides:

Advertisements

Similar presentations

DBSCAN & Its Implementation on Atlas Xin Zhou, Richard Luo Prof. Carlo Zaniolo Spring 2002.

Advertisements

Clustering (2). Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram –A tree like.

SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.

Efficient Density-Based Clustering of Complex Objects Stefan Brecheisen, Hans-Peter Kriegel, Martin Pfeifle University of Munich Institute for Computer.

Protein sequence clustering has been widely used as a part of the analysis of protein structure and function. We demonstrate an approach to protein clustering,

Presented by: GROUP 7 Gayathri Gandhamuneni & Yumeng Wang.

DBSCAN – Density-Based Spatial Clustering of Applications with Noise M.Ester, H.P.Kriegel, J.Sander and Xu. A density-based algorithm for discovering clusters.

OPTICS: Ordering Points To Identify the Clustering Structure Mihael Ankerst, Markus M. Breunig, Hans- Peter Kriegel, Jörg Sander Presented by Chris Mueller.

Clustering Prof. Navneet Goyal BITS, Pilani

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.

More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.

Chapter 3: Cluster Analysis

CMPUT 690 – Topics in Databases Knowledge Discovery in Databases Additional Slides for Clustering II: Animation of the OPTICS Algorithm Dr. Jörg Sander.

Geographical and Temporal Similarity Measurement in Location-based Social Networks Chongqing University of Posts and Telecommunications KTH – Royal Institute.

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Cluster Analysis.

INTERNATIONAL INSTITUTE FOR GEO-INFORMATION SCIENCE AND EARTH OBSERVATION Conceptualization of Place via Spatial Clustering and Co- occurrence Analysis.

Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition Thurid Vogt, Elisabeth André ICME 2005 Multimedia concepts.

SCAN: A Structural Clustering Algorithm for Networks

Cluster Analysis.

An Approach to Active Spatial Data Mining Wei Wang Data Mining Lab, UCLA March 24, 1999.

This example shows the development of a classification tree using DMVis, a data visualisation tool. The tool allows visualisation of relational structures.

The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Using Sets of Feature Vectors for Similarity Search on Voxelized.

Neural Network Homework Report: Clustering of the Self-Organizing Map Professor ： Hahn-Ming Lee Student : Hsin-Chung Chen M IEEE TRANSACTIONS ON.

Time-focused density-based clustering of trajectories of moving objects Margherita D’Auria Mirco Nanni Dino Pedreschi.

Intelligent Database Systems Lab Presenter: MIN-CHIEH HSIU Authors: NHAT-QUANG DOAN ∗, HANANE AZZAG, MUSTAPHA LEBBAH 2013 NN Growing self-organizing trees.

The BIRCH Algorithm Davitkov Miroslav, 2011/3116

VAST 2011 Sebastian Bremm, Tatiana von Landesberger, Martin Heß, Tobias Schreck, Philipp Weil, and Kay Hamacher Interactive-Graphics Systems TU Darmstadt,

The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.

-1- Philipp Heim, Thomas Ertl, Jürgen Ziegler Facet Graphs: Complex Semantic Querying Made Easy Philipp Heim 1, Thomas Ertl 1 and Jürgen Ziegler 2 1 Visualization.

Mining High Utility Itemset in Big Data

Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.

Enhancing Interactive Visual Data Analysis by Statistical Functionality Jürgen Platzer VRVis Research Center Vienna, Austria.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comprehensive Comparison Study of Document Clustering.

Density-Based Clustering Algorithms

80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.

Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.

Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.

Topic9: Density-based Clustering

Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering.

MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:

Institute of Computing Technology, Chinese Academy of Sciences 1 A Unified Framework of Recommending Diverse and Relevant Queries Speaker: Xiaofei Zhu.

DBSCAN Data Mining algorithm Dr Veljko Milutinović Milan Micić

Daniel A. Keim, Hans-Peter Kriegel Institute for Computer Science, University of Munich 3/23/ VisDB: Database exploration using Multidimensional.

BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies A hierarchical clustering method. It introduces two concepts : Clustering feature Clustering.

Presented by Ho Wai Shing

Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters.

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.

Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.

Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,

1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree ： An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.

CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course.

2010 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology (WI-IAT) Hierarchical Cost-sensitive Web Resource Acquisition.

Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.

Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.

Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.

More on Clustering in COSC 4335

CS 685: Special Topics in Data Mining Jinze Liu

Liang Zheng and Yuzhong Qu

CSc4730/6730 Scientific Visualization

CS 685: Special Topics in Data Mining Jinze Liu

The BIRCH Algorithm Davitkov Miroslav, 2011/3116

GPX: Interactive Exploration of Time-series Microarray Data

Comparative Evaluation of SOM-Ward Clustering and Decision Tree for Conducting Customer-Portfolio Analysis By 1Oloyede Ayodele, 2Ogunlana Deborah, 1Adeyemi.

CS 685: Special Topics in Data Mining Jinze Liu

Donghui Zhang, Tian Xia Northeastern University

BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies

Presentation transcript:

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies Hans-Peter Kriegel, Stefan Brecheisen, Peer Kröger, Martin Pfeifle, Maximillian Viermetz MDM/KDD2003 Washington, DC August , 2003 Database Group Institute for Computer Science University of Munich, Germany

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction OPTICS Conclusion Introduction Cluster Recognition Cluster Representatives BOSS

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Introduction Telecommunication DataMarket-Basket Data Problem: Larger and larger amounts of data gathered automatically Too large for humans to analyze manually Space Telescopes Data anlysis tools: Help the user to get an overview over large data sets Help companies to get a competitive advantage out of the data

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Introduction Solution based on Visual Data Mining OPTICS DATA Visualisation of the intermediate Result Reachability-Plot BOSS Cluster Recognition Cluster Representatives Knowledge

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction OPTICS Introduction Conclusion Cluster Recognition Cluster Representatives BOSS

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich OPTICS Ordering Points to Identify the Clustering Structure OPTICS [Ankerst, Breunig, Kriegel, Sander 99] Yields a density-based hierarchical clustering Insensitive to its two input parameters, MinPts Result (so called reachability plot) can be easily visualized and is suitable for interactive exploration  A1A1 A2A2 22 A1A1 A2A2 B B AB A B 11 Data Space Reachability Plot

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H 44  reach seedlist:OPTICSAlgorithm Example Database (2-dimensional, 16 points) = 44, MinPts = 3  (A,  )

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist:OPTICSAlgorithm A 44 reach  Database: 20 2-dimensional points, = 44, MinPts = 3  (B,40) (I, 40)  core- distance Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (I, 40) (C, 40)OPTICSAlgorithm A B 44 reach  Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (J, 20) (K, 20) (L, 31) (C, 40) (M, 40) (R, 43)OPTICSAlgorithm ABI 44 reach  Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (L, 19) (K, 20) (R, 21) (M, 30) (P, 31) (C, 40)OPTICSAlgorithm ABIJ 44 reach  Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (M, 18) (K, 18) (R, 20) (P, 21) (N, 35) (C, 40)OPTICSAlgorithm ABIJL 44 reach  Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (K, 18) (N, 19) (R, 20) (P, 21) (C, 40)OPTICSAlgorithm ABIJLM 44 reach  Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (N, 19) (R, 20) (P, 21) (C, 40)OPTICSAlgorithm ABIJLMK 44 reach  Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (R, 20) (P, 21) (C, 40)OPTICSAlgorithm ABIJLMKN 44 reach  Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (P, 21) (C, 40)OPTICSAlgorithm ABIJLMKNR 44 reach  Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (C, 40)OPTICSAlgorithm ABIJLMKNRP 44 reach  Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (D, 22) (F, 22) (E, 30) (G, 35)OPTICSAlgorithm ABIJLMKNRPC 44 reach  Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G HOPTICSAlgorithm seedlist: (F, 22) (E, 22) (G, 32) ABIJLMKNRPCD 44 reach  Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (G, 17) (E, 22)OPTICSAlgorithm ABIJLMKNRPCDF 44 reach  Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (E, 15) (H, 43)OPTICSAlgorithm ABIJLMKNRPCDFG 44 reach  Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: (H, 43)OPTICSAlgorithm ABIJLMKNRPCDFGE 44 reach  Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: -OPTICSAlgorithm ABIJLMKNRPCDFGEH 44 reach  Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich A I B J K L R M P N C F D E G H seedlist: -OPTICSAlgorithm ABIJLMKNRPCDFGEH 44 reach  Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction OPTICS Conclusion Cluster Recognition Cluster Representatives BOSS Cluster Recognition

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Recognition of Clusters via steepness: Definition: Steep Elements UpPoint: The successor is % higher than this point DownPoint: The successor is % lower than this point Definition: Steep Areas A steep area starts end ends with a steep point A steep area contains at most MinPoints contiguous non-steep points A steep area must be maximal Cluster Recognition - Clustering [Kriegel et al. 99] Steep Downward Points Steep Upward Points Steep Down AreaSteep Upward Area Cluster   

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Recognition Cluster-Tree [Sander et al. 03] Root significant local maxima insignificant local maxima Algorithm: Find all local maxima and sort them in descending order Split data set Test for significance of split Decide where to attach the sublcusters Call the method recursively for new sublcusters

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Recognition Cluster-Tree [Sander et al. 03] Root significant local maxima insignificant local maxima Algorithm: Find all local maxima and sort them in descending order Split data set Test for significance of split Decide where to attach the sublcusters Call the method recursively for new sublcusters

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Recognition Cluster-Tree [Sander et al. 03] Root significant local maxima insignificant local maxima Algorithm: Find all local maxima and sort them in descending order Split data set Test for significance of split Decide where to attach the sublcusters Call the method recursively for new sublcusters

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Recognition Cluster-Tree [Sander et al. 03] Root Algorithm: Find all local maxima and sort them in descending order Split data set Test for significance of split Decide where to attach the sublcusters Call the method recursively for new sublcusters Similar reachability values => no new cluster hierarchy

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Recognition Cluster-Tree [Sander et al. 03] Root significant local maxima insignificant local maxima Algorithm: Find all local maxima and sort them in descending order Split data set Test for significance of split Decide where to attach the sublcusters Call the method recursively for new sublcusters

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Motivation: Detection of narrowing clusters, e.g. cluster C Cluster Definition: A set of elements which is smaller than a given value A set of elements which contains at least MinPts elements and at least MinPts elements less than its parent cluster Cluster Recognition Drop-Down Clustering: Motivation  A A B C A

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Motivation: Detection of narrowing clusters, e.g. cluster C Cluster Definition: A set of elements which is smaller than a given value A set of elements which contains at least MinPts elements and at least MinPts elements less than its parent cluster Cluster Recognition Drop-Down Clustering: Motivation  A B A B C B A

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Motivation: Detection of narrowing clusters, e.g. cluster C Cluster Definition: A set of elements which is smaller than a given value A set of elements which contains at least MinPts elements and at least MinPts elements less than its parent cluster Cluster Recognition Drop-Down Clustering: Motivation  A C B A B C B C A

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 1 (initial clustering): Sort all elements by descending reachability value Find root clusters by scanning sorted list Cluster Recognition Drop-Down Clustering: Algortihm sorted list of reachability values

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 1 (initial clustering): Sort all elements by descending reachability value Find root clusters by scanning sorted list Cluster Recognition Drop-Down Clustering: Algortihm

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm border points cluster hierarchy

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm  border points cluster hierarchy

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm  border points cluster hierarchy

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm  pred  succ  pred   << succ   border points cluster hierarchy

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm  pred  succ  border points cluster hierarchy

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm  pred  succ  border points cluster hierarchy

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm  border points cluster hierarchy

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm  border points cluster hierarchy

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm  border points cluster hierarchy

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm  border points cluster hierarchy

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Phase 2 (recursive pool draining): Sort all elements by descending reachability value Scan sorted list if (non-adjacent border elements) or (inflexion point) then Test if cluster size contains at least MinPts elements Test if cluster size is MinPts smaller than parent cluster Call the method recursively for new subclusters Cluster Recognition root-cluster sorted elements of root-cluster Drop-Down Clustering: Algortihm  border points cluster hierarchy

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Representatives First Experimental Results Drop-Down-Clustering Tree-Clustering - Clustering  many clusters and subclusters are recognized some clusters are recognized no clusters are recognized detection of narrowing clusters

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction OPTICS Conclusion Cluster Recognition Cluster Representatives BOSS Cluster Representatives Cluster Recognition

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Representatives Algorithms for Detecting Cluster Representatives: Medoid-Approach Medoid-Approach A I B J K L R M P N C D E G H S T U V Example with MinPts = 3 I I

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Representatives Algorithms for Detecting Cluster Representatives: Medoid-Approach Core-Distance Approach (based on an OPTICS run) Core-Distance Approach A I B J K L R M N C D E G H S T U V I Example with MinPts = 3 P P PP

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Representatives Algorithms for Detecting Cluster Representatives: Medoid-Approach Core-Distance Approach (based on an OPTICS run) Core-Distance Approach A I B J K L R M N C D E G H S T I Example with MinPts = 5 P L L U V

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Representatives Algorithms for Detecting Cluster Representatives: Medoid-Approach Core-Distance Approach (based on an OPTICS run) Maximizing Successors (based on an OPTICS run) Maximizing Successors A I B J K R M P N C D E G H S T I P L OPTICS run L L L Example with MinPts = 3 EGBIKLPRNJM reach V  narrowing cluster

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Cluster Representatives First Experimental Results

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction OPTICS Conclusion Cluster Recognition Cluster Representatives BOSS Cluster Representatives

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich BOSS Browsing Optics-Plots for Similarity Search BOSS (Browsing OPTICS-Plots for Similarity Search) Interactive data browsing tool based on reachability plots Interactive data browsing tool based on reachability plots User-friendly method to support the time-consuming task of finding similar parts: of finding similar parts: Revealing the hierarchical clustering structure Revealing the hierarchical clustering structure of the dataset at a glance of the dataset at a glance Displaying suitable representatives for large clusters Displaying suitable representatives for large clusters

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich BOSSArchitecture

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich BOSSScreenshot

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction OPTICS Conclusion Cluster Recognition Cluster Representatives BOSS Conclusion BOSS

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Contribution New algorithm for cluster recognition New algorithms for finding suitable cluster representatives BOSS: a new data analysis tool Future Work detailed evaluation of the new algorithmsConclusions

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Thank you for your attention Any questions? ? ? ? ? ? ? ? ?

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich OPTICS Application Ranges OPTICS yields an intermediate result which serves as a multi-purpose basis for further analysis: Similarity Search Similarity search Visualisation of the intermeediate result OPTICS DATA Other Algorithms Knowledge Visual Data Mining Visual data mining Evaluation of similarity models Evaluation of Similarity Models k-nn query: