Efficient Density-Based Clustering of Complex Objects Stefan Brecheisen, Hans-Peter Kriegel, Martin Pfeifle University of Munich Institute for Computer.

Slides:



Advertisements
Similar presentations
Ranking Multimedia Databases via Relevance Feedback with History and Foresight Support / 12 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT AND EXPLORATION.
Advertisements

DBSCAN & Its Implementation on Atlas Xin Zhou, Richard Luo Prof. Carlo Zaniolo Spring 2002.
Introduction Distance-based Adaptable Similarity Search
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Di Yang, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute VLDB 2009, Lyon, France 1 A Shared Execution Strategy for Multiple Pattern.
Lecture outline Density-based clustering (DB-Scan) – Reference: Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu: A Density-Based Algorithm for.
Presented by: GROUP 7 Gayathri Gandhamuneni & Yumeng Wang.
DBSCAN – Density-Based Spatial Clustering of Applications with Noise M.Ester, H.P.Kriegel, J.Sander and Xu. A density-based algorithm for discovering clusters.
Density-based Approaches
OPTICS: Ordering Points To Identify the Clustering Structure Mihael Ankerst, Markus M. Breunig, Hans- Peter Kriegel, Jörg Sander Presented by Chris Mueller.
Clustering Prof. Navneet Goyal BITS, Pilani
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
Clustering Methods Professor: Dr. Mansouri
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
Chapter 3: Cluster Analysis
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
1 NNH: Improving Performance of Nearest- Neighbor Searches Using Histograms Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research) Chen Li (UC Irvine)
Similarity Search for Adaptive Ellipsoid Queries Using Spatial Transformation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa (Nara.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010 Presenter: Yunchao Gong Dept. Computer Science, UNC Chapel Hill.
Cluster Analysis.
SCAN: A Structural Clustering Algorithm for Networks
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Using Sets of Feature Vectors for Similarity Search on Voxelized.
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Tree-Based Density Clustering using Graphics Processors
 Clustering of Web Documents Jinfeng Chen. Zhong Su, Qiang Yang, HongHiang Zhang, Xiaowei Xu and Yuhen Hu, Correlation- based Document Clustering using.
Time-focused density-based clustering of trajectories of moving objects Margherita D’Auria Mirco Nanni Dino Pedreschi.
An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg and Daniel A. Keim.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.
Christian Böhm & Hans-Peter Kriegel, Ludwig Maximilians Universität München A Cost Model and Index Architecture for the Similarity Join.
Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,
Density-Based Clustering Algorithms
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
DataBases & Data Mining Joined Specialization Project „Data Mining Classification Tool” By Mateusz Żochowski & Jakub Strzemżalski.
Topic9: Density-based Clustering
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
An Efficient Linear Time Triple Patterning Solver Haitong Tian Hongbo Zhang Zigang Xiao Martin D.F. Wong ASP-DAC’15.
DBSCAN Data Mining algorithm Dr Veljko Milutinović Milan Micić
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
Presented by Ho Wai Shing
Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters.
5/29/2008AI UEC in Japan Chapter 12 Clustering: Large Databases Written by Farial Shahnaz Presented by Zhao Xinyou Data Mining Technology.
Database Systems Laboratory The Pyramid-Technique: Towards Breaking the Curse of Dimensionality Stefan Berchtold, Christian Bohm, and Hans-Peter Kriegal.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course.
23 1 Christian Böhm 1, Florian Krebs 2, and Hans-Peter Kriegel 2 1 University for Health Informatics and Technology, Innsbruck 2 University of Munich Optimal.
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Density-based Place Clustering in Geo-Social Networks Jieming Shi, Nikos Mamoulis, Dingming Wu, David W. Cheung Department of Computer Science, The University.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Indexing Multidimensional Data
Cohesive Subgraph Computation over Large Graphs
Data Mining: Basic Cluster Analysis
Christian Böhm, Bernhard Braunmüller, Florian Krebs, and Hans-Peter Kriegel, University of Munich Epsilon Grid Order: An Algorithm for the Similarity.
Clustering Uncertain Taxi data
K Nearest Neighbor Classification
Similarity Search: A Matching Based Approach
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
Topological Signatures For Fast Mobility Analysis
CSE572: Data Mining by H. Liu
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Presentation transcript:

Efficient Density-Based Clustering of Complex Objects Stefan Brecheisen, Hans-Peter Kriegel, Martin Pfeifle University of Munich Institute for Computer Science Brighton,UK November 01-04, 2004

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK Outline Density-Based Clustering Clustering of Complex Objects Experimental Evaluation

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK Outline Density-Based Clustering Core Object · Density-Reachability · DBSCAN · OPTICS Clustering of Complex Objects Experimental Evaluation

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK Data Mining Larger and larger amounts of data collected automatically Too large for humans to analyze manually Tools to assist analysis necessary  KDD / Data Mining Hubble Space TelescopeTelecommunication DataMarket-Basket Data

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK Clustering –Efficiently grouping the database into sub-groups (clusters) such that similarity within clusters maximized similarity between clusters minimized Flat Clustering one level of clusters Hierarchical Clustering nested clusters e.g. density-based clustering algorithm DBSCAN [KDD 96] e.g. density-based clustering algorithm OPTICS [SIGMOD 99]

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK Density-Based Clustering I Parameters –range  and minimal weight MinPts Definition: core object – q is core object if | rangeQuery (q,  ) |  MinPts Definition: directly density-reachable –p directly density-reachable from q if q is a core object and p  rangeQuery (q,  ) Definition: density-reachable –density-reachable: transitive closure of “directly density-reachable” q MinPts=5 p q o q r

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK Density-Based Clustering II Core Idea of Hierarchical Cluster Ordering: Order the objects linearly such that objects of a cluster are adjacent in the ordering.

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK Density-Based Clustering II Core Idea of Hierarchical Cluster Ordering: Order the objects linearly such that objects of a cluster are adjacent in the ordering. Definition: core-distance core-distance(o) o  MinPts = 5

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK Density-Based Clustering II Core Idea of Hierarchical Cluster Ordering: Order the objects linearly such that objects of a cluster are adjacent in the ordering. Definition: core-distance Definition: reachability-distance core-distance(o) o reachability-distance(p,o) p p  MinPts = 5

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK OPTICS Algorithm A I B J K L R M P N C F D E G H 44  reach seedlist: Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK OPTICS Algorithm A I B J K L R M P N C F D E G H 44  reach seedlist: Example Database (2-dimensional, 16 points) = 44, MinPts = 3  A I B J K L R M P N C F D E G H A 44   core- distance (B,40) (I, 40)

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK OPTICS Algorithm 44  reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3  A 44  B A I B J K L R M P N C F D E G H seedlist: (I, 40) (C, 40)

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK OPTICS Algorithm 44  reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3  A 44  B A I B J K L R M P N C F D E G H I seedlist: (J, 20) (K, 20) (L, 31) (C, 40) (M, 40) (R, 43)

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK OPTICS Algorithm 44  reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3  A 44  B I A I B J K L R M P N C F D E G H J seedlist: (L, 19) (K, 20) (R, 21) (M, 30) (P, 31) (C, 40)

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK OPTICS Algorithm 44  reach Example Database (2-dimensional, 16 points) = 44, MinPts = 3  A 44  B IJ A I B J K L R M P N C F D E G H L … seedlist: (M, 18) (K, 18) (R, 20) (P, 21) (N, 35) (C, 40)

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK OPTICS Algorithm A I B J K L R M P N C F D E G H seedlist: - ABIJLMKNRPCDFGEH 44 reach  Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK OPTICS Algorithm A I B J K L R M P N C F D E G H seedlist: - ABIJLMKNRPCDFGEH 44 reach  Example Database (2-dimensional, 16 points) = 44, MinPts = 3 

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK Outline Foundations of Density-Based Clustering Core Object · Density-Reachability · DBSCAN · OPTICS Clustering of Complex Objects Direct Integration of the Multi-Step Query Processing Paradigm Experimental Evaluation

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK Complex Objects complex objects complex models complex distance measure

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK Single-Step Clustering Approach Exact information Density-based Clustering algorithms, like DBSCAN and OPTICS Query Q(q,  ) Result R(q,  ) Performance Problems For each database object q, we perform one range query. Expensive exact distance computation d o (o,q) for each object o of the database independent of the  range 1 2

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK Multi-Step Query Processing Multi-Step Similarity Search Range Queries (Faloutsos et al. 94) k-Nearest Neighbor Queries (Korn et al. 96) Optimal k- Nearest Neighbor Queries (Seidl, Kriegel 98) No False Drops? Filter Step (index-based) Refinement Step (exact evaluation) candidates results filter distanceobject distance Lower-Bounding Property

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK Traditional Multi-Step Clustering Approach Range query processor (e.g. Faloutsos et al. 94) Density-based Clustering algorithms, like DBSCAN and OPTICS Performance Problems For each database object q, we perform one range query (1). The range query is first performed on the filter information (2,3). One expensive exact distance computation d o (o,q) for each object o of the candidate set C(q,  ) is performed (4). This refinement step is very expensive for non-selective filters or high  values. Query (q,  ) 1 Candidates C(q,  ) Filter information Query Q(q,  ) using d f 23 Exact information refinement-step computation of d o (o,q) for all o  C(q,  ) 4 Result (q,  ) 5

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK Integrated Multi-Step Clustering Approach Exact information Filter information Extended Density-based Clustering algorithms, like DBSCAN and OPTICS Query Q(q,  ) using d f Candidates C (q,  ) computation of d o (o,q) for Core - properties of q 123 Direct integration of the multi-step query processing paradigm into the clustering algorithm postponing expensive exact distance computations as long as possible Proposed Solution For each database object q, we perform one range query on the filter information (1,2). Only those exact distances d o (o,q) are computed which are necessary to determine the core-properties of q (3). A beneficial heuristic for determining the reachability- properties is applied which saves on exact distance computations (4). postponed computations of d o (o,q) for Reach.-properties of o 4

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK Filter Information Q  First, we carry out a range query on the filter for each query object Q. Second, we order the resulting candidate set in ascending order according to the filter distance. Third, we walk through the candidate set and perform exact distance calculations until we can be sure that we have found the MinPts nearest neighbors. MinPts=3  =75 d f (K,Q)=10 d f (Z,Q)=12 d f (R,Q)=18 d f (M,Q)=55 d f (A,Q)=58 d f (I,Q)=65 Sorted Distance List R Z K M A I d o (K,Q)=53 d o (Z,Q)=69 d o (R,Q)=49 Determination of Core-Properties Integrated Multi-Step Clustering Approach core-distance of Q =53 d o (R,Q)=53

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK d f (R,B)=18 d f (R,D)=34 d f (K,B)=20 d f (K,L)=30 d f (K,G)=43 Data Structure “List of Lists” Additional information about possible predecessor objects are stored in order to postpone exact distance calculations as long as possible. Integrated Multi-Step Clustering Approach d f (K,C)=55 d 0 (M,C)=65 first elements are ascendingly ordered each list of predecessor objects is ascendingly ordered d o (R,Q)=53 d o (Z,Q)=69 d f (M,Q)=55 d f (A,Q)=58 d f (I,Q)=65 result list of the current query object Q which has to be inserted into the extended seedlist d o (K,Q)=53 Extended Seedlist

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK d o (K,Q)=53 d f (R,B)=18 d f (R,D)=34 d f (K,B)=20 d f (K,L)=30 d f (K,G)=43 Data Structure “List of Lists” Additional information about possible predecessor objects are stored in order to postpone exact distance calculations as long as possible. Integrated Multi-Step Clustering Approach d f (K,C)=55 d 0 (M,C)=65 d o (R,Q)=53 d o (Z,Q)=69 d f (M,Q)=55 d f (A,Q)=58 d f (I,Q)=65 result list of the current query object Q which has to be inserted into the extended seedlist d o (K,Q)=53 Extended Seedlist

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK d f (R,B)=18 d f (R,D)=34 d f (K,B)=20 d f (K,L)=30 d f (K,G)=43 Data Structure “List of Lists” Additional information about possible predecessor objects are stored in order to postpone exact distance calculations as long as possible. Integrated Multi-Step Clustering Approach d 0 (M,C)=65 d o (R,Q)=53 d o (Z,Q)=69 d f (M,Q)=55 d f (A,Q)=58 d f (I,Q)=65 result list of the current query object Q which has to be inserted into the extended seedlist d o (K,Q)=53 d 0 (Z,Q)=69 Extended Seedlist

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK d f (R,B)=18 d f (R,D)=34 d f (K,B)=20 d f (K,L)=30 d f (K,G)=43 Data Structure “List of Lists” Additional information about possible predecessor objects are stored in order to postpone exact distance calculations as long as possible. Integrated Multi-Step Clustering Approach d o (R,Q)=53 d o (Z,Q)=69 d f (M,Q)=55 d f (A,Q)=58 d f (I,Q)=65 result list of the current query object Q which has to be inserted into the extended seedlist d o (K,Q)=53 d 0 (M,C)=65 d 0 (Z,Q)=69 d 0 (R,Q)=53 Extended Seedlist

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK d f (R,B)=18 d f (R,D)=34 d f (K,B)=20 d f (K,L)=30 d f (K,G)=43 Data Structure “List of Lists” Additional information about possible predecessor objects are stored in order to postpone exact distance calculations as long as possible. Integrated Multi-Step Clustering Approach d o (R,Q)=53 d o (Z,Q)=69 d f (M,Q)=55 d f (A,Q)=58 d f (I,Q)=65 result list of the current query object Q which has to be inserted into the extended seedlist d o (K,Q)=53 d 0 (M,C)=65 d 0 (Z,Q)=69 d 0 (R,Q)=53 d f (M,Q)=55 Extended Seedlist

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK d f (R,B)=18 d f (R,D)=34 d f (K,B)=20 d f (K,L)=30 d f (K,G)=43 Data Structure “List of Lists” Additional information about possible predecessor objects are stored in order to postpone exact distance calculations as long as possible. Integrated Multi-Step Clustering Approach d o (R,Q)=53 d o (Z,Q)=69 d f (M,Q)=55 d f (A,Q)=58 d f (I,Q)=65 result list of the current query object Q which has to be inserted into the extended seedlist d o (K,Q)=53 d 0 (M,C)=65 d 0 (Z,Q)=69 d 0 (R,Q)=53 d f (M,Q)=55d f (A,Q)=58 Extended Seedlist

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK d f (A,Q)=58 d f (R,B)=18 d f (R,D)=34 d f (K,B)=20 d f (K,L)=30 d f (K,G)=43 Data Structure “List of Lists” Additional information about possible predecessor objects are stored in order to postpone exact distance calculations as long as possible. Extended Seedlist Integrated Multi-Step Clustering Approach d o (R,Q)=53 d o (Z,Q)=69 d f (M,Q)=55 d f (A,Q)=58 d f (I,Q)=65 result list of the current query object Q which has to be inserted into the extended seedlist d o (K,Q)=53 d 0 (M,C)=65 d 0 (Z,Q)=69 d 0 (R,Q)=53 d f (M,Q)=55d f (I,Q)=65

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK d f (A,Q)=58d f (K,B)=20 d f (K,L)=30 d f (K,G)=43 Data Structure “List of Lists” Additional information about possible predecessor objects are stored in order to postpone exact distance calculations as long as possible. Determination of Next Query Object Integrated Multi-Step Clustering Approach d o (K,Q)=53 d 0 (M,C)=65 d 0 (Z,Q)=69 d 0 (R,Q)=53 d f (M,Q)=55d f (I,Q)=65d o (R,B)=44 d f (R,B)=18 d f (R,D)=34

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK d f (A,Q)=58 Data Structure “List of Lists” Additional information about possible predecessor objects are stored in order to postpone exact distance calculations as long as possible. Determination of Next Query Object Integrated Multi-Step Clustering Approach d f (K,B)=20 d f (K,L)=30 d f (K,G)=43 d o (K,Q)=53 d 0 (M,C)=65 d 0 (Z,Q)=69d f (M,Q)=55d f (I,Q)=65 d 0 (R,Q)=53 d o (R,B)=44 d f (R,D)=34 d o (R,B)=44

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK d f (A,Q)=58 Data Structure “List of Lists” Additional information about possible predecessor objects are stored in order to postpone exact distance calculations as long as possible. Determination of Next Query Object Integrated Multi-Step Clustering Approach d f (K,B)=20 d f (K,L)=30 d f (K,G)=43 d o (K,Q)=53 d 0 (M,C)=65 d 0 (Z,Q)=69d f (M,Q)=55d f (I,Q)=65 d 0 (R,Q)=53 d o (R,B)=44 d f (R,D)=34 d o (R,B)=44 d 0 (K,B)=25

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK Outline Foundations of Density-Based Clustering Core Object · Density-Reachability · DBSCAN · OPTICS Clustering of Complex Objects Direct Integration of the Multi-Step Query Processing Paradigm Experimental Evaluation

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK Experimental Evaluation High dimensional feature vectors representing CAD objects [DASFAA 03] not very selective filter used (Euclidean norm) Graphs representing images [DAWAK 03] Expensive exact distance function Selective filter used Test Data Sets

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK no. of objects runtime [sec.] no. of objects runtime [sec.] Feature vectors Already non-selective filters (feature vectors) are helpful for accelerating DBSCAN by up to an order of magnitude when using the new integrated multi-step query processing approach. The traditional multi-step query processing approach does not benefit from non- selective filters (feature vectors), as the cardinality of the candidate set is still high even when small  values are used. When filters of high selectivity (graphs) are used, our new integrated multi-step query processing approach leads to a speed-up of two orders of magnitude compared to a full table scan. Graphs Experimental Evaluation DBSCAN

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK no. of objects runtime [sec.] no. of objects runtime [sec.] When using filters of high selectivity (graphs), our new integrated multi-step query processing approach outperforms the traditional multi-step query processing approach and the full table scan by a factor of up to 30. For high  values, as used with OPTICS, the full table scan performs even better than the traditional multi-step query processing approach. Feature vectorsGraphs no. of objects Experimental Evaluation OPTICS

Martin Pfeifle, University of MunichICDM 2004, Brighton, UK Conclusions Summary „Efficient Density-Based Clustering of Complex Objects“ direct integration of the multi-step query processing paradigm into the clustering algorithm MinPts-nearest neighbor queries on the exact information postponing expensive exact distance computations as long as possible Future Work integration of the multi-step query processing paradigm into other data mining algorithms