Download presentation
Presentation is loading. Please wait.
Published byGillian Wilkins Modified over 9 years ago
1
GRAPH-BASED HIERARCHICAL CONCEPTUAL CLUSTERING by Istvan Jonyer, Lawrence B. Holder and Diane J. Cook The University of Texas at Arlington
2
Outline What is hierarchical conceptual clustering? Overview of Subdue Conceptual clustering in Subdue Evaluation of hierarchical clusterings Experiments and results Conclusions
3
What is clustering?
4
What is hierarchical conceptual clustering? Unsupervised concept learning Generating hierarchies to explain data Applications – Hypothesis generation and testing – Prediction based on groups – Finding taxonomies
5
Example hierarchical conceptual clustering Animals BodyTemp: unregulated HeartChamber: four BodyTemp: regulated Fertilization: internal Fertilization: external Name: mammal BodyCover: hair Name: bird BodyCover: feathers Name: reptile BodyCover: cornified-skin HeartChamber: imperfect-four Fertilization: internal Name: fish BodyCover: scales HeartChamber: two Name: amphibian BodyCover: moist-skin HeartChamber: three
6
The Problem Hierarchical conceptual clustering in discrete-valued structural databases Existing systems: – Continuous-valued – Discrete but unstructured – We can do better! (Field under explored)
7
Related Work Cobweb Labyrinth AutoClass Snob In Euclidian space: Chameleon, Cure Unsupervised learning algorithms
8
The Solution Take Subdue and extend it!
9
Overview of Subdue Data mining in graph representations of structural databases A C BD A C BD F E f c b a d e a b c g
10
Overview of Subdue Iteratively searching for best substructure by MDL heuristic A C BD c b a
11
Overview of Subdue Compress using best substructure S S F E f d e g
12
Overview of Subdue Fuzzy match – Inexact matching of subgraphs – Applications: Defining fuzzy concepts Evaluation of clusterings
13
Conceptual Clustering with Subdue Use Subdue to identify clusters – The best subgraph in an iteration defines a cluster When to stop within an iteration? 1) Use –limit option 2) Use –size option 3) Use first minimum heuristic (new)
14
The First Minimum Heuristic Use subgraph at first local minimum – Detect it using –prune2 option
15
The First Minimum Heuristic Not a greedy heuristic! – Although first local minimum is usually the global minimum – First local minimum is caused by a smaller, more frequently occurring subgraph – Subsequent minima are caused by bigger, less frequently occurring subgraphs => First subgraph is more general
16
The First Minimum Heuristic A multi-minimum search space:
17
Lattice vs. Tree Previous work defined classification trees – Inadequate in structured domains Better hierarchical description: classification lattice – A cluster can have more than one parent – A parent can be at any level (not only one level above)
18
Hierarchical Clustering in Subdue Subdue can compress by a subgraph after each iteration Subsequent clusters may be defined in terms of previously defined clusters This results in a hierarchy
19
Hierarchical Conceptual Clustering of an Artificial Domain
20
Root
21
Evaluation of Clusterings Traditional evaluation: – Not applicable to hierarchical domains No known evaluation for hierarchical clusterings – Most hierarchical evaluations are anecdotal
22
New Evaluation Heuristic for Hierarchical Clusterings Properties of a good clustering: – Small number of clusters Large coverage good generality – Big cluster descriptions More features more inferential power – Minimal or no overlap between clusters More distinct clusters better defined concepts
23
New Evaluation Heuristic for Hierarchical Clusterings Big clusters: bigger distance between disjoint clusters Overlap: less overlap bigger distance Few clusters: averaging comparisons
24
Experiments and Results Validation in an artificial domain Validation in unstructured domains Comparison to existing systems Real world applications
25
The Animal Domain NameBody Cover Heart ChamberBody Temp.Fertilization mammalhairfourregulatedinternal birdfeathersfourregulatedinternal reptilecornified-skinimperfect-fourunregulatedinternal amphibianmoist-skinthreeunregulatedexternal fishscalestwounregulatedexternal animal hair mammal BodyCover Fertilization HeartChamber BodyTemp internalregulated Name four
26
Hierarchical Clustering of the Animal Domain Animals BodyTemp: unregulated HeartChamber: four BodyTemp: regulated Fertilization: internal Fertilization: external Name: mammal BodyCover: hair Name: bird BodyCover: feathers Name: reptile BodyCover: cornified-skin HeartChamber: imperfect-four Fertilization: internal Name: fish BodyCover: scales HeartChamber: two Name: amphibian BodyCover: moist-skin HeartChamber: three
27
Hierarchical Clustering of the Animal Domain by Cobweb animals amphibian/fish mammal/bird reptile mammalbird fishamphibian
28
Comparison of Subdue and Cobweb Quality of Subdue’s lattice (tree): 2.60 Quality of Cobweb’s tree: 1.74 Therefore Subdue is better Reasons for a higher score: – Better generalization resulting in less clusters – Eliminating overlap between (reptile) and (amphibian/fish)
29
Chemical Application: Clustering of a DNA sequence
30
Coverage – 61% – 68% – 71% DNA O | O == P — OH C — NC — C \ O | O == P — OH | O | CH 2 C \ N — C \ C O \ C / \ C — C N — C / \ O C
31
Conclusions Goal of hierarchical conceptual clustering of structured databases was achieved Synthesized classification lattice Developed new evaluation heuristic for hierarchical clusterings Good performance in comparison to other systems, even in unstructured domains
32
Future Work More experiments on real-world domains Comparison to other systems Incorporation of evaluation tool into Subdue
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.