Download presentation
Presentation is loading. Please wait.
1
Sam Somuah REU-DIMACS 2010 Mentor: James Abello
Graph Mining Sam Somuah REU-DIMACS 2010 Mentor: James Abello
2
Outline Converting Data to Graphs using similarity measures
Project Data REU Participant Surveys DIMACS Workshop Abstracts Challenges Choosing “good” similarity measures Visualizing and detecting “interesting clusters”
3
REU-Participant Data
4
Procedure Convert each record into a smaller representative vector
“Aggregate” similar weighted attributes Convert the remaining weighted attributes into 3 digit numbers Leave binary attributes alone
5
Creating Graphs Each record becomes one vertex in the graph, joined by a weighted edge between them. Edge-weights: The calculated similarity between two records. Vertex-weights: The distance between each vector and a reference vector(eg. Zero vector or …)
6
Similarity Measures for Edge-Weights
Hellinger Distance Euclidean Distance
7
Similarity Measures Math Math:1 Comp.Sci:1 Biology:0
Jaccard Coefficient Math:1 Comp.Sci:1 Biology:0 Math:1 Comp.Sci:0 Biology:0 Math
8
Similarity Measures Jaccard Coefficient
9
Creating the Graph Use the GraphView software to visualize the graph.
Significance of colours
10
Pruning Graphs One method is using a Minimum Spanning Tree(MST)
11
Clustering We will derive clusters from the MST
A Cluster C is a set of nodes that are more similar to each other than to its complement. Clusters
12
Conclusion We can transform attributed data ( i.e. collection of records on a set of attributes) into weighted graphs, using a variety of similarity measures among the records. Visualizations of the weighted graphs can then be used to locate similar records and devise algorithms that can automatically create clusters from such datasets. These methods will also be used on larger datasets such as the DIMACS Workshop Abstracts and Publications.
13
References James Abello, Frank van Ham, and Neeraj Krishnan ASK-GraphView : A Large Scale Graph Visualization System IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 12, NO. 5, SEPTEMBER/OCTOBER 2006 Zahn, Charles Graph Theoretical Methods for Detecting and Describing Gestalt Clusters IEEE TRANSACTIONS ON COMPUTERS VOL. C- 20 No.1 January 1971
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.