Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sam Somuah REU-DIMACS 2010 Mentor: James Abello

Similar presentations


Presentation on theme: "Sam Somuah REU-DIMACS 2010 Mentor: James Abello"— Presentation transcript:

1 Sam Somuah REU-DIMACS 2010 Mentor: James Abello
Graph Mining Sam Somuah REU-DIMACS 2010 Mentor: James Abello

2 Outline Converting Data to Graphs using similarity measures
Project Data REU Participant Surveys DIMACS Workshop Abstracts Challenges Choosing “good” similarity measures Visualizing and detecting “interesting clusters”

3 REU-Participant Data

4 Procedure Convert each record into a smaller representative vector
“Aggregate” similar weighted attributes Convert the remaining weighted attributes into 3 digit numbers Leave binary attributes alone

5 Creating Graphs Each record becomes one vertex in the graph, joined by a weighted edge between them. Edge-weights: The calculated similarity between two records. Vertex-weights: The distance between each vector and a reference vector(eg. Zero vector or …)

6 Similarity Measures for Edge-Weights
Hellinger Distance Euclidean Distance

7 Similarity Measures Math Math:1 Comp.Sci:1 Biology:0
Jaccard Coefficient Math:1 Comp.Sci:1 Biology:0 Math:1 Comp.Sci:0 Biology:0 Math

8 Similarity Measures Jaccard Coefficient

9 Creating the Graph Use the GraphView software to visualize the graph.
Significance of colours

10 Pruning Graphs One method is using a Minimum Spanning Tree(MST)

11 Clustering We will derive clusters from the MST
A Cluster C is a set of nodes that are more similar to each other than to its complement. Clusters

12 Conclusion We can transform attributed data ( i.e. collection of records on a set of attributes) into weighted graphs, using a variety of similarity measures among the records. Visualizations of the weighted graphs can then be used to locate similar records and devise algorithms that can automatically create clusters from such datasets. These methods will also be used on larger datasets such as the DIMACS Workshop Abstracts and Publications.

13 References James Abello, Frank van Ham, and Neeraj Krishnan ASK-GraphView : A Large Scale Graph Visualization System IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 12, NO. 5, SEPTEMBER/OCTOBER 2006 Zahn, Charles Graph Theoretical Methods for Detecting and Describing Gestalt Clusters IEEE TRANSACTIONS ON COMPUTERS VOL. C- 20 No.1 January 1971


Download ppt "Sam Somuah REU-DIMACS 2010 Mentor: James Abello"

Similar presentations


Ads by Google