Sam Somuah REU-DIMACS 2010 Mentor: James Abello

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

DIFFERENTIAL PRIVACY REU Project Mentors: Darakhshan Mir James Abello Marco A. Perez.
Clustering.
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Clustering Categorical Data The Case of Quran Verses
Three things everyone should know to improve object retrieval
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Data Mining Techniques: Clustering
CMU SCS : Multimedia Databases and Data Mining Lecture #16: Text - part III: Vector space model and clustering C. Faloutsos.
Terminology species data = the measured variables we want to explain (response or dependent variables) environmental data = the variables we use for explaining.
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
Region Segmentation. Find sets of pixels, such that All pixels in region i satisfy some constraint of similarity.
Visual Mining of Communities in Complex Networks: Bringing Humans Into the Loop Perceptual Science and Technology REU Jack Murtagh & Florentina Ferati.
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
Distinguishing Photographic Images and Photorealistic Computer Graphics Using Visual Vocabulary on Local Image Edges Rong Zhang,Rand-Ding Wang, and Tian-Tsong.
Segmentation Graph-Theoretic Clustering.
1 Worst and Best-Case Coverage in Sensor Networks Seapahn Meguerdichian, Farinaz Koushanfar, Miodrag Potkonjak, Mani Srivastava IEEE TRANSACTIONS ON MOBILE.
Interactive Visualization of the Stock Market Graph Presented by Camilo Rostoker Department of Computer Science University of British.
Protein Encoding Optimization Student: Logan Everett Mentor: Endre Boros Funded by DIMACS REU 2004.
Minimum Spanning Trees. Subgraph A graph G is a subgraph of graph H if –The vertices of G are a subset of the vertices of H, and –The edges of G are a.
1 Visual Analysis of Large Heterogeneous Social Networks by Semantic and Structural Abstraction Zequian shen, Kwan-Liu Ma, Tina Eliassi-Rad Department.
Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
Hubert CARDOTJY- RAMELRashid-Jalal QURESHI Université François Rabelais de Tours, Laboratoire d'Informatique 64, Avenue Jean Portalis, TOURS – France.
Dimacs Graph Mining (via Similarity Measures) Ye Zhu Stephanie REU-DIMACS, July 17, 2009 Mentor : James Abello.
10/2/2015 3:00 PMCampus Tour1. 10/2/2015 3:00 PMCampus Tour2 Outline and Reading Overview of the assignment Review Adjacency matrix structure (§12.2.3)
Minimum Spanning Tree Given a weighted graph G = (V, E), generate a spanning tree T = (V, E’) such that the sum of the weights of all the edges is minimum.
1 Converting Categories to Numbers for Approximate Nearest Neighbor Search 嘉義大學資工系 郭煌政 2004/10/20.
Automated Social Hierarchy Detection through Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
Identification of Regulatory Binding Sites Using Minimum Spanning Trees Pacific Symposium on Biocomputing, pp , 2003 Reporter: Chu-Ting Tseng Advisor:
7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.
Module 5 – Networks and Decision Mathematics Chapter 23 – Undirected Graphs.
1 Optimal Cycle Vida Movahedi Elder Lab, January 2008.
A Discrepancy Detector James Abello, CCICADA-DIMACS FACULTY ( Student: Nishchal Devanur CS Dept Rutgers Goal To detect the most influential.
Data Extraction using Image Similarity CIS 601 Image Processing Ajay Kumar Yadav.
A Discrepancy Detector James Abello, CCICADA-DIMACS FACULTY ( Student: Nishchal Devanur CS Dept Rutgers Goal To detect the most influential.
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
Clustering.
Chapter 2: Getting to Know Your Data
Mining and Visualizing the Evolution of Subgroups in Social Networks Falkowsky, T., Bartelheimer, J. & Spiliopoulou, M. (2006) IEEE/WIC/ACM International.
Minas Gjoka, Emily Smith, Carter T. Butts
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
COMBO-17 Galaxy Dataset Colin Holden COSC 4335 April 17, 2012.
Coverage Problems in Wireless Ad-hoc Sensor Networks Seapahn Meguerdichian 1 Farinaz Koushanfar 2 Miodrag Potkonjak 1 Mani Srivastava 2 University of California,
Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos VC 15/16 – TP10 Advanced Segmentation Miguel Tavares.
Prims Algorithm for finding a minimum spanning tree
Connected Point Coverage in Wireless Sensor Networks using Robust Spanning Trees IEEE ICDCSW, 2011 Pouya Ostovari Department of Computer and Information.
Graphs Definition: a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
Computer Math CPS120 Introduction to Computer Science Lecture 7.
Normalized Cuts and Image Segmentation Patrick Denis COSC 6121 York University Jianbo Shi and Jitendra Malik.
Shuang Wu REU-DIMACS, 2010 Mentor: James Abello. Project description Our research project Input: time data recorded from the ‘Name That Cluster’ web page.
Miguel Tavares Coimbra
Lecture 2-2 Data Exploration: Understanding Data
Minimum Spanning Tree Chapter 13.6.
Associative Query Answering via Query Feature Similarity
Similarity and Dissimilarity
Campus Tour 11/16/2018 3:14 PM Campus Tour Campus Tour
Autumn 2016 Lecture 11 Minimum Spanning Trees (Part II)
Segmentation Graph-Theoretic Clustering.
15-826: Multimedia Databases and Data Mining
Visualizing Prim’s MST Algorithm Used to Trace the Algorithm in Class
Connected Components Minimum Spanning Tree
Autumn 2015 Lecture 11 Minimum Spanning Trees (Part II)
Kruskal’s Algorithm for finding a minimum spanning tree
Campus Tour 2/23/ :26 AM Campus Tour Campus Tour
Group 9 – Data Mining: Data
Minimum Spanning Trees (MSTs)
Clustering The process of grouping samples so that the samples are similar within each group.
Presentation transcript:

Sam Somuah REU-DIMACS 2010 Mentor: James Abello Graph Mining Sam Somuah REU-DIMACS 2010 Mentor: James Abello

Outline Converting Data to Graphs using similarity measures Project Data REU Participant Surveys DIMACS Workshop Abstracts Challenges Choosing “good” similarity measures Visualizing and detecting “interesting clusters”

REU-Participant Data

Procedure Convert each record into a smaller representative vector “Aggregate” similar weighted attributes Convert the remaining weighted attributes into 3 digit numbers Leave binary attributes alone

Creating Graphs Each record becomes one vertex in the graph, joined by a weighted edge between them. Edge-weights: The calculated similarity between two records. Vertex-weights: The distance between each vector and a reference vector(eg. Zero vector or …)

Similarity Measures for Edge-Weights Hellinger Distance Euclidean Distance

Similarity Measures Math Math:1 Comp.Sci:1 Biology:0 Jaccard Coefficient Math:1 Comp.Sci:1 Biology:0 Math:1 Comp.Sci:0 Biology:0 Math

Similarity Measures Jaccard Coefficient

Creating the Graph Use the GraphView software to visualize the graph. Significance of colours

Pruning Graphs One method is using a Minimum Spanning Tree(MST)

Clustering We will derive clusters from the MST A Cluster C is a set of nodes that are more similar to each other than to its complement. Clusters

Conclusion We can transform attributed data ( i.e. collection of records on a set of attributes) into weighted graphs, using a variety of similarity measures among the records. Visualizations of the weighted graphs can then be used to locate similar records and devise algorithms that can automatically create clusters from such datasets. These methods will also be used on larger datasets such as the DIMACS Workshop Abstracts and Publications.

References James Abello, Frank van Ham, and Neeraj Krishnan ASK-GraphView : A Large Scale Graph Visualization System IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 12, NO. 5, SEPTEMBER/OCTOBER 2006 Zahn, Charles Graph Theoretical Methods for Detecting and Describing Gestalt Clusters IEEE TRANSACTIONS ON COMPUTERS VOL. C- 20 No.1 January 1971