Mining Graphs.

Slides:



Advertisements
Similar presentations
gSpan: Graph-based substructure pattern mining
Advertisements

Graph-02.
Graph Isomorphism Algorithms and networks. Graph Isomorphism 2 Today Graph isomorphism: definition Complexity: isomorphism completeness The refinement.
Graphs Chapter 12. Chapter Objectives  To become familiar with graph terminology and the different types of graphs  To study a Graph ADT and different.
Graphs Chapter 20 Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013.
Introduction to Graph Mining
 Graph Graph  Types of Graphs Types of Graphs  Data Structures to Store Graphs Data Structures to Store Graphs  Graph Definitions Graph Definitions.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
ITEC200 – Week 12 Graphs. 2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study.
Association Analysis (7) (Mining Graphs)
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
Spring 2010CS 2251 Graphs Chapter 10. Spring 2010CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs.
Mining Tree-Query Associations in a Graph Bart Goethals University of Antwerp, Belgium Eveline Hoekx Jan Van den Bussche Hasselt University, Belgium.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Data Mining Association Rules: Advanced Concepts and Algorithms
Fall 2007CS 2251 Graphs Chapter 12. Fall 2007CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs To.
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Graphs Chapter 20 Data Structures and Problem Solving with C++: Walls and Mirrors, Frank Carrano, © 2012.
Section 1.2 Isomorphisms By Christina Touhey and Sarah Graham.
Graphs, relations and matrices
FAST FREQUENT FREE TREE MINING IN GRAPH DATABASES Marko Lazić 3335/2011 Department of Computer Engineering and Computer Science,
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
Advanced Association Rule Mining and Beyond. Continuous and Categorical Attributes Example of Association Rule: {Number of Pages  [5,10)  (Browser=Mozilla)}
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
© 2006 Pearson Addison-Wesley. All rights reserved14 A-1 Chapter 14 Graphs.
7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.
© by Kenneth H. Rosen, Discrete Mathematics & its Applications, Sixth Edition, Mc Graw-Hill, 2007 Chapter 9 (Part 2): Graphs  Graph Terminology (9.2)
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Graphs.
Chapter 6 Graph Theory R. Johnsonbaugh Discrete Mathematics 5 th edition, 2001.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 By Gun Ho Lee Intelligent Information Systems Lab.
An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
Graphs. Definitions A graph is two sets. A graph is two sets. –A set of nodes or vertices V –A set of edges E Edges connect nodes. Edges connect nodes.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
Data Mining Association Rules: Advanced Concepts and Algorithms
Chapter 10 Graph Theory Eulerian Cycle and the property of graph theory 10.3 The important property of graph theory and its representation 10.4.
An Introduction to Graph Theory
Discrete Mathematical Structures: Theory and Applications
Graphs. Graphs Similar to the graphs you’ve known since the 5 th grade: line graphs, bar graphs, etc., but more general. Those mathematical graphs are.
Graphs Chapter 12. Chapter 12: Graphs2 Chapter Objectives To become familiar with graph terminology and the different types of graphs To study a Graph.
Graphs Basic properties.
Introduction to Graph Theory By: Arun Kumar (Asst. Professor) (Asst. Professor)
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Mining Complex Data COMP Seminar Spring 2011.
Week 11 - Wednesday.  What did we talk about last time?  Graphs  Paths and circuits.
Gspan: Graph-based Substructure Pattern Mining
Graphs ORD SFO LAX DFW Graphs 1 Graphs Graphs
Chapter 9 (Part 2): Graphs
Mining Association Rules: Advanced Concepts and Algorithms
Data Mining Association Rules: Advanced Concepts and Algorithms
Special Graphs By: Sandeep Tuli Astt. Prof. CSE.
Advanced Pattern Mining 02
Mining Frequent Subgraphs
Association Analysis: Advance Concepts
Mining Frequent Subgraphs
Isomorphism in GRAPHS.
Chapter 14 Graphs © 2006 Pearson Addison-Wesley. All rights reserved.
Graph Vocabulary.
Applied Discrete Mathematics Week 13: Graphs
Agenda Review Lecture Content: Shortest Path Algorithm
INTRODUCTION A graph G=(V,E) consists of a finite non empty set of vertices V , and a finite set of edges E which connect pairs of vertices .
Presentation transcript:

Mining Graphs

Frequent Subgraph Mining Extend association rule mining to finding frequent subgraphs Useful for computational chemistry, Web Mining, spatial data sets, etc. Homepage Teaching Databases Data Mining

Example In drug discovery, the goal is to identify common parts in molecules sharing similar chemical properties. Use the two dimensional atom-bond structure of molecules. The database is searched for subgraphs that appear at least in a certain number of molecules. A famous example for a frequent molecular fragment is the so called AZT, which is a well-known HIV-1 inhibitor (see Figure on the right)

Graph Definitions

Mining Subgraphs

Apriori-Like Approach Support: number of graphs that contain a particular subgraph Apriori principle still holds Level-wise (Apriori-like) approach: Vertex growing: k is the number of vertices Edge growing: k is the number of edges

Apriori-Like Algorithm Generate candidate Merge pairs of frequent (k - 1)-subgraphs to obtain a candidate k-subgraphs. Prune candidates Discard all candidate k-subgraphs that contain infrequent (k - l)-subgraphs. Count support Counting the number of graphs in DB that contain each candidate. Discard all candidate subgraphs whose support counts are less than minsup.

Vertex Growing r The resulting matrix is the first matrix, appended with the last row and last column of the second matrix. The remaining entries of the new matrix are either zero or replaced by all valid edge labels connecting the pair of vertices.

Edge Growing Edge growing inserts a new edge to an existing frequent subgraph during candidate generation. Doesn’t necessarily increase the number of vertices in the original graphs.

Topological equivalence Two vertices are topologically equivalent if they have: The same label and The same number and label of edges incident to them. v1,v4 are topologically equivalent v2,v3 are topologically equivalent No topologically equivalent vertices v1,v2,v3,v4 are topologically equivalent

Multiplicity of Candidates Case 1a: v  v’ , v1v2 (Topologically in the (k-2)-graphs) + a b d c q p r v v’ e v1 v2 e a b d c q p r Core: The (k-2)-edge subgraph that is common between the joint graphs We try to map the cores.

Multiplicity of Candidates Case 1b: v  v’ , v1=v2 (Topologically in the (k-2)-graphs) e a b c q p r + a b e c q p r v v’ v1 v2 a b e c q p r

Multiplicity of Candidates Case 2a: v  v’ , v1v2 (Topologically in the (k-2)-graphs) + a b d c q p r v v’ e v2 v1 e a b d c q p r

Multiplicity of Candidates Case 2b: v  v’ , v1=v2 (Topologically in the (k-2)-graphs) e a b c q p r + a b e c q p r v v’ v1 v2 e a b c q p r

Multiplicity of Candidates Case 2c: v  v’ (Topologically in the (k-2)-graphs) e a b d q r p + a b d q r v v’ e p e a b d q r p We try to map the cores, and there two ways to do this.

Multiplicity of Candidates Case 2d: v  v’ (Topologically in the (k-2)-graphs) e a b q r p + a b e q r v v’ p e a b q r p We try to map the cores, and there two ways to do this.

Multiplicity of Candidates More than two topologically equivalent vertexes a c b + a b c Core: The (k-2) subgraph that is common between the joint graphs

Adjacency Matrix Representation B(5) B(6) B(7) B(8) 1 A(1) A(2) A(3) A(4) B(5) B(6) B(7) B(8) 1 The same graph can be represented in many ways

Graph Isomorphism A graph G1 is isomorphic to another graph G2, if G1 is topologically equivalent to G2 Test for graph isomorphism is needed: During candidate generation, to determine whether a candidate can be generated During candidate pruning, to check whether its (k-1)-subgraphs are frequent During candidate counting, to check whether a candidate is contained within another graph, we should use more specialized algorithms (possibly using indexes with each frequent (k-1) sub-graph)

Codes A(1) A(2) A(3) A(4) B(5) B(6) B(7) B(8) 1 Code =1 10 011 1000 01001 001010 0001011 A(1) A(2) A(3) A(4) B(5) B(6) B(7) B(8) 1 Code =1011010010100000100110001110

Graph Isomorphism Use canonical labeling to handle isomorphism Map each graph into an ordered string representation (known as its code) such that two isomorphic graphs will be mapped to the same canonical encoding Example: Choose the string representation with the lowest Lexicographical value Then, the graph isomorphism problem can be solved by string matching.