Association Analysis (7) (Mining Graphs)

Slides:



Advertisements
Similar presentations
gSpan: Graph-based substructure pattern mining
Advertisements

www.brainybetty.com1 MAVisto A tool for the exploration of network motifs By Guo Chuan & Shi Jiayi.
Graph-02.
Graph Isomorphism Algorithms and networks. Graph Isomorphism 2 Today Graph isomorphism: definition Complexity: isomorphism completeness The refinement.
Graphs Chapter 20 Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013.
Introduction to Graph Mining
Connected Substructure Similarity Search Haichuan Shang The University of New South Wales & NICTA, Australia Joint Work: Xuemin Lin (The University of.
 Graph Graph  Types of Graphs Types of Graphs  Data Structures to Store Graphs Data Structures to Store Graphs  Graph Definitions Graph Definitions.
Mining Graphs.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Generalized Sequential Pattern (GSP) Step 1: – Make the first pass over the sequence database D to yield all the 1-element frequent sequences Step 2: Repeat.
Spring 2010CS 2251 Graphs Chapter 10. Spring 2010CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Data Mining Association Rules: Advanced Concepts and Algorithms
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Graphs Chapter 20 Data Structures and Problem Solving with C++: Walls and Mirrors, Frank Carrano, © 2012.
The Maximum Independent Set Problem Sarah Bleiler DIMACS REU 2005 Advisor: Dr. Vadim Lozin, RUTCOR.
Section 1.2 Isomorphisms By Christina Touhey and Sarah Graham.
9.3 Representing Graphs and Graph Isomorphism
Graphs, relations and matrices
FAST FREQUENT FREE TREE MINING IN GRAPH DATABASES Marko Lazić 3335/2011 Department of Computer Engineering and Computer Science,
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
Advanced Association Rule Mining and Beyond. Continuous and Categorical Attributes Example of Association Rule: {Number of Pages  [5,10)  (Browser=Mozilla)}
7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.
1 Topics Intro. to Graphs (11.1) A B C D EF G. 2 Definition (p.650) A graph G = V(G) + E(G) A set of vertices (or nodes), V(G) = {v 1, v 2, …, v n } A.
© by Kenneth H. Rosen, Discrete Mathematics & its Applications, Sixth Edition, Mc Graw-Hill, 2007 Chapter 9 (Part 2): Graphs  Graph Terminology (9.2)
Chapter 6 Graph Theory R. Johnsonbaugh Discrete Mathematics 5 th edition, 2001.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 By Gun Ho Lee Intelligent Information Systems Lab.
An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾.
Based on slides by Y. Peng University of Maryland
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
Week 11 - Monday.  What did we talk about last time?  Binomial theorem and Pascal's triangle  Conditional probability  Bayes’ theorem.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
Data Mining Association Rules: Advanced Concepts and Algorithms
An Introduction to Graph Theory
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
Graphs. Graphs Similar to the graphs you’ve known since the 5 th grade: line graphs, bar graphs, etc., but more general. Those mathematical graphs are.
Graphs Basic properties.
2004/12/31 報告人 : 邱紹禎 1 Mining Frequent Query Patterns from XML Queries L.H. Yang, M.L. Lee, W. Hsu, and S. Acharya. Proc. of 8th Int. Conf. on Database.
1 Finding a decomposition of a graph T into isomorphic copies of a graph G is a classical problem in Combinatorics. The G-decomposition of T is balanced.
Introduction to Graph Theory By: Arun Kumar (Asst. Professor) (Asst. Professor)
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Mining Complex Data COMP Seminar Spring 2011.
Week 11 - Wednesday.  What did we talk about last time?  Graphs  Paths and circuits.
Gspan: Graph-based Substructure Pattern Mining
Graphs ORD SFO LAX DFW Graphs 1 Graphs Graphs
Chapter 9 (Part 2): Graphs
Mining Association Rules: Advanced Concepts and Algorithms
Data Mining Association Rules: Advanced Concepts and Algorithms
Special Graphs By: Sandeep Tuli Astt. Prof. CSE.
Graphs Hubert Chan (Chapter 9) [O1 Abstract Concepts]
Advanced Pattern Mining 02
Mining Frequent Subgraphs
Association Analysis: Advance Concepts
Based on slides by Y. Peng University of Maryland
Mining Frequent Subgraphs
Isomorphism in GRAPHS.
Mining Frequent Subgraphs
Trees-2, Graphs Data Structures with C Chpater-6 Course code: 10CS35
Graph Vocabulary.
Applied Discrete Mathematics Week 13: Graphs
Agenda Review Lecture Content: Shortest Path Algorithm
INTRODUCTION A graph G=(V,E) consists of a finite non empty set of vertices V , and a finite set of edges E which connect pairs of vertices .
Presentation transcript:

Association Analysis (7) (Mining Graphs)

Frequent Subgraph Mining Extend association rule mining to finding frequent subgraphs Useful for Web Mining, computational chemistry, spatial data sets, etc Homepage Teaching Databases Data Mining

Bio/Chem-Informatics Each year, new chemical compounds are designed. We know that structure of a compound plays a big role in its chemical properties. However, it is difficult to establish their exact relationship. Frequent subgraph mining can aid by identifying the substructures commonly associated with certain properties of known compounds.

Web mining E.g. Mining the DBLP Web Graph A mined subgraph Garcia-Molina Widom Jeff Ullman Alfred Aho Lenzerini Calvanese Vardi Kuperferman A mined subgraph Two examples of matches

Graph Definitions

Mining Subgraphs

The Exhaustive Way…Listing all...

Apriori-Like Approach Support: number of graphs that contain a particular subgraph Apriori principle still holds Level-wise (Apriori-like) approach: Vertex growing: k is the number of vertices Edge growing: k is the number of edges

Apriori-Like Algorithm Generate candidate Merge pairs of frequent (k - 1)-subgraphs to obtain a candidate k-subgraphs. Prune candidates Discard all candidate k-subgraphs that contain infrequent (k - l)-subgraphs. Count support Counting the number of graphs in DB that contain each candidate. Discard all candidate subgraphs whose support counts are less than minsup.

Vertex Growing r The resulting matrix is the first matrix, appended with the last row and last column of the second matrix. The remaining entries of the new matrix are either zero or replaced by all valid edge labels connecting the pair of vertices.

Edge Growing Edge growing inserts a new edge to an existing frequent subgraph during candidate generation. Doesn’t necessarily increase the number of vertices in the original graphs.

Topological equivalence Two vertexes are topologically equivalent if they have: The same label and The same number and label of edges incident to them. v1,v4 are topologically equivalent v2,v3 are topologically equivalent No topologically equivalent vertexes v1,v2,v3,v4 are topologically equivalent

Multiplicity of Candidates Case 1a: v  v’ , v1v2 (Topologically in the (k-2)-graphs) + a b d c q p r v v’ e v1 v2 e a b d c q p r Core: The (k-2)-edge subgraph that is common between the joint graphs We try to map the cores.

Multiplicity of Candidates Case 1b: v  v’ , v1=v2 (Topologically in the (k-2)-graphs) e a b c q p r + a b e c q p r v v’ v1 v2 a b e c q p r

Multiplicity of Candidates Case 2a: v  v’ , v1v2 (Topologically in the (k-2)-graphs) + a b d c q p r v v’ e v2 v1 e a b d c q p r

Multiplicity of Candidates Case 2b: v  v’ , v1=v2 (Topologically in the (k-2)-graphs) e a b c q p r + a b e c q p r v v’ v1 v2 e a b c q p r

Multiplicity of Candidates Case 2c: v  v’ (Topologically in the (k-2)-graphs) e a b d q r p + a b d q r v v’ e p e a b d q r p We try to map the cores, and there two ways to do this.

Multiplicity of Candidates Case 2d: v  v’ (Topologically in the (k-2)-graphs) e a b q r p + a b e q r v v’ p e a b q r p We try to map the cores, and there two ways to do this.

Multiplicity of Candidates More than two topologically equivalent vertexes b a c a a a a a b c b a c + a a a a a a a a a a c a Core: The (k-2) subgraph that is common between the joint graphs a b a

Adjacency Matrix Representation B(5) B(6) B(7) B(8) 1 A(1) A(2) A(3) A(4) B(5) B(6) B(7) B(8) 1 The same graph can be represented in many ways

Graph Isomorphism A graph G1 is isomorphic to another graph G2, if G1 is topologically equivalent to G2 Test for graph isomorphism is needed: During candidate generation, to determine whether a candidate can be generated During candidate pruning, to check whether its (k-1)-subgraphs are frequent During candidate counting, to check whether a candidate is contained within another graph, we should use more specialized algorithms (possibly using indexes with each frequent (k-1) sub-graph)

Codes A(1) A(2) A(3) A(4) B(5) B(6) B(7) B(8) 1 Code =1 10 011 1000 01001 001010 0001011 A(1) A(2) A(3) A(4) B(5) B(6) B(7) B(8) 1 Code =1011010010100000100110001110

Graph Isomorphism Use canonical labeling to handle isomorphism Map each graph into an ordered string representation (known as its code) such that two isomorphic graphs will be mapped to the same canonical encoding Example: Choose the string representation with the lowest Lexicographical value Then, the graph isomorphism problem can be solved by string matching.