Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining Frequent Subgraphs

Similar presentations


Presentation on theme: "Mining Frequent Subgraphs"— Presentation transcript:

1 Mining Frequent Subgraphs
COMP Seminar Spring 2011

2 FFSM: Fast Frequent Subgraph Mining -- An Overview:
How to solve graph isomorphism problem? A Novel Graph Canonical Form: CAM How to tackle subgraph isomorphism problem (NP-complete)? Incrementally maintained embeddings How to enumerate subgraphs: An Efficient Data Structure: CAM Tree Two Operations: CAM-join, CAM-extension. FSG: level wise search gSpan: depth first search 11/30/2018

3 Adjacency Matrix Every diagonal entry of adjacency matrix M corresponds to a distinct vertex in G and is filled with the label of this vertex. Every off-diagonal entry in the lower triangle part of M1 corresponds to a pair of vertices in G and is filled with the label of the edge between the two vertices and zero if there is no edge. p2 p5 a b d y x (P) p1 p3 p4 c M1 y b d c x a M2 y b c d x a M3 d b x a y c 1for an undirected graph, the upper triangle is always a mirror of the lower triangle 11/30/2018

4 Code A Code of n  n adjacency matrix M is defined as sequence of lower triangular entries (including the diagonal entries) in the order: M1,1 M2,1 M2,2 … Mn,1 Mn,2 …Mn,n-1 Mn,n M1 y b d c x a a M3 d b x a y c Code(M1): aybyxb0y0c00y0d < Code(M2): aybyxb00yd0y00c < Code(M3): bxby0d0y0cyy00a y b y x b y d y c M2 The Canonical Adjacency Matrix is the one produces the maximal code, using lexicographic order. 11/30/2018

5 MP Submatrix For an m  m matrix A, an n  n matrix B is A’s maximal proper submatrix (MP Submatrix), iff B is obtained by removing the last none-zero entry from A. M6 y b d c x a M5 b y c x a M2 b y a M1 M3 M4 b y x a We define a CAM is connected iff the corresponding graph is connected. Theorem I: A CAM’s MP submatrix is CAM Theorem II: A connected CAM’s MP submatrix is connected Also explain, the symmetric property and we also remove the row if there is no edge entry left. 11/30/2018

6 CAM Tree: Subgraphs y x p2 p5 a b d (P) p1 p3 p4 c 11/30/2018 b c d a
y a a a a d y b a c y b x b y y b b y b x b x x b b y c y d b x y a y a c b d y b a y a c b x d y b x a y a c b x d y b x a d y c b x The first blink object can be obtained by “superimposing” two objects above it. The second one can not. This is the motivation for suboptimal CAMs p2 p5 a b d y x (P) p1 p3 p4 c y a c b x y a d b x y b d c a y b d c x a y b c d x a 11/30/2018

7 CAM Tree: Frequent Subgraphs
x y a = 2/3 p2 p5 a b d y x (P) p1 p3 p4 c a b y x (Q) q1 q3 q2 a b y (S) s1 s3 s2 11/30/2018

8 How to Enumerate Nodes in a CAM Tree?
Two operations to explore CAM tree: CAM-Join CAM-Extension Augmenting CAM tree with Suboptimal CAMs Objectives: no false dismissal no redundancy Plus: We want to this efficiently! 11/30/2018

9 Suboptimal Tree We define a Suboptimal CAM as a matrix that its MP submatrix is a CAM. b c d a a b c y b b y b x b y d b x y a c d j e j e c y b x d j y a c b x d e j d y c b x p2 p5 a b d y x (P) p1 p3 p4 c May explain depth first search and using information from siblings. We don’t show the biggest one which has five edges in it due to space limitation y b d c x a j 11/30/2018

10 Summary Theorem: For a graph G, let CK-1 (Ck) be set of the suboptimal CAMs of all size-(K-1) (K) subgraphs of G (K ≥ 2). Every member of set CK can be enumerated unambiguously either by joining two members of set CK-1 or by extending a member in CK-1. 11/30/2018

11 Experimental Study Predictive Toxicology Evaluation Competition (PTE)
Contains: 337 compounds Each graph contains 27 nodes and 27 edges on average NIH DTP Anti-Viral Screen Test (DTP CA/CM) Chemicals are classified to be Confirmed Active (CA), Confirmed Moderate Active (CM) and Confirmed Inactive (CI). We formed a dataset contains CA (423) and CM (1083). Each graph contains 25 nodes and 27 edges on average 11/30/2018

12 Performance (PTE) Support Threshold (%) Support Threshold (%)
11/30/2018

13 Performance (DTP CACM)
Support Threshold (%) Support Threshold (%) 11/30/2018


Download ppt "Mining Frequent Subgraphs"

Similar presentations


Ads by Google