Presentation is loading. Please wait.

Presentation is loading. Please wait.

SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

Similar presentations


Presentation on theme: "SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004."— Presentation transcript:

1 SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004

2 Introduction ► Graphs model a relations among data  Inter-disciplinary research ► Huge number of recurring patterns ► To mining only maximal frequent subgraphs.  None of its super graphs are frequent

3 Advantages ► Reducing the total number of mined subgraphs  Saving space and analysis effort ► Reducing mining time ► Non-maximal frequent subgraph can be reconstructed. ► Maximal frequent subgraphs are of most interest in some appliations.

4 Algorithm ► Mining all frequent trees from a general graph database.  Tree normalization is simpler than graph.  In certain applications, most of the frequent subgraphs are really trees.  Use current subgraph mining algorithm  Mining subtrees from a forest

5 Algorithm ► Reconstruct all maximal subgraphs from the mined trees.  For each frequent tree T, find all frequent subgraphs whose canonical spanning tree are isomorphic to T  Enumerate the equvalence class of a tree T  Maximal subgraph mining

6 Tree-based Equivalence Classes ► A subtree T is a spanning tree of G if T contains all nodes in G.  Maximal one: canonical spanning tree ► Group all frequent subgraphs in to equivalence classes based on spanning trees.

7 Spanning tree

8 Tree-based Equivalence Classes back

9 12 singletons group b a y b a x a a y a a x a y b a y a x b a y a x b a x a x a a y a x b a x a y a y b a y a x a y b a y a x a y a b x a x

10 Enumerating Graphs from Trees ► G C :{e 1,e 2, …,e n }  If frequent -> edge C (candidate set) ► Search space of G : G:C ={G+y|y 2 C } GO

11 Optimizations ► Removing a set of frequent subgraphs that can not be maximal from a search space ► Locally maximal : frequent subgraph G is maximal in its equivalence class ► Globally maximal : maximal frequent in a graph database ► Avoid enumerating subgraphs which are not locally maximal.

12 Bottom-up Pruning ► G ’ = G C  G ’ is frequent : each graph in search space is a subgraph of G ’ and not maximal

13 Tail Shrink ► Embedding of G in G ’ is a subgraph isomorphism f from G to G ’  Two embeddings of L in P l 1 ->P 1, l 2 ->P 2, l 3 ->P 3, l 4 ->P 4 l 1 ->P 1, l 2 ->P 3,l 3 ->P 2,l 4 ->P 4 go

14 Tail Shrink ► candidate edge (i, j, e l ) is associative to a graph G  It appears in every embedding of G in a graph databases ► If a tree T contains a set of associative edges, any maximal frequent graph G, a superset of T, must contains all associative edges.

15 Tail Shrink ► Remove associative edges from candidate sets and augment them to T without missing any maximal ones  Reducing the search space  Prune the entire equivalences class in certain cases ► A set of associative edges C of a tree T is lethal  G ’ = T C has a canonical spanning tree different from that of T go

16 External-Edge Pruning ► Remove one equivalence class without any knowledge about its candidate edges ► External-edge for a graph G: it connects a node in G and a node not in G ► (i, e l, v l ) is associative to a graph G  Every embedding f of G in a graph G ’, G ’ has a node v with the label v l  v connects to the node f(i) with an edge label e l in G ’  Not exist node j V[G] such that v = f(j)

17 Associative external edges

18 Experiments ► 2.8GHz Pentium Xeon, ► 512KB L2 cache,2GB main memory ► Red Hat Linux 7.3 ► C++ Programming language

19 Synthetic Dataset D10KT30L200I11V4E4

20 DTP CA data set

21 DTP CM data set


Download ppt "SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004."

Similar presentations


Ads by Google