SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

Slides:



Advertisements
Similar presentations
Recap: Mining association rules from large datasets
Advertisements

Graph Mining Laks V.S. Lakshmanan
Greedy Algorithms.
 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,
gSpan: Graph-based substructure pattern mining
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2011 More Control Flow John Cavazos University.
1 Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets.
Breadth-First Search Seminar – Networking Algorithms CS and EE Dept. Lulea University of Technology 27 Jan Mohammad Reza Akhavan.
Connected Substructure Similarity Search Haichuan Shang The University of New South Wales & NICTA, Australia Joint Work: Xuemin Lin (The University of.
Mining Graphs.
Data Mining Association Analysis: Basic Concepts and Algorithms
Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang
Efficiently Mining Long Patterns from Databases Roberto J. Bayardo Jr. IBM Almaden Research Center.
IGraph: A Framework for Comparisons of Disk-Based Graph Indexing Techniques Jeffrey Xu Yu et. al. VLDB ‘10 Presented by Tao Yu.
Association Analysis (7) (Mining Graphs)
Chen Chen 1, Cindy X. Lin 1, Matt Fredrikson 2, Mihai Christodorescu 3, Xifeng Yan 4, Jiawei Han 1 1 University of Illinois at Urbana-Champaign 2 University.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel.
Data Mining Association Analysis: Basic Concepts and Algorithms
Pattern Lattice Traversal by Selective Jumps Osmar R. Zaïane and Mohammad El-Hajj Department of Computing Science, University of Alberta Edmonton, AB,
Mining Tree-Query Associations in a Graph Bart Goethals University of Antwerp, Belgium Eveline Hoekx Jan Van den Bussche Hasselt University, Belgium.
Summarization of Frequent Pattern Mining. What is FPM? Why being frequent is so important? Application of FPM Decision make/Business Software Debugging.
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
MATH 310, FALL 2003 (Combinatorial Problem Solving) Lecture 11, Wednesday, September 24.
33 rd International Conference on Very Large Data Bases, Sep. 2007, Vienna Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip.
1 Efficiently Mining Frequent Trees in a Forest Mohammed J. Zaki.
FAST FREQUENT FREE TREE MINING IN GRAPH DATABASES Marko Lazić 3335/2011 Department of Computer Engineering and Computer Science,
What Is Sequential Pattern Mining?
Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.
Takeaki Uno Tatsuya Asai Yuzo Uchida Hiroki Arimura
LCM ver.2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets Takeaki Uno Masashi Kiyomi Hiroki Arimura National Institute of Informatics,
An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾.
Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Tao Lin Chris Chu TPL-Aware Displacement- driven Detailed Placement Refinement with Coloring Constraints ISPD ‘15.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
Mohammad Hasan, Mohammed Zaki RPI, Troy, NY. Consider the following problem from Medical Informatics Healthy Diseased Damaged Tissue Images Cell Graphs.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
LCM ver.3: Collaboration of Array, Bitmap and Prefix Tree for Frequent Itemset Mining Takeaki Uno Masashi Kiyomi Hiroki Arimura National Institute of Informatics,
1 Heat Diffusion Classifier on a Graph Haixuan Yang, Irwin King, Michael R. Lyu The Chinese University of Hong Kong Group Meeting 2006.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
1 Online Mining (Recently) Maximal Frequent Itemsets over Data Streams Hua-Fu Li, Suh-Yin Lee, Man Kwan Shan RIDE-SDMA ’ 05 speaker :董原賓 Advisor :柯佳伶.
Graph Indexing From managing and mining graph data.
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Mining Complex Data COMP Seminar Spring 2011.
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
University at BuffaloThe State University of New York Pattern-based Clustering How to cluster the five objects? qHard to define a global similarity measure.
Subgraph Search Over Uncertain Graphs Erşan Demircioğlu.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Gspan: Graph-based Substructure Pattern Mining
Finding Maximal Frequent Itemsets over Online Data Streams Adaptively
Mining Frequent Subgraphs
The Concept of Maximal Frequent Itemsets
CARPENTER Find Closed Patterns in Long Biological Datasets
Visualizing Prim’s MST Algorithm Used to Trace the Algorithm in Class
The Ohio State University
Graph Database Mining and Its Applications
Mining Frequent Subgraphs
Elementary Graph Algorithms
CS 584 Lecture7 Assignment -- Due Now! Paper Review is due next week.
Mining Frequent Subgraphs
Approximate Graph Mining with Label Costs
Presentation transcript:

SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004

Introduction ► Graphs model a relations among data  Inter-disciplinary research ► Huge number of recurring patterns ► To mining only maximal frequent subgraphs.  None of its super graphs are frequent

Advantages ► Reducing the total number of mined subgraphs  Saving space and analysis effort ► Reducing mining time ► Non-maximal frequent subgraph can be reconstructed. ► Maximal frequent subgraphs are of most interest in some appliations.

Algorithm ► Mining all frequent trees from a general graph database.  Tree normalization is simpler than graph.  In certain applications, most of the frequent subgraphs are really trees.  Use current subgraph mining algorithm  Mining subtrees from a forest

Algorithm ► Reconstruct all maximal subgraphs from the mined trees.  For each frequent tree T, find all frequent subgraphs whose canonical spanning tree are isomorphic to T  Enumerate the equvalence class of a tree T  Maximal subgraph mining

Tree-based Equivalence Classes ► A subtree T is a spanning tree of G if T contains all nodes in G.  Maximal one: canonical spanning tree ► Group all frequent subgraphs in to equivalence classes based on spanning trees.

Spanning tree

Tree-based Equivalence Classes back

12 singletons group b a y b a x a a y a a x a y b a y a x b a y a x b a x a x a a y a x b a x a y a y b a y a x a y b a y a x a y a b x a x

Enumerating Graphs from Trees ► G C :{e 1,e 2, …,e n }  If frequent -> edge C (candidate set) ► Search space of G : G:C ={G+y|y 2 C } GO

Optimizations ► Removing a set of frequent subgraphs that can not be maximal from a search space ► Locally maximal : frequent subgraph G is maximal in its equivalence class ► Globally maximal : maximal frequent in a graph database ► Avoid enumerating subgraphs which are not locally maximal.

Bottom-up Pruning ► G ’ = G C  G ’ is frequent : each graph in search space is a subgraph of G ’ and not maximal

Tail Shrink ► Embedding of G in G ’ is a subgraph isomorphism f from G to G ’  Two embeddings of L in P l 1 ->P 1, l 2 ->P 2, l 3 ->P 3, l 4 ->P 4 l 1 ->P 1, l 2 ->P 3,l 3 ->P 2,l 4 ->P 4 go

Tail Shrink ► candidate edge (i, j, e l ) is associative to a graph G  It appears in every embedding of G in a graph databases ► If a tree T contains a set of associative edges, any maximal frequent graph G, a superset of T, must contains all associative edges.

Tail Shrink ► Remove associative edges from candidate sets and augment them to T without missing any maximal ones  Reducing the search space  Prune the entire equivalences class in certain cases ► A set of associative edges C of a tree T is lethal  G ’ = T C has a canonical spanning tree different from that of T go

External-Edge Pruning ► Remove one equivalence class without any knowledge about its candidate edges ► External-edge for a graph G: it connects a node in G and a node not in G ► (i, e l, v l ) is associative to a graph G  Every embedding f of G in a graph G ’, G ’ has a node v with the label v l  v connects to the node f(i) with an edge label e l in G ’  Not exist node j V[G] such that v = f(j)

Associative external edges

Experiments ► 2.8GHz Pentium Xeon, ► 512KB L2 cache,2GB main memory ► Red Hat Linux 7.3 ► C++ Programming language

Synthetic Dataset D10KT30L200I11V4E4

DTP CA data set

DTP CM data set