Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.

Slides:



Advertisements
Similar presentations
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Advertisements

Mining Frequent Patterns Using FP-Growth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical.
A distributed method for mining association rules
Graph Mining Laks V.S. Lakshmanan
 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,
gSpan: Graph-based substructure pattern mining
www.brainybetty.com1 MAVisto A tool for the exploration of network motifs By Guo Chuan & Shi Jiayi.
13 May 2009Instructor: Tasneem Darwish1 University of Palestine Faculty of Applied Engineering and Urban Planning Software Engineering Department Introduction.
Introduction to Graph Mining
Mining Graphs.
FP-Growth algorithm Vasiljevic Vladica,
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Data Mining Association Analysis: Basic Concepts and Algorithms
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis (7) (Mining Graphs)
Spring 2003Data Mining by H. Liu, ASU1 5. Association Rules Market Basket Analysis and Itemsets APRIORI Efficient Association Rules Multilevel Association.
Data Mining Association Analysis: Basic Concepts and Algorithms
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Fast Algorithms for Association Rule Mining
Data Mining Association Rules: Advanced Concepts and Algorithms
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Mining Scientific Data Sets Using Graphs George Karypis Department of Computer Science & Engineering University of Minnesota (Michihiro Kuramochi & Mukund.
What Is Sequential Pattern Mining?
Slides are modified from Jiawei Han & Micheline Kamber
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Advanced Association Rule Mining and Beyond. Continuous and Categorical Attributes Example of Association Rule: {Number of Pages  [5,10)  (Browser=Mozilla)}
Sequential PAttern Mining using A Bitmap Representation
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Data Mining Association Analysis Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
CanTree: a tree structure for efficient incremental mining of frequent patterns Carson Kai-Sang Leung, Quamrul I. Khan, Tariqul Hoque ICDM ’ 05 報告者:林靜怡.
Mining Turbulence Data Ivan Marusic Department of Aerospace Engineering and Mechanics University of Minnesota Collaborators: Victoria Interrante, George.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Trimming Chuang-Kai Chiou, Judy C. R Tseng Proceedings of the Sixth International.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Mining Complex Data COMP Seminar Spring 2011.
Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds Mukund Deshpande, Michihiro Kuramochi, George Karypis University of Minnesota,
Searching for Pattern Rules Guichong Li and Howard J. Hamilton Int'l Conf on Data Mining (ICDM),2006 IEEE Advisor : Jia-Ling Koh Speaker : Tsui-Feng Yen.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Gspan: Graph-based Substructure Pattern Mining
Reducing Number of Candidates
Mining in Graphs and Complex Structures
Data Mining Association Rules: Advanced Concepts and Algorithms
Mining Frequent Subgraphs
A Parameterised Algorithm for Mining Association Rules
Mining Association Rules from Stars
Transactional data Algorithm Applications
Mining Complex Data COMP Seminar Spring 2011.
Graph Database Mining and Its Applications
Mining Frequent Subgraphs
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Mining Frequent Subgraphs
Geometrically Inspired Itemset Mining*
Finding Frequent Itemsets by Transaction Mapping
Approximate Graph Mining with Label Costs
Presentation transcript:

Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001

Outline Introduction FSG algo. Canonical Labeling Conclusion

Introduction  Most of existing data mining algorithms assume that the data is represented via   Transactions (set of items)   Sequence of items or events  Multi-dimensional vectors   Time series  Scientific datasets with layers, hierarchy, geometry, and arbitrary relations can not be accurately modeled using such frameworks.  e.g. chemical compounds

Graphs can accurately model and represent scientific data sets Graphs are suitable for capturing arbitrary relations between the various elements

Example

FSG(Frequent Subgraph Discovery Algorithm) Level-by-level approach Incremental on the number of edges of the frequent subgraphs(like Apriori) Counting of frequent single and double edge subgraphs For finding frequent size k-subgraphs(k=3) Candidate generation Joining two size (k-1) subgraphs similar to each other Candidate pruning by downward closure property Frequency counting Check if a subgraph is contained in a transaction Repeat the steps for k = k + 1 Increase the size of subgraphs by one edge

FSG Approach: Candidate Generation Generate a size k-subgraph by merging two size (k-1) subgraphs

FSG Approach: Candidate Pruning Pruning of size k-candidates For all the (k-1)-subgraphs of a size k-canidate, check if downward closure property holds

FSG Approach: Frequency Counting For each subgraph to scan each one of the transaction graphs and determine if it is contained or not using subgraph isomorphism In Apriori,the frequency counting is performed substantially faster by building a hash-tree of candidate itemsets To scan each transaction to determine which of the itemsets in the hash-tree it supports

FSG Approach: Frequency Counting Keep track of the TID-List Perform subgraph isomorphism only on the intersection of the TID lists of the parent frequent subgraphs of size k-1

FSG Approach: Frequency Counting Perform only subgraph_isomorphism(c, T1)

Key to Performance: Canonical Labeling Given a graph, we want to find a unique order of vertices, by permuting rows and columns of its adjacency matrix

Canonical Labeling Find the vertex order so that the matrix becomes lexicographically the Largest when we compare in the column-wise way

Conclusion During both candidate generation and frequency counting, what FSG essentially does is to solve subgraph isomorphism Canonical Labeling can reduce our search space Using TID List Reduce the number of subgraph isomorphism checks Suitable for sparse graph transactions