Download presentation
Presentation is loading. Please wait.
Published byDerek Skinner Modified over 9 years ago
1
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent Subgraph/ Substructure Mining Seminar 2009
2
University at BuffaloThe State University of New York Outline Introduction Apriori-based Subgrah Mining Pattern Growth Subgraph Mining Summary
3
University at BuffaloThe State University of New York Graphs are everywhere
4
University at BuffaloThe State University of New York Graph Mining Problems Graph Pattern Mining Frequent subgraph pattern mining Pattern summarization Optimal graph patterns Graph patterns with constraints Approximate graph patterns …. Graph Classification Graph clustering Important node identification Bridge and hub identification Other Important Topics Graph compression Graph model Social network analysis.
5
University at BuffaloThe State University of New York Subgraph pattern Mining Frequent subgraph A (sub)graph is frequent if its support (occurrence frequency) in a given dataset is no less than a minimum support threshold Application of subgraph pattern mining Mining biochemical structures Program control flow analysis Mining XML structures or Web communities Building blocks for graph classifiction, clustering,compression, comparison and correlation analysis.
6
University at BuffaloThe State University of New York (1) (2) (3) B C A A B A A B C C B C A A A subgraph 331 Support Frequent Subgraph Example
7
University at BuffaloThe State University of New York Key Challenges in Subgraph Mining Graph isomorphism to detect if two graphs are identical in structure Graph representation (Canonical Labeling) A canonical label is a unique code of a given graph. Canonical label should be the same no matter how graphs are represented, as long as graphs have the same topological structure and the same labeling of edges and vertices. Subgraph candidate generation generate candidate frequent subgraphs from datasets
8
University at BuffaloThe State University of New York Subgraph Mining Approaches Apriori-based AGM/AcGM: Inokuchi, et al. (PKDD’00) FSG: Kuramochi and Karypis (ICDM’01) M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM’01, pages 313-320, Nov. 2001 PATH#: Vanetik and Gudes (ICDM’02, ICDM’04) FFSM: Huan, et al. (ICDM’03) and SPIN: Huan et al. (KDD’04) FTOSM: Horvath et al. (KDD’06) Pattern growth based Subdue: Holder et al. (KDD’94) MoFa: Borgelt and Berthold (ICDM’02) gSpan: Yan and Han (ICDM’02) Yan, X. and Han, J. 2002. gSpan : Graph-Based Substructure Pattern Mining. In Proceedings of the 2002 IEEE international Conference on Data Mining (Icdm’02) (December 09-12, 2002). ICDM. IEEE Computer Society, Washington, DC, 721 Gaston: Nijssen and Kok (KDD’04) CMTreeMiner: Chi et al. (TKDE’05) LEAP: Yan et al. (SIGMOD’08)
9
University at BuffaloThe State University of New York Outline Introduction and Background Apriori-based Subgrah Mining Pattern Growth Subgraph Mining Summary
10
University at BuffaloThe State University of New York Apriori-based Approach FSG : Frequent subgraph discovery. In ICDM’01, Nov. 2001 M.Kuramochi and G. Karypis. Flattened Representation as Canonical Labeling Apriori-based method to generate subgraph candidate
11
University at BuffaloThe State University of New York Graph Representation in FSG Flattened Representation
12
University at BuffaloThe State University of New York Graph Representation in FSG Flatterned Representation Lexicographic order or dictionary order
13
University at BuffaloThe State University of New York Apriori-based method Apriori Property If a graph is frequent, all of its subgraphs are frequent. Candidate Generation Create a set of candidate size k+1 -from given two frequent k- subgraphs -containing the same (k-1)- subgraph -Result in several candidates size k+1
14
University at BuffaloThe State University of New York Apriori-based method Graph candidate generated Example
15
University at BuffaloThe State University of New York Apriori-based method FlowChart
16
University at BuffaloThe State University of New York Apriori-based method Experiment Result - Chemical Compound Dataset, which contains 340 compounds,24 different atoms (vertices)
17
University at BuffaloThe State University of New York Outline Introduction Apriori-based Subgrah Mining Pattern Growth Subgraph Mining Summary
18
University at BuffaloThe State University of New York Motivation of gSpan Weakness of Apriori-based approach The generation of size (k+1) subgraph candidates from size k frequent subgraph too complicated and complex. Pruning false positive : subgraph isomorphism is an NP complete problem which is costly. gSpan: Graph-Based Substructure Pattern Mining Change the way to represent a graph (DFS: Depth First Search) Using pattern growth to generate new subgraph candidate.
19
University at BuffaloThe State University of New York gSpan: Graph-Based Substructure Pattern Mining DFS (Depth First Search) Code First Step: DFS the graph and use edges on the path to represent the graph. Second Step: DFS Lexicographic Order Pattern Growth subgraph generation
20
University at BuffaloThe State University of New York DFS code An edge is presented by 5 tuples.
21
University at BuffaloThe State University of New York DFS code Second Step: DFS Lexicographic Order
22
University at BuffaloThe State University of New York Pattern Growth Approach Pattern Growth (free extension)
23
University at BuffaloThe State University of New York Pattern Growth Approach Duplicate Graphs
24
University at BuffaloThe State University of New York Pattern Growth Approach Free extension
25
University at BuffaloThe State University of New York Pattern Growth Approach Right most extension
26
University at BuffaloThe State University of New York Pattern Growth Approach Exmaples (cont.)
27
University at BuffaloThe State University of New York gSpan
28
University at BuffaloThe State University of New York gSpan
29
University at BuffaloThe State University of New York Pattern Growth Approach Experimental result using Chemical data 340 molecules 66 atom types and 4 bond types as labels On average only 27 vertices with 28 edges
30
University at BuffaloThe State University of New York Summary Graph representation Flattern representation vs. DFS code Generation of Candidate Patterns apriori vs. pattern growth
31
University at BuffaloThe State University of New York
32
University at BuffaloThe State University of New York Pattern-Growth Approach
33
University at BuffaloThe State University of New York Frequent Graph Pattern Given a graph dataset D, find subgraph g, s.t. Where is the percentage of graphs in D that contain g. Problem 1 : Exponential Pattern Set Problem 2 : Threshold Setting
34
University at BuffaloThe State University of New York Difference between frequent itemset and frequent subgraph discovery
35
University at BuffaloThe State University of New York Frequent itemset discovery
36
University at BuffaloThe State University of New York subgraph Mining Algorithms Apriori-based approach – AGM/AcGM: Inokuchi, et al. (PKDD’00) – FSG: Kuramochi and Karypis (ICDM’01) – PATH#: Vanetik and Gudes (ICDM’02, ICDM’04) – FFSM: Huan, et al. (ICDM’03) and SPIN: Huan et al. (KDD’04) – FTOSM: Horvath et al. (KDD’06) Pattern growth approach – Subdue: Holder et al. (KDD’94) – MoFa: Borgelt and Berthold (ICDM’02) – gSpan: Yan and Han (ICDM’02) – Gaston: Nijssen and Kok (KDD’04) – CMTreeMiner: Chi et al. (TKDE’05) – LEAP: Yan et al. (SIGMOD’08)
37
University at BuffaloThe State University of New York Framework of subraph Mining Algorithms Search Order breadth vs. depth complete vs. incomplete Generation of Candidate Patterns apriori vs. pattern growth Discovery Order of Patterns DFS order path tree graph Elimination of Duplicate Subgraphs passive vs. active Support Calculation embedding store or not
38
University at BuffaloThe State University of New York Frequent Subgraph Examples:
39
University at BuffaloThe State University of New York Example (cont.)
40
University at BuffaloThe State University of New York Subgraph Mining Approaches Apriori-based approach AGM/AcGM: Inokuchi, et al. (PKDD’00) FSG: Kuramochi and Karypis (ICDM’01) M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM’01, pages 313-320, Nov. 2001 PATH#: Vanetik and Gudes (ICDM’02, ICDM’04) FFSM: Huan, et al. (ICDM’03) and SPIN: Huan et al. (KDD’04) FTOSM: Horvath et al. (KDD’06) Pattern growth approach Subdue: Holder et al. (KDD’94) MoFa: Borgelt and Berthold (ICDM’02) gSpan: Yan and Han (ICDM’02) Yan, X. and Han, J. 2002. gSpan : Graph-Based Substructure Pattern Mining. In Proceedings of the 2002 IEEE international Conference on Data Mining (Icdm’02) (December 09-12, 2002). ICDM. IEEE Computer Society, Washington, DC, 721 Gaston: Nijssen and Kok (KDD’04) CMTreeMiner: Chi et al. (TKDE’05) LEAP: Yan et al. (SIGMOD’08)
41
University at BuffaloThe State University of New York Outline Introduction and Background Apriori-based Subgrah Mining Pattern Growth Subgraph Mining Summary DFS code Yan, X. and Han, J. 2002. gSpan : Graph-Based Substructure Pattern Mining. In Proceedings of the 2002 IEEE international Conference on Data Mining (Icdm’02) (December 09-12, 2002). ICDM. IEEE Computer Society, Washington, DC, 721
42
University at BuffaloThe State University of New York Pattern Growth Approach
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.