Download presentation
Presentation is loading. Please wait.
1
© 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)
2
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 2 Graph Patterns Interestingness measures / Objective functions Frequency: frequent graph pattern Discriminative: information gain, Fisher score Significance: G-test …
3
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 3 Frequent Graph Pattern
4
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 4 Optimal Graph Pattern (this work)
5
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 5 Objective Functions Challenge: Not Anti-Monotonic X
6
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 6 Challenge: Non Anti-Monotonic Anti-Monotonic Non Monotonic Non-Monotonic: Enumerate all subgraphs then check their score? Enumerate subgraphs : small-size to large-size
7
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 7 Frequent Pattern Based Mining Framework Exploratory task Graph clustering Graph classification Graph index (SIGMOD’04, ’05) (ISMB’05, ’07) Graph Database Frequent Patterns Optimal Patterns 1. Bottleneck : millions, even billions of patterns 2. No guarantee of quality
8
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 8 Direct Pattern Mining Framework Exploratory task Graph clustering Graph classification Graph index Graph DatabaseOptimal Patterns Direct How?
9
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 9 Upper-Bound
10
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 10 Upper-Bound: Anti-Monotonic (cont.) Rule of Thumb : If the frequency difference of a graph pattern in the positive dataset and the negative dataset increases, the pattern becomes more interesting We can recycle the existing graph mining algorithms to accommodate non-monotonic functions.
11
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 11 Vertical Pruning Large <- small
12
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 12 Horizontal Pruning: Structural Proximity
13
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 13 Structural Proximity: Another Perspective # of frequent patterns >> # of possible frequency pairs Many patterns share the same score
14
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 14 Frequency Envelope
15
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 15 Structural Leap Search
16
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 16 Frequency Association Significant patterns often fall into the high-quantile of frequency Starting with the most frequent patterns
17
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 17 Descending Leap Mine 1. Structural Leap Search with frequency threshold 3. Structural Leap Search 2. Support-Descending Mining F(g*) converges
18
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 18 Results: NCI Anti-Cancer Screen Datasets Name# of CompoundsTumor Description MCF-727,770Breast MOLT-439,765Leukemia NCI-H2340,353Non-Small Cell Lung OVCAR-840,516Ovarian P38841,472Leukemia PC-327,509Prostate SF-29540,271Central Nerve System SN12C40,004Renal SW-62040,532Colon UACC25739,988Melanoma YEAST79,601Yeast anti-cancer Link: http://pubchem.ncbi.nlm.nih.gov Chemical Compounds: anti-cancer or not # of vertices: 10 ~ 200
19
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 19 Efficiency Vertical Pruning Horizontal Pruning
20
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 20 Effectiveness (runtime) frequency descending + leap mine
21
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 21 Effectiveness (accuracy) slightly different
22
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 22 Graph Classification NameOA KernelLEAPOA Kernel (6x)LEAP (6x) Average (AUC)0.700.720.750.77 * OA Kernel: Optimal Assignment Kernel LEAP: LEAP search (6x)
23
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 23 Scalability Means Something ! LEAP OA LEAP(6X) OA(6X) ~20sec ~100sec ~200sec ~8000sec Linear Quadratic
24
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 24 Direct Pattern Mining Framework Exploratory task Graph clustering Graph classification Graph index Graph DatabaseOptimal Graph Patterns Direct
25
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 25 Beyond Graph Patterns Exploratory task Clustering Classification Index itemset/sequence/tree Database Optimal Patterns Direct 1. Direct mining can be applied to itemsets, sequences, and trees 2.Existing algorithms can be recycled to mine patterns with sophisticated measures. 3.Pattern-based methods including indexing and classification are competitive.
26
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 26 Thank you Direct Mining of Discriminative and Essential Graphical and Itemset Features via Model-based Search Tree SIGKDD’08 @ Las Vegas
27
IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 27 Graph Classification: Kernel Approach Kernel-based Graph Classification Optimal Assignment Kernel (Fröhlich et al. ICML’05)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.