© 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

Slides:

Advertisements

Similar presentations

Data not in the pre-defined feature vectors that can be used to construct predictive models. Applications: Transactional database Sequence database Graph.

Advertisements

Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Yan, Jiawei.

Mining Frequent Patterns Using FP-Growth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical.

Graph Mining Laks V.S. Lakshmanan

 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,

Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.

Christoph F. Eick Questions and Topics Review Dec. 10, Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2. K-means.

gSpan: Graph-based substructure pattern mining

Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.

Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.

University of Illinois at Urbana-Champaign Graph Indexing: Tree + Δ ≥ Graph Peixiang Zhao Jeffrey Xu Yu Philip S. Yu Peixiang Zhao Jeffrey Xu Yu Philip.

Chen Chen, Xifeng Yan, Feida Zhu, Jiawei Han, Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center University of Illinois.

1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.

Frequent Subgraph Pattern Mining on Uncertain Graph Data

Rakesh Agrawal Ramakrishnan Srikant

© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.

Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.

1 Graph Mining Applications to Machine Learning Problems Max Planck Institute for Biological Cybernetics Koji Tsuda.

COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel.

6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.

Summarization of Frequent Pattern Mining. What is FPM? Why being frequent is so important? Application of FPM Decision make/Business Software Debugging.

Integration of Classification and Pattern Mining: A Discriminative and Frequent Pattern-Based Approach Hong Cheng Jiawei Han Chinese.

33 rd International Conference on Very Large Data Bases, Sep. 2007, Vienna Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip.

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery.

What Is Sequential Pattern Mining?

Slides are modified from Jiawei Han & Micheline Kamber

Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim

Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.

Graph Indexing: A Frequent Structure based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†

Mining Frequent Itemsets with Constraints Takeaki Uno Takeaki Uno National Institute of Informatics, JAPAN Nov/2005 FJWCP.

Takeaki Uno Tatsuya Asai Yuzo Uchida Hiroki Arimura

On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.

Frequent Structure Mining Presented By: Ahmed R. Nabhan Computer Science Department University of Vermont Fall 2011.

Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.

1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.

On Node Classification in Dynamic Content-based Networks.

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.

Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?

Mohammad Hasan, Mohammed Zaki RPI, Troy, NY. Consider the following problem from Medical Informatics Healthy Diseased Damaged Tissue Images Cell Graphs.

University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.

1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.

MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:

1 LinkClus: Efficient Clustering via Heterogeneous Semantic Links Xiaoxin Yin, Jiawei Han Univ. of Illinois at Urbana-Champaign Philip S. Yu IBM T.J. Watson.

Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.

1 Efficient Discovery of Frequent Approximate Sequential Patterns Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu ICDM 2007.

Data Mining: Concepts and Techniques — Chapter 9 — Graph mining: Part II Graph Classification and Clustering Jiawei Han and Micheline Kamber Department.

M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.

Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.

Frequent Structure Mining Robert Howe University of Vermont Spring 2014.

Graph Indexing From managing and mining graph data.

1 Data Mining: Principles and Algorithms Mining Homogeneous Networks Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign.

University at BuffaloThe State University of New York Pattern-based Clustering How to cluster the five objects? qHard to define a global similarity measure.

A Fast Kernel for Attributed Graphs Yu Su University of California at Santa Barbara with Fangqiu Han, Richard E. Harang, and Xifeng Yan.

1 Substructure Similarity Search in Graph Databases R 陳芃安.

Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.

Xifeng Yan Philip S. Yu Jiawei Han SIGMOD 2005 Substructure Similarity Search in Graph Databases.

Gspan: Graph-based Substructure Pattern Mining

10/23/ /23/2017 Presented at KDD’09 Classification of Software Behaviors for Failure Detection: A Discriminative Pattern Mining Approach David Lo1,

Frequent Pattern Mining: Current Status and Future Direction

Jiawei Han Department of Computer Science

Graph Search with Indexing

On Efficient Graph Substructure Selection

Mining, Indexing and Searching Graphs in Biological Databases

SEG 4630 E-Commerce Data Mining — Final Review —

Mining and Searching Graphs in Biological Databases

Slides are modified from Jiawei Han & Micheline Kamber

Graph Classification SEG 5010 Week 3.

Association Rule Mining

Jim Hahn Associate Professor

Presentation transcript:

© 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 2 Graph Patterns Interestingness measures / Objective functions Frequency: frequent graph pattern Discriminative: information gain, Fisher score Significance: G-test …

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 3 Frequent Graph Pattern

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 4 Optimal Graph Pattern (this work)

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 5 Objective Functions Challenge: Not Anti-Monotonic X

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 6 Challenge: Non Anti-Monotonic Anti-Monotonic Non Monotonic Non-Monotonic: Enumerate all subgraphs then check their score? Enumerate subgraphs : small-size to large-size

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 7 Frequent Pattern Based Mining Framework Exploratory task Graph clustering Graph classification Graph index (SIGMOD’04, ’05) (ISMB’05, ’07) Graph Database Frequent Patterns Optimal Patterns 1. Bottleneck : millions, even billions of patterns 2. No guarantee of quality

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 8 Direct Pattern Mining Framework Exploratory task Graph clustering Graph classification Graph index Graph DatabaseOptimal Patterns Direct How?

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 9 Upper-Bound

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 10 Upper-Bound: Anti-Monotonic (cont.) Rule of Thumb : If the frequency difference of a graph pattern in the positive dataset and the negative dataset increases, the pattern becomes more interesting We can recycle the existing graph mining algorithms to accommodate non-monotonic functions.

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 11 Vertical Pruning Large <- small

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 12 Horizontal Pruning: Structural Proximity

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 13 Structural Proximity: Another Perspective # of frequent patterns >> # of possible frequency pairs Many patterns share the same score

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 14 Frequency Envelope

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 15 Structural Leap Search

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 16 Frequency Association Significant patterns often fall into the high-quantile of frequency Starting with the most frequent patterns

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 17 Descending Leap Mine 1. Structural Leap Search with frequency threshold 3. Structural Leap Search 2. Support-Descending Mining F(g*) converges

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 18 Results: NCI Anti-Cancer Screen Datasets Name# of CompoundsTumor Description MCF-727,770Breast MOLT-439,765Leukemia NCI-H2340,353Non-Small Cell Lung OVCAR-840,516Ovarian P38841,472Leukemia PC-327,509Prostate SF-29540,271Central Nerve System SN12C40,004Renal SW-62040,532Colon UACC25739,988Melanoma YEAST79,601Yeast anti-cancer Link: Chemical Compounds: anti-cancer or not # of vertices: 10 ~ 200

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 19 Efficiency Vertical Pruning Horizontal Pruning

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 20 Effectiveness (runtime) frequency descending + leap mine

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 21 Effectiveness (accuracy) slightly different

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 22 Graph Classification NameOA KernelLEAPOA Kernel (6x)LEAP (6x) Average (AUC) * OA Kernel: Optimal Assignment Kernel LEAP: LEAP search (6x)

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 23 Scalability Means Something ! LEAP OA LEAP(6X) OA(6X) ~20sec ~100sec ~200sec ~8000sec Linear Quadratic

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 24 Direct Pattern Mining Framework Exploratory task Graph clustering Graph classification Graph index Graph DatabaseOptimal Graph Patterns Direct

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 25 Beyond Graph Patterns Exploratory task Clustering Classification Index itemset/sequence/tree Database Optimal Patterns Direct 1. Direct mining can be applied to itemsets, sequences, and trees 2.Existing algorithms can be recycled to mine patterns with sophisticated measures. 3.Pattern-based methods including indexing and classification are competitive.

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 26 Thank you Direct Mining of Discriminative and Essential Graphical and Itemset Features via Model-based Search Tree Las Vegas

IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 27 Graph Classification: Kernel Approach  Kernel-based Graph Classification  Optimal Assignment Kernel (Fröhlich et al. ICML’05)