COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel.

Slides:



Advertisements
Similar presentations
Recap: Mining association rules from large datasets
Advertisements

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,
Mining for Tree-Query Associations in a Graph Jan Van den Bussche Hasselt University, Belgium joint work with Bart Goethals (U Antwerp, Belgium) and Eveline.
gSpan: Graph-based substructure pattern mining
www.brainybetty.com1 MAVisto A tool for the exploration of network motifs By Guo Chuan & Shi Jiayi.
Feature Grouping-Based Fuzzy-Rough Feature Selection Richard Jensen Neil Mac Parthaláin Chris Cornelis.
Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Connected Substructure Similarity Search Haichuan Shang The University of New South Wales & NICTA, Australia Joint Work: Xuemin Lin (The University of.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Continuous Data Stream Processing  Music Virtual Channel – extensions  Data Stream Monitoring – tree pattern mining  Continuous Query Processing – sequence.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Mismatch string kernels for discriminative protein classification By Leslie. et.al Presented by Yan Wang.
Mining Time-Series Databases Mohamed G. Elfeky. Introduction A Time-Series Database is a database that contains data for each point in time. Examples:
© 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)
Distinguishing Photographic Images and Photorealistic Computer Graphics Using Visual Vocabulary on Local Image Edges Rong Zhang,Rand-Ding Wang, and Tian-Tsong.
Graph-Based Data Mining Diane J. Cook University of Texas at Arlington
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
IBM1 An Algorithm For Exploring Patterns In Clinical Genomic Data Richard Mushlin and Aaron Kershenbaum IBM T.J. Watson Research Center.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
1 Efficient Discovery of Conserved Patterns Using a Pattern Graph Inge Jonassen Pattern Discovery Arwa Zabian 13/07/2015.
ROUGH SET THEORY AND FUZZY LOGIC BASED WAREHOUSING OF HETEROGENEOUS CLINICAL DATABASES Yiwen Fan.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Graph and Topological Structure Mining on Scientific Articles Fan Wang, Ruoming Jin, Gagan Agrawal and Helen Piontkivska The Ohio State University The.
Maryam Sadeghi 1,3, Majid Razmara 1, Martin Ester 1, Tim K. Lee 1,2,3 and M. Stella Atkins 1 1: School of Computing Science, Simon Fraser University 2:
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Turing Clusters into Patterns: Rectangle-based Discriminative Data Description Byron J. Gao and Martin Ester IEEE ICDM 2006 Adviser: Koh Jia-Ling Speaker:
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Querying Structured Text in an XML Database By Xuemei Luo.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
Mohammad Hasan, Mohammed Zaki RPI, Troy, NY. Consider the following problem from Medical Informatics Healthy Diseased Damaged Tissue Images Cell Graphs.
Chao-Yeh Chen and Kristen Grauman University of Texas at Austin Efficient Activity Detection with Max- Subgraph Search.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Post-Ranking query suggestion by diversifying search Chao Wang.
Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
Graph Indexing From managing and mining graph data.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Mining Complex Data COMP Seminar Spring 2011.
1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:
Data Mining Techniques Applied in Advanced Manufacturing PRESENT BY WEI SUN.
Subgraph Search Over Uncertain Graphs Erşan Demircioğlu.
Searching for Pattern Rules Guichong Li and Howard J. Hamilton Int'l Conf on Data Mining (ICDM),2006 IEEE Advisor : Jia-Ling Koh Speaker : Tsui-Feng Yen.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Gspan: Graph-based Substructure Pattern Mining
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Slides for KDD07 Mining statistically important equivalence classes and delta-discriminative emerging patterns Jinyan Li School of Computer Engineering.
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Graph Database Mining and Its Applications
Mining Frequent Subgraphs
Discriminative Frequent Pattern Analysis for Effective Classification
Graph Classification SEG 5010 Week 3.
Presentation transcript:

COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel Hill 11/04/2009

What Are Graphs? Graph: a set of nodes connected by a set of edges nodes and edges can have labels edges can have directions

Graph Classification: Example Negative set: Positive set:

Graph Classification: Example Negative set: Positive set:

Graph Classification: Example Negative set: Positive set:

Graph Representation graphs Represented by

Interesting Properties in Data some most Determined by structure

Graph Classification Classify graphs Classify becomes positivenegativepositive negative Function is determined by structure

Graph Classification Using Frequent Subgraph Patterns The positive graphs should have Some common subgraph patterns that negative graphs don’t have Generate classifiers Frequent subgraph mining in the positive set (frequency >= threshold) Feature selection High dimensional data points classification

Graph Classification Using Frequent Subgraph Patterns The positive graphs should have Some common subgraph patterns that negative graphs don’t have Generate classifiers Frequent subgraph mining in the positive set Feature selection High dimensional data points classification

Graph Classification Using Discriminative Subgraph Patterns Frequent subgraph mining in the positive set Feature selection Mining discriminative/significant subgraph patterns merge Scoring function:Pattern redundancy: Pattern 1: found in positive graphs P1, P2 and in negative graphs N1, N2 Pattern 2: found in positive graphs P1, P2, P3 and in negative graphs N1 Pattern 1 is redundant given pattern 2

Graph Classification Using Discriminative Subgraph Patterns Frequent subgraph mining in the positive set Feature selection Mining discriminative/significant subgraph patterns merge Scoring function:Pattern redundancy: Pattern 1: found in positive graphs P1, P2 and in negative graphs N1, N2 Pattern 2: found in positive graphs P1, P2, P3 and in negative graphs N1 Pattern 1 is redundant given pattern 2

Previous Discriminative Pattern Mining Methods Each tree node represents a subgraph pattern Each node is a supergraph of its parent node, with one more edge One subgraph pattern corresponds to only one node Pattern redundancy: Pattern 1: found in positive graphs G1, G2 and in negative graphs G4, G5 Pattern 2: found in positive graphs G1, G2, G3 and in negative graphs G4 Pattern 1 is redundant given pattern 2 Scoring function:

1. Heuristic Exploration Order Pattern 1 Pattern 2 Pattern redundancy: Pattern 1: found in positive graphs G1, G2 and in negative graphs G4, G5 Pattern 2: found in positive graphs G1, G2, G3 and in negative graphs G4 Pattern 1 is redundant given pattern 2

Heuristic Exploration Order: Delta Score Pattern p Pattern p’ Delta score of p = score of p – score of p’It’s like looking for maximum of a function Large derivative Large absolute value

Heuristic Exploration Order: Delta Score Pattern p Pattern p’ Delta score of p = score of p – score of p’

Workflow of Pattern Exploration Collect frequent edges in the positive set and insert into a heap H If H not empty terminate Pop from H the pattern p with the highest delta score Extend pattern p and insert new non-redundant patterns into H A frequency threshold t p is needed

2. Use Co-occurrences of Patterns D D C B A D D C B AA Can be approximated by Co-occurrence D D C B A D D C B AA Graph G Graph G’

When Co-occurrence Is Superior Separately: A-B: N1, N2, P1, P2, P3, P4 B-C: N3, N4, P1, P2, P3, P4 Co-occurrence of A-B and B-C: P1, P2, P3, P4 No negative graphs

Co-occurrence Generation Candidate co-occurrence 1 Candidate co-occurrence 2 Candidate co-occurrence 3 Candidate co-occurrence 4 Candidate co-occurrence n For each new pattern p: Pattern p Union of pattern p and candidate co- occurrence k insert merging candidate k and pattern p can improve the score of p most significantly A co-occurrence is a set of subgraph patterns: {p 1, p 2, …, p m }

3. Use Association Rules to Classify Association Rule: {p 1, p 2, p 3, …, p n }  “positive” Input of COM (Co-Occurrence rule Miner): Positive graph set, negative graph set Frequency threshold t p of classification rule in the positive set; frequency threshold t n in the negative set Output of COM: A set of association rules

Association Rule Generation Each candidate co- occurrence corresponds to a candidate association rule If a rule satisfies >=t p and <=t n, it is a resulting rule Terminate when each positive graph is covered Remove redundant rules

Experiments: Datasets Protein datasets: Six SCOP families Chemical datasets: Six PubChem bioassays

Experiments: Parameters & Evaluation Protein datasets: t p = 30%, t n = 0% Chemical datasets:t p = 1%, t n = 0.4%

Experimental Results: Protein Datasets

Experimental Results: Chemical Datasets

Conclusions Using heuristic pattern exploration order and co-occurrences can improve runtime efficiency of mining discriminative patterns Using association rules can achieve competitive classification accuracy

Questions & Suggestions