Download presentation
Presentation is loading. Please wait.
1
COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel Hill 11/04/2009
2
What Are Graphs? Graph: a set of nodes connected by a set of edges nodes and edges can have labels edges can have directions 12 1 2
3
Graph Classification: Example Negative set: Positive set:
4
Graph Classification: Example Negative set: Positive set:
5
Graph Classification: Example Negative set: Positive set:
6
Graph Representation graphs Represented by
7
Interesting Properties in Data some most Determined by structure
8
Graph Classification Classify graphs Classify becomes positivenegativepositive negative Function is determined by structure
9
Graph Classification Using Frequent Subgraph Patterns The positive graphs should have Some common subgraph patterns that negative graphs don’t have Generate classifiers Frequent subgraph mining in the positive set (frequency >= threshold) Feature selection High dimensional data points classification
10
Graph Classification Using Frequent Subgraph Patterns The positive graphs should have Some common subgraph patterns that negative graphs don’t have Generate classifiers Frequent subgraph mining in the positive set Feature selection High dimensional data points classification
11
Graph Classification Using Discriminative Subgraph Patterns Frequent subgraph mining in the positive set Feature selection Mining discriminative/significant subgraph patterns merge Scoring function:Pattern redundancy: Pattern 1: found in positive graphs P1, P2 and in negative graphs N1, N2 Pattern 2: found in positive graphs P1, P2, P3 and in negative graphs N1 Pattern 1 is redundant given pattern 2
12
Graph Classification Using Discriminative Subgraph Patterns Frequent subgraph mining in the positive set Feature selection Mining discriminative/significant subgraph patterns merge Scoring function:Pattern redundancy: Pattern 1: found in positive graphs P1, P2 and in negative graphs N1, N2 Pattern 2: found in positive graphs P1, P2, P3 and in negative graphs N1 Pattern 1 is redundant given pattern 2
13
Previous Discriminative Pattern Mining Methods Each tree node represents a subgraph pattern Each node is a supergraph of its parent node, with one more edge One subgraph pattern corresponds to only one node Pattern redundancy: Pattern 1: found in positive graphs G1, G2 and in negative graphs G4, G5 Pattern 2: found in positive graphs G1, G2, G3 and in negative graphs G4 Pattern 1 is redundant given pattern 2 Scoring function:
14
1. Heuristic Exploration Order Pattern 1 Pattern 2 Pattern redundancy: Pattern 1: found in positive graphs G1, G2 and in negative graphs G4, G5 Pattern 2: found in positive graphs G1, G2, G3 and in negative graphs G4 Pattern 1 is redundant given pattern 2
15
Heuristic Exploration Order: Delta Score Pattern p Pattern p’ Delta score of p = score of p – score of p’It’s like looking for maximum of a function Large derivative Large absolute value
16
Heuristic Exploration Order: Delta Score Pattern p Pattern p’ Delta score of p = score of p – score of p’
17
Workflow of Pattern Exploration Collect frequent edges in the positive set and insert into a heap H If H not empty terminate Pop from H the pattern p with the highest delta score Extend pattern p and insert new non-redundant patterns into H A frequency threshold t p is needed
18
2. Use Co-occurrences of Patterns D D C B A D D C B AA Can be approximated by Co-occurrence D D C B A D D C B AA Graph G Graph G’
19
When Co-occurrence Is Superior Separately: A-B: N1, N2, P1, P2, P3, P4 B-C: N3, N4, P1, P2, P3, P4 Co-occurrence of A-B and B-C: P1, P2, P3, P4 No negative graphs
20
Co-occurrence Generation Candidate co-occurrence 1 Candidate co-occurrence 2 Candidate co-occurrence 3 Candidate co-occurrence 4 Candidate co-occurrence n For each new pattern p: Pattern p Union of pattern p and candidate co- occurrence k insert merging candidate k and pattern p can improve the score of p most significantly A co-occurrence is a set of subgraph patterns: {p 1, p 2, …, p m }
21
3. Use Association Rules to Classify Association Rule: {p 1, p 2, p 3, …, p n } “positive” Input of COM (Co-Occurrence rule Miner): Positive graph set, negative graph set Frequency threshold t p of classification rule in the positive set; frequency threshold t n in the negative set Output of COM: A set of association rules
22
Association Rule Generation Each candidate co- occurrence corresponds to a candidate association rule If a rule satisfies >=t p and <=t n, it is a resulting rule Terminate when each positive graph is covered Remove redundant rules
23
Experiments: Datasets Protein datasets: Six SCOP families Chemical datasets: Six PubChem bioassays
24
Experiments: Parameters & Evaluation Protein datasets: t p = 30%, t n = 0% Chemical datasets:t p = 1%, t n = 0.4%
25
Experimental Results: Protein Datasets
26
Experimental Results: Chemical Datasets
27
Conclusions Using heuristic pattern exploration order and co-occurrences can improve runtime efficiency of mining discriminative patterns Using association rules can achieve competitive classification accuracy
28
Questions & Suggestions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.