COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel.

COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel Hill 11/04/2009

What Are Graphs? Graph: a set of nodes connected by a set of edges nodes and edges can have labels edges can have directions 12 1 2

Graph Classification: Example Negative set: Positive set:

Graph Representation graphs Represented by

Interesting Properties in Data some most Determined by structure

Graph Classification Classify graphs Classify becomes positivenegativepositive negative Function is determined by structure

Graph Classification Using Frequent Subgraph Patterns The positive graphs should have Some common subgraph patterns that negative graphs don’t have Generate classifiers Frequent subgraph mining in the positive set (frequency >= threshold) Feature selection High dimensional data points classification

Graph Classification Using Frequent Subgraph Patterns The positive graphs should have Some common subgraph patterns that negative graphs don’t have Generate classifiers Frequent subgraph mining in the positive set Feature selection High dimensional data points classification

Graph Classification Using Discriminative Subgraph Patterns Frequent subgraph mining in the positive set Feature selection Mining discriminative/significant subgraph patterns merge Scoring function:Pattern redundancy: Pattern 1: found in positive graphs P1, P2 and in negative graphs N1, N2 Pattern 2: found in positive graphs P1, P2, P3 and in negative graphs N1 Pattern 1 is redundant given pattern 2

Previous Discriminative Pattern Mining Methods Each tree node represents a subgraph pattern Each node is a supergraph of its parent node, with one more edge One subgraph pattern corresponds to only one node Pattern redundancy: Pattern 1: found in positive graphs G1, G2 and in negative graphs G4, G5 Pattern 2: found in positive graphs G1, G2, G3 and in negative graphs G4 Pattern 1 is redundant given pattern 2 Scoring function:

1. Heuristic Exploration Order Pattern 1 Pattern 2 Pattern redundancy: Pattern 1: found in positive graphs G1, G2 and in negative graphs G4, G5 Pattern 2: found in positive graphs G1, G2, G3 and in negative graphs G4 Pattern 1 is redundant given pattern 2

Heuristic Exploration Order: Delta Score Pattern p Pattern p’ Delta score of p = score of p – score of p’It’s like looking for maximum of a function Large derivative Large absolute value

Heuristic Exploration Order: Delta Score Pattern p Pattern p’ Delta score of p = score of p – score of p’

Workflow of Pattern Exploration Collect frequent edges in the positive set and insert into a heap H If H not empty terminate Pop from H the pattern p with the highest delta score Extend pattern p and insert new non-redundant patterns into H A frequency threshold t p is needed

2. Use Co-occurrences of Patterns D D C B A D D C B AA Can be approximated by Co-occurrence D D C B A D D C B AA Graph G Graph G’

When Co-occurrence Is Superior Separately: A-B: N1, N2, P1, P2, P3, P4 B-C: N3, N4, P1, P2, P3, P4 Co-occurrence of A-B and B-C: P1, P2, P3, P4 No negative graphs

Co-occurrence Generation Candidate co-occurrence 1 Candidate co-occurrence 2 Candidate co-occurrence 3 Candidate co-occurrence 4 Candidate co-occurrence n For each new pattern p: Pattern p Union of pattern p and candidate co- occurrence k insert merging candidate k and pattern p can improve the score of p most significantly A co-occurrence is a set of subgraph patterns: {p 1, p 2, …, p m }

3. Use Association Rules to Classify Association Rule: {p 1, p 2, p 3, …, p n }  “positive” Input of COM (Co-Occurrence rule Miner): Positive graph set, negative graph set Frequency threshold t p of classification rule in the positive set; frequency threshold t n in the negative set Output of COM: A set of association rules

Association Rule Generation Each candidate co- occurrence corresponds to a candidate association rule If a rule satisfies >=t p and <=t n, it is a resulting rule Terminate when each positive graph is covered Remove redundant rules

Experiments: Datasets Protein datasets: Six SCOP families Chemical datasets: Six PubChem bioassays

Experiments: Parameters & Evaluation Protein datasets: t p = 30%, t n = 0% Chemical datasets:t p = 1%, t n = 0.4%

Experimental Results: Protein Datasets

Experimental Results: Chemical Datasets

Conclusions Using heuristic pattern exploration order and co-occurrences can improve runtime efficiency of mining discriminative patterns Using association rules can achieve competitive classification accuracy

Questions & Suggestions

COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel.

Similar presentations

Presentation on theme: "COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel.

Similar presentations

Presentation on theme: "COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel."— Presentation transcript:

Similar presentations

About project

Feedback