Download presentation
Presentation is loading. Please wait.
Published byIsabel Page Modified over 9 years ago
1
Frequent Subgraph Pattern Mining on Uncertain Graph Data
Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09, Hong Kong Nov 4, 2009
2
Outline Background Problem Definition Algorithm Experimental Results
Conclusions
3
Background Graph mining has played an important role in a range of real world applications. medicines: structures of molecules bioinformatics: biological networks technologies: WWW social science: social networks many others
4
Directions of Graph Mining
Models of graphs e.g. [Leskovec et al. KDD’05] Patterns of graphs e.g., [Yan et al. ICDM’02] Uncertainties of graphs Privacy of graphs e.g., [Zou et al. VLDB’09] Evolution of graphs e.g., [Faloutsos et al. SIGMOD’07]
5
Uncertainties of Graphs: Example I
Protein-Protein Interaction (PPI) Networks Vertices: proteins Edges: interactions between proteins Uncertainties: probabilities of interactions really existing TIF34 0.375 0.639 0.867 0.651 0.651 FET3 0.698 0.147 0.639 NTG1 SMT3 RAD59 RPC40 The data are taken from the STRING Database (
6
Uncertainties of Graphs: Example II
Topologies of wireless sensor networks (WSNs) Vertices: sensor nodes Edges: wireless links between sensor nodes Uncertainties: probabilities of wireless links functioning at any given time 0.75 0.95 0.88 0.92 0.69
7
The Goal of This Paper Models of graphs e.g. [Leskovec et al. KDD’05]
Patterns of graphs e.g., [Yan et al. ICDM’02] Uncertainties of graphs Privacy of graphs e.g., [Zou et al. VLDB’09] Evolution of graphs e.g., [Faloutsos et al. SIGMOD’07]
8
Outline Background Problem Definition Algorithm Experimental Results
Conclusions
9
Preliminaries B A x y z graph G2 B A x y graph G1 Graph Database B x y
Subgraph Pattern support = 1.0 support = 0.5 The support of S = the number of graphs containing S the total number of graphs
10
Frequent Subgraph Pattern Mining Problem
Input: a graph database D, and a support threshold minsup Output: all subgraph patterns with support no less than minsup FSP mining on biological networks (e.g., PPI networks) is an important tool for discovering functional modules [Koyutürk et al. Bioinformatics 04, Turanalp et al. BMC Bioinformatics 08]. PPI networks are subject to uncertainties. How do we define support?
11
Model of Uncertain Graphs
B A x y exist in this form Implicated Graph B A x y 0.5 0.6 0.7 0.8 (1 – 0.5) * 0.6 * 0.7 * 0.8 = 0.168 B A x y exist in this form Uncertain Graph 0.5 * (1 – 0.6) * 0.7 * 0.8 = 0.112
12
Model of Uncertain Graphs (Cont’d)
Theorem: An uncertain graph represents a probability distribution over all its implicated graphs.
13
Uncertain Graph Databases
x y 0.5 0.6 0.7 0.8 z 0.1 Uncertain graph G1 Uncertain graph G2 B A x y exist in this form Implicated graph of G1 Implicated graph of G2 Theorem: An uncertain graph DB represents a probability distribution over all its implicated graph DBs. Totally, 24 * 23 = 128 implicated graph databases. Implicated Graph Database ((1 – 0.5) * 0.6 * 0.7 * 0.8) * (0.8 * 0.1 * (1 – 0.7)) = * 10-3
14
…… Expected Support D uncertain graph DB d1 d2 dn implicating
p1 = Pr(D implicates d1) p2 = Pr(D implicates d2) pn = Pr(D implicates dn) s1 = support of S in d1 s2 = support of S in d2 sn = support of S in dn The expected support of S is
15
FSP Mining Problem on Uncertain Graphs
Input: an uncertain graph database D, and an expected support threshold minsup Output: all subgraph patterns with expected support no less than minsup It is #P-hard to count the number of frequent subgraph patterns. Reduction from the problem of counting the number of satisfying truth assignments of a monotone k-CNF formula. The FSP mining problem on uncertain graphs is NP-hard.
16
Outline Background Problem Definition Algorithm Experimental Results
Conclusions
17
Approximation Method It is #P-hard to compute the expected support of a subgraph pattern. We develop an approximation method to find an approximate set of frequent subgraph patterns. Let e (0 < e < 1) be a relative error tolerance. Discard Arbitrary Output expected support (1-e) minsup minsup 1
18
Objective I Difficulty I: # of frequent subgraph patterns is exponentially large. Objective I: Examine subgraph patterns as efficiently as possible to find all frequent ones.
19
Method for Objectives I
Step 1: Build a search tree T of subgraph patterns. Step 2: Examine subgraph patterns in T in depth-first order If S is infrequent, then all its descendents can be pruned. B A x y 0.5 0.6 0.7 0.8 z 0.1 Uncertain graph G1 Uncertain graph G2 expected support minsup (1-e) minsup Output Discard Arbitrary 1
20
Objective II Difficulty II: It is #P-hard to compute the expected support esup(S) of a subgraph pattern S. Objective II: Make the following judgments without computing esup(S) exactly. If esup(S) is surely not in the green region, then discard. If esup(S) is probable to be in the green region and surely not in the red region, then output. expected support minsup (1-e) minsup Output Discard Arbitrary 1
21
Method for Objective II
Step 1: Approximate esup(S) by an interval [l, u] such that esup(S)∈[l, u]. Step 2: Decide whether S can be output or not by testing the following conditions. expected support minsup (1-e) minsup 1 Output Discard Shrink
22
Approximating esup(S) by [l,u]
A subgraph pattern S occurs in an uncertain graph G if S is contained in at least one implicated graph of G. Algorithm Approximate esup(S) by [l,u] Step 1: For each uncertain graph Gi in D, approximate Pr(S occurs in Gi) by an interval [li, ui] of width at most e*minsup. Step 2:
23
Approximate Pr(S occurs in Gi) by [li, ui]
0.5 0.6 0.7 0.8 uncertain graph Gi pattern S (x1) (x2) (x4) (x3) Step 1: Find all embeddings of S in Gi embeddings Step 2: Assign boolean variables to the edges in the embeddings. Pr(x1) = 0.5, Pr(x2) = 0.6, Pr(x3) = 0.7, Pr(x4) = 0.8. Step 3: Construct a conjunctive formula for each embedding. C1 = (x1 ^ x2), C2 = (x1 ^ x4), C3 = (x2 ^ x3), C4 = (x3 ^ x4). Step 4: Construct a DNF formula. F = C1 V C2 V C3 V C4. Step 5: Estimate Pr(F = TRUE) by p using Karp & Luby’s Markov-Chain Monte-Carlo method with absolute error e*minsup/2 and confidence d (d ∈[0,1]). Step 6: [li, ui] = [p - e*minsup/2, p + e*minsup/2].
24
Outline Background Problem Definition Algorithm Experimental Results
Conclusions
25
Experimental Results Data The STRING Database (
26
Time Efficiency
27
Approximation Quality
28
Scalability
29
Conclusions A new model of uncertain graph data has been proposed.
The frequent subgraph pattern mining problem on uncertain graph data has been formalized. The computational complexity of the problem has been formally proved to be NP-hard. An approximate mining algorithm has been proposed. The proposed algorithm has high efficiency, high approximation quality, and high scalability.
30
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.