Download presentation
Presentation is loading. Please wait.
Published byDamian Rose Modified over 9 years ago
1
School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Danai Koutra U Kang Hsing-Kuo Kenneth Pao Tai-You Ke Duen Horng (Polo) Chau Christos Faloutsos ECML PKDD, 5-9 September 2011, Athens, Greece
2
Problem Definition: G B A techniques Danai Koutra (CMU) - PKDD 20111 Given: graph with N nodes & M edges; few labeled nodes Find: class (red/green) for rest nodes Assuming: network effects ( homophily/ heterophily )
3
Homophily and Heterophily Danai Koutra (CMU) - PKDD 20112 Step 1 Step 2 All methods handle homophily NOT all methods handle heterophily BUT proposed method does! NOT all methods handle heterophily BUT proposed method does!
4
Why do we study these methods? Danai Koutra (CMU) - PKDD 20113
5
Motivation (1): Law Enforcement Danai Koutra (CMU) - PKDD 20114 [Tong+ ’06][Lin+ ‘04][Chen+ ’11]…
6
Motivation (2): Cyber Security Danai Koutra (CMU) - PKDD 20115 victims? [ Kephart+ ’95 ] [Kolter+ ’06 ][Song+ ’08-’11][Chau+ ‘11]… botnet members? bot
7
Motivation (3): Fraud Detection Danai Koutra (CMU) - PKDD 20116 Lax controls? [Neville+ ‘05][Chau+ ’07][McGlohon+ ’09]… fraudsters? fraudster
8
Motivation (4): Ranking Danai Koutra (CMU) - PKDD 20117 [Brin+ ‘98][Tong+ ’06][Ji+ ‘11]…
9
Our Contributions Theory correspondence: BP ≈ RWR ≈ SSL linearization for BP convergence criteria for linearized BP Practice F A BP algorithm fast accurate and scalable Experiments on DBLP, Web, and Kronecker graphs Danai Koutra (CMU) - PKDD 20118
10
Roadmap Danai Koutra (CMU) - PKDD 20119 Background Belief Propagation Random Walk with Restarts Semi-supervised Learning Linearized BP Correspondence of Methods Proposed Algorithm Experiments Conclusions
11
Background Danai Koutra (CMU) - PKDD 201110 Apologies for diversion…
12
Background 1: Belief Propagation (BP) Iterative message-based method Danai Koutra (CMU) - PKDD 201111 0.90.1 0.20.8 0.30.7 0.90.1 1 st round 2 nd round... until stop criterion fulfilled “Propagation matrix”: Homophily Heterophily 0.90.1 0.9 class of “sender” class of “receiver” Usually same diagonal = homophily factor h Usually same diagonal = homophily factor h “about-half” homophily factor h h = h-0.5 “about-half” homophily factor h h = h-0.5 0.4-0.4 0.4
13
Danai Koutra (CMU) - PKDD 201112 Background 1: Belief Propagation Equations [Pearl ‘82][Yedidia+ ‘02] …[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10]
14
Background 2: Semi-Supervised Learning graph-based SSL use few labeled data & exploit neighborhood information Danai Koutra (CMU) - PKDD 201113 STEP1STEP1 STEP1STEP1 STEP2STEP2 STEP2STEP2 0.8 -0.3 ? ? -0.1 0.6 0.8 [Zhou ‘06][Ji, Han ’10]…
15
Background 3: Personalized Random Walk with Restarts (RWR) Danai Koutra (CMU) - PKDD 201114 [Brin+ ’98][Haveliwala ’03][Tong+ ‘06][Minkov, Cohen ‘07]…
16
Danai Koutra (CMU) - PKDD 201115 Background
17
Qualitative Comparison of G B A Methods Danai Koutra (CMU) - PKDD 201116 GBA Method HeterophilyScalabilityConvergence RWR ✗✓✓ SSL ✗✓✓ BP ✓✓ ? F A BP ✓✓✓
18
Qualitative Comparison of G B A Methods Danai Koutra (CMU) - PKDD 201117 GBA Method HeterophilyScalabilityConvergence RWR ✗✓✓ SSL ✗✓✓ BP ✓✓ ? F A BP ✓✓✓
19
Roadmap Danai Koutra (CMU) - PKDD 201118 Background Linearized BP Correspondence of Methods Proposed Algorithm Experiments Conclusions New work Previous work
20
Linearized BP Odds ratio Maclaurin expansions Odds ratio Maclaurin expansions Danai Koutra (CMU) - PKDD 201119 BP is approximated by Theorem [Koutra+] Sketch of proof 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ? ? 0 -10 -2 10 -2 0 -10 -2 10 -2 1 d1 d2 d3 d1 d2 d3 final beliefs prior beliefs scalar constants 0.5 pipi 0 “ ” 1 DETAILS!
21
Linearized BP vs BP Danai Koutra (CMU) - PKDD 201120 BP is approximated by Linearized BP 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ? ? 0 -10 -2 10 -2 0 -10 -2 10 -2 1 d1 d2 d3 d1 d2 d3 linearnon-linear Belief Propagation Our proposal:Original [Yedidia+]:
22
Our Contributions Theory correspondence: BP ≈ RWR ≈ SSL linearization for BP convergence criteria for linearized BP Practice F A BP algorithm fast accurate and scalable Experiments on DBLP, Web, and Kronecker graphs Danai Koutra (CMU) - PKDD 201121 ✓
23
DETAILS! Linearized BP converges if Linearized BP: convergence Danai Koutra (CMU) - PKDD 201122 Theorem degree of node n 1-norm < 1 OR Frobenius norm < 1 1-norm < 1 OR Frobenius norm < 1 Sketch of proof
24
Our Contributions Theory correspondence: BP ≈ RWR ≈ SSL linearization for BP convergence criteria for linearized BP Practice F A BP algorithm fast accurate and scalable Experiments on DBLP, Web, and Kronecker graphs Danai Koutra (CMU) - PKDD 201123 ✓ ✓
25
Roadmap Danai Koutra (CMU) - PKDD 201124 Background Linearized BP Correspondence of Methods Proposed Algorithm Experiments Conclusions
26
Correspondence of Methods Danai Koutra (CMU) - PKDD 201125 MethodMatrixUnknownknown RWR [I – c AD -1 ]×x=(1-c)y SSL [I + a (D - A)] ×x=y F A BP [I + a D - c ’ A] ×bhbh =φhφh 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ? ? 0 1 0 1 d1 d2 d3 d1 d2 d3 final labels/ beliefs prior labels/ beliefs adjacency matrix
27
RWR ≈ SSL Danai Koutra (CMU) - PKDD 201126 RWR and SSL identical if THEOREM individual homophily strength of node i (SSL) fly-out probability (RWR) Simplification global homophily strength of nodes (SSL) DETAILS!
28
RWR ≈ SSL: example Danai Koutra (CMU) - PKDD 201127 similar scores and identical rankings y = x RWR scores SSL scores individual hom. strength global hom. strength
29
Our Contributions Theory correspondence: BP ≈ RWR ≈ SSL linearization for BP convergence criteria for linearized BP Practice F A BP algorithm fast accurate and scalable Experiments on DBLP, Web, and Kronecker graphs Danai Koutra (CMU) - PKDD 201128 ✓ ✓ ✓
30
Roadmap Danai Koutra (CMU) - PKDD 201129 Background Linearized BP Correspondence of Methods Proposed Algorithm Experiments Conclusions
31
Proposed algorithm: F A BP ①Pick the homophily factor ②Solve the linear system ①(opt) If accuracy is low, run BP with prior beliefs. Danai Koutra (CMU) - PKDD 201130 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ? ? 0 1 0 1 d1 d2 d3 d1 d2 d3 0.5 pipi 0 “ ” 1
32
Roadmap Danai Koutra (CMU) - PKDD 201131 Background Linearized BP Correspondence of Methods Proposed Algorithm Experiments Conclusions
33
Datasets Danai Koutra (CMU) - PKDD 201132 p% labeled nodes initially YahooWeb:.edu/others | DBLP: AI/not AI accuracy computed on hold-out set Dataset# nodes# edges YahooWeb 1,413,511,3906,636,600,779 Kronecker 1 177,1471,977,149,596 Kronecker 2 120,5521,145,744,786 Kronecker 3 59,049282,416,924 Kronecker 4 19,68340,333,924 DBLP 37,791170,794 6 billion!
34
Specs hadoop version 0.20.2 M45 hadoop cluster (Yahoo!) 500 machines 4000 cores 1.5PB total storage 3.5TB of memory 100 machines used for the experiments Danai Koutra (CMU) - PKDD 201133
35
Roadmap Danai Koutra (CMU) - PKDD 201134 Background Linearized BP Correspondence of Methods Proposed Algorithm Experiments 1. Accuracy 2. Convergence 3. Sensitivity 4. Scalability 5. Parallelism Conclusions
36
Results (1): Accuracy Danai Koutra (CMU) - PKDD 201135 All points on the diagonal scores near-identical beliefs in BP beliefs in F A BP 0.3% labels Scatter plot of beliefs for (h, priors) = ( 0.5±0.002, 0.5±0.001 ) AI non-AI
37
Results (2): Convergence Danai Koutra (CMU) - PKDD 201136 F A BP achieves maximum accuracy within the convergence bounds. Accuracy wrt h h (priors = ±0.001) 0.3% labels h % accuracy frobenius norm |e_val| = 1 1-norm convergence bounds h
38
Danai Koutra (CMU) - PKDD 201137 Accuracy wrt h h (priors = ±0.001) 0.3% labels h % accuracy frobenius norm |e_val| = 1 1-norm F A BP is robust to the homophily factor h h within the convergence bounds. Results (3): Sensitivity to the homophily factor convergence bounds
39
( For all plots ) Danai Koutra (CMU) - PKDD 201138 Average over 10 runs Error bars tiny h % accuracy h prior beliefs’ magnitude note
40
Results (3): Sensitivity to the prior beliefs Danai Koutra (CMU) - PKDD 201139 F A BP is robust to the prior beliefs φ h. % accuracy prior beliefs’ magnitude Accuracy wrt priors (h h = ±0.002) p=5% p=0.1% p=0.3% p=0.5%
41
Results (4): Scalability Danai Koutra (CMU) - PKDD 201140 F A BP is linear on the number of edges. # of edges (Kronecker graphs) runtime (min)
42
Results (5): Parallelism Danai Koutra (CMU) - PKDD 201141 F A BP ~2x faster & wins/ties on accuracy. # of steps runtime (min) % accuracy runtime (min)
43
Roadmap Danai Koutra (CMU) - PKDD 201142 Background Linearized BP Correspondence of Methods Proposed Algorithm Experiments Conclusions
44
Our Contributions Theory correspondence: BP ≈ RWR ≈ SSL linearization for BP convergence criteria for linearized BP Practice F A BP algorithm fast accurate and scalable Experiments on DBLP, Web, and Kronecker graphs Danai Koutra (CMU) - PKDD 201143 ~2x faster 6 billion edges! same/better ✓ ✓ ✓ ✓ ✓
45
Thanks Data Funding Danai Koutra (CMU) - PKDD 201144 NSC ILLINOIS Ming Ji, Jiawei Han
46
Thank you! Danai Koutra (CMU) - PKDD 201145 % accuracy runtime (min)
47
Danai Koutra (CMU) - PKDD 201146 Q: Can we have multiple classes? AI ML DB 0.70.20.1 0.20.60.2 0.10.20.7 Propagation matrix A: yes!
48
Q: Which of the methods do you recommend? A: (Fast) Belief Propagation Reasons: solid bayesian foundation heterophily and multiple classes Danai Koutra (CMU) - PKDD 201147 0.70.20.1 0.20.60.2 0.10.20.7 Propagation matrix
49
Q: Why is F A BP faster than BP? A: BP 2|E| messages per iteration F A BP |V| records per “power method” iteration Danai Koutra (CMU) - PKDD 201148 |V| < 2 |E|
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.