Download presentation
Presentation is loading. Please wait.
Published bySophie Gerold Modified over 9 years ago
1
Semi-Supervised Learning With Graphs William Cohen 1
2
HW7 Hints m movies n users m movies x1y1 x2y2.. …… xnyn a1a2..…am b1b2……bm v11 … …… vij … vnm ~ V[i,j] = user i’s rating of movie j r W H V 2
3
Homework 7 - Hints What’s parallel and what’s sequential? for epoch t=1,….,T in sequence – for stratum s=1,…,K in sequence for block b=1,… in stratum s in parallel – for triple (i,j,rating) in block b in sequence » do SGD step for (i,j,rating) Hint 1: You need some loops around rdd operations Hint 2: Look at rdd.mapPartitions() to see how to control a parallel/sequential breakdown like the inner loop only needs part of W,H this needs r/w access to (part of) W and H How? 3
4
Homework 7 - Hints What’s in memory and what’s on disk? ratings triples (i,j,r) on disk – W,H in memory as shared synchronized memory: you should worry about transmission cost and blocking Abhinav hint: Python dictionaries are thread-safe 4
5
Homework 7 - Hints What’s in memory and what’s on disk? ratings triples (i,j,r) on disk – W,H in memory as broadcast memory: workers get unsynchronized copies of the data: worry here is extra broadcast cost hint: mapPartitions ~= Hadoop map 5
6
Homework 7 - Hints What is rdd.mapPartitions(foo)? Function foo is an iterator that loops over the entries in a partition (aka shard) of rdd and yields a sequence of entries: def foo(iterator): doSomething() #setup for x in iterator … yield … … doAnotherThing() #breakdown 6
7
Homework 7 - Hints What is rdd.mapPartitions(foo)? Function foo is an iterator that loops over the entries in a partition (aka shard) of rdd and yields a sequence of entries: def foo(iterator): doSomething() #setup: modifiedEntries = set() for (i,j,r) in iterator … yield … # update copies of W[i,*], H[*,j] … doAnotherThing() #output modified entries of W,H input: broadcast W,H, ratings Vij output: updated W,H 7
8
Homework 7 - Hints What’s in memory and what’s on disk? ratings triples (i,j,r) on disk – W,H in memory shared memory individual copies of H, W memory – W,H on disk, joined in with (i,j,r) should worry about inflating the size of the ratings – tis scheme inserts many copies of each W – Hybrid schemes: triples (i,j,r) (i,{(j1,r1),(j2,r2),…,}, W[i,*]) H in memory 8
9
Homework 7 - Hints Change grain size of computation Rather than streaming through (r,i,j) tasks: – create SGD subtasks W1,H1,V11 W2,H2,V22 W3,H3,V33 – each map task does SGD on one subtask Should worry about subtask size (part of ratings are in memory) 9
10
Homework 7 - Hints What’s in memory and what’s on disk? ratings triples (i,j,r) on disk – W,H in memory shared memory individual copies of H, W memory – W,H on disk, joined in with (i,j,r) should worry about inflating the size of the ratings – tis scheme inserts many copies of each W – Hybrid schemes: triples (i,j,r) (i,{(j1,r1),(j2,r2),…,}, W[i,*]) H in memory 10
11
Homework 7 – Avuncular Advice This is a really interesting assignment – think and learn! …But don’t stress out We know this assignment is challenging and we’ll curve it appropriately HW8 is being reduced in scope PS. If you’re an undergrad doing carnival stuff talk to me about the (useless) extension 11
12
Semi-Supervised Learning With Graphs William Cohen 12
13
Flashback – Graph Algorithms so far…. PageRank and how to scale it up Personalized PageRank/Random Walk with Restart and – how to implement it – how to use it for extracting part of a graph Other uses for graphs? – not so much 13
14
Main topics today Scalable semi-supervised learning on graphs – SSL with RWR – SSL with coEM/wvRN/HF Scalable unsupervised learning on graphs – Power iteration clustering – … 14
15
Semi-supervised learning A pool of labeled examples L A (usually larger) pool of unlabeled examples U Can you improve accuracy somehow using U? 15
16
Semi-Supervised Bootstrapped Learning/Self-training Paris Pittsburgh Seattle Cupertino mayor of arg1 live in arg1 San Francisco Austin denial arg1 is home of traits such as arg1 anxiety selfishness Berlin Extract cities: 16
17
Semi-Supervised Bootstrapped Learning via Label Propagation Paris live in arg1 San Francisco Austin traits such as arg1 anxiety mayor of arg1 Pittsburgh Seattle denial arg1 is home of selfishness 17
18
Semi-Supervised Bootstrapped Learning via Label Propagation Paris live in arg1 San Francisco Austin traits such as arg1 anxiety mayor of arg1 Pittsburgh Seattle denial arg1 is home of selfishness Nodes “near” seedsNodes “far from” seeds Information from other categories tells you “how far” (when to stop propagating) arrogance traits such as arg1 denial selfishness 18
19
ASONAM-2010 (Advances in Social Networks Analysis and Mining) 19
20
Network Datasets with Known Classes UBMCBlog AGBlog MSPBlog Cora Citeseer 20
21
RWR - fixpoint of: Seed selection 1.order by PageRank, degree, or randomly 2.go down list until you have at least k examples/class Seed selection 1.order by PageRank, degree, or randomly 2.go down list until you have at least k examples/class 21
22
Results – Blog data Random Degree PageRank We’ll discuss this soon…. 22
23
Results – More blog data Random Degree PageRank 23
24
Results – Citation data Random Degree PageRank 24
25
Seeding – MultiRankWalk 25
26
Seeding – HF/wvRN 26
27
What is HF aka coEM aka wvRN? 27
28
CoEM/HF/wvRN One definition [MacKassey & Provost, JMLR 2007]:… Another definition: A harmonic field – the score of each node in the graph is the harmonic (linearly weighted) average of its neighbors’ scores; [X. Zhu, Z. Ghahramani, and J. Lafferty, ICML 2003] Another definition: A harmonic field – the score of each node in the graph is the harmonic (linearly weighted) average of its neighbors’ scores; [X. Zhu, Z. Ghahramani, and J. Lafferty, ICML 2003] 28
29
CoEM/wvRN/HF Another justification of the same algorithm…. … start with co-training with a naïve Bayes learner 29
30
CoEM/wvRN/HF One algorithm with several justifications…. One is to start with co-training with a naïve Bayes learner And compare to an EM version of naïve Bayes – E: soft-classify unlabeled examples with NB classifier – M: re-train classifier with soft-labeled examples 30
31
CoEM/wvRN/HF A second experiment – each + example: concatenate features from two documents, one of class A+, one of class B+ – each - example: concatenate features from two documents, one of class A-, one of class B- – features are prefixed with “A”, “B” disjoint 31
32
CoEM/wvRN/HF A second experiment – each + example: concatenate features from two documents, one of class A+, one of class B+ – each - example: concatenate features from two documents, one of class A-, one of class B- – features are prefixed with “A”, “B” disjoint NOW co-training outperforms EM 32
33
CoEM/wvRN/HF Co-training with a naïve Bayes learner vs an EM version of naïve Bayes – E: soft-classify unlabeled examples with NB classifier – M: re-train classifier with soft-labeled examples vs an EM version of naïve Bayes – E: soft-classify unlabeled examples with NB classifier – M: re-train classifier with soft-labeled examples incremental hard assignments iterative soft assignments 33
34
Co-Training Rote Learner My advisor + - - pages hyperlinks - - - - + + - - - + + + + - - + + - 34
35
Co-EM Rote Learner: equivalent to HF on a bipartite graph Pittsburgh + - - contexts NPs - - - - + + - - - + + + + - - + + - lives in _ 35
36
What is HF aka coEM aka wvRN? Algorithmically: HF propagates weights and then resets the seeds to their initial value MRW propagates weights and does not reset seeds Algorithmically: HF propagates weights and then resets the seeds to their initial value MRW propagates weights and does not reset seeds 36
37
MultiRank Walk vs HF/wvRN/CoEM Seeds are marked S MRWHF 37
38
Back to Experiments: Network Datasets with Known Classes UBMCBlog AGBlog MSPBlog Cora Citeseer 38
39
MultiRankWalk vs wvRN/HF/CoEM 39
40
How well does MWR work? 40
41
Parameter Sensitivity 41
42
Semi-supervised learning A pool of labeled examples L A (usually larger) pool of unlabeled examples U Can you improve accuracy somehow using U? These methods are different from EM – optimizes Pr(Data|Model) How do SSL learning methods (like label propagation) relate to optimization? 42
43
SSL as optimization slides from Partha Talukdar 43
44
44
45
yet another name for HF/wvRN/coEM 45
46
match seeds smoothness prior 46
47
47
48
48
49
49
50
How to do this minimization? First, differentiate to find min is at Jacobi method: To solve Ax=b for x Iterate: … or: 50
51
51
52
52
53
precision- recall break even point /HF/… 53
54
/HF/… 54
55
/HF/… 55
56
from mining patterns like “musicians such as Bob Dylan” from HTML tables on the web that are used for data, not formatting 56
57
57
58
58
59
More recent work (AIStats 2014) Propagating labels requires usually small number of optimization passes – Basically like label propagation passes Each is linear in – the number of edges – and the number of labels being propagated Can you do better? – basic idea: store labels in a countmin sketch – which is basically an compact approximation of an object double mapping 59
60
Flashback: CM Sketch Structure Each string is mapped to one bucket per row Estimate A[j] by taking min k { CM[k,h k (j)] } Errors are always over-estimates Sizes: d=log 1/ , w=2/ error is usually less than ||A|| 1 +c h 1 (s) h d (s) d=log 1/ w = 2/ from: Minos Garofalakis 60
61
More recent work (AIStats 2014) Propagating labels requires usually small number of optimization passes – Basically like label propagation passes Each is linear in – the number of edges – and the number of labels being propagated – the sketch size – sketches can be combined linearly without “unpacking” them: sketch(av + bw) = a*sketch(v)+b*sketch(w) – sketchs are good at storing skewed distributions 61
62
More recent work (AIStats 2014) Label distributions are often very skewed – sparse initial labels – community structure: labels from other subcommunities have small weight 62
63
More recent work (AIStats 2014) Freebase Flick-10k “self-injection”: similarity computation 63
64
More recent work (AIStats 2014) Freebase 64
65
More recent work (AIStats 2014) 100 Gb available 65
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.