Download presentation
Presentation is loading. Please wait.
Published byWilliam Gilbert Modified over 9 years ago
1
Semi-Supervised Learning With Graphs William Cohen
2
Administrivia Last assignment (approx PR/sweep/visualize): – 48hr extension for all Final proposal: – Still due Tues 1:30 – I’ve given everyone comments (often brief) and/or questions Final project: – We’re asking everyone to keep in touch We’re assigning each project a mentor (soon) Feel free to discuss issues in office hours Send a few bullet points each Tuesday to your mentor outlining progress (or lack of it) These are required but won’t be graded
3
Graph topics so far Algorithms: – exact PageRank with nodes-in-memory with nothing-in-memory – approximate personalized PageRank (apr) using repeated “push” – a “sweep” to find a coherent subgraph from a personalized PageRank vector/ranking – iterative averaging clamping some nodes no clamping Applications: – web search, etc – semi-supervised learning (MRW algorithm), etc, etc – sampling a graph (with apr) SSL power iteration clustering
4
Streaming PageRank Assume we can store v but not W in memory Repeat until converged: – Let v t+1 = cu + (1-c)Wv t Store A as a row matrix: each line is – i j i,1,…,j i,d [the neighbors of i] Store v’ and v in memory: v’ starts out as cu For each line “i j i,1,…,j i,d “ – For each j in j i,1,…,j i,d v’[j] += (1-c)v[i]/d Everything needed for update is right there in row….
5
Recap: PageRank algorithm Repeat until converged: – Let v t+1 = cu + (1-c)Wv t Pure streaming: use a table mapping nodes to degree+pageRank – Lines are i: degree=d,pr=v For each edge i,j – Send to i (in degree/pagerank) table: outlink j For each line i: degree=d,pr=v: – send to i: incrementVBy c – for each message “outlink j”: send to j: incrementVBy (1-c)*v/d For each line i: degree=d,pr=v – sum up the incrementVBy messages to compute v’ – output new row: i: degree=d,pr=v’
6
Recap: Approximate Personalized PageRank
7
Graph topics so far Algorithms: – exact PageRank with nodes-in-memory with nothing-in-memory – approximate personalized PageRank (apr) using repeated “push” – a “sweep” to find a coherent subgraph from a personalized PageRank vector/ranking – iterative averaging clamping some nodes no clamping Applications: – web search – semi-supervised learning (MRW algorithm); etc, etc – sampling a graph (with apr) SSL power iteration clustering
8
proposal CMU CALO graph William 6/18/07 6/17/07 Sent To Term In Subject einat@cs.cmu.edu Learning to Search Email [SIGIR 2006, CEAS 2006, WebKDD/SNA 2007]
9
Q: “what are Jason’s email aliases?” “ Jason ” Msg 5 Msg 18 jernst@ andrew.cmu.edu Sent from Email Sent to Email Jason Ernst Sent-to EmailAddressOf jernst@ cs.cmu.edu Similar to Msg 2 einat@cs.cmu.edu Sent To einat Has term inv. Basic idea: searching personal information is querying a graph for information and query is done with personalized pageRank + filtering output on a type
10
Tasks that are like similarity queries Person name disambiguation Threading Alias finding [ term “ andy ” file msgId ] “ person ” [ file msgId ] “ file ” What are the adjacent messages in this thread? A proxy for finding “ more messages like this one ” What are the email-addresses of Jason ?... [ term Jason ] “ email-address ” Meeting attendees finder Which email-addresses (persons) should I notify about this meeting? [ meeting mtgId ] “ email-address ”
11
Results Mgmt. game PERSON NAME DISAMBIGUATION
12
Results Mgmt. game PERSON NAME DISAMBIGUATION
13
Results Mgmt. game PERSON NAME DISAMBIGUATION
14
Results Mgmt. game PERSON NAME DISAMBIGUATION
15
Graph topics so far Algorithms: – exact PageRank with nodes-in-memory with nothing-in-memory – approximate personalized PageRank (apr) using repeated “push” – a “sweep” to find a coherent subgraph from a personalized PageRank vector/ranking – iterative averaging clamping some nodes no clamping Applications: – web search – semi-supervised learning (MRW algorithm), etc, etc – sampling a graph (with apr) SSL power iteration clustering
16
Semi-Supervised Bootstrapped Learning via Label Propagation Paris live in arg1 San Francisco Austin traits such as arg1 anxiety mayor of arg1 Pittsburgh Seattle denial arg1 is home of selfishness Nodes “near” seedsNodes “far from” seeds Information from other categories tells you “how far” (when to stop propagating) arrogance traits such as arg1 denial selfishness
17
RWR - fixpoint of: Seed selection 1.order by PageRank, degree, or randomly 2.go down list until you have at least k examples/class
18
MultiRankWalk vs wvRN/HF/CoEM
19
Graph topics so far Algorithms: – exact PageRank with nodes-in-memory with nothing-in-memory – approximate personalized PageRank (apr) using repeated “push” – a “sweep” to find a coherent subgraph from a personalized PageRank vector/ranking – iterative averaging clamping some nodes no clamping Applications: – web search – semi-supervised learning (MRW algorithm) – sampling a graph (with apr) SSL: HF/CoEM/wvRN power iteration clustering
20
Repeated averaging with neighbors on a sample problem… Create a graph, connecting all points in the 2-D initial space to all other points Weighted by distance Run power iteration for 10 steps Plot node id x vs v 10 (x) nodes are ordered by actual cluster number b b b b b gg g g g gg g g rr rr r rr … bluegreen___red___
21
Repeated averaging with neighbors on a sample problem… bluegreen___red___bluegreen___red___bluegreen___red___ smaller larger
22
Repeated averaging with neighbors on a sample problem… bluegreen___red___bluegreen___red___bluegreen___red___ bluegreen___red___bluegreen___red___
23
Repeated averaging with neighbors on a sample problem… very small
24
PIC: Power Iteration Clustering run power iteration (repeated averaging w/ neighbors) with early stopping –V 0 : random start, or “degree matrix” D, or … –Easy to implement and efficient –Very easily parallelized –Experimentally, often better than traditional spectral methods –Surprising since the embedded space is 1-dimensional!
25
Harmonic Functions/CoEM/wvRN then replace v t+1 (i) with seed values +1/-1 for labeled data for 5-10 iterations Classify data using final values from v
26
Experiments “Network” problems: natural graph structure – PolBooks: 105 political books, 3 classes, linked by copurchaser – UMBCBlog: 404 political blogs, 2 classes, blogroll links – AGBlog: 1222 political blogs, 2 classes, blogroll links “ Manifold” problems: cosine distance between classification instances – Iris: 150 flowers, 3 classes – PenDigits01,17: 200 handwritten digits, 2 classes (0-1 or 1-7) – 20ngA: 200 docs, misc.forsale vs soc.religion.christian – 20ngB: 400 docs, misc.forsale vs soc.religion.christian – 20ngC: 20ngB + 200 docs from talk.politics.guns – 20ngD: 20ngC + 200 docs from rec.sport.baseball
27
Experimental results: best-case assignment of class labels to clusters
28
Why I’m talking about graphs Lots of large data is graphs – Facebook, Twitter, citation data, and other social networks – The web, the blogosphere, the semantic web, Freebase, Wikipedia, Twitter, and other information networks – Text corpora (like RCV1), large datasets with discrete feature values, and other bipartite networks nodes = documents or words links connect document word or word document – Computer networks, biological networks (proteins, ecosystems, brains, …), … – Heterogeneous networks with multiple types of nodes people, groups, documents
29
proposal CMU CALO graph William 6/18/07 6/17/07 Sent To Term In Subject einat@cs.cmu.edu Learning to Search Email [SIGIR 2006, CEAS 2006, WebKDD/SNA 2007]
30
proposal CMU CALO graph William Simplest Case: Bi-partite Graphs
31
Outline Background on spectral clustering “Power Iteration Clustering” – Motivation – Experimental results Analysis: PIC vs spectral methods PIC for sparse bipartite graphs – “Lazy” Distance Computation – “Lazy” Normalization – Experimental Results
32
Motivation: Experimental Datasets are… “Network” problems: natural graph structure – PolBooks: 105 political books, 3 classes, linked by copurchaser – UMBCBlog: 404 political blogs, 2 classes, blogroll links – AGBlog: 1222 political blogs, 2 classes, blogroll links – Also: Zachary’s karate club, citation networks,... “ Manifold” problems: cosine distance between all pairs of classification instances – Iris: 150 flowers, 3 classes – PenDigits01,17: 200 handwritten digits, 2 classes (0-1 or 1-7) – 20ngA: 200 docs, misc.forsale vs soc.religion.christian – 20ngB: 400 docs, misc.forsale vs soc.religion.christian – … Gets expensive fast
33
Sp ectral Clustering: Graph = Matrix A*v 1 = v 2 “propogates weights from neighbors” H ABCDEFGHIJ A_111 B1_1 C11_ D_11 E1_1 F11_ G_11 H_11 I11_1 J111_ A B C F D E G I J A3 B2 C3 D E F G H I J M A v1v1 A2*1+3*1+0*1 B3*1+3*1 C3*1+2*1 D E F G H I J v2v2 *=
34
Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogates weights from neighbors” H ABCDEFGHIJ A _.5.3 B _.5 C.3.5_ D _.3 E.5_.3 F.5 _ G _.3 H _ I.5 _.3 J.5.3_ A B C F D E G I J A3 B2 C3 D E F G H I J W v1v1 A 2*.5+3*.5+0*.3 B 3*.3+3*.5 C 3*.33+2*.5 D E F G H I J v2v2 * = W: normalized so columns sum to 1 W = D -1 *A D[i,i]=1/degree(i)
35
Lazy computation of distances and normalizers Recall PIC’s update is – v t = W * v t-1 = = D -1 A * v t-1 – …where D is the [diagonal] degree matrix: D=A*1 My favorite distance metric for text is length- normalized TFIDF: – Def’n: A(i,j)= /||v i ||*||v j || – Let N(i,i)=||v i || … and N(i,j)=0 for i!=j – Let F(i,k)=TFIDF weight of word w k in document v i – Then: A = N -1 F T FN -1 =inner product ||u|| is L2-norm 1 is a column vector of 1’s
36
Lazy computation of distances and normalizers Recall PIC’s update is – v t = W * v t-1 = = D -1 A * v t-1 – …where D is the [diagonal] degree matrix: D=A*1 – Let F(i,k)=TFIDF weight of word w k in document v i – Compute N(i,i)=||v i || … and N(i,j)=0 for i!=j – Don’t compute A = N -1 F T FN -1 – Let D(i,i)= N -1 F T FN -1 *1 where 1 is an all-1’s vector Computed as D=N -1( F T( F(N -1 * 1))) for efficiency – New update: v t = D -1 A * v t-1 = D -1 N -1 F T FN -1 * v t-1 Equivalent to using TFIDF/cosine on all pairs of examples but requires only sparse matrices
37
Experimental results RCV1 text classification dataset – 800k + newswire stories – Category labels from industry vocabulary – Took single-label documents and categories with at least 500 instances – Result: 193,844 documents, 103 categories Generated 100 random category pairs – Each is all documents from two categories – Range in size and difficulty – Pick category 1, with m 1 examples – Pick category 2 such that 0.5m 1 <m 2 <2m 1
38
Results NCUTevd: Ncut with exact eigenvectors NCUTiram: Implicit restarted Arnoldi method No stat. signif. diffs between NCUTevd and PIC
39
Results
42
Linear run-time implies constant number of iterations Number of iterations to “acceleration- convergence” is hard to analyze: – Faster than a single complete run of power iteration to convergence – On our datasets 10-20 iterations is typical 30-35 is exceptional
44
Implicit Manifolds on the NELL datasets Paris live in arg1 San Francisco Austin traits such as arg1 anxiety mayor of arg1 Pittsburgh Seattle denial arg1 is home of selfishness Nodes “near” seedsNodes “far from” seeds arrogance traits such as arg1 denial selfishness
45
Using the Manifold Trick for SSL
50
A smoothing trick:
51
Using the Manifold Trick for SSL
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.