Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speeding up LDA.

Similar presentations


Presentation on theme: "Speeding up LDA."— Presentation transcript:

1 Speeding up LDA

2 The LDA Topic Model

3 Unsupervised NB vs LDA different class distrib θ for each doc
one class prior Y Nd D θd γk K α β Zdi Y Nd D π W γ K α β one Y per doc one Z per word Wdi

4 LDA’s view of a document
Mixed membership model

5 LDA topics: top words w by Pr(w|Z=k)

6 Parallel LDA

7 JMLR 2009

8 Observation How much does the choice of z depend on the other z’s in the same document? quite a lot How much does the choice of z depend on the other z’s in elsewhere in the corpus? maybe not so much depends on Pr(w|t) but that changes slowly Can we parallelize Gibbs and still get good results?

9 Question Can we parallelize Gibbs sampling?
formally, no: every choice of z depends on all the other z’s Gibbs needs to be sequential just like SGD

10 What if you try and parallelize?
Split document/term matrix randomly and distribute to p processors .. then run “Approximate Distributed LDA”

11 What if you try and parallelize?
D=#docs W=#word(types) K=#topics N=words in corpus

12

13

14

15 Update c. 2014 Algorithms: Distributed variational EM
Asynchronous LDA (AS-LDA) Approximate Distributed LDA (AD-LDA) Ensemble versions of LDA: HLDA, DCM-LDA Implementations: GitHub Yahoo_LDA not Hadoop, special-purpose communication code for synchronizing the global counts Alex Smola, YahooCMU Mahout LDA Andy Schlaikjer, CMUTwitter

16 Faster Sampling for LDA

17 RECAP Way way more detail

18 RECAP More detail

19 RECAP

20 RECAP

21 You spend a lot of time sampling
RECAP z=1 z=2 random z=3 unit height You spend a lot of time sampling There’s a loop over all topics here in the sampler

22 KDD 09

23 z=s+r+q

24 lookup U on line segment for r
If U<s: lookup U on line segment with tic-marks at α1β/(βV + n.|1), α2β/(βV + n.|2), … If s<U<r: lookup U on line segment for r Only need to check t such that nt|d>0 z=s+r+q

25 lookup U on line segment for r If s+r<U:
If U<s: lookup U on line segment with tic-marks at α1β/(βV + n.|1), α2β/(βV + n.|2), … If s<U<s+r: lookup U on line segment for r If s+r<U: lookup U on line segment for q Only need to check t such that nw|t>0 z=s+r+q

26 Only need to check occasionally (< 10% of the time)
Only need to check t such that nt|d>0 Only need to check t such that nw|t>0 z=s+r+q

27 Trick; count up nt|d for d when you start working on d and update incrementally
Only need to store (and maintain) total words per topic and α’s,β,V Only need to store nt|d for current d Need to store nw|t for each word, topic pair …??? z=s+r+q

28 1. Precompute, for each t, 2. Quickly find t’s such that nw|t is large for w Most (>90%) of the time and space is here… Need to store nw|t for each word, topic pair …??? z=s+r+q

29 1. Precompute, for each t, 2. Quickly find t’s such that nw|t is large for w map w to an int array no larger than frequency w no larger than #topics encode (t,n) as a bit vector n in the high-order bits t in the low-order bits keep ints sorted in descending order Most (>90%) of the time and space is here… Need to store nw|t for each word, topic pair …???

30

31 Other Fast Samplers for LDA

32 Alias tables O(K) Basic problem: how can we sample from a multinomial quickly? If the distribution changes slowly maybe we can do some preprocessing and then sample multiple times. Proof of concept: generate r~uniform and use a binary tree r in (23/40,7/10] O(log2K)

33 Alias tables Another idea… Simulate the dart with two drawn values:
rx  int(u1*K) ry  u1*pmax keep throwing till you hit a stripe

34 Alias tables An even more clever idea: minimize the brown space (where the dart “misses”) by sizing the rectangle’s height to the average probability, not the maximum probability, and cutting and pasting a bit. You can always do this using only two colors in each column of the final alias table and the dart never misses! mathematically speaking…

35 Alias sampling for LDA You can sample quickly….
But you need to regenerate the alias table after every topic switch Workaround: Sample from an old, stale alias table Update it periodically Compensate for the stale parameters by replacing Gibbs sampling with Metropolis-Hastings (MH)

36 Alias sampling for LDA MH sampler: q defined by stale alias sampler
we can sample quickly and easily from q we can evaluate p(i) easily p is based on actual counts here i and j are vectors of topic assignments

37

38 Yet More Fast Samplers for LDA

39

40 Fenwick Tree (1994) O(K) Basic problem: how can we sample from a multinomial quickly? If the distribution changes slowly maybe we can do some preprocessing and then sample multiple times. Proof of concept: generate r~uniform and use a binary tree r in (23/40,7/10] O(log2K)

41 Data structures and algorithms
LSearch: linear search

42 Data structures and algorithms
BSearch: binary search store cumulative probability

43 Data structures and algorithms
Alias sampling…..

44 Data structures and algorithms
Fenwick tree

45 Data structures and algorithms
Fenwick tree Binary search βq: dense, changes slowly, re-used for each word in a document r: sparse, a different one is needed for each uniq term in doc Sampler is:

46 Speedup vs std LDA sampler (1024 topics)

47 Speedup vs std LDA sampler (10k-50k topics)

48 And Parallelism….

49 Second idea: you can sample document-by-document or word-by-word …. or….
use a MF-like approach to distributing the data.

50

51 Multi-core NOMAD method

52 On Beyond LDA

53 Network Datasets UBMCBlog AGBlog MSPBlog Cora Citeseer

54 Motivation Social graphs seem to have some aspects of randomness
small diameter, giant connected components,.. some structure homophily, scale-free degree dist? How do you model it?

55 More terms “Stochastic block model”, aka “Block-stochastic matrix”:
Draw ni nodes in block i With probability pij, connect pairs (u,v) where u is in block i, v is in block j Special, simple case: pii=qi, and pij=s for all i≠j Question: can you fit this model to a graph? find each pij and latent nodeblock mapping

56 Not? football

57 Not? books Artificial graph

58 Stochastic Block models: assume 1) nodes w/in a block z and 2) edges between blocks zp,zq are exchangeable a b zp zp zq p apq N N2

59 Another mixed membership block model

60 Another mixed membership block model
z=(zi,zj) is a pair of block ids nz = #pairs z qz1,i = #links to i from block z1 qz1,. = #outlinks in block z1 δ = indicator for diagonal M = #nodes

61 Experiments Balasubramanyan, Lin, Cohen, NIPS w/s 2010


Download ppt "Speeding up LDA."

Similar presentations


Ads by Google