Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS246: Latent Dirichlet Analysis

Similar presentations


Presentation on theme: "CS246: Latent Dirichlet Analysis"β€” Presentation transcript:

1 CS246: Latent Dirichlet Analysis
Junghoo β€œJohn” Cho UCLA

2 LSI LSI uses SVD to find the best rank-K approximation
The result is difficult to interpret especially with negative numbers Q: Can we develop a more interpretable method?

3 Probabilistic Approach
Develop a probabilistic model on how users write a document based on topics. Q: How do we write a document? A: (1) Pick the topic(s) (2) Start writing on the topic(s) with related terms

4 Two Probability Vectors
For every document 𝑑, we assume that the user will first pick the topics to write about 𝑃(𝑧|𝑑): probability to pick topic 𝑧 when the user write each word in document 𝑑. 𝑧=1 𝑇 𝑃 𝑧 𝑑 =1 Document-topic vector of 𝑑 We also assume that every topic is associated with certain words with certain probability 𝑃(𝑀|𝑧) : the probability of picking the word w when the user write on the topic 𝑧. 𝑀=1 π‘Š 𝑃 𝑀 𝑧 =1 Topic-word vector of 𝑧

5 Probabilistic Topic Model
There exists T number of topics The topics-word vector for each topic is set before any document is written 𝑃(𝑀|𝑧) is set for every 𝑧 and 𝑀 Then for every document 𝑑, The user decides the topics to write on, i.e., 𝑃(𝑧|𝑑) For each word in 𝑑 The user selects a topic 𝑧 with probability 𝑃(𝑧|𝑑) The user selects a word 𝑀 with probability 𝑃(𝑀|𝑧)

6 Probabilistic Document Model
𝑃(𝑀|𝑧) 𝑃(𝑧|𝑑) Topic 1 Topic 2 1.0 money1 bank1 loan1 bank1 money1 ... bank loan money DOC 1 0.5 money1 river2 bank1 stream2 bank2 ... DOC 2 river stream bank 1.0 river2 stream2 river2 bank2 stream2 ... DOC 3

7 Example: Calculating Probability
z1 = {w1:0.8, w2:0.1, w3:0.1} z2 = {w1:0.1, w2:0.2, w3:0.7} d’s topics are {z1: 0.9, z2:0.1} d has three terms {w32, w11, w21}. Q: What is the probability that a user will write such a document? A: (0.1*0.7)*(0.9*0.8)*(0.9*0.1)

8 Corpus Generation Probability
𝑇: # topics 𝐷: # documents 𝑀: # words per document Probability of generating the corpus 𝐢 𝑃 𝐢 = 𝑖=1 𝐷 𝑗=1 𝑀 𝑃( 𝑀 𝑖,𝑗 | 𝑧 𝑖,𝑗 )𝑃( 𝑧 𝑖,𝑗 | 𝑑 𝑖 )

9 Generative Model vs Inference (1)
P(w|z) P(z|d) Topic 1 Topic 2 1.0 money1 bank1 loan1 bank1 money1 ... bank loan money DOC 1 0.5 money1 river2 bank1 stream2 bank2 ... DOC 2 river stream bank 1.0 river2 stream2 river2 bank2 stream2 ... DOC 3

10 Generative Model vs Inference (2)
Topic 1 Topic 2 ? money? bank? loan? bank? money? ... ? DOC 1 ? money? river? bank? stream? bank? ... DOC 2 ? ? river? stream? river? bank? stream? ... DOC 3

11 Probabilistic Latent Semantic Index (pLSI)
Basic Idea: We pick 𝑃(𝑧𝑗|𝑑𝑖), 𝑃(π‘€π‘˜|𝑧𝑗), and 𝑧𝑖𝑗 values to maximize the corpus generation probability Maximum-likelihood estimation (MLE) More discussion later on how to compute the 𝑃(𝑧𝑗|𝑑𝑖), 𝑃(π‘€π‘˜|𝑧𝑗), and 𝑧𝑖𝑗 values that maximize the probability

12 Problem of pLSI Q: 1M documents, 1000 topics, 1M words words/doc. How much input data? How many variables do we have to estimate? Q: Too much freedom. How can we avoid overfitting problem? A: Adding constraints to reduce degree of freedom

13 Latent Dirichlet Analysis (LDA)
When term probabilities are selected for each topic Topic-term probability vector, (P(w1|zj), …, P(wW|zj)), is sampled randomly from Dirichlet distribution When users select topics for a document Document-topic probability vector, (P(z1|d), …, P(zT|d)), is sampled randomly from Dirichlet distribution

14 What is Dirichlet Distribution?
Multinomial distribution Given the probability pi of each event ei, what is the probability that each event ei occurs ⍺i times after n trial? We assume pi’s. The distribution assigns ⍺i’s probability. Dirichlet distribution β€œInverse” of multinomial distribution: We assume ⍺i’s. The distribution assigns pi’s probability.

15 Dirichlet Distribution
Q: Given ⍺1, ⍺2,…, ⍺k, what are the most likely p1, p2, pk values?

16 Normalized Probability Vector and Simplex Plane
When ( 𝑝 1 ,…, 𝑝 𝑛 ) satisfies 𝑝 1 +…+ 𝑝 𝑛 =1, they are on a β€œ(n-1)- simplex plane” Remember that 𝑧=1 𝑇 𝑃 𝑧 𝑑 =1 and 𝑀=1 π‘Š 𝑃 𝑀 𝑧 =1 Example: ( 𝑝 1 , 𝑝 2 , 𝑝 3 ) and their 2-simplex plane 𝑝 1 𝑝 2 𝑝 3 1

17 Effect of ⍺ values p1 p2 p3 p1 p2 p3

18 Effect of ⍺ values p1 p2 p3 p1 p2 p3

19 Effect of ⍺ values p1 p2 p3 p1 p2 p3

20 Effect of ⍺ values p1 p1 p3 p3 p2 p2

21 Minor Correction is not a standard Dirichlet distribution
The β€œstandard” Dirichlet distribution formula: I used non-standard to make the connection to multinomial distribution clear From now on, we use the standard formula

22 Back to LDA Document Generation Model
For each topic z Pick the word probability vector 𝑃(𝑀|𝑧)’s by taking a random sample from Dir(Ξ²1,…, Ξ²W) For every document d The user decides its topic vector 𝑃(𝑧|𝑑)’s by taking a random sample from Dir(⍺1,…, ⍺T) For each word in d The user selects a topic z with probability 𝑃(𝑧|𝑑) The user selects a word w with probability 𝑃(𝑀|𝑧) Once all is said and done, we have 𝑃(𝑀|𝑧): topic-term vector for each topic 𝑃(𝑧|𝑑): document-topic vector for each document Topic assignment to every word in each document

23 Symmetric Dirichlet Distribution
In principle, we need to assume two vectors, (⍺1,…, ⍺T) and (Ξ²1 ,…, Ξ²W) as input parameters. In practice, we often assume all ⍺i’s are equal to ⍺ and all Ξ²i’s = Ξ² Use two scalar values ⍺ and Ξ², not two vectors. Symmetric Dirichlet distribution Q: What is the implication of this assumption?

24 Effect of ⍺ value on Symmetric Dirichlet
Q: What does it mean? How will the sampled document topic vectors change as ⍺ grows? Common choice: ⍺ = 50/T, b = 200/W p1 p1 p3 p3 p2 p2

25 Plate Notation a P(z|d) z b w P(w|z) M T N


Download ppt "CS246: Latent Dirichlet Analysis"

Similar presentations


Ads by Google