Download presentation
Presentation is loading. Please wait.
1
CS246: Latent Dirichlet Analysis
Junghoo βJohnβ Cho UCLA
2
LSI LSI uses SVD to find the best rank-K approximation
The result is difficult to interpret especially with negative numbers Q: Can we develop a more interpretable method?
3
Probabilistic Approach
Develop a probabilistic model on how users write a document based on topics. Q: How do we write a document? A: (1) Pick the topic(s) (2) Start writing on the topic(s) with related terms
4
Two Probability Vectors
For every document π, we assume that the user will first pick the topics to write about π(π§|π): probability to pick topic π§ when the user write each word in document π. π§=1 π π π§ π =1 Document-topic vector of π We also assume that every topic is associated with certain words with certain probability π(π€|π§) : the probability of picking the word w when the user write on the topic π§. π€=1 π π π€ π§ =1 Topic-word vector of π§
5
Probabilistic Topic Model
There exists T number of topics The topics-word vector for each topic is set before any document is written π(π€|π§) is set for every π§ and π€ Then for every document π, The user decides the topics to write on, i.e., π(π§|π) For each word in π The user selects a topic π§ with probability π(π§|π) The user selects a word π€ with probability π(π€|π§)
6
Probabilistic Document Model
π(π€|π§) π(π§|π) Topic 1 Topic 2 1.0 money1 bank1 loan1 bank1 money1 ... bank loan money DOC 1 0.5 money1 river2 bank1 stream2 bank2 ... DOC 2 river stream bank 1.0 river2 stream2 river2 bank2 stream2 ... DOC 3
7
Example: Calculating Probability
z1 = {w1:0.8, w2:0.1, w3:0.1} z2 = {w1:0.1, w2:0.2, w3:0.7} dβs topics are {z1: 0.9, z2:0.1} d has three terms {w32, w11, w21}. Q: What is the probability that a user will write such a document? A: (0.1*0.7)*(0.9*0.8)*(0.9*0.1)
8
Corpus Generation Probability
π: # topics π·: # documents π: # words per document Probability of generating the corpus πΆ π πΆ = π=1 π· π=1 π π( π€ π,π | π§ π,π )π( π§ π,π | π π )
9
Generative Model vs Inference (1)
P(w|z) P(z|d) Topic 1 Topic 2 1.0 money1 bank1 loan1 bank1 money1 ... bank loan money DOC 1 0.5 money1 river2 bank1 stream2 bank2 ... DOC 2 river stream bank 1.0 river2 stream2 river2 bank2 stream2 ... DOC 3
10
Generative Model vs Inference (2)
Topic 1 Topic 2 ? money? bank? loan? bank? money? ... ? DOC 1 ? money? river? bank? stream? bank? ... DOC 2 ? ? river? stream? river? bank? stream? ... DOC 3
11
Probabilistic Latent Semantic Index (pLSI)
Basic Idea: We pick π(π§π|ππ), π(π€π|π§π), and π§ππ values to maximize the corpus generation probability Maximum-likelihood estimation (MLE) More discussion later on how to compute the π(π§π|ππ), π(π€π|π§π), and π§ππ values that maximize the probability
12
Problem of pLSI Q: 1M documents, 1000 topics, 1M words words/doc. How much input data? How many variables do we have to estimate? Q: Too much freedom. How can we avoid overfitting problem? A: Adding constraints to reduce degree of freedom
13
Latent Dirichlet Analysis (LDA)
When term probabilities are selected for each topic Topic-term probability vector, (P(w1|zj), β¦, P(wW|zj)), is sampled randomly from Dirichlet distribution When users select topics for a document Document-topic probability vector, (P(z1|d), β¦, P(zT|d)), is sampled randomly from Dirichlet distribution
14
What is Dirichlet Distribution?
Multinomial distribution Given the probability pi of each event ei, what is the probability that each event ei occurs βΊi times after n trial? We assume piβs. The distribution assigns βΊiβs probability. Dirichlet distribution βInverseβ of multinomial distribution: We assume βΊiβs. The distribution assigns piβs probability.
15
Dirichlet Distribution
Q: Given βΊ1, βΊ2,β¦, βΊk, what are the most likely p1, p2, pk values?
16
Normalized Probability Vector and Simplex Plane
When ( π 1 ,β¦, π π ) satisfies π 1 +β¦+ π π =1, they are on a β(n-1)- simplex planeβ Remember that π§=1 π π π§ π =1 and π€=1 π π π€ π§ =1 Example: ( π 1 , π 2 , π 3 ) and their 2-simplex plane π 1 π 2 π 3 1
17
Effect of βΊ values p1 p2 p3 p1 p2 p3
18
Effect of βΊ values p1 p2 p3 p1 p2 p3
19
Effect of βΊ values p1 p2 p3 p1 p2 p3
20
Effect of βΊ values p1 p1 p3 p3 p2 p2
21
Minor Correction is not a standard Dirichlet distribution
The βstandardβ Dirichlet distribution formula: I used non-standard to make the connection to multinomial distribution clear From now on, we use the standard formula
22
Back to LDA Document Generation Model
For each topic z Pick the word probability vector π(π€|π§)βs by taking a random sample from Dir(Ξ²1,β¦, Ξ²W) For every document d The user decides its topic vector π(π§|π)βs by taking a random sample from Dir(βΊ1,β¦, βΊT) For each word in d The user selects a topic z with probability π(π§|π) The user selects a word w with probability π(π€|π§) Once all is said and done, we have π(π€|π§): topic-term vector for each topic π(π§|π): document-topic vector for each document Topic assignment to every word in each document
23
Symmetric Dirichlet Distribution
In principle, we need to assume two vectors, (βΊ1,β¦, βΊT) and (Ξ²1 ,β¦, Ξ²W) as input parameters. In practice, we often assume all βΊiβs are equal to βΊ and all Ξ²iβs = Ξ² Use two scalar values βΊ and Ξ², not two vectors. Symmetric Dirichlet distribution Q: What is the implication of this assumption?
24
Effect of βΊ value on Symmetric Dirichlet
Q: What does it mean? How will the sampled document topic vectors change as βΊ grows? Common choice: βΊ = 50/T, b = 200/W p1 p1 p3 p3 p2 p2
25
Plate Notation a P(z|d) z b w P(w|z) M T N
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.