Download presentation
Presentation is loading. Please wait.
Published byEmmeline Cobb Modified over 9 years ago
1
Introduction to LDA Jinyang Gao
2
Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter Setting
3
Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter Setting
4
Bayesian Analysis Suppose we have some coins, they have an average 0.75 probability to appear the FRONT. We throw a coin, how should we esimate? FRONT: 0.75 BACK: 0.25 Prior Estimation
5
Bayesian Analysis Suppose we throw a coin 100 times, and we observed that 25 of them is FRONT. How should we estimate the next throw: FRONT: 0.25 BACK: 0.75 Maximum Likelihood Estimation
6
Bayesian Analysis Can we give a trade-off between prior and observation? Prior is NOT certain to be some fixed value. – Change 0.75 to a distribution of Beta(u|15, 5) Add posterior observation (5 FRONT 15 BACK) – Beta(u|15, 5) to Beta(u|15, 15) Calculate the expectation etc.
7
Bayesian Analysis Key idea: – Express the uncertainty of prior estimation as a distribution. – Distribution converge to a single value after more and more observation – Little observation : prior estimation – Large observation: posterior observation – If we have strong confidence about prior, a single value estimation after any observation won’t change.
8
Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter Setting
9
Dirichlet Distribution
11
Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter Setting
12
Evolution of Topic Model Here we give some solutions from NAÏVE to LDA. – Kmeans (TF vector version) – Kmeans with KL-divergence(Language Model Version) – PLSA (fixed topic frequency prior) – LDA (based on topic frequency observation and smoothing)
13
Evolution of Topic Model K-means with TF vector: – We begin with one simplest model. – Just cluster the document! – Each document is a vector of terms. – How to cluster? K-means! – Each cluster is a topic. – Each topic is a TF vector.
14
Evolution of Topic Model Problems of K-means with TF vector – High frequency words over influence(idf logtf and stop words can help some) – Correlation among the words – Single word than a topic (implement it and you will see)
15
Evolution of Topic Model K-means with KL-divergence: – Generation model about text. – Each text is a probability distribution of words. – Still just cluster the document. – K-means(not cosine or Euclidean, KL-divegence) – Each cluster is a topic. – Each topic is a distribution of words.
16
Evolution of Topic Model Problems of K-means with KL-divergence: – Much better, some topic appear. – Still not clearly. – Each document only have one topic? – It’s still just a good cluster method for documents.
17
Evolution of Topic Model PLSA/PLSI – Each text is a probability distribution of words. – Each text is a distribution of topics. – Probabilistic way to assign topics and words(EM). – Each cluster is a topic (but no entire document in a cluster). – Each topic is a distribution of words.
18
Evolution of Topic Model Problems of PLSA: – First available version of topic model in this evolution! – General words? Context information? See works of QZ Mei among 2005-2008. – What about the k in K-means? – Each topic is not in the same size. – Can two topics with same distribution combine? – Can a large topic break?
19
Evolution of Topic Model LDA: – Gives a prior distribution of topics. – From maximum likelihood estimation(MLE) to Bayesian analysis in word-to-topic assignments. – Dirichlet is the easiest way! – Give a complete Bayesian analysis.
20
Evolution of Topic Model Analysis of LDA: – Small topic will disappear (even the central point text has a larger probability to be chose by a large nearby topic). K is self-adaption here. – Smoothing in topic-word distribution.
21
What About Short Text? Consider the following: – Lots of documents only have one meaningful word. – How many words is enough to be a topic? – Usually no ‘blue’ and ‘red’ co-occurred in a short text, but “blue plane” or “red car”. – ……
22
Evolution of Topic Model This is only some milestone in this evolution line. Small changes may give different results. – Text weight – General words – Probabilistic clustering – Hyperparameters – Context information – Hierarchy
23
Evolution of Topic Model You SHOULD implement ALL of them if you want to get a deep understand of topic model ! – I implemented all of them in both long and short text in my undergraduate. The code is easy and data is also easy to be obtained. – Check some topic (and their variation in iteration) and find why they work well or bad. – You will know more about each consideration in model inference and some derivation is not difficult in code.
24
Evolution of Topic Model You should know why some models are RIGHT rather than performs good in experiment. Otherwise you can’t know which model is RIGHT in your own problem (usually some features changed). Study the features of models, data and targets carefully. Use Occam's Razor to develop your model.
25
Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter Setting
26
Gibbs Sampling Gibbs sampling: – Key idea: if all the parameters are decided, then the decision for new things should be easy. – Choose one thing (e.g. one word’s topic etc.) – Fix all others. – Sample (not optimize) based on other. – Loop until converge.
27
Gibbs Sampling Pls read the paper carefully for the details. It is a easy-to-follow material for Gibbs in LDA.
28
Gibbs Sampling EM – Fix all parameters or settings – Compute the best(maximize likelihood) for all parameters or settings – Changed to the new setting – Loop until converge
29
Gibbs Sampling Either Gibbs or EM gives a best estimation! Exact best estimation is to calculate the expectation of each random variable consider all the possible situation(exponential), but NOT their optimized expectation in current status. But so far these are the best we can do. No good or bad for them in my personal view.
30
Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter Settings
31
Parameter Settings
34
Summary Bayesian Analysis: Prior-Observation Trade-off Dirichlet Distribution: Smoothing Method Topic Model Evolution: Why It Works Well Gibbs and EM: Variable Inference Methods Parameter Setting: How Many Topics, Words in a Topic
35
THANKS Q&AQ&AQ&AQ&A
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.