Ouyang Ruofei Topic Model Latent Dirichlet Allocation Ouyang Ruofei May LDA
Introduction Ouyang Ruofei LDA Parameters: 2 Inference: data = latent pattern + noise
Introduction Ouyang Ruofei LDA Parametric Model: 3 Nonparametric Model: Number of parameters is fixed w.r.t. sample size Number of parameters grows with sample size Infinite dimensional parameter space ProblemParameter Density Estimation Distributions RegressionFunctions ClusteringPartitions
Clustering Ouyang Ruofei LDA 4 1.Ironman2.Thor3.Hulk Indicator variable for each data point
Dirichlet process Ouyang Ruofei LDA 5 Ironman: 3 times Thor: 2 times Hulk: 2 times Without the likelihood, we know that: 1. There are three clusters 2. The distribution over three clusters New data
Dirichlet process Ouyang Ruofei LDA 6 Dirichlet distribution: pdf: mean: Example: Dir(Ironman,Thor,Hulk)
Dirichlet process Ouyang Ruofei LDA 7 Dirichlet distribution: Multinomial distribution: Conjugate prior Posterior: Example:IronmanThorHulkPrior322 Likelihood Posterior Pseudo count
Dirichlet process Ouyang Ruofei LDA 8 In our Avengers model, K=3 (Ironman, Thor, Hulk) Dirichlet process: However, this guy comes… Dirichlet distribution can’t model this stupid guy K = infinity Nonparametrics here mean infinite number of clusters
Dirichlet process Ouyang Ruofei LDA 9 α: Pseudo counts in each cluster G 0 : Base distribution of each cluster A distribution over distributions Dirichlet process: Given any partition Distribution template
Dirichlet process Ouyang Ruofei LDA 10 Construct Dirichlet process by CRP In a restaurant, there are infinite number of tables. Chinese restaurant process: Costumer 1 seats at an unoccupied table with p=1. Costumer N seats at table k with p=
Dirichlet process Ouyang Ruofei LDA 11
Dirichlet process Ouyang Ruofei LDA 12
Dirichlet process Ouyang Ruofei LDA 13
Dirichlet process Ouyang Ruofei LDA 14
Dirichlet process Ouyang Ruofei LDA 15 Customers : data Tables : clusters
Dirichlet process Ouyang Ruofei LDA 16 Train the model by Gibbs sampling
Dirichlet process Ouyang Ruofei LDA 17 Train the model by Gibbs sampling
Gibbs sampling Ouyang Ruofei LDA 18 Gibbs sampling is a MCMC method to obtain a sequence of observations from a multivariate distribution The intuition is to turn a multivariate problem into a sequence of univariate problem. Multivariate: Univariate: In Dirichlet process,
Gibbs sampling Ouyang Ruofei LDA 19 Gibbs sampling pseudo code:
Topic model Ouyang Ruofei LDA 20 Document Mixture of topics we can read words Latent variable But, topics words
Topic model Ouyang Ruofei LDA 21
Topic model Ouyang Ruofei LDA 22
Topic model Ouyang Ruofei LDA 23 word/topic counttopic/doc count topic of x ij observed word other topics other words
Topic model Ouyang Ruofei LDA 24 Apply Dirichlet process in topic model Topic 1 Topic 2 Topic 3 Document P1P1P1P1 P2P2P2P2 P3P3P3P3 Topic 1 Topic 2 Topic 3 Word Q1Q1Q1Q1 Q2Q2Q2Q2 Q3Q3Q3Q3 Learn the distribution of topics in a document Learn the distribution of topics for a word
Topic model Ouyang Ruofei LDA 25 t1t2t3 d1 t1t2t3 d2 t1t2t3 d3 w1w2w3w4 t1 t2 t3 topic/doc table word/topic table
Topic model Ouyang Ruofei LDA 26 Latent Dirichlet allocation: Dirichlet mixture model:
LDA Example Ouyang Ruofei LDA 27 w: ipad apple itunes mirror queen joker ladygaga t1: product t2: story t3: poker d1: ipad apple itunes d2: apple mirror queen d3: queen joker ladygaga d4: queen ladygaga mirror In fact, the topics are latent
LDA example Ouyang Ruofei LDA 28 d1: ipad apple itunes d2: apple mirror queen d3: queen joker ladygaga d4: queen ladygaga mirror ipadappleitunesmirrorqueenjokerladygaga t1112 t2212 t3111 sum t1t2t3 d1111 d2120 d3102 d
LDA example Ouyang Ruofei LDA 29 d1: ipad apple itunes d2: apple mirror queen d3: joker ladygaga d4: queen ladygaga mirror ipadappleitunesmirrorqueenjokerladygaga t1112 t2212 t3111 sum t1t2t3 d1111 d2120 d3102 d queen
LDA example Ouyang Ruofei LDA 30 d1: ipad apple itunes d2: apple mirror queen d3: joker ladygaga d4: queen ladygaga mirror ipadappleitunesmirrorqueenjokerladygaga t1112 t2212 t sum t1t2t3 d1111 d2120 d d queen
LDA example Ouyang Ruofei LDA 31 d1: ipad apple itunes d2: apple mirror queen d3: joker ladygaga d4: queen ladygaga mirror ipadappleitunesmirrorqueenjokerladygaga t1112 t2212 t3101 sum t1t2t3 d1111 d2120 d3101 d queen
LDA example Ouyang Ruofei LDA 32 d1: ipad apple itunes d2: apple mirror queen d3: joker ladygaga d4: queen ladygaga mirror ipadappleitunesmirrorqueenjokerladygaga t1112 t t3101 sum t1t2t3 d1111 d2120 d d queen 2
Further Ouyang Ruofei LDA 33 Dirichlet distribution prior: K topics Alpha mainly controls the probability of a topic with few training data in the document. Dirichlet process prior: infinite topics Beta mainly controls the probability of a topic with few training data in the words. Supervised Unsupervised
Further Ouyang Ruofei LDA 34 Unrealistic bag of words assumption Lose power law behavior TNG, biLDA Pitman Yor language model David Blei has done an extensive survey on topic model
Q&A Ouyang Ruofei LDA