1 Tracking Dynamics of Topic Trends Using a Finite Mixture Model Satoshi Morinaga, Kenji Yamanishi KDD ’04.

1 Tracking Dynamics of Topic Trends Using a Finite Mixture Model Satoshi Morinaga, Kenji Yamanishi KDD ’04

2 Introduction Wide range of business deal with text stream Discover topic trend Analyze topic dynamic in real time Topic : activity and develop event

3 Introduction Consider tasks in topic analysis Topic structure identification Topic emergence detection Topic characterization Topic structure modeled use finite mixture model and change of topic trend by learning it dynamically

4 Model W={w 1,w 2,…,w d } : vocabulary set of document x : document tf(w i ) : frequency of word w i in x idf(w i ) : idf value of w i idf(w i )=log(N/df(w i )) N: total number of texts for reference df(w i ) : frequency of texts that wi appear

5 Model a text : x= (tf(w 1 ),.., tf(w d )) or x= (tf-idf(w 1 ),.., tf-idf(w d )) K : # of different topics tf-idf(w i ) = tf(w i )* log(N/df(w i ))

6 Model Support a text only has one topic A text have i-th topic distributed according probability distribution with density : p i (x|θ i ) i=1,2..,k θ i : real-value parameter vector

7 Model x distributed according finite mixture distribution with K components : p(x|θ : K) = ∑ k i=1 π i p(x|θ i ) π i > 0, (i=1,2..,k) ∑ k i=1 π i =1 θ= (π 1, …, π k+1, …, θ 1, …, θ k ) π i : degree of i-th topic likely appear in text stream

8 Model p i (x|θ i ) : form Gaussian density d : dimension of each data p i (x|θ i ) = φ i (x|μ i,Σ i ) = μ i : d-dimensional real-valued vector Σ i : d*d dimensional matrix θ i = (μ i,Σ i )

9 Model a topic structure identified by # of components K (how many topics exist) weight vector (π 1, · · ·, π K ) indicating how likely each topic appears parameter valuesθ i (i = 1, · · ·,K) indicating how each topic distributed

10 Model Topic emergence detection : track change of main components in mixture model. Topic characterization : classify each text into the component for which the posterior is largest and then by extracting feature terms characterizing the classified texts. Topic drift : track changes of a parameter value θ i for each topic i.

11 Model

12 TOPIC STRUCTURE IDENTIFICATION WITH DISCOUNTING LEARNING Algorithm for learning topic structure Time-stamp based discounting topic learning algorithm basically design as variant of incremental EM algorithm

13 TOPIC STRUCTURE IDENTIFICATION WITH DISCOUNTING LEARNING Regard three feature: Adaptive to the change of the topic structure Making use of time stamps for texts Normalizing data of different dimensions

14 TOPIC STRUCTURE IDENTIFICATION WITH DISCOUNTING LEARNING λ : discounting parameter r i : posterior density of i-th component m : introduced for calculation of weights for old statistics

16 TOPIC EMERGENCE DETECTION WITH DYNAMIC MODEL SELECTION Selecting the optimal components in the mixture model dynamically ---- dynamic model selection dynamic model selection : learn a finite mixture model with a relatively large number of components select main components dynamically from among them on the basis of Rissanen’s predictive stochastic complexity

17 TOPIC EMERGENCE DETECTION WITH DYNAMIC MODEL SELECTION Initialization: Kmax : maximum number of mixture components W : window size Set initial values of

18 TOPIC EMERGENCE DETECTION WITH DYNAMIC MODEL SELECTION 1.Model Class Construction: G t i = (γ t−W i +· · ·+γ t i )/W k= 1, · · ·,Kmax window average of the posterior probability l 1, · · ·,l k : indices of k highest scores G (t−1) l1 ≥ · · · ≥ G (t−1) lk

19 TOPIC EMERGENCE DETECTION WITH DYNAMIC MODEL SELECTION Mixture model with k components : s = t − W, · · ·, t U : uniform distribution over data

20 TOPIC EMERGENCE DETECTION WITH DYNAMIC MODEL SELECTION 2.Predictive Stochastic Complexity Calculation: When t-th input data x t with dimension dt given 3.Model Selection: Select k ∗ t minimizing S (t) (k) Let be main components at time t 4. Estimation of Parameters

21 TOPIC CHARACTERIZATION WITH INFORMATION GAIN 4. Estimation of Parameters: Learn a finite mixture model with K max components using the time-stamp based discounting learning algorithm Let the estimated parameter be (π (t) 1, · · ·,π (t) Kmax, θ (t) 1, · · ·, θ (t) Kmax )

22 Conclusion

23 Thank you very much~

1 Tracking Dynamics of Topic Trends Using a Finite Mixture Model Satoshi Morinaga, Kenji Yamanishi KDD ’04.

Similar presentations

Presentation on theme: "1 Tracking Dynamics of Topic Trends Using a Finite Mixture Model Satoshi Morinaga, Kenji Yamanishi KDD ’04."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Tracking Dynamics of Topic Trends Using a Finite Mixture Model Satoshi Morinaga, Kenji Yamanishi KDD ’04.

Similar presentations

Presentation on theme: "1 Tracking Dynamics of Topic Trends Using a Finite Mixture Model Satoshi Morinaga, Kenji Yamanishi KDD ’04."— Presentation transcript:

Similar presentations

About project

Feedback