Download presentation
Presentation is loading. Please wait.
Published byHortense James Modified over 8 years ago
1
1 Tracking Dynamics of Topic Trends Using a Finite Mixture Model Satoshi Morinaga, Kenji Yamanishi KDD ’04
2
2 Introduction Wide range of business deal with text stream Discover topic trend Analyze topic dynamic in real time Topic : activity and develop event
3
3 Introduction Consider tasks in topic analysis Topic structure identification Topic emergence detection Topic characterization Topic structure modeled use finite mixture model and change of topic trend by learning it dynamically
4
4 Model W={w 1,w 2,…,w d } : vocabulary set of document x : document tf(w i ) : frequency of word w i in x idf(w i ) : idf value of w i idf(w i )=log(N/df(w i )) N: total number of texts for reference df(w i ) : frequency of texts that wi appear
5
5 Model a text : x= (tf(w 1 ),.., tf(w d )) or x= (tf-idf(w 1 ),.., tf-idf(w d )) K : # of different topics tf-idf(w i ) = tf(w i )* log(N/df(w i ))
6
6 Model Support a text only has one topic A text have i-th topic distributed according probability distribution with density : p i (x|θ i ) i=1,2..,k θ i : real-value parameter vector
7
7 Model x distributed according finite mixture distribution with K components : p(x|θ : K) = ∑ k i=1 π i p(x|θ i ) π i > 0, (i=1,2..,k) ∑ k i=1 π i =1 θ= (π 1, …, π k+1, …, θ 1, …, θ k ) π i : degree of i-th topic likely appear in text stream
8
8 Model p i (x|θ i ) : form Gaussian density d : dimension of each data p i (x|θ i ) = φ i (x|μ i,Σ i ) = μ i : d-dimensional real-valued vector Σ i : d*d dimensional matrix θ i = (μ i,Σ i )
9
9 Model a topic structure identified by # of components K (how many topics exist) weight vector (π 1, · · ·, π K ) indicating how likely each topic appears parameter valuesθ i (i = 1, · · ·,K) indicating how each topic distributed
10
10 Model Topic emergence detection : track change of main components in mixture model. Topic characterization : classify each text into the component for which the posterior is largest and then by extracting feature terms characterizing the classified texts. Topic drift : track changes of a parameter value θ i for each topic i.
11
11 Model
12
12 TOPIC STRUCTURE IDENTIFICATION WITH DISCOUNTING LEARNING Algorithm for learning topic structure Time-stamp based discounting topic learning algorithm basically design as variant of incremental EM algorithm
13
13 TOPIC STRUCTURE IDENTIFICATION WITH DISCOUNTING LEARNING Regard three feature: Adaptive to the change of the topic structure Making use of time stamps for texts Normalizing data of different dimensions
14
14 TOPIC STRUCTURE IDENTIFICATION WITH DISCOUNTING LEARNING λ : discounting parameter r i : posterior density of i-th component m : introduced for calculation of weights for old statistics
15
15
16
16 TOPIC EMERGENCE DETECTION WITH DYNAMIC MODEL SELECTION Selecting the optimal components in the mixture model dynamically ---- dynamic model selection dynamic model selection : learn a finite mixture model with a relatively large number of components select main components dynamically from among them on the basis of Rissanen’s predictive stochastic complexity
17
17 TOPIC EMERGENCE DETECTION WITH DYNAMIC MODEL SELECTION Initialization: Kmax : maximum number of mixture components W : window size Set initial values of
18
18 TOPIC EMERGENCE DETECTION WITH DYNAMIC MODEL SELECTION 1.Model Class Construction: G t i = (γ t−W i +· · ·+γ t i )/W k= 1, · · ·,Kmax window average of the posterior probability l 1, · · ·,l k : indices of k highest scores G (t−1) l1 ≥ · · · ≥ G (t−1) lk
19
19 TOPIC EMERGENCE DETECTION WITH DYNAMIC MODEL SELECTION Mixture model with k components : s = t − W, · · ·, t U : uniform distribution over data
20
20 TOPIC EMERGENCE DETECTION WITH DYNAMIC MODEL SELECTION 2.Predictive Stochastic Complexity Calculation: When t-th input data x t with dimension dt given 3.Model Selection: Select k ∗ t minimizing S (t) (k) Let be main components at time t 4. Estimation of Parameters
21
21 TOPIC CHARACTERIZATION WITH INFORMATION GAIN 4. Estimation of Parameters: Learn a finite mixture model with K max components using the time-stamp based discounting learning algorithm Let the estimated parameter be (π (t) 1, · · ·,π (t) Kmax, θ (t) 1, · · ·, θ (t) Kmax )
22
22 Conclusion
23
23 Thank you very much~
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.