1 Tracking Dynamics of Topic Trends Using a Finite Mixture Model Satoshi Morinaga, Kenji Yamanishi KDD ’04.

Slides:

Advertisements

Similar presentations

Learning under concept drift: an overview Zhimin He iTechs – ISCAS

Advertisements

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.

K Means Clustering , Nearest Cluster and Gaussian Mixture

Kiyoshi Irie, Tomoaki Yoshida, and Masahiro Tomono 2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center.

Observers and Kalman Filters

黃文中 Preview 2 3 The Saliency Map is a topographically arranged map that represents visual saliency of a corresponding visual scene. 4.

Learning for Text Categorization

Pattern Recognition and Machine Learning

Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.

Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.

Face Recognition Using Embedded Hidden Markov Model.

Machine Learning CMPT 726 Simon Fraser University

Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction

1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.

Unsupervised Learning

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Discovering Outlier Filtering Rules from Unlabeled Data Author: Kenji Yamanishi & Jun-ichi Takeuchi Advisor: Dr. Hsu Graduate: Chia- Hsien Wu.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:

Isolated-Word Speech Recognition Using Hidden Markov Models

Gaussian Mixture Model and the EM algorithm in Speech Recognition

EM and expected complete log-likelihood Mixture of Experts

ArrayCluster: an analytic tool for clustering, data visualization and module ﬁnder on gene expression proﬁles 組員：李祥豪謝紹陽江建霖.

Text Classification, Active/Interactive learning.

Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.

Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)

Optimal Bayes Classification

1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.

Machines that Make Decisions Instructor: Edmondo Trentin

CHAPTER 5 SIGNAL SPACE ANALYSIS

Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05

Statistical Models for Automatic Speech Recognition Lukáš Burget.

Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining KDD’05, August 21–24, 2005, Chicago, Illinois, USA. Qiaozhu Mei.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Zhaoxia Fu, Yan Han Measurement Volume 45, Issue 4, May 2012, Pages 650–655 Reporter: Jing-Siang, Chen.

Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.

A Method to Approximate the Bayesian Posterior Distribution in Singular Learning Machines Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.

 DM-Group Meeting Liangzhe Chen, Oct Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa,

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Chapter 3: Maximum-Likelihood Parameter Estimation

Deep Feedforward Networks

Online Multiscale Dynamic Topic Models

(5) Notes on the Least Squares Estimate

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

LECTURE 10: DISCRIMINANT ANALYSIS

Overview of Supervised Learning

Igor V. Cadez, Padhraic Smyth, Geoff J. Mclachlan, Christine and E

Statistical Models for Automatic Speech Recognition

SMEM Algorithm for Mixture Models

Matching Words with Pictures

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

CHAPTER 15 SUMMARY Chapter Specifics

Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

LECTURE 09: DISCRIMINANT ANALYSIS

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

EM Algorithm 主講人：虞台文.

Applied Statistics and Probability for Engineers

Presentation transcript:

1 Tracking Dynamics of Topic Trends Using a Finite Mixture Model Satoshi Morinaga, Kenji Yamanishi KDD ’04

2 Introduction Wide range of business deal with text stream Discover topic trend Analyze topic dynamic in real time Topic : activity and develop event

3 Introduction Consider tasks in topic analysis Topic structure identification Topic emergence detection Topic characterization Topic structure modeled use finite mixture model and change of topic trend by learning it dynamically

4 Model W={w 1,w 2,…,w d } : vocabulary set of document x : document tf(w i ) : frequency of word w i in x idf(w i ) : idf value of w i idf(w i )=log(N/df(w i )) N: total number of texts for reference df(w i ) : frequency of texts that wi appear

5 Model a text : x= (tf(w 1 ),.., tf(w d )) or x= (tf-idf(w 1 ),.., tf-idf(w d )) K : # of different topics tf-idf(w i ) = tf(w i )* log(N/df(w i ))

6 Model Support a text only has one topic A text have i-th topic distributed according probability distribution with density : p i (x|θ i ) i=1,2..,k θ i : real-value parameter vector

7 Model x distributed according finite mixture distribution with K components : p(x|θ : K) = ∑ k i=1 π i p(x|θ i ) π i > 0, (i=1,2..,k) ∑ k i=1 π i =1 θ= (π 1, …, π k+1, …, θ 1, …, θ k ) π i : degree of i-th topic likely appear in text stream

8 Model p i (x|θ i ) : form Gaussian density d : dimension of each data p i (x|θ i ) = φ i (x|μ i,Σ i ) = μ i : d-dimensional real-valued vector Σ i : d*d dimensional matrix θ i = (μ i,Σ i )

9 Model a topic structure identified by # of components K (how many topics exist) weight vector (π 1, · · ·, π K ) indicating how likely each topic appears parameter valuesθ i (i = 1, · · ·,K) indicating how each topic distributed

10 Model Topic emergence detection : track change of main components in mixture model. Topic characterization : classify each text into the component for which the posterior is largest and then by extracting feature terms characterizing the classified texts. Topic drift : track changes of a parameter value θ i for each topic i.

11 Model

12 TOPIC STRUCTURE IDENTIFICATION WITH DISCOUNTING LEARNING Algorithm for learning topic structure Time-stamp based discounting topic learning algorithm basically design as variant of incremental EM algorithm

13 TOPIC STRUCTURE IDENTIFICATION WITH DISCOUNTING LEARNING Regard three feature: Adaptive to the change of the topic structure Making use of time stamps for texts Normalizing data of different dimensions

14 TOPIC STRUCTURE IDENTIFICATION WITH DISCOUNTING LEARNING λ : discounting parameter r i : posterior density of i-th component m : introduced for calculation of weights for old statistics

15

16 TOPIC EMERGENCE DETECTION WITH DYNAMIC MODEL SELECTION Selecting the optimal components in the mixture model dynamically ---- dynamic model selection dynamic model selection : learn a finite mixture model with a relatively large number of components select main components dynamically from among them on the basis of Rissanen’s predictive stochastic complexity

17 TOPIC EMERGENCE DETECTION WITH DYNAMIC MODEL SELECTION Initialization: Kmax : maximum number of mixture components W : window size Set initial values of

18 TOPIC EMERGENCE DETECTION WITH DYNAMIC MODEL SELECTION 1.Model Class Construction: G t i = (γ t−W i +· · ·+γ t i )/W k= 1, · · ·,Kmax window average of the posterior probability l 1, · · ·,l k : indices of k highest scores G (t−1) l1 ≥ · · · ≥ G (t−1) lk

19 TOPIC EMERGENCE DETECTION WITH DYNAMIC MODEL SELECTION Mixture model with k components : s = t − W, · · ·, t U : uniform distribution over data

20 TOPIC EMERGENCE DETECTION WITH DYNAMIC MODEL SELECTION 2.Predictive Stochastic Complexity Calculation: When t-th input data x t with dimension dt given 3.Model Selection: Select k ∗ t minimizing S (t) (k) Let be main components at time t 4. Estimation of Parameters

21 TOPIC CHARACTERIZATION WITH INFORMATION GAIN 4. Estimation of Parameters: Learn a finite mixture model with K max components using the time-stamp based discounting learning algorithm Let the estimated parameter be (π (t) 1, · · ·,π (t) Kmax, θ (t) 1, · · ·, θ (t) Kmax )

22 Conclusion

23 Thank you very much~