Download presentation
Published byVanessa Holt Modified over 9 years ago
1
UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization
Jaegul Choo1*, Changhyun Lee1, Chandan K. Reddy2, and Haesun Park1 1Georgia Institute of Technology, 2Wayne State University *
2
Intro: Topic Modeling Document 1 Document 2 Document 3 Document 4
brain evolve dna genetic gene nerve neuron life organism
3
Topic: a distribution over keywords
Intro: Topic Modeling Document 1 Document 2 Document 3 Document 4 Topic 1 Topic 2 Topic 3 Topic: a distribution over keywords brain evolve dna genetic gene nerve neuron life organism
4
Intro: Topic Modeling Document : a distribution over topic
Topic: a distribution over keywords brain evolve dna genetic gene nerve neuron life organism
5
Latent Dirichlet Allocation (LDA) in Visual Analytics
LDA has been widely used in visual analytics. TIARA [Wei et al. KDD10], iVisClustering [Lee et al. EuroVis12], ParallelTopics [Dou et al. VAST12], TopicViz [Eisenstein et al. CHI-WIP12], … *Image courtesy of original papers.
6
Doc-induced topic creation Keyword-induced topic creation
Overview of Our Work Proposes nonnegative matrix factorization (NMF) for topic modeling. Highlights advantages of NMF over LDA in visual analytics. Presents UTOPIAN, an NMF-based interactive topic modeling system. Topic merging Topic splitting Doc-induced topic creation Keyword-induced topic creation
7
What is Nonnegative Matrix Factorization?
8
Nonnegative Matrix Factorization (NMF)
Lower-rank approximation with nonnegativity constraints Why nonnegativity? Easy interpretation and semantically meaningful output Algorithm Alternating nonnegativity-constrained least squares [Kim et al., 2008] H ~ = min || A – WH ||F W>=0, H>=0 A W Mention document vector
9
NMF as Topic Modeling ~ = Document : a distribution over topic
W W H H ~ = A Document 1 Document 2 Document 3 Document 4 Document : a distribution over topic Topic 1 Topic 2 Topic 3 Topic: a distribution over keywords brain evolve dna genetic gene nerve neuron life organism
10
Why NMF in Visual Analytics?
11
Advantages of NMF in Visual Analytics
Reliable algorithmic behaviors Flexible support for user interactions
12
NMF vs. LDA Consistency from Multiple Runs
Documents’ topical membership changes among 10 runs InfoVis/VAST paper data set 20 newsgroup data set
13
NMF vs. LDA Empirical Convergence
Documents’ topical membership changes between iterations InfoVis/VAST paper data set 48 seconds 10 minutes NMF LDA
14
NMF vs. LDA Topic Summary (Top Keywords)
InfoVis/VAST paper data set Topics are more consistent in NMF than in LDA. Topic quality is comparable between NMF and LDA. Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 Topic 7 NMF Run #1 visualization design information user analysis system graph layout visual analytics data sets color weaving Run #2 LDA documents similarities knowledge edge query collaborative social tree measures multivariate animation dimensions treemap analysts scatterplot spatial text multidimensional, high aggregation
15
Advantages of NMF in Visual Analytics
Reliable algorithmic behaviors Flexible support for user interactions
16
Weakly Supervised NMF [Choo et al., DMKD, accepted with rev.]
min ||A – WH ||F2 + α||(W – Wr)MW ||F2 + β||MH(H – DHHr) ||F2 W>=0, H>=0 Wr, Hr : reference matrices for W and H MW, MH : diagonal matrices for weighting/masking columns/rows of W and H Provides flexible yet intuitive means for user interaction. Maintains the same computational complexity as original NMF.
17
UTOPIAN: User-Driven Topic Modeling Based on Interactive NMF
Topic merging Topic splitting Doc-induced topic creation Keyword-induced topic creation
18
Doc-induced topic creation Keyword-induced topic creation
UTOPIAN Overview Supervised t-distributed stochastic neighbor embedding (t-SNE) User interactions supported Keyword refinement Topic merging/splitting Keyword-/document-induced topic creation Real-time interaction via PIVE (Per-Iteration Visualization Environment) Topic merging Topic splitting Doc-induced topic creation Keyword-induced topic creation Just like In-Spire, documents are represented as dots, and their colors represent their topic cluster membership.
19
Supervised t-SNE Original t-SNE
Documents are often too noisy to work with. Supervised t-SNE d(xi, xj) ← α•d(xi, xj) if xi and xj belongs to the same topic cluster.
20
PIVE (Per-Iteration Visualization Environment) for Real-time Interaction [Choo et al., under revision] Standard approach PIVE approach
21
Demo Video http://tinyurl.com/UTOPIAN2013
22
Usage Scenario: Hyundai Genesis Review Data
Initial result After interaction
23
Summary Presented UTOPIAN, a User-Driven Topic Modeling based on Interactive NMF. Highlighted the advantages of NMF over LDA in visual analytics. Reliable algorithmic behaviors Consistency from multiple runs Early empirical convergence Flexible support for user interactions Keyword refinement Topic merging/splitting Keyword-/document-induced topic creation
24
More in the paper & On-going Work
A general taxonomy of user interactions with computational methods Keyword-based vs. document-based Template-based vs. from-scratch-based Algorithmic details about supported user interactions Implementation details More usage scenarios On-going Work Scaling up the system with parallel distributed NMF
25
Thank you! http://tinyurl.com/UTOPIAN2013
Jaegul Choo Topic merging Topic splitting Doc-induced topic creation Keyword-induced topic creation For more details, please find me at ‘Meet the Candidate’ A601+ A602, 6PM today
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.