Context Analysis in Text Mining and Search Qiaozhu Mei Department of Computer Science University of Illinois at Urbana-Champaign 1 Joint work with ChengXiang Zhai 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
Motivating Example: Personalized Search 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 2 Mountain safety research Metropolis Street Racer Molten salt reactor Mars Sample Return Magnetic Stripe Reader … MSR Actually Looking for Microsoft Research…
University of Illinois at Urbana-Champaign 3 Motivating Example: Comparing Product Reviews Common Themes“IBM” specific“APPLE” specific“DELL” specific Battery LifeLong, 4-3 hrsMedium, 3-2 hrsShort, 2-1 hrs Hard diskLarge, GBSmall, 5-10 GBMedium, GB SpeedSlow, MhzVery Fast, 3-4 GhzModerate, 1-2 Ghz IBM Laptop Reviews APPLE Laptop Reviews DELL Laptop Reviews Unsupervised discovery of common topics and their variations 2008 © Qiaozhu Mei
University of Illinois at Urbana-Champaign 4 Motivating Example: Discovering Topical Trends in Literature Unsupervised discovery of topics and their temporal variations Topic Strength Time TF-IDF Retrieval IR Applications Language Model Text Categorization SIGIR topics
2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 5 Motivating Example: Analyzing Spatial Topic Patterns How do bloggers in different states respond to topics such as “oil price increase during Hurricane Karina”? Unsupervised discovery of topics and their variations in different locations
2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 6 Motivating Example: Summarizing Sentiments Unsupervised/Semi-supervised discovery of topics and different sentiments of the topics time strength Positive Negative Topic-sentiment dynamics (Topic = Price ) Neutral Query: Dell Laptops Topic-sentiment summary positivenegative Facet 2 (Battery) Facet 1 (Price) neutral my Dell battery sucks Stupid Dell laptop battery One thing I really like about this Dell battery is the Express Charge feature. i still want a free battery from dell.. …… it is the best site and they show Dell coupon code as early as possible Even though Dell's price is cheaper, we still don't want it. …… mac pro vs. dell precision: a price comparis.. DELL is trading at $24.66
Motivating Example: Analyzing Topics on a Social Network 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 7 Publications of Gerard Salton Publications of Bruce Croft Unsupervised discovery of topics and correlated research communities Data mining Machine learning Information retrieval Bruce Croft Gerard Salton
2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 8 Research Questions What are these problems in common? Can we model all these problems generally? Can we solve these problems with a unified approach? How can we bring human into the loop?
2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 9 Rest of Talk Background: Language Models in Text Mining and Retrieval Definition of context General methodology to model context –Models, example applications, results Conclusion and Discussion
Generative Models of Text Text as observations: words; tags; links, etc Use a unified probabilistic model to explain the appearance (generation) of observations Documents are generated by sampling every observation from such a generative model Different generation assumption different model –Document Language Models –Probabilistic Topic Models: PLSA, LDA, etc. –Hidden Markov Models … 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 10
Multinomial Language Models 11 Known as a Topic model when there are k of them in text: A multinomial distribution of words as a text representation retrieval 0.2 information 0.15 model 0.08 query 0.07 language 0.06 feedback 0.03 …… e.g., semi-supervised learning; boosting; spectral clustering, etc © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
Language Models in Information Retrieval (e.g., KL-Div. Method) 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 12 Document d A text mining paper data mining Doc Language Model (LM) θ d : p(w| d ) text 4/100=0.04 mining 3/100=0.03 clustering 1/100=0.01 … data = 0 computing = 0 … Query q Data ½=0.5 Mining ½=0.5 Query Language Model θ q : p(w| q ) Data ½=0.4 Mining ½=0.4 Clustering =0.1 … ? p(w| q’ ) text =0.039 mining =0.028 clustering =0.01 … data = computing = … Similarity function Smoothed Doc LM θ d' : p(w| d’ )
13 Probabilistic Topic Models for Text Mining Text Collections Probabilistic Topic Modeling … web 0.21 search 0.10 link 0.08 graph 0.05 … … term 0.16 relevance 0.08 weight 0.07 feedback 0.04 independ model 0.03 … Topic models (Multinomial distributions) PLSA [Hofmann 99] LDA [Blei et al. 03] Author-Topic [Steyvers et al. 04] CPLSA [Mei & Zhai 06] … Pachinko allocation [Li & McCallum 06] CTM [Blei et al. 06] 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign Subtopic discovery Opinion comparison Summarization Topical pattern analysis … Passage segmentation
Importance of Context 14 Science in the year 2000 and Science in the year 1500: Are we still working on the same topics? For a computer scientist and a gardener: Does “tree, root, prune” mean the same? “Football” means soccer in Europe. What about in US? Context affects topics! 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 15 Context Features of Text (Meta-data) Weblog Article Author Author’s Occupation Location Time communities source
2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 16 Context = Partitioning of Text …… papers written in 1998 WWWSIGIRACLKDDSIGMOD papers written by authors in US Papers about Web
Rich Context Information in Text News articles: time, publisher, etc. Blogs: time, location, author, … Scientific Literature: author, publication year, conference, citations, … Query Logs: time, IP address, user, clicks, … Customer reviews: product, source, time, sentiments.. s: sender, receiver, time, thread, … Web pages: domain, time, click rate, etc. More? entity-relations, social networks, …… © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
Categories of Context Some partitions of text are explicit explicit context –Time; location; author; conference; user; IP; etc –Similar to metadata Some partitions are implicit implicit context –Sentiments; missions; goals; intents; Some partitions are at document level Some are at a finer granularity –Context of a word; an entity; a pattern; a query, etc. –Sentences; sliding windows; adjacent words; etc 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 18
Context Analysis Use context to infer semantics –Annotating frequent patterns; labeling of topic models Use context to provide targeted service –Personalized search; intent-based search; etc. Compare contextual patterns of topics –Evolutionary topic patterns; spatiotemporal topic patterns; topic-sentiment patterns; etc. Use context to help other tasks –Social network analysis; impact summarization; etc © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 19
General Methodology to Model Context Context Generative Model –Observations in the same context are generated with a unified model –Observations in different contexts are generated with different models –Observations in similar contexts are generated with similar models Text is generated with a mixture of such generative models –Example Task; Model; Sample results 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 20
Model a unique context with a unified model (Generation) 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 21
Probabilistic Latent Semantic Analysis (Hofmann ’99) 22 A Document d Topics θ 1…k government donation New Orleans government 0.3 response donate 0.1 relief 0.05 help city 0.2 new 0.1 orleans π d : P(θ i |d) government donate new Draw a word from i response aid help Orleans Criticism of government response to the hurricane primarily consisted of criticism of its response to … The total shut-in oil production from the Gulf of Mexico … approximately 24% of the annual production and the shut- in gas production … Over seventy countries pledged monetary donations or other assistance. … Choose a topic 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign N D W d,n θkθk Z d,n πdπd K πdπd θkθk P(w|θ j ) Documents about “Hurricane Katrina”
Latent Dirichlet Allocation (Blei ‘03) PLSA: –no natural way to assign probability to a unseen document. –Number of parameters grow linearly with size of training set overfits data. –Not a fully generative model. LDA solves these problems –But need to inference p(topic|d) and p(w|topic) –Parameter estimation using Gibbs Sampling or variational inference © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
Example: Topics in Science (D. Blei 05) © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
25 Label a Multinomial Topic Model Semantically close (relevance) Understandable – phrases? High coverage inside topic Discriminative across topics term relevance weight feedback independence model frequent probabilistic document … iPod Nano Pseudo-feedback Information Retrieval Retrieval models じょうほうけんさく – Mei and Zhai 06: a topic in SIGIR 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
26 Automatic Labeling of Topics Collection (e.g., SIGIR) term 0.16 relevance 0.07 weight 0.07 feedback 0.04 independence 0.03 model 0.03 … filtering 0.21 collaborative 0.15 … trec 0.18 evaluation 0.10 … NLP Chunker Ngram Stat. information retrieval, retrieval model, index structure, relevance feedback, … Candidate label pool 1 Relevance Score Information retrieval 0.26 retrieval models 0.19 IR models 0.17 pseudo feedback 0.06 …… 2 Discrimination 3 information retriev retrieval models 0.20 IR models 0.18 pseudo feedback 0.09 …… 4 Coverage retrieval models 0.20 IR models pseudo feedback 0.09 …… information retrieval © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
27 Clustering hash dimension algorithm partition … p(w | clustering algorithm ) Good Label ( l 1 ) “clustering algorithm” Clustering hash dimension key algorithm … p(w | hash join ) key …hash join … code …hash table …search …hash join… map key…hash …algorithm…key …hash…key table…join… l 2 : “hash join” Label Relevance: Context Comparison Intuition: expect the label with similar context (distribution) Clustering dimension partition algorithm hash Topic … P(w| ) Score (l, ) = D( || l ) 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
28 Results: Sample Topic Labels tree 0.09 trees 0.08 spatial 0.08 b 0.05 r 0.04 disk 0.02 array 0.01 cache 0.01 north 0.02 case 0.01 trial 0.01 iran 0.01 documents 0.01 walsh reagan charges the, of, a, and, to, data, > 0.02 … clustering 0.02 time 0.01 clusters 0.01 databases 0.01 large 0.01 performance 0.01 quality clustering algorithm clustering structure … large data, data quality, high data, data application, … iran contra … r tree b tree … indexing methods 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
Model different contexts with different models (Discrimination, Comparison) 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 29
Example: Finding Evolutionary Patterns of Topics 30 T SVM criteria classifica – tion linear … decision tree classifier class Bayes … Classifica - tion text unlabeled document labeled learning … Informa - tion web social retrieval distance networks … ………… 1999 … web classifica – tion features0.006 topic … mixture random cluster clustering variables … topic mixture LDA semantic … … KDD 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign Content Variations over Contexts
Example: Finding Evolutionary Patterns of Topics (II) 31 Figure from (Mei ‘05) 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign Strength Variations over Contexts
View of Topics: Context-Specific Version of Views 32 One context one view A document selects from a mix of views language model smoothing query generation feedback mixture estimate EM model pseudo vector Rocchio weighting feedback term vector space TF-IDF Okapi LSI retrieval Context 1: 1998 ~ 2006 (e.g. After “Language Modeling”) Context 2: 1977 ~ 1998 (i.e. Before “Language Modeling”) feedback judge expansion pseudo query Topic 2: Feedback Topic 1: Retrieval Model retrieve model relevance documen t query 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
Coverage of Topics: Distribution over Topics 33 Background A coverage of topics: a (strength) distribution over the topics. One context one coverage A document selects from a mix of multiple coverages. Oil Price Government Response Aid and donation Criticism of government response to the hurricane primarily consisted of criticism of its response to … The total shut-in oil production from the Gulf of Mexico … approximately 24% of the annual production and the shut-in gas production … Over seventy countries pledged monetary donations or other assistance. … Background Oil Price Government Response Aid and donation Context: Texas Context: Louisiana 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 34 A General Solution: CPLSA CPLAS = Contextual Probabilistic Latent Semantic Analysis An extension of PLSA model ([Hofmann 99]) by –Introducing context variables –Modeling views of topics –Modeling coverage variations of topics Process of contextual text mining –Instantiation of CPLSA (context, views, coverage) –Fit the model to text data (EM algorithm) –Compare a topic from different views –Compute strength dynamics of topics from coverages –Compute other probabilistic topic patterns
The “Generation” Process 35 View1View2View3 TexasJuly 2005 sociolo gist Context of Document: Time = July 2005 Location = Texas Author = Eric Brill Occup. = Sociologist Age = 45+ … Topics government donation New Orleans government 0.3 response donate 0.1 relief 0.05 help city 0.2 new 0.1 orleans Choose a view Choose a Coverage government donate new Draw a word from i response aid help Orleans Criticism of government response to the hurricane primarily consisted of criticism of its response to … The total shut-in oil production from the Gulf of Mexico … approximately 24% of the annual production and the shut- in gas production … Over seventy countries pledged monetary donations or other assistance. … Choose a theme Topic coverages: Texas July 2005 document …… sociologist 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
An Intuitive Example Two topics: web search; machine learning I am writing a WWW paper. I will cover more about “web search” instead of “machine learning”. –But of course I have my own taste. I am from a search engine company, so when I write about “web search”, I will focus on “search engine” and “online advertisements”… 36 Coverage donate 0.1 relief 0.05 help city 0.2 new 0.1 orleans View 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
The Probabilistic Model 37 A probabilistic model explaining the generation of a document D and its context features C: if an author wants to write such a document, he will –Choose a view v i according to the view distribution –Choose a coverage к j according to the coverage distribution. –Choose a theme according to the coverage к j. –Generate a word using. –The likelihood of the document collection is: 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
38 Example results: Query Log Analysis Context = Days of week Query & Clicks: more query/clicks on weekdays Search Difficulty: more difficult to predict on weekends
39 Business Queries: clear day- week pattern; weekdays more frequent than weekends Consumer Queries: no clear day-week pattern; weekends are comparable, even more frequent than weekdays Query Log Analysis Context = Type of Query
Bursting Topics in SIGMOD: Context = Time (Years) © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
Spatiotemporal Text Mining: Context = Time & Location 41 Week4: The theme is again strong along the east coast and the Gulf of Mexico Week3: The theme distributes more uniformly over the states Week2: The discussion moves towards the north and west Week5: The theme fades out in most states Week1: The theme is the strongest along the Gulf of Mexico About Government Response in Hurricane Katrina 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
Faceted Opinions Context = Sentiments NeutralPositiveNegative Topic 1: Movie... Ron Howards selection of Tom Hanks to play Robert Langdon. Tom Hanks stars in the movie,who can be mad at that? But the movie might get delayed, and even killed off if he loses. Directed by: Ron Howard Writing credits: Akiva Goldsman... Tom Hanks, who is my favorite movie star act the leading role. protesting... will lose your faith by... watching the movie. After watching the movie I went online and some research on... Anybody is interested in it?... so sick of people making such a big deal about a FICTION book and movie. Topic 2: Book I remembered when i first read the book, I finished the book in two days. Awesome book.... so sick of people making such a big deal about a FICTION book and movie. I’m reading “Da Vinci Code” now. … So still a good book to past time. This controversy book cause lots conflict in west society © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
Sentiment Dynamics Context = Time & Sentiments 43 Facet: the book “ the da vinci code”. ( Bursts during the movie, Pos > Neg ) Facet: the impact on religious beliefs. ( Bursts during the movie, Neg > Pos ) 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign “ the da vinci code”
2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 44 Event Impact Analysis: IR Research vector concept extend model space boolean function feedback … xml model collect judgment rank subtopic … probabilist model logic ir boolean algebra estimate weight … model language estimate parameter distribution probable smooth markov likelihood … 1998 Publication of the paper “A language modeling approach to information retrieval” Starting of the TREC conferences year 1992 term relevance weight feedback independence model frequent probabilistic document … Theme: retrieval models SIGIR papers
Model similar context with similar models (Smoothing, Regularization) 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 45
46 Personalization with Backoff Ambiguous query: MSG –Madison Square Garden –Monosodium Glutamate Disambiguate based on user’s prior clicks We don’t have enough data for everyone! –Backoff to classes of users Proof of Concept: –Classes defined by IP addresses Better: –Market Segmentation (Demographics) –Collaborative Filtering (Other users who click like me)
Context = IP 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign * *.* 156.*.*.* *.*.*.* Full personalization: every context has a different model: sparse data! No personalization: all contexts share the same model Personalization with backoff: similar contexts have similar models
48 Backing Off by IP λs estimated with EM and CV A little bit of personalization –Better than too much –Or too little λ 4 : weights for first 4 bytes of IP λ 3 : weights for first 3 bytes of IP λ 2 : weights for first 2 bytes of IP …… Sparse DataMissed Opportunity
Social Network as Correlated Contexts 49 Optimization of Relevance Feedback Weights Parallel Architecture in IR... Predicting query performance … A Language Modeling Approach to Information Retrieval © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign Linked contexts are similar to each other
Social Network Context for Topic Modeling 50 Context = author Coauthor = similar contexts Intuition: I work on similar topics to my neighbors Smoothed Topic distributions over context e.g. coauthor network 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
Topic Modeling with Network Regularization (NetPLSA) 51 Basic Assumption (e.g., co-author graph) Related authors work on similar topics PLSA Graph Harmonic Regularizer, Generalization of [Zhu ’03], importance (weight) of an edge difference of topic distribution on neighbor vertices tradeoff between topic and smoothness topic distribution of a document 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
Topical Communities with PLSA Topic 1Topic 2Topic 3Topic 4 term 0.02 peer 0.02 visual 0.02 interface 0.02 question 0.02 patterns 0.01 analog 0.02 towards 0.02 protein 0.01 mining 0.01 neurons 0.02 browsing 0.02 training 0.01 clusters 0.01 vlsi 0.01 xml 0.01 weighting 0.01 stream 0.01 motion 0.01 generation 0.01 multiple 0.01 frequent 0.01 chip 0.01 design 0.01 recognition 0.01 e 0.01 natural 0.01 engine 0.01 relations 0.01 page 0.01 cortex 0.01 service 0.01 library 0.01 gene 0.01 spike 0.01 social ? ? ? ? Noisy community assignment 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
Topical Communities with NetPLSA 53 Topic 1Topic 2Topic 3Topic 4 retrieval 0.13 mining 0.11 neural 0.06 web 0.05 information 0.05 data 0.06 learning 0.02 services 0.03 document 0.03 discovery 0.03 networks 0.02 semantic 0.03 query 0.03 databases 0.02 recognition 0.02 services 0.03 text 0.03 rules 0.02 analog 0.01 peer 0.02 search 0.03 association 0.02 vlsi 0.01 ontologies 0.02 evaluation 0.02 patterns 0.02 neurons 0.01 rdf 0.02 user 0.02 frequent 0.01 gaussian 0.01 management 0.01 relevance 0.02 streams 0.01 network 0.01 ontology 0.01 Information Retrieval Data mining Machine learning Web Coherent community assignment 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
Smoothed Topic Map 54 Map a topic on the network (e.g., using p(θ|a)) PLSA (Topic : “information retrieval”) NetPLSA Core contributors Irrelevant Intermediate 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
Smoothed Topic Map 55 The Windy States -Blog articles: “weather” -US states network: -Topic: “windy” PLSA NetPLSA Real reference 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
2007 © ChengXiang ZhaiLLNL, Aug 15, Related Work Specific Contextual Text Mining Problems –Multi-collection Comparative Mining (e.g., [Zhai et al. 04]) –Temporal theme pattern (e.g., [Mei et al. 05], [Blei et al. 06], [Wang et al. 06]) –Spatiotemporal theme analysis (e.g., [Mei et al. 06], [Wang et al. 07]) –Author-topic analysis (e.g., [Steyvers et al. 04], [Zhou et al 06]) –… Probabilistic topic models: –Probabilistic latent semantic analysis (PLSA) (e.g. [Hofmann 99]) –Latent Dirichlet allocation (LDA) (e.g., [Blei et al. 03]) –Many extensions (e.g., [Blei et al. 05], [Li and McCallum 06])
Conclusions Context analysis in text mining and search General methodology to model context in text –A unified generative model for observations in the same context –Different models for different context –Similar models for similar contexts –Generation discrimination smoothing Many applications 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 57
Discussion: Context in Search Not all contexts are useful –E.g. personalized search v.s. search by time of day –How can we know which contexts are more useful? Many contexts are useful –E.g., personalized search; task-based search; localized search; –How can we combine them? Can we do better than market segmentations? –Backoff to users who search like me – Collaborative Search –But who searches like you? 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign 58
2007 © ChengXiang ZhaiLLNL, Aug 15, References CPLSA –Q. Mei, C. Zhai. A Mixture Model for Contextual Text Mining, In Proceedings of KDD' 06. NetPLSA –Q. Mei, D. Cai, D. Zhang, C. Zhai, Topic Modeling with Network Reguarization, Proceedings of WWW’ 08 Labeling –Q. Mei, X.Shen, C. Zhai, Automatic Labeling of Multinomial Topic Models, Proceedings KDD'07 Personalization: –Q.Mei, K.Church, Entropy of Search Logs: How Hard is Search? With Personalization? With Backoff? In Proceedings of WSDM’08. Applications: –Q. Mei, C. Zhai, Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining, In Proceedings KDD' 05 –Q. Mei, C. Liu, H. Su, and C. Zhai, A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs, In Proceedings of WWW' 06 –Q. Mei, X. Ling, M. Wondra, H. Su, C. Zhai, Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs, Proceedings of WWW’ 07
2007 © ChengXiang ZhaiLLNL, Aug 15, The End Thank You!
Experiments Bibliography data and coauthor networks –DBLP: text = titles; network = coauthors –Four conferences (expect 4 topics): SIGIR, KDD, NIPS, WWW Blog articles and Geographic network –Blogs from spaces.live.com containing topical words, e.g. “weather” –Network: US states (adjacent states) © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign
Coherent Topical Communities 62 Semantics of community: “Data Mining (KDD) ” NetPLSA mining 0.11 data 0.06 discovery 0.03 databases 0.02 rules 0.02 association 0.02 patterns 0.02 frequent 0.01 streams 0.01 PLSA peer 0.02 patterns 0.01 mining 0.01 clusters 0.01 stream 0.01 frequent 0.01 e 0.01 page 0.01 gene 0.01 PLSA visual 0.02 analog 0.02 neurons 0.02 vlsi 0.01 motion 0.01 chip 0.01 natural 0.01 cortex 0.01 spike 0.01 NetPLSA neural 0.06 learning 0.02 networks 0.02 recognition 0.02 analog 0.01 vlsi 0.01 neurons 0.01 gaussian 0.01 network 0.01 Semantics of community: “machine learning (NIPS)” 2008 © Qiaozhu MeiUniversity of Illinois at Urbana-Champaign