2008 © ChengXiang Zhai 1 Contextual Text Analysis with Probabilistic Topic Models ChengXiang Zhai Department of Computer Science Graduate School of Library.

Slides:



Advertisements
Similar presentations
1 A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs Qiaozhu Mei, Chao Liu, Hang Su, and ChengXiang Zhai : University of Illinois.
Advertisements

ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
A Researcher’s Workbench in 2020: Intelligent Information Systems for Knowledge Synthesis and Discovery ChengXiang (“Cheng”) Zhai Department of Computer.
1 Opinion Integration and Summarization ChengXiang (“Cheng”) Zhai Department of Computer Science Graduate School of Library & Information Science Institute.
Modern Information Retrieval Chapter 1: Introduction
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Keynote at SIGIR 2011, July 26, 2011, Beijing, China Beyond Search: Statistical Topic Models for Text Analysis ChengXiang Zhai Department of Computer Science.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Introduction to IR Research ChengXiang Zhai Department of Computer.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
2010 © University of Michigan 1 Text Retrieval and Data Mining in SI - An Introduction Qiaozhu Mei School of Information Computer Science and Engineering.
Context Analysis in Text Mining and Search Qiaozhu Mei Department of Computer Science University of Illinois at Urbana-Champaign
Overview of Search Engines
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
Topic Modeling with Network Regularization Qiaozhu Mei, Deng Cai, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
Information Retrieval in Practice
Probabilistic Topic Models for Text Mining
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Pick a Good IR Research Problem ChengXiang Zhai Department of Computer.
MINING MULTI-FACETED OVERVIEWS OF ARBITRARY TOPICS IN A TEXT COLLECTION Xu Ling, Qiaozhu Mei, ChengXiang Zhai, Bruce Schatz Presented by: Qiaozhu Mei,
1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute.
Automatic Construction of Topic Maps for Navigation in Information Space ChengXiang (“Cheng”) Zhai Department of Computer Science University of Illinois.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Prepare Yourself for IR Research ChengXiang Zhai Department of Computer.
1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
1 Research on Intelligent Text Information Management ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute.
CS598CXZ (CS510) Advanced Topics in Information Retrieval (Fall 2014) Instructor: ChengXiang (“Cheng”) Zhai 1 Teaching Assistants: Xueqing Liu, Yinan Zhang.
LIS618 lecture 1 Thomas Krichel economic rational for traditional model In olden days the cost of telecommunication was high. database use.
2009 © Qiaozhu Mei University of Illinois at Urbana-Champaign Contextual Text Mining Qiaozhu Mei University of Illinois at Urbana-Champaign.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Frame an IR Research Problem and Form Hypotheses ChengXiang Zhai Department.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Comparative Text Mining Q. Mei, C. Liu, H. Su, A. Velivelli, B. Yu, C. Zhai DAIS The Database and Information Systems Laboratory. at The University of.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Automatic Labeling of Multinomial Topic Models Qiaozhu Mei, Xuehua Shen, ChengXiang Zhai University of Illinois at Urbana-Champaign.
Research on Intelligent Text Information Management
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 龙星计划课程 : 信息检索 Topic Models for Text Mining ChengXiang Zhai ( 翟成祥 )
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 龙星计划课程 : 信息检索 Course Summary ChengXiang Zhai ( 翟成祥 ) Department of.
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
Automatic Labeling of Multinomial Topic Models
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
Automatic Labeling of Multinomial Topic Models Qiaozhu Mei, Xuehua Shen, and ChengXiang Zhai DAIS The Database and Information Systems Laboratory.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining KDD’05, August 21–24, 2005, Chicago, Illinois, USA. Qiaozhu Mei.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Hierarchical Clustering & Topic Models
Context Analysis in Text Mining and Search
CS510 Advanced Topics in Information Retrieval (Fall 2017)
Sentiment analysis algorithms and applications: A survey
Probabilistic Topic Model
Introduction to IR Research
CS598CXZ (CS510) Advanced Topics in Information Retrieval (Fall 2016)
Course Summary (Lecture for CS410 Intro Text Info Systems)
A Researcher’s Workbench in 2020: Intelligent Information Systems for Knowledge Synthesis and Discovery ChengXiang (“Cheng”) Zhai Department of Computer.
Text Retrieval and Data Mining in SI - An Introduction
Qiaozhu Mei†, Chao Liu†, Hang Su‡, and ChengXiang Zhai†
ChengXiang (“Cheng”) Zhai Department of Computer Science
CS510 (Fall 2018) Advanced Topics in Information Retrieval
John Lafferty, Chengxiang Zhai School of Computer Science
Course Summary ChengXiang “Cheng” Zhai Department of Computer Science
Topic Models in Text Processing
Semi-Automatic Data-Driven Ontology Construction System
Presentation transcript:

2008 © ChengXiang Zhai 1 Contextual Text Analysis with Probabilistic Topic Models ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute for Genomic Biology, Statistics University of Illinois, Urbana-Champaign Joint work with Qiaozhu Mei

2008 © ChengXiang Zhai 2 Motivation Documents are often associated with context (meta- data) –Direct context: time, location, source, authors,… –Indirect context: events, policies, … Many applications require “contextual text analysis”: –Discovering topics from text in a context-sensitive way –Analyzing variations of topics over different contexts –Revealing interesting patterns (e.g., topic evolution, topic variations, topic communities)

2008 © ChengXiang Zhai 3 Example 1: Comparing News Articles Common Themes“Vietnam” specific“Afghan” specific“Iraq” specific United nations ……… Death of people ……… … ……… Vietnam WarAfghan War Iraq War CNNFox Blog Before 9/11During Iraq war Current US blogEuropean blog Others What’s in common? What’s unique?

2008 © ChengXiang Zhai 4 More Contextual Analysis Questions What positive/negative aspects did people say about X (e.g., a person, an event)? Trends? How does an opinion/topic evolves over time? What are emerging topics? What topics are fading away? How can we characterize a social network?

2008 © ChengXiang Zhai 5 Research Questions Can we model all these problems generally? Can we solve these problems with a unified approach? How can we bring human into the loop?

2008 © ChengXiang Zhai 6 Document context: Time = July 2005 Location = Texas Author = xxx Occup. = Sociologist Age Group = 45+ … Contextual Probabilistic Latent Semantics Analysis View1View2View3 Themes government donation New Orleans government 0.3 response donate 0.1 relief 0.05 help city 0.2 new 0.1 orleans TexasJuly 2005 sociolo gist Theme coverages: Texas July 2005 document …… Choose a view Choose a Coverage government donate new Draw a word from  i response aid help Orleans Criticism of government response to the hurricane primarily consisted of criticism of its response to … The total shut-in oil production from the Gulf of Mexico … approximately 24% of the annual production and the shut- in gas production … Over seventy countries pledged monetary donations or other assistance. … Choose a theme

2008 © ChengXiang Zhai 7 Comparing News Articles Iraq War (30 articles) vs. Afghan War (26 articles) Cluster 1Cluster 2Cluster 3 Common Theme united nations 0.04 … killed month deaths … … Iraq Theme n 0.03 Weapons Inspections … troops hoon sanches … … Afghan Theme Northern 0.04 alliance 0.04 kabul 0.03 taleban aid 0.02 … taleban rumsfeld 0.02 hotel front … … The common theme indicates that “United Nations” is involved in both wars Collection-specific themes indicate different roles of “United Nations” in the two wars

2008 © ChengXiang Zhai 8 Spatiotemporal Patterns in Blog Articles Query= “Hurricane Katrina” Topics in the results: Spatiotemporal patterns

2008 © ChengXiang Zhai 9 Theme Life Cycles (“Hurricane Katrina”) city orleans new louisiana flood evacuate storm … price oil gas increase product fuel company … Oil Price New Orleans

2008 © ChengXiang Zhai 10 Theme Snapshots (“Hurricane Katrina”) Week4: The theme is again strong along the east coast and the Gulf of Mexico Week3: The theme distributes more uniformly over the states Week2: The discussion moves towards the north and west Week5: The theme fades out in most states Week1: The theme is the strongest along the Gulf of Mexico

2008 © ChengXiang Zhai 11 Theme Life Cycles (KDD Papers) gene expressions probability microarray … marketing customer model business … rules association support …

2008 © ChengXiang Zhai 12 Theme Evolution Graph: KDD T SVM criteria classifica – tion linear … decision tree classifier class Bayes … Classifica - tion text unlabeled document labeled learning … Informa - tion web social retrieval distance networks … ………… 1999 … web classifica – tion features0.006 topic … mixture random cluster clustering variables … topic mixture LDA semantic … …

2008 © ChengXiang Zhai 13 Multi-Faceted Sentiment Summary (query=“Da Vinci Code”) NeutralPositiveNegative Facet 1: Movie... Ron Howards selection of Tom Hanks to play Robert Langdon. Tom Hanks stars in the movie,who can be mad at that? But the movie might get delayed, and even killed off if he loses. Directed by: Ron Howard Writing credits: Akiva Goldsman... Tom Hanks, who is my favorite movie star act the leading role. protesting... will lose your faith by... watching the movie. After watching the movie I went online and some research on... Anybody is interested in it?... so sick of people making such a big deal about a FICTION book and movie. Facet 2: Book I remembered when i first read the book, I finished the book in two days. Awesome book.... so sick of people making such a big deal about a FICTION book and movie. I’m reading “Da Vinci Code” now. … So still a good book to past time. This controversy book cause lots conflict in west society.

2008 © ChengXiang Zhai 14 Separate Theme Sentiment Dynamics “book” “religious beliefs”

2008 © ChengXiang Zhai 15 Event Impact Analysis: IR Research vector concept extend model space boolean function feedback … xml model collect judgment rank subtopic … probabilist model logic ir boolean algebra estimate weight … model language estimate parameter distribution probable smooth markov likelihood … 1998 Publication of the paper “A language modeling approach to information retrieval” Starting of the TREC conferences year 1992 term relevance weight feedback independence model frequent probabilistic document … Theme: retrieval models SIGIR papers

2008 © ChengXiang Zhai 16 Topic Modeling + Social Networks 16 Authors writing about the same topic form a community Topic Model OnlyTopic Model + Social Network Separation of 3 research communities: IR, ML, Web

2008 © ChengXiang Zhai 17 On-Going Work Combining contextual text analysis with visualization More detailed semantic modeling (entities, relations,…) Integration of search and contextual text analysis to develop an analyst’s workbench: –Interactive semantic navigation and probing –Synthesis of information/knowledge –Personalized/customized service

2008 © ChengXiang Zhai 18 The End Thank You!