Event Intensity Tracking in Weblog Collections Viet Ha-Thuc, Yelena Mejova, Christopher Harris, Dr Padmini Srinivasan ICWSM 2009 Data Challenge Workshop.

Slides:



Advertisements
Similar presentations
1 A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs Qiaozhu Mei, Chao Liu, Hang Su, and ChengXiang Zhai : University of Illinois.
Advertisements

CS 315 – Web Search and Data Mining. Overview The power of crowdsourcing Predicting flu outbreaks Predicting “the present” through Google Insights! Predicting.
Linguistic Processing in Lattice- Based Taxonomy Construction Anastasia Novokreshchenova, Maria Shabanova, Dmitry Zaytsev and Nina Belyaeva State University.
Title: The Author-Topic Model for Authors and Documents
1 Multi-topic based Query-oriented Summarization Jie Tang *, Limin Yao #, and Dewei Chen * * Dept. of Computer Science and Technology Tsinghua University.
Probabilistic Clustering-Projection Model for Discrete Data
MICHAEL PAUL AND ROXANA GIRJU UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted Topics.
Analysis of Twitter Data NIKHIL PURANIK CMSC 601 – Research Skills 25 th April 2011UNIVERSITY OF MARYLAND BALTIMORE COUNTY.
Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov April TexPoint fonts used in EMF. Read the TexPoint manual before.
Automatic Classification of Accounting Literature Nineteenth Annual Strategic and Emerging Technologies Workshop Vasundhara Chakraborty, Victoria Chiu,
There are two main political parties in the United States. The Democratic Party has nominated Barack Obama for President and Joe Biden for Vice President.
1 I256: Applied Natural Language Processing Marti Hearst Nov 8, 2006.
Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University.
Outcomes: Bill Clinton, George W. Bush, & Barrack Obama.
Statistical Topic Models for Integrating and Analyzing Opinions in Blog articles Yue Lu Qiaozhu Mei ChengXiang Zhai.
RAP 31 ABC D E Who are these people and what political party are they in? Hillary Clinton- Democrat Barack Obama- Democrat John McCain-Republican Sarah.
QUANTITATIVE & QUALITATIVE RESEARCH IN SOCIAL SCIENCE NGUYEN THU QUYNH – I34035 Introduction to International Relations.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
It’s the Political Season and They’ll Promise Us Everything And Look Out for the Attacks on Their Opponents!
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
To Accompany Comprehensive, Alternate, and Texas Editions American Government: Roots and Reform, 10th edition Karen O’Connor and Larry J. Sabato  Pearson.
The U.S. in 2008: The Election and the Economy I. Demographic data II. U.S. political process III election IV. State of U.S. economy.
KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu.
Addressing Incompleteness and Noise in Evolving Web Snapshots KJDB2007 Masashi Toyoda IIS, University of Tokyo.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
2009 IEEE Symposium on Computational Intelligence in Cyber Security 1 LDA-based Dark Web Analysis.
The American story By: Rondale Salter By: Rondale Salter.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
1 Linmei HU 1, Juanzi LI 1, Zhihui LI 2, Chao SHAO 1, and Zhixing LI 1 1 Knowledge Engineering Group, Dept. of Computer Science and Technology, Tsinghua.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
The President must be  a natural-born citizen  at least 35 years old  at least 14 years resident in the USA.
The 21st-Century United States Politics & Leaders.
NTU Natural Language Processing Lab. 1 Investment and Attention in the Weblog Community Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
The Presidential Election Cycle in the U.S.A.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Prepared by: February 11, 2008 Post-Super Tuesday/ Pre-Chesapeake Tuesday Political Media Analysis.
Topic Modeling using Latent Dirichlet Allocation
Politics and Social media: The Political Blogosphere and the 2004 U.S. election: Divided They Blog Crystal: Analyzing Predictive Opinions on the Web Swapna.
V&E #14 Primary Elections How do we choose the party’s candidate?
“Elections in the United States”. How does a candidate get from this point…
Chapter 14: The Campaign Process What do you think of when you hear the word campaign?
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Topic-Factorized Ideal Point Estimation Model for Legislative Voting Network Yupeng Gu †, Yizhou Sun †, Ning Jiang ‡, Bingyu Wang †, Ting Chen † † Northeastern.
Automatic Labeling of Multinomial Topic Models
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
Selecting a President:
A Nonparametric Method for Early Detection of Trending Topics Zhang Advisor: Prof. Aravind Srinivasan.
To Accompany Comprehensive, Alternate, and Texas Editions American Government: Roots and Reform, 10th edition Karen O’Connor and Larry J. Sabato  Pearson.
Alvin CHAN Kay CHEUNG Alex YING Relationship between Twitter Events and Real-life.
The Campaign Process Chapter 14. The Campaign Process ✦ We will cover ✦ The Structure of a Campaign ✦ The Candidate for the Campaign ✦ Which do we vote.
Copyright © 2011 Pearson Education, Inc. Publishing as Longman.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
POST REGAN PRESIDENTS AND REVIEW. REVIEW - PRESIDENTS Founding Fathers
Focus Question Rising campaign costs have led to: A.Some candidates dropping out due to inadequate international funding. B.Term limits for those borrowing.
____ 1) Which state’s voting results were key to determining the winner of the 2000 presidential election? a. Ohio b. Iowa c. Texas d. Florida 2) What.
The Last 35 Years: Clinton, Bush & Obama
Online Multiscale Dynamic Topic Models
Aspect-based sentiment analysis
Welcome to ForumPass Site Creator Training
Election of Al Gore -George W. Bush -very close election
Topic Modeling Nick Jordan.
John Lafferty, Chengxiang Zhai School of Computer Science
Modeling Trust and Influence in the Blogosphere using Link Polarity
Welcome to ForumPass Site Creator Training
Presentation transcript:

Event Intensity Tracking in Weblog Collections Viet Ha-Thuc, Yelena Mejova, Christopher Harris, Dr Padmini Srinivasan ICWSM 2009 Data Challenge Workshop Presented by: Yelena Mejova 1

Outline Motivation: Topic Tracking Explore the weblog collection Event tracking approach Related work Results Event tracking Sub-event tracking Future directions 2

Outline Motivation: Topic Tracking Explore the weblog collection Event tracking approach Related work Results Event tracking Sub-event tracking Future directions 3

Event Tracking People talk: - What - When -How much 4

Outline Motivation: Topic Tracking Explore the weblog collection Event tracking approach Related work Results Event tracking Sub-event tracking Future directions 5

Data Set Published by Spinn3r.com 44 million blog posts August 1, 2008 – October 1, 2008 No comments 6

Data Set Languages 7

Data Set Document Length 8

Data Set Document Distribution by Date 9

Data Set Popular Categories 10

Data Set Our subset: – 1 million documents (4% of all English posts) – English only – Inlink threshold of

Outline Motivation: Topic Tracking Explore the weblog collection Event tracking approach Related work Results Event tracking Sub-event tracking Future directions 12

Tracking Approach Phase I: Estimate relevance-based topic models Phase II: Estimate topical intensity training docs docs topic models topic models topic models topic models docs 13

Relevance-based Topical Model 14

Relevance-based Topical Model b b ekek ekek t o (d) w w N D K BACKGROUND TOPIC (EX: COMMON ENGLISH WORDS) EVENT TOPIC OTHER DOCUMENT- SPECIFIC TOPIC OBSERVED WORD TOKEN TRAINING DOCUMENT TRAINING DOCUMENTS FOR AN EVENT ALL TRAINING SETS FOR ALL K EVENTS 15

Relevance-based Topical Model Inference – Given a training set for each event considered b- All documents e k - Event training documents, not the rest t o (d)- One document, not the rest 16

Estimating intensities From a subset (slice) Window: 5 days Intensity(e i,t) = Σ log[p(d|e i )] d ∈ [t,t+w] Log-likelihood of document given an event At a particular window in time 17

Outline Motivation: Topic Tracking Explore the weblog collection Event tracking approach Related work Results Event tracking Sub-event tracking Future directions 18

Related Work Topic Evolution Extraction Zhou et al 2006, Mei & Zhai 2005 Topic Detection and Tracking Allan 2002, Allan et al 1998 Blog Mining Attardi & Simi 2006, Aschenbrenner & Miksch 2005, Kumar et al 2003, Glance, Hurst, Tomokiyo 2004 Relevance Modeling Robertson & Sparck-Jones 1988, Lavrenko & Croft

Outline Motivation: Topic Tracking Explore the weblog collection Event tracking approach Related work Results Event tracking Sub-event tracking Future directions 20

Event Tracking News Events sources: wikipedia.org + news sites training subsets: retrieved using Lucene 2 US Presidential Election Economic Financial Crisis Hurricane Tropical Storms US Open Tennis Russia Georgia Conflict Beijing Olympics China Milk Powder Scandal Thai Political Crisis Delhi India Bomb Blast Pakistan Impeachment 21

Event Tracking Topic Estimation Beijing Olympics wordP(w|BO) olymp0.075 beij0.071 phelp0.043 china0.041 game0.040 gold0.023 august0.021 michael0.021 US Presidential Election wordP(w|USPE) obama0.064 mccain0.050 palin0.041 democrat0.034 republican0.030 clinton0.019 biden0.018 convent

Event Tracking Running mate announcements, National Conventions Olympics: Aug 8-24 Phelps’ Eighth Medal: Aug 17 Impeachment launched: Aug 7 Formal impeachment charges: Aug 17 Musharraf’s formal resignation: Aug 18 Several Hurricanes 23

Event Tracking Are the spikes due to sampling process? Topic Latency – How long does it take for discussion to start? What is the effect of topic interference? – Ex: Beijing Olympics China / China Milk Scandal What kinds of subtopics contribute to the main topics? 24

Outline Motivation: Topic Tracking Explore the weblog collection Event tracking approach Related work Results Event tracking Sub-event tracking Future directions 25

Sub-Event Tracking Training set: event-specific WASNOW common Englishcommon English + common topical eventsub-event other doc-specific 26

Sub-Event Tracking 27

Sub-Event Tracking Sub-topic Estimation Democratic Convention wordP(w|DC) obama0.041 dnc0.040 democrat0.038 clinton0.034 biden0.034 denver0.027 barack0.021 hillari0.012 Republican Convention wordP(w|RC) palin0.073 republican0.063 mccain0.050 sarah0.029 rnc0.025 song0.009 paul0.009 gop

Sub-Event Tracking Democratic Convention: August Republican Convention: September

Sub-Event Tracking Named: August 15 Landfall: August 18 Named: August 25 Landfall: September 1 Named: September 1 Landfall: September 13 30

Sub-Event Tracking Deeper hierarchies Re-define sub-topics – Opinion, locale, other demographics 31 Financial Crisis Federal Reserve Bailout AIGGoldman Sachs Taxpayer Reaction Congressional Reaction Conflicts of Interest Taxpayer Reaction Financial Market Reaction

Conclusions Topic modeling – Excluding non-relevant background and document-specific terms Topic tracking – Closely corresponds with real world – Hierarchical Scalability 32

Outline Motivation: Topic Tracking Explore the weblog collection Event tracking approach Related work Results Event tracking Sub-event tracking Future directions 33

Future Directions Baseline – standard ad hoc retrieval approaches? Evaluation – gold standard? Dynamic Topic Tracking – moving time window Community Dynamics Topical Sentiment Analysis 34

Thank You 35

Works Cited [1] Blei, M., Ng. A., Jordan, M. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, [2] Apache Lucene. 36