ICDM, Shenzhen, 2014 Flu Gone Viral: Syndromic Surveillance of Flu on Twitter using Temporal Topic Models Liangzhe Chen, K. S. M. Tozammel Hossain, Patrick.

Slides:



Advertisements
Similar presentations
Attribute Learning for Understanding Unstructured Social Activity
Advertisements

MA/CS 375 Fall MA/CS 375 Fall 2002 Lecture 29.
Influence and Passivity in Social Media Daniel M. Romero, Wojciech Galuba, Sitaram Asur, and Bernardo A. Huberman Social Computing Lab, HP Labs.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
CS 315 – Web Search and Data Mining. Overview The power of crowdsourcing Predicting flu outbreaks Predicting “the present” through Google Insights! Predicting.
Self-introduction Name:  鲍鹏 (Peng Bao) Research Interests:  Popularity Prediction, Information Diffusion, Social Network , etc… Grade:  In the third.
{ Trends in Social Network M. Tech Project Presentation By : Pranay Agarwal 2008CS50220 Guides : Amitabha Bagchi Maya Ramanath.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Topic Modeling with Network Regularization Md Mustafizur Rahman.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference (Sec. )
Sample Midterm question. Sue want to build a model to predict movie ratings. She has a matrix of data, where for M movies and U users she has collected.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
Biostatistics Frank H. Osborne, Ph. D. Professor.
Hidden Process Models Rebecca Hutchinson Tom M. Mitchell Indrayana Rustandi October 4, 2006 Women in Machine Learning Workshop Carnegie Mellon University.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
Social Theory Driven Operational Forecasting of Civil Unrest Event Outbreaks Final Project Presentation Peter Wu Apr 30, 2015.
Towards Detecting Influenza Epidemics by Analyzing Twitter Massages Aron Culotta Jedsada Chartree.
1 Active learning based survival regression for censored data Bhanukiran Vinzamuri Yan Li Chandan K.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015.
Fast Mining and Forecasting of Complex Time-Stamped Events Yasuko Matsubara (Kyoto University), Yasushi Sakurai (NTT), Christos Faloutsos (CMU), Tomoharu.
Forex-foreteller: A News Based Currency Predictor Fang Jin (fang8), Nathan Self (nwself), Parang Saraf (parang), Patrick Butler (pabutler), Wei Wang (tskatom)
Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.
Spatial Dynamic Factor Analysis Hedibert Freitas Lopes, Esther Salazar, Dani Gamerman Presented by Zhengming Xing Jan 29,2010 * tables and figures are.
22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.
CHAPTER 2 Statistical Inference, Exploratory Data Analysis and Data Science Process cse4/587-Sprint
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Forex-foreteller: A News Based Currency Predictor Fang Jin, Nathan Self, Parang Saraf, Patrick Butler, Wei Wang, Naren Ramakrishnan Department of Computer.
Detecting Influenza Outbreaks by Analyzing Twitter Messages By Aron Culotta Jedsada Chartree 02/28/11.
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
Prediction of Influencers from Word Use Chan Shing Hei.
Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by: Prof. Alan Mislove.
G Class 11 Statistical Methods for the Analysis of Change Administrative Issues Why study change? Overview of methodological issues Overview of.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Disk Failures Eli Alshan. Agenda Articles survey – Failure Trends in a Large Disk Drive Population – Article review – Conclusions – Criticism – Disk failure.
Lecture 2: Statistical learning primer for biologists
Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst.
Epidemiological Modeling of News and Rumors on Twitter Fang Jin, Edward Dougherty, Parang Saraf, Peng Mi, Yang Cao, Naren Ramakrishnan Virginia Tech Aug.
Understanding and Predicting Human Behavior using Propagation: From Flu-trends to Cyber-Security B. Aditya Prakash Computer Science Virginia Tech. Keynote.
Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
A Latent Social Approach to YouTube Popularity Prediction Amandianeze Nwana Prof. Salman Avestimehr Prof. Tsuhan Chen.
Fitting normal distribution: ML 1Computer vision: models, learning and inference. ©2011 Simon J.D. Prince.
Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining KDD’05, August 21–24, 2005, Chicago, Illinois, USA. Qiaozhu Mei.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation Rendong Yang and Zhen Su Division of Bioinformatics,
 DM-Group Meeting Liangzhe Chen, Oct Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa,
Experience Report: System Log Analysis for Anomaly Detection
B. Aditya Prakash Computer Science Virginia Tech.
Inferring Networks of Diffusion and Influence
B. Aditya Prakash Department of Computer Science
B. Aditya Prakash Computer Science Virginia Tech.
MEIKE: Influence-based Communities in Networks

DM-Group Meeting Liangzhe Chen, Nov
CSE 4705 Artificial Intelligence
Summary Presented by : Aishwarya Deep Shukla
Machine learning in Action: Unpacking the Biographical Questionnaire
Collective Network Linkage across Heterogeneous Social Platforms
Epidemic Alerts EECS E6898: TOPICS – INFORMATION PROCESSING: From Data to Solutions Alexander Loh May 5, 2016.
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
Modeling Mass Protest Adoption in Social Network Communities
Data Science Process Chapter 2 Rich's Training 11/13/2018.
Combining Species Occupancy Models and Boosted Regression Trees
Distributed Representations of Subgraphs
Neil Gealy Outline What I Learned this Week Research Interests
Predicting Prevalence of Influenza-Like Illness From Geo-Tagged Tweets
Automatic Segmentation of Data Sequences
Presentation transcript:

ICDM, Shenzhen, 2014 Flu Gone Viral: Syndromic Surveillance of Flu on Twitter using Temporal Topic Models Liangzhe Chen, K. S. M. Tozammel Hossain, Patrick Butler, Naren Ramakrishnan, B. Aditya Prakash Computer Science at Virginia Tech

ICDM, Shenzhen, 2014 Introduction: Surveillance How to estimate and predict flu trends? 2 Population survey Hospital record Lab survey Surveillance Report

ICDM, Shenzhen, 2014 Introduction : GFT& Twitter Estimate flu trends using online electronic sources 3 So cold today, I’m catching cold. I have headache, sore throat, I can’t go to school today. My nose is totally congested, I have a hard time understanding what I’m saying.

ICDM, Shenzhen, 2014 Outline Observations HFSTM Model Inference Experiments Conclusion Future work 4

ICDM, Shenzhen, 2014 Observation 1: States There are different states in an infection cycle. SEIR model: 1. Susceptible 2. Exposed 3. Infected 4. Recovered 5

ICDM, Shenzhen, 2014 Observation 2: Ep. & So. Gap Infection cases drop exponentially in epidemiology (Hethcote 2000) Keyword mentions drop in a power-law pattern in social media (Matsubara 2012) 6

ICDM, Shenzhen, 2014 Outline Observations HFSTM Model Inference Experiments Conclusion Future work 7

ICDM, Shenzhen, 2014 HFSTM Model Hidden Flu-State from Tweet Model (HFSTM) Each word (w) in a tweet (O i ) can be generated by: A background topic Non-flu related topics State related topics 8 Binary background switch Binary non- flu related switch Word distribution Latent state Initial prob. Transit. prob. Transit. switch

ICDM, Shenzhen, 2014 HFSTM Model Generating tweets 9 Generate the state for a tweet Generate the topic for a word State: [S,E,I] Topic: [Background, Non-flu, State] S:S: good This restaurant isreally E:E:Themovie was good but it was freezing I:I:IthinkIhaveflu

ICDM, Shenzhen, 2014 Outline Observations HFSTM Model Inference Experiments Conclusion Future work 10

ICDM, Shenzhen, 2014 EM-based algorithm: HFSTM-FIT E-step: A t (i)=P(O 1,O 2,…,O t,S t =i) B t (i)=P(O t+1,…,O Tu |S t =i) γ t (i)=P(S t =i|O u ) M-step: Other parameters such as state transition probabilities, topic distributions, etc. Parameters learned: Inference 11

ICDM, Shenzhen, 2014 Outline Observations HFSTM Model Inference Experiments Conclusion Future work 12

ICDM, Shenzhen, 2014 Vocabulary & Dataset Vocabulary (230 words): Flu-related keyword list by Chakraborty SDM 2014 Extra state-related keyword list Dataset (34,000 tweets): Identify infected users and collect their tweets Train on data from Jun 20, 2013-Aug 06, 2013 Test on two time period: Dec 01, July 08, 2013 Nov 10, 2013-Jan 26,

ICDM, Shenzhen, 2014 Learned word distributions The most probable words learned in each state 14 Probably healthy: S Having symptons: E Definitely sick: I

ICDM, Shenzhen, 2014 Learned state transition Transition probabilitiesTransition in real tweets 15 Not directly flu- related, yet correctly identified Learned by HFSTM:

ICDM, Shenzhen, 2014 Flu trend fitting Ground-truth: The Pan American Health Organization (PAHO) Algorithms: Baseline: Count the number of keywords weekly as features, and regress to the ground-truth curve. Google flu trend: Take the google flu trend data as input, regress to the PAHO curve. HFSTM: Distinguish different states of keyword, and only use the number of keywords in I state. Again regress to PAHO. 16

ICDM, Shenzhen, 2014 Flu trend fitting Linear regression to the case count reported by PAHO (the ground-truth) 17

ICDM, Shenzhen, 2014 Bridging the Ep. & So. Gap Select some flu-related keyword Plot its number of mentions w.r.t time Identify the fall-part Fit the fall-part with exponential functions, and power law. 18

ICDM, Shenzhen, 2014 Bridging the Ep. & So. Gap Fitting the fall-part with power-law and exponential functions 19

ICDM, Shenzhen, 2014 Outline Observations HFSTM Model Inference Experiments Conclusion Future work 20

ICDM, Shenzhen, 2014 Conclusions HFSTM: infers biological states for twitter users. learns word distributions and state transitions. helps predict the flu-trend. reconciles the social contagion activity profile to standard epidemiological models. 21

ICDM, Shenzhen, 2014 Outline Observations HFSTM Model Inference Experiments Conclusion Future work 22

ICDM, Shenzhen, 2014 Future work A possible issue with HFSTM Suffer from large, noisy vocabulary. Semi-supervision for improvement Introduce weak supervision into HFSTM. 23

ICDM, Shenzhen, 2014 Questions? Code at: B. Aditya PrakashLiangzhe Chen Naren Ramakrishnan K. S. M. Tozammel HossainPatrick Butler Funding: