Download presentation
Presentation is loading. Please wait.
Published byRamiro Harwell Modified over 9 years ago
1
ICDM, Shenzhen, 2014 Flu Gone Viral: Syndromic Surveillance of Flu on Twitter using Temporal Topic Models Liangzhe Chen, K. S. M. Tozammel Hossain, Patrick Butler, Naren Ramakrishnan, B. Aditya Prakash Computer Science at Virginia Tech
2
ICDM, Shenzhen, 2014 Introduction: Surveillance How to estimate and predict flu trends? 2 Population survey Hospital record Lab survey Surveillance Report
3
ICDM, Shenzhen, 2014 Introduction : GFT& Twitter Estimate flu trends using online electronic sources 3 So cold today, I’m catching cold. I have headache, sore throat, I can’t go to school today. My nose is totally congested, I have a hard time understanding what I’m saying.
4
ICDM, Shenzhen, 2014 Outline Observations HFSTM Model Inference Experiments Conclusion Future work 4
5
ICDM, Shenzhen, 2014 Observation 1: States There are different states in an infection cycle. SEIR model: 1. Susceptible 2. Exposed 3. Infected 4. Recovered 5
6
ICDM, Shenzhen, 2014 Observation 2: Ep. & So. Gap Infection cases drop exponentially in epidemiology (Hethcote 2000) Keyword mentions drop in a power-law pattern in social media (Matsubara 2012) 6
7
ICDM, Shenzhen, 2014 Outline Observations HFSTM Model Inference Experiments Conclusion Future work 7
8
ICDM, Shenzhen, 2014 HFSTM Model Hidden Flu-State from Tweet Model (HFSTM) Each word (w) in a tweet (O i ) can be generated by: A background topic Non-flu related topics State related topics 8 Binary background switch Binary non- flu related switch Word distribution Latent state Initial prob. Transit. prob. Transit. switch
9
ICDM, Shenzhen, 2014 HFSTM Model Generating tweets 9 Generate the state for a tweet Generate the topic for a word State: [S,E,I] Topic: [Background, Non-flu, State] S:S: good This restaurant isreally E:E:Themovie was good but it was freezing I:I:IthinkIhaveflu
10
ICDM, Shenzhen, 2014 Outline Observations HFSTM Model Inference Experiments Conclusion Future work 10
11
ICDM, Shenzhen, 2014 EM-based algorithm: HFSTM-FIT E-step: A t (i)=P(O 1,O 2,…,O t,S t =i) B t (i)=P(O t+1,…,O Tu |S t =i) γ t (i)=P(S t =i|O u ) M-step: Other parameters such as state transition probabilities, topic distributions, etc. Parameters learned: Inference 11
12
ICDM, Shenzhen, 2014 Outline Observations HFSTM Model Inference Experiments Conclusion Future work 12
13
ICDM, Shenzhen, 2014 Vocabulary & Dataset Vocabulary (230 words): Flu-related keyword list by Chakraborty SDM 2014 Extra state-related keyword list Dataset (34,000 tweets): Identify infected users and collect their tweets Train on data from Jun 20, 2013-Aug 06, 2013 Test on two time period: Dec 01, 2012- July 08, 2013 Nov 10, 2013-Jan 26, 2014 13
14
ICDM, Shenzhen, 2014 Learned word distributions The most probable words learned in each state 14 Probably healthy: S Having symptons: E Definitely sick: I
15
ICDM, Shenzhen, 2014 Learned state transition Transition probabilitiesTransition in real tweets 15 Not directly flu- related, yet correctly identified Learned by HFSTM:
16
ICDM, Shenzhen, 2014 Flu trend fitting Ground-truth: The Pan American Health Organization (PAHO) Algorithms: Baseline: Count the number of keywords weekly as features, and regress to the ground-truth curve. Google flu trend: Take the google flu trend data as input, regress to the PAHO curve. HFSTM: Distinguish different states of keyword, and only use the number of keywords in I state. Again regress to PAHO. 16
17
ICDM, Shenzhen, 2014 Flu trend fitting Linear regression to the case count reported by PAHO (the ground-truth) 17
18
ICDM, Shenzhen, 2014 Bridging the Ep. & So. Gap Select some flu-related keyword Plot its number of mentions w.r.t time Identify the fall-part Fit the fall-part with exponential functions, and power law. 18
19
ICDM, Shenzhen, 2014 Bridging the Ep. & So. Gap Fitting the fall-part with power-law and exponential functions 19
20
ICDM, Shenzhen, 2014 Outline Observations HFSTM Model Inference Experiments Conclusion Future work 20
21
ICDM, Shenzhen, 2014 Conclusions HFSTM: infers biological states for twitter users. learns word distributions and state transitions. helps predict the flu-trend. reconciles the social contagion activity profile to standard epidemiological models. 21
22
ICDM, Shenzhen, 2014 Outline Observations HFSTM Model Inference Experiments Conclusion Future work 22
23
ICDM, Shenzhen, 2014 Future work A possible issue with HFSTM Suffer from large, noisy vocabulary. Semi-supervision for improvement Introduce weak supervision into HFSTM. 23
24
ICDM, Shenzhen, 2014 Questions? Code at: http://people.cs.vt.edu/~liangzhe B. Aditya PrakashLiangzhe Chen Naren Ramakrishnan K. S. M. Tozammel HossainPatrick Butler Funding:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.