Download presentation
Presentation is loading. Please wait.
Published byStanley Chaddock Modified over 10 years ago
1
FUNNEL: Automatic Mining of Spatially Coevolving Epidemics Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Willem G. van Panhuis (University of Pittsburgh) Christos Faloutsos (CMU) SIGKDD 2014Y. Matsubara et al.1
2
Motivation Given: Large set of epidemiological data Y. Matsubara et al.2SIGKDD 2014 e.g., Measles cases in the U.S. Linear (Weekly)
3
Motivation Given: Large set of epidemiological data Y. Matsubara et al.3SIGKDD 2014 e.g., Measles cases in the U.S. Linear Yearly periodicity (Weekly)
4
Motivation Given: Large set of epidemiological data Y. Matsubara et al.4SIGKDD 2014 e.g., Measles cases in the U.S. Linear Yearly periodicity Vaccine effect (Weekly)
5
Motivation Given: Large set of epidemiological data Y. Matsubara et al.5SIGKDD 2014 e.g., Measles cases in the U.S. Linear Yearly periodicity Shocks, e.g., 1941 Vaccine effect (Weekly)
6
Motivation Given: Large set of epidemiological data Y. Matsubara et al.6SIGKDD 2014 e.g., Measles cases in the U.S. Linear Goal: summarize all the epidemic time-series, “fully-automatically”
7
Data description Project Tycho: infectious diseases in the U.S. Y. Matsubara et al.7SIGKDD 2014 50 states 56 diseases Time (weekly) 1888 (> 125 years) X
8
50 states 56 diseases Time (weekly) 1888 (> 125 years) Data description Project Tycho: infectious diseases in the U.S. Y. Matsubara et al.8SIGKDD 2014 Element x : # of cases e.g., ‘measles’, ‘NY’, ‘April 1-7, 1931’, ‘4000’ X x
9
Problem definition Given: Tensor X (disease x state x time) Y. Matsubara et al.9SIGKDD 2014 X
10
Problem definition Given: Tensor X (disease x state x time) Find: Compact description of X, “automatically” Y. Matsubara et al.10SIGKDD 2014 XX
11
Problem definition Given: Tensor X (disease x state x time) Find: Compact description of X, “automatically” Y. Matsubara et al.11SIGKDD 2014 XX Discontinuities
12
Problem definition Given: Tensor X (disease x state x time) Find: Compact description of X, “automatically” Y. Matsubara et al.12SIGKDD 2014 XX
13
Roadmap SIGKDD 201413Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔
14
Modeling power of FUNNEL Y. Matsubara et al.14SIGKDD 2014 Questions about epidemics Q1 Q2 Q3 Q4 Q5 X
15
X Questions Y. Matsubara et al.15SIGKDD 2014 Are there any periodicities? If yes, when is the peak season? Q1 Q2 Q3 Q4 Q5 Q1
16
Answers Y. Matsubara et al.16SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD 201416Y. Matsubara et al. Seasonality P1 Angle: peak season Radius: seasonality strength FUNNEL: Polar plot
17
Answers Y. Matsubara et al.17SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD 201417Y. Matsubara et al. Seasonality P1 Questions ?
18
Answers Y. Matsubara et al.18SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD 201418Y. Matsubara et al. Seasonality P1 Questions ? Q: Does Influenza have seasonality? If yes, when?
19
Answers Y. Matsubara et al.19SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD 201419Y. Matsubara et al. Seasonality P1 Questions ?
20
Answers Y. Matsubara et al.20SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD 201420Y. Matsubara et al. Seasonality P1 Influenza in Feb. Detected by FUNNEL (strong seasonality) Detected!
21
Answers Y. Matsubara et al.21SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD 201421Y. Matsubara et al. Seasonality P1 Questions ?
22
Answers Y. Matsubara et al.22SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD 201422Y. Matsubara et al. Seasonality P1 Questions ? Q: How about measles ?
23
Answers Y. Matsubara et al.23SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD 201423Y. Matsubara et al. Seasonality P1 Questions ?
24
Answers Y. Matsubara et al.24SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD 201424Y. Matsubara et al. Seasonality P1 Measles (children’s) in spring Detected!
25
Answers Y. Matsubara et al.25SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD 201425Y. Matsubara et al. Seasonality P1 Questions ?
26
Answers Y. Matsubara et al.26SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD 201426Y. Matsubara et al. Seasonality P1 Questions ? Q: Which disease peaks in summer?
27
Answers Y. Matsubara et al.27SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD 201427Y. Matsubara et al. Seasonality P1 Questions ?
28
Answers Y. Matsubara et al.28SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD 201428Y. Matsubara et al. Seasonality P1 Detected! Lyme-disease (tick-borne) in summer
29
Answers Y. Matsubara et al.29SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD 201429Y. Matsubara et al. Seasonality P1 Questions ?
30
Answers Y. Matsubara et al.30SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD 201430Y. Matsubara et al. Seasonality P1 Questions ? Q: Which disease has no periodicity?
31
Answers Y. Matsubara et al.31SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD 201431Y. Matsubara et al. Seasonality P1 Questions ?
32
Answers Y. Matsubara et al.32SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD 201432Y. Matsubara et al. Seasonality P1 Detected! Gonorrhea (STD) no periodicity
33
Questions Y. Matsubara et al.33SIGKDD 2014 Can we see any discontinuities? Q1 Q2 Q3 Q4 Q5 Q2 X
34
Answers Y. Matsubara et al.34SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 Disease reduction effect P2 Measles 1965: Detected by FUNNEL 1963: Vaccine licensure Detected!
35
Questions Y. Matsubara et al.35SIGKDD 2014 Q3 What’s the difference between measles in NY and in FL? Q1 Q2 Q3 Q4 Q5
36
Answers Y. Matsubara et al.36SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 area sensitivity P3 FUNNEL’s guess of susceptibles ( measles ) CA TX NY, PA (more children) FL (fewer children) Detected!
37
X Questions Y. Matsubara et al.37SIGKDD 2014 Q4 Are there any external shock events, like wars? Q1 Q2 Q3 Q4 Q5
38
Answers Y. Matsubara et al.38SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 external shock events P4 Funnel can detect external shocks “fully-automatically” ! World war II Scarlet fever Detected by FUNNEL Detected!
39
X Questions Y. Matsubara et al.39SIGKDD 2014 Q5 How can we remove mistakes and incorrect values? Q1 Q2 Q3 Q4 Q5 TYPO!
40
Answers Y. Matsubara et al.40SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 mistakes P5 It can also detect typos, “automatically” !! Missing values Typhoid fever cases Detected! Mistake
41
Modeling power of FUNNEL Our model can capture 5 properties Y. Matsubara et al.41SIGKDD 2014 P1 P2 P3 P4 P5 Seasonality Disease reductions Area sensitivity External events Mistakes
42
Roadmap SIGKDD 201442Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔
43
Problem definition Given: Tensor X (disease x state x time) Find: Compact description of X, “automatically” Y. Matsubara et al.43SIGKDD 2014 XX
44
Two main ideas Idea #1: Grey-box model Idea #2: MDL for fitting SIGKDD 2014Y. Matsubara et al.44 X NO magic numbers ! (parameter-free)
45
Two main ideas Idea #1: Grey-box model - domain knowledge SIGKDD 2014Y. Matsubara et al.45 X (SIRS+) : 6 parameters Shocks Vaccine
46
Two main ideas Idea #2: Fitting with MDL -> parameter free! SIGKDD 2014Y. Matsubara et al.46 X
47
Roadmap SIGKDD 201447Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔ ✔
48
Proposed model: FUNNEL Y. Matsubara et al.48SIGKDD 2014 states diseases Time single epidemic Multi-evolving epidemics (a) FUNNEL-single (b) FUNNEL-full X
49
Proposed model: FUNNEL Y. Matsubara et al.49SIGKDD 2014 states diseases Time single epidemic Multi-evolving epidemics (a) FUNNEL-single (b) FUNNEL-full X
50
FUNNEL – with a single epidemic Given: “single” epidemic sequence Find: nonlinear equation, model parameters Y. Matsubara et al.50SIGKDD 2014 e.g., measles in NY FUNNEL
51
FUNNEL – with a single epidemic With a single epidemic: Funnel-RE Y. Matsubara et al.51SIGKDD 2014 S(t) I(t) V(t) Linear Log I(t) People of 3 classes S: Susceptible I : Infected V : Vigilant/ vaccinated Details
52
FUNNEL – with a single epidemic With a single epidemic: Funnel-RE Y. Matsubara et al.52SIGKDD 2014 S(t) : susceptible I (t) : Infected V(t) : Vigilant /Vaccinated S V I Details
53
FUNNEL – with a single epidemic With a single epidemic: Funnel-RE Y. Matsubara et al.53SIGKDD 2014 S V I Details : strength of infection (yearly periodic func)
54
FUNNEL – with a single epidemic With a single epidemic: Funnel-RE Y. Matsubara et al.54SIGKDD 2014 S V I Details : healing rate : disease reduction effect
55
FUNNEL – with a single epidemic With a single epidemic: Funnel-RE Y. Matsubara et al.55SIGKDD 2014 S V I Details : temporal susceptible rate
56
Proposed model: FUNNEL Y. Matsubara et al.56SIGKDD 2014 states diseases Time single epidemic Multi-evolving epidemics (a) FUNNEL-single (b) FUNNEL-full X
57
states diseases time X (P1, P2): global/country (P3): local/state M (P4, P5): extra - E : shocks & M : mistakes E Proposed model: FUNNEL-full Y. Matsubara et al.57 P1 P2 P3 P4 P5 SIGKDD 2014
58
states diseases time X (P1, P2): global/country Proposed model: FUNNEL-full Y. Matsubara et al.58 P1 P2 Details Global P1 P2 Base matrix (d x 6) Disease reduction matrix (d x 2) SIGKDD 2014
59
states diseases time X (P3): local/state Proposed model: FUNNEL-full Y. Matsubara et al.59 P3 Details Local P3 Geo-disease matrix (d x l) : potential population of disease i in state j SIGKDD 2014
60
states diseases time X M (P4, P5): extra - E : shocks & M : mistakes E Proposed model: FUNNEL-full Y. Matsubara et al.60 P4 P5 Extra P4 P5 External shock tensor E Mistake tensor M Details SIGKDD 2014
61
Roadmap SIGKDD 201461Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔ ✔ ✔
62
Challenges Q1. How to automatically - find “external shocks” ? - ignore “mistakes” (i.e., typos) ? Q2. How to efficiently estimate model parameters ? SIGKDD 2014Y. Matsubara et al.62 M E X
63
Challenges Q1. How to automatically - find “external shocks” ? - ignore “mistakes” (i.e., typos) ? Idea (1) : Model description cost Q2. How to efficiently estimate model parameters ? Idea (2): Multi-layer optimization (linear) SIGKDD 2014Y. Matsubara et al.63 M E X
64
Roadmap SIGKDD 201464Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔ ✔ ✔ ✔
65
Roadmap SIGKDD 201465Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔ ✔ ✔ ✔ ✔
66
Roadmap SIGKDD 201466Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔ ✔ ✔ ✔ ✔
67
(a) FUNNEL at work - forecasting SIGKDD 201467Y. Matsubara et al. Forecasting future epidemics Train: 2/3 sequences Forecast: 1/3 following years ?
68
(a) FUNNEL at work - forecasting SIGKDD 201468Y. Matsubara et al. Forecasting future epidemics Train: 2/3 sequences Forecast: 1/3 following years Funnel can capture future epidemics (AR: fail)
69
(a) FUNNEL at work - forecasting SIGKDD 201469Y. Matsubara et al. Forecasting future epidemics Train: 2/3 sequences Forecast: 1/3 following years Funnel can capture future epidemics (AR: fail)
70
(b) Generality of FUNNEL Epidemics on computer networks SIGKDD 201470Y. Matsubara et al. Funnel is general: it fits computer virus very well! (10 years) Spread via email attachment Spread through corporate networks
71
Roadmap SIGKDD 201471Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔ ✔ ✔ ✔ ✔ ✔
72
Conclusions FUNNEL has the following advantages Sense-making Captures all essential aspects: Fully-automatic No training set Scalable It scales linearly General Real epidemics (+ computer virus) SIGKDD 2014Y. Matsubara et al.72 ✔ ✔ ✔ ✔ P1 P2 P3 P4 P5
73
Thank you! SIGKDD 201473Y. Matsubara et al. Data : http://www.tycho.pitt.edu/ Code : http://www.cs.kumamoto-u.ac.jp/~yasuko/software.html
74
FUNNEL: Automatic Mining of Spatially Coevolving Epidemics Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Willem G. van Panhuis (University of Pittsburgh) Christos Faloutsos (CMU) SIGKDD 2014Y. Matsubara et al.74
75
Proposed model: FUNNEL-full Y. Matsubara et al.75 Details P4P5 M E vs. What’s the difference?? External shock vs. Mistake P4 P5 giardiasis SIGKDD 2014
76
Proposed model: FUNNEL-full Y. Matsubara et al.76 Details P4P5 M E vs. What’s the difference?? External shock vs. Mistake P4 P5 independent influenced SIGKDD 2014
77
Proposed model: FUNNEL-full Y. Matsubara et al.77 Details E (e 1 ) (e 2 ) (e k ) P4 E= Disease matrix Time matrix State matrix SIGKDD 2014
78
Idea (1): Model description cost Q1. How should we - find “external shocks” ? - ignore “mistakes” (i.e., typos) ? Idea (1) : Model description cost Minimize coding cost find “optimal” # of externals/mistakes “automatically” SIGKDD 2014Y. Matsubara et al.78 M E
79
Idea (1): Model description cost Idea: Minimize encoding cost! SIGKDD 2014Y. Matsubara et al.79 Good compression Good description (# of | E |+| M |) Cost M ( F ) + Cost c ( X | F ) Model cost Coding cost min ( )
80
Idea (1): Model description cost Total cost of tensor X, given F SIGKDD 2014Y. Matsubara et al.80 Details
81
Idea (1): Model description cost Total cost of tensor X, given F SIGKDD 2014Y. Matsubara et al.81 Details Model description cost of F Coding cost of X given F Dimensions of X
82
Idea (2): Multi-layer optimization Q2. How to efficiently estimate model parameters ? Idea (2): Multi-layer optimization Find “optimal” solution w.r.t. – level parameters SIGKDD 2014Y. Matsubara et al.82 X Local Global
83
Idea (2): Multi-layer optimization Find “optimal” solution w.r.t. SIGKDD 2014Y. Matsubara et al.83 P1 P2 P3 P4P5 M E FUNNEL LocalGlobal Local Global
84
Idea (2): Multi-layer optimization Find “optimal” solution w.r.t. SIGKDD 2014Y. Matsubara et al.84 P1 P2 P3 P4P5 M E FUNNEL LocalGlobal Local Global P1 P2 P4P5 P4P5P3
85
Idea (2): Multi-layer optimization Multi-layer fitting algorithm SIGKDD 2014Y. Matsubara et al.85 Global P1 P2 P4P5 Local P4P5P3 Global fitting Local fitting X X
86
Experiments We answer the following questions… Q1. Sense-making Can it help us understand the given epidemics? Q2. Accuracy How well does it match the data? Q3. Scalability How does it scale in terms of computational time? SIGKDD 201486Y. Matsubara et al.
87
Q1. Sense-making Our preliminary observations: SIGKDD 201487Y. Matsubara et al. P1 P2 P3 P4 P5 yearly periodicity disease reduction effects area specificity and sensitivity external shock events mistakes, incorrect values
88
Q1. Sense-making Disease seasonality SIGKDD 201488Y. Matsubara et al. P1 Radius: seasonality strength Angle: peak season
89
SIGKDD 201489Y. Matsubara et al. Q1. Sense-making Disease seasonality P1 Influenza in Feb. Children’s in spring Gonorrhea no periodicity Tick-borne in summer
90
SIGKDD 201490Y. Matsubara et al. Q1. Sense-making Disease reduction effect P2 Measles (vaccine licensure: 1963) 1967 1969 Linear Log
91
SIGKDD 201491Y. Matsubara et al. Q1. Sense-making area specificity and sensitivity P3 Potential population of susceptibles (measles) CA TX NY, PA FL (less children)
92
SIGKDD 201492Y. Matsubara et al. Q1. Sense-making area specificity and sensitivity P3 Measles in NY and PA Linear Log
93
SIGKDD 201493Y. Matsubara et al. Q1. Sense-making external shock events P4 We can detect external shocks “automatically” !! Linear Log
94
SIGKDD 201494Y. Matsubara et al. Q1. Sense-making mistakes, incorrect values P5 We can also detect typos “automatically” !! Linear Log Missing values Mistake
95
SIGKDD 201495Y. Matsubara et al. Q2. Accuracy Fitting accuracy for sequences (lower is better) Local Global LocalGlobal X X
96
SIGKDD 201496Y. Matsubara et al. Q3. Scalability Wall clock time vs. diseases, states, Time XXX FunnelFit is linear w.r.t. data size : O(dln)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.