FUNNEL: Automatic Mining of Spatially Coevolving Epidemics Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Willem G. van Panhuis (University of Pittsburgh) Christos Faloutsos (CMU) SIGKDD 2014Y. Matsubara et al.1
Motivation Given: Large set of epidemiological data Y. Matsubara et al.2SIGKDD 2014 e.g., Measles cases in the U.S. Linear (Weekly)
Motivation Given: Large set of epidemiological data Y. Matsubara et al.3SIGKDD 2014 e.g., Measles cases in the U.S. Linear Yearly periodicity (Weekly)
Motivation Given: Large set of epidemiological data Y. Matsubara et al.4SIGKDD 2014 e.g., Measles cases in the U.S. Linear Yearly periodicity Vaccine effect (Weekly)
Motivation Given: Large set of epidemiological data Y. Matsubara et al.5SIGKDD 2014 e.g., Measles cases in the U.S. Linear Yearly periodicity Shocks, e.g., 1941 Vaccine effect (Weekly)
Motivation Given: Large set of epidemiological data Y. Matsubara et al.6SIGKDD 2014 e.g., Measles cases in the U.S. Linear Goal: summarize all the epidemic time-series, “fully-automatically”
Data description Project Tycho: infectious diseases in the U.S. Y. Matsubara et al.7SIGKDD states 56 diseases Time (weekly) 1888 (> 125 years) X
50 states 56 diseases Time (weekly) 1888 (> 125 years) Data description Project Tycho: infectious diseases in the U.S. Y. Matsubara et al.8SIGKDD 2014 Element x : # of cases e.g., ‘measles’, ‘NY’, ‘April 1-7, 1931’, ‘4000’ X x
Problem definition Given: Tensor X (disease x state x time) Y. Matsubara et al.9SIGKDD 2014 X
Problem definition Given: Tensor X (disease x state x time) Find: Compact description of X, “automatically” Y. Matsubara et al.10SIGKDD 2014 XX
Problem definition Given: Tensor X (disease x state x time) Find: Compact description of X, “automatically” Y. Matsubara et al.11SIGKDD 2014 XX Discontinuities
Problem definition Given: Tensor X (disease x state x time) Find: Compact description of X, “automatically” Y. Matsubara et al.12SIGKDD 2014 XX
Roadmap SIGKDD Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔
Modeling power of FUNNEL Y. Matsubara et al.14SIGKDD 2014 Questions about epidemics Q1 Q2 Q3 Q4 Q5 X
X Questions Y. Matsubara et al.15SIGKDD 2014 Are there any periodicities? If yes, when is the peak season? Q1 Q2 Q3 Q4 Q5 Q1
Answers Y. Matsubara et al.16SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Angle: peak season Radius: seasonality strength FUNNEL: Polar plot
Answers Y. Matsubara et al.17SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ?
Answers Y. Matsubara et al.18SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ? Q: Does Influenza have seasonality? If yes, when?
Answers Y. Matsubara et al.19SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ?
Answers Y. Matsubara et al.20SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Influenza in Feb. Detected by FUNNEL (strong seasonality) Detected!
Answers Y. Matsubara et al.21SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ?
Answers Y. Matsubara et al.22SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ? Q: How about measles ?
Answers Y. Matsubara et al.23SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ?
Answers Y. Matsubara et al.24SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Measles (children’s) in spring Detected!
Answers Y. Matsubara et al.25SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ?
Answers Y. Matsubara et al.26SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ? Q: Which disease peaks in summer?
Answers Y. Matsubara et al.27SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ?
Answers Y. Matsubara et al.28SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Detected! Lyme-disease (tick-borne) in summer
Answers Y. Matsubara et al.29SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ?
Answers Y. Matsubara et al.30SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ? Q: Which disease has no periodicity?
Answers Y. Matsubara et al.31SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ?
Answers Y. Matsubara et al.32SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Detected! Gonorrhea (STD) no periodicity
Questions Y. Matsubara et al.33SIGKDD 2014 Can we see any discontinuities? Q1 Q2 Q3 Q4 Q5 Q2 X
Answers Y. Matsubara et al.34SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 Disease reduction effect P2 Measles 1965: Detected by FUNNEL 1963: Vaccine licensure Detected!
Questions Y. Matsubara et al.35SIGKDD 2014 Q3 What’s the difference between measles in NY and in FL? Q1 Q2 Q3 Q4 Q5
Answers Y. Matsubara et al.36SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 area sensitivity P3 FUNNEL’s guess of susceptibles ( measles ) CA TX NY, PA (more children) FL (fewer children) Detected!
X Questions Y. Matsubara et al.37SIGKDD 2014 Q4 Are there any external shock events, like wars? Q1 Q2 Q3 Q4 Q5
Answers Y. Matsubara et al.38SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 external shock events P4 Funnel can detect external shocks “fully-automatically” ! World war II Scarlet fever Detected by FUNNEL Detected!
X Questions Y. Matsubara et al.39SIGKDD 2014 Q5 How can we remove mistakes and incorrect values? Q1 Q2 Q3 Q4 Q5 TYPO!
Answers Y. Matsubara et al.40SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 mistakes P5 It can also detect typos, “automatically” !! Missing values Typhoid fever cases Detected! Mistake
Modeling power of FUNNEL Our model can capture 5 properties Y. Matsubara et al.41SIGKDD 2014 P1 P2 P3 P4 P5 Seasonality Disease reductions Area sensitivity External events Mistakes
Roadmap SIGKDD Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔
Problem definition Given: Tensor X (disease x state x time) Find: Compact description of X, “automatically” Y. Matsubara et al.43SIGKDD 2014 XX
Two main ideas Idea #1: Grey-box model Idea #2: MDL for fitting SIGKDD 2014Y. Matsubara et al.44 X NO magic numbers ! (parameter-free)
Two main ideas Idea #1: Grey-box model - domain knowledge SIGKDD 2014Y. Matsubara et al.45 X (SIRS+) : 6 parameters Shocks Vaccine
Two main ideas Idea #2: Fitting with MDL -> parameter free! SIGKDD 2014Y. Matsubara et al.46 X
Roadmap SIGKDD Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔ ✔
Proposed model: FUNNEL Y. Matsubara et al.48SIGKDD 2014 states diseases Time single epidemic Multi-evolving epidemics (a) FUNNEL-single (b) FUNNEL-full X
Proposed model: FUNNEL Y. Matsubara et al.49SIGKDD 2014 states diseases Time single epidemic Multi-evolving epidemics (a) FUNNEL-single (b) FUNNEL-full X
FUNNEL – with a single epidemic Given: “single” epidemic sequence Find: nonlinear equation, model parameters Y. Matsubara et al.50SIGKDD 2014 e.g., measles in NY FUNNEL
FUNNEL – with a single epidemic With a single epidemic: Funnel-RE Y. Matsubara et al.51SIGKDD 2014 S(t) I(t) V(t) Linear Log I(t) People of 3 classes S: Susceptible I : Infected V : Vigilant/ vaccinated Details
FUNNEL – with a single epidemic With a single epidemic: Funnel-RE Y. Matsubara et al.52SIGKDD 2014 S(t) : susceptible I (t) : Infected V(t) : Vigilant /Vaccinated S V I Details
FUNNEL – with a single epidemic With a single epidemic: Funnel-RE Y. Matsubara et al.53SIGKDD 2014 S V I Details : strength of infection (yearly periodic func)
FUNNEL – with a single epidemic With a single epidemic: Funnel-RE Y. Matsubara et al.54SIGKDD 2014 S V I Details : healing rate : disease reduction effect
FUNNEL – with a single epidemic With a single epidemic: Funnel-RE Y. Matsubara et al.55SIGKDD 2014 S V I Details : temporal susceptible rate
Proposed model: FUNNEL Y. Matsubara et al.56SIGKDD 2014 states diseases Time single epidemic Multi-evolving epidemics (a) FUNNEL-single (b) FUNNEL-full X
states diseases time X (P1, P2): global/country (P3): local/state M (P4, P5): extra - E : shocks & M : mistakes E Proposed model: FUNNEL-full Y. Matsubara et al.57 P1 P2 P3 P4 P5 SIGKDD 2014
states diseases time X (P1, P2): global/country Proposed model: FUNNEL-full Y. Matsubara et al.58 P1 P2 Details Global P1 P2 Base matrix (d x 6) Disease reduction matrix (d x 2) SIGKDD 2014
states diseases time X (P3): local/state Proposed model: FUNNEL-full Y. Matsubara et al.59 P3 Details Local P3 Geo-disease matrix (d x l) : potential population of disease i in state j SIGKDD 2014
states diseases time X M (P4, P5): extra - E : shocks & M : mistakes E Proposed model: FUNNEL-full Y. Matsubara et al.60 P4 P5 Extra P4 P5 External shock tensor E Mistake tensor M Details SIGKDD 2014
Roadmap SIGKDD Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔ ✔ ✔
Challenges Q1. How to automatically - find “external shocks” ? - ignore “mistakes” (i.e., typos) ? Q2. How to efficiently estimate model parameters ? SIGKDD 2014Y. Matsubara et al.62 M E X
Challenges Q1. How to automatically - find “external shocks” ? - ignore “mistakes” (i.e., typos) ? Idea (1) : Model description cost Q2. How to efficiently estimate model parameters ? Idea (2): Multi-layer optimization (linear) SIGKDD 2014Y. Matsubara et al.63 M E X
Roadmap SIGKDD Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔ ✔ ✔ ✔
Roadmap SIGKDD Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔ ✔ ✔ ✔ ✔
Roadmap SIGKDD Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔ ✔ ✔ ✔ ✔
(a) FUNNEL at work - forecasting SIGKDD Y. Matsubara et al. Forecasting future epidemics Train: 2/3 sequences Forecast: 1/3 following years ?
(a) FUNNEL at work - forecasting SIGKDD Y. Matsubara et al. Forecasting future epidemics Train: 2/3 sequences Forecast: 1/3 following years Funnel can capture future epidemics (AR: fail)
(a) FUNNEL at work - forecasting SIGKDD Y. Matsubara et al. Forecasting future epidemics Train: 2/3 sequences Forecast: 1/3 following years Funnel can capture future epidemics (AR: fail)
(b) Generality of FUNNEL Epidemics on computer networks SIGKDD Y. Matsubara et al. Funnel is general: it fits computer virus very well! (10 years) Spread via attachment Spread through corporate networks
Roadmap SIGKDD Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔ ✔ ✔ ✔ ✔ ✔
Conclusions FUNNEL has the following advantages Sense-making Captures all essential aspects: Fully-automatic No training set Scalable It scales linearly General Real epidemics (+ computer virus) SIGKDD 2014Y. Matsubara et al.72 ✔ ✔ ✔ ✔ P1 P2 P3 P4 P5
Thank you! SIGKDD Y. Matsubara et al. Data : Code :
FUNNEL: Automatic Mining of Spatially Coevolving Epidemics Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Willem G. van Panhuis (University of Pittsburgh) Christos Faloutsos (CMU) SIGKDD 2014Y. Matsubara et al.74
Proposed model: FUNNEL-full Y. Matsubara et al.75 Details P4P5 M E vs. What’s the difference?? External shock vs. Mistake P4 P5 giardiasis SIGKDD 2014
Proposed model: FUNNEL-full Y. Matsubara et al.76 Details P4P5 M E vs. What’s the difference?? External shock vs. Mistake P4 P5 independent influenced SIGKDD 2014
Proposed model: FUNNEL-full Y. Matsubara et al.77 Details E (e 1 ) (e 2 ) (e k ) P4 E= Disease matrix Time matrix State matrix SIGKDD 2014
Idea (1): Model description cost Q1. How should we - find “external shocks” ? - ignore “mistakes” (i.e., typos) ? Idea (1) : Model description cost Minimize coding cost find “optimal” # of externals/mistakes “automatically” SIGKDD 2014Y. Matsubara et al.78 M E
Idea (1): Model description cost Idea: Minimize encoding cost! SIGKDD 2014Y. Matsubara et al.79 Good compression Good description (# of | E |+| M |) Cost M ( F ) + Cost c ( X | F ) Model cost Coding cost min ( )
Idea (1): Model description cost Total cost of tensor X, given F SIGKDD 2014Y. Matsubara et al.80 Details
Idea (1): Model description cost Total cost of tensor X, given F SIGKDD 2014Y. Matsubara et al.81 Details Model description cost of F Coding cost of X given F Dimensions of X
Idea (2): Multi-layer optimization Q2. How to efficiently estimate model parameters ? Idea (2): Multi-layer optimization Find “optimal” solution w.r.t. – level parameters SIGKDD 2014Y. Matsubara et al.82 X Local Global
Idea (2): Multi-layer optimization Find “optimal” solution w.r.t. SIGKDD 2014Y. Matsubara et al.83 P1 P2 P3 P4P5 M E FUNNEL LocalGlobal Local Global
Idea (2): Multi-layer optimization Find “optimal” solution w.r.t. SIGKDD 2014Y. Matsubara et al.84 P1 P2 P3 P4P5 M E FUNNEL LocalGlobal Local Global P1 P2 P4P5 P4P5P3
Idea (2): Multi-layer optimization Multi-layer fitting algorithm SIGKDD 2014Y. Matsubara et al.85 Global P1 P2 P4P5 Local P4P5P3 Global fitting Local fitting X X
Experiments We answer the following questions… Q1. Sense-making Can it help us understand the given epidemics? Q2. Accuracy How well does it match the data? Q3. Scalability How does it scale in terms of computational time? SIGKDD Y. Matsubara et al.
Q1. Sense-making Our preliminary observations: SIGKDD Y. Matsubara et al. P1 P2 P3 P4 P5 yearly periodicity disease reduction effects area specificity and sensitivity external shock events mistakes, incorrect values
Q1. Sense-making Disease seasonality SIGKDD Y. Matsubara et al. P1 Radius: seasonality strength Angle: peak season
SIGKDD Y. Matsubara et al. Q1. Sense-making Disease seasonality P1 Influenza in Feb. Children’s in spring Gonorrhea no periodicity Tick-borne in summer
SIGKDD Y. Matsubara et al. Q1. Sense-making Disease reduction effect P2 Measles (vaccine licensure: 1963) Linear Log
SIGKDD Y. Matsubara et al. Q1. Sense-making area specificity and sensitivity P3 Potential population of susceptibles (measles) CA TX NY, PA FL (less children)
SIGKDD Y. Matsubara et al. Q1. Sense-making area specificity and sensitivity P3 Measles in NY and PA Linear Log
SIGKDD Y. Matsubara et al. Q1. Sense-making external shock events P4 We can detect external shocks “automatically” !! Linear Log
SIGKDD Y. Matsubara et al. Q1. Sense-making mistakes, incorrect values P5 We can also detect typos “automatically” !! Linear Log Missing values Mistake
SIGKDD Y. Matsubara et al. Q2. Accuracy Fitting accuracy for sequences (lower is better) Local Global LocalGlobal X X
SIGKDD Y. Matsubara et al. Q3. Scalability Wall clock time vs. diseases, states, Time XXX FunnelFit is linear w.r.t. data size : O(dln)