FUNNEL: Automatic Mining of Spatially Coevolving Epidemics Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Willem G. van Panhuis (University of.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Yasuko Matsubara (Kyoto University),
PARAMETER ESTIMATION FOR ODES USING A CROSS-ENTROPY APPROACH Wayne Enright Bo Wang University of Toronto.
School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T.
Packet Classification using Hierarchical Intelligent Cuttings
Deepayan ChakrabartiCIKM F4: Large Scale Automated Forecasting Using Fractals -Deepayan Chakrabarti -Christos Faloutsos.
Le Song Joint work with Mladen Kolar and Eric Xing KELLER: Estimating Time Evolving Interactions Between Genes.
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
Disease Dynamics in a Dynamic Social Network Claire Christensen 1, István Albert 3, Bryan Grenfell 2, and Réka Albert 1,2 Bryan Grenfell 2, and Réka Albert.
Data mining and statistical learning - lecture 6
Modeling the Ebola Outbreak in West Africa, 2014 August 11 th Update Bryan Lewis PhD, MPH Caitlin Rivers MPH, Stephen.
On the Spread of Viruses on the Internet Noam Berger Joint work with C. Borgs, J.T. Chayes and A. Saberi.
CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU.
WindMine: Fast and Effective Mining of Web-click Sequences SDM 2011Y. Sakurai et al.1 Yasushi Sakurai (NTT) Lei Li (Carnegie Mellon Univ.) Yasuko Matsubara.
1 Tutorials plan CS Tutorials plan. 2 Week by week W1+W2: Modularity Code generation via Rose Writing use cases Drawing use-cases Relationships.
Fully Automatic Cross-Associations Deepayan Chakrabarti (CMU) Spiros Papadimitriou (CMU) Dharmendra Modha (IBM) Christos Faloutsos (CMU and IBM)
Finding the best immunization strategy Yiping Chen Collaborators: Shlomo Havlin Gerald Paul Advisor: H.Eugene Stanley.
Modelling the control of epidemics by behavioural changes in response to awareness of disease Savi Maharaj (joint work with Adam Kleczkowski) University.
A Multiresolution Symbolic Representation of Time Series
Pandemic Influenza Preparedness Kentucky Department for Public Health Department for Public Health.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University.
Kalman filtering techniques for parameter estimation Jared Barber Department of Mathematics, University of Pittsburgh Work with Ivan Yotov and Mark Tronzo.
Optimization for Operation of Power Systems with Performance Guarantee
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.
Fast and Exact Monitoring of Co-evolving Data Streams Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Naonori Ueda (NTT) Masatoshi Yoshikawa (Kyoto.
RADAR STORM MOTION ESTIMATION AND BEYOND: A SPECTRAL ALGORITHM AND RADAR OBSERVATION BASED DYNAMIC MODEL Gang Xu* and V. Chandrasekar Colorado State University.
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.
V5 Epidemics on networks
Multiuser Detection (MUD) Combined with array signal processing in current wireless communication environments Wed. 박사 3학기 구 정 회.
Fast Mining and Forecasting of Complex Time-Stamped Events Yasuko Matsubara (Kyoto University), Yasushi Sakurai (NTT), Christos Faloutsos (CMU), Tomoharu.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
AutoPlait: Automatic Mining of Co-evolving Time Sequences Yasuko Matsubara (Kumamoto University) Yasushi Sakurai (Kumamoto University) Christos Faloutsos.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
Showcase /06/2005 Towards Computational Epidemiology Using Stochastic Cellular Automata in Modeling Spread of Diseases Sangeeta Venkatachalam, Armin.
Efficient computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm Anders Eriksson and Anton van den Hengel.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Lei Li Computer Science Department Carnegie Mellon University Pre Proposal Time Series Learning completed work 11/27/2015.
Jaroslaw Kutylowski 1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Robust Undetectable Interference Watermarks Ryszard Grząślewicz.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT.
A multiline LTE inversion using PCA Marian Martínez González.
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
L – Modelling and Simulating Social Systems with MATLAB © ETH Zürich | Lesson 3 – Dynamical Systems Anders Johansson and Wenjian.
GEO Meningitis Environmental Risk Consultative Meeting, sept 2007, WHO, Geneva Project : Long-term epidemiology of Meningococcal Meningitis in the.
Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.
Epidemiological Modeling of News and Rumors on Twitter Fang Jin, Edward Dougherty, Parang Saraf, Peng Mi, Yang Cao, Naren Ramakrishnan Virginia Tech Aug.
D YNA MM O : M INING AND S UMMARIZATION OF C OEVOLVING S EQUENCES WITH M ISSING V ALUES Lei Li joint work with Christos Faloutsos, James McCann, Nancy.
Facets: Fast Comprehensive Mining of Coevolving High-order Time Series Hanghang TongPing JiYongjie CaiWei FanQing He Joint Work by Presenter:Wei Fan.
Arizona State University1 Fast Mining of a Network of Coevolving Time Series Wei FanHanghang TongPing JiYongjie Cai.
Eawag: Swiss Federal Institute of Aquatic Science and Technology Analyzing input and structural uncertainty of deterministic models with stochastic, time-dependent.
Observability-Based Approach to Optimal Design of Networks Atiye Alaeddini Spring 2016.
Forecasting with Cyber-physical Interactions in Data Centers (part 3)
Supervised Time Series Pattern Discovery through Local Importance
Non-linear Mining of Competing Local Activities
Kabanikhin S. I., Krivorotko O. I.
INVERSE BUILDING MODELING
Unrolling: A principled method to develop deep neural networks
Pre Proposal Time Series Learning completed work
Kijung Shin1 Mohammad Hammoud1
Epidemiological Modeling to Guide Efficacy Study Design Evaluating Vaccines to Prevent Emerging Diseases An Vandebosch, PhD Joint Statistical meetings,
Graph and Tensor Mining for fun and profit
An Approach to Enhance Credibility of Decadal-Century Scale Arctic
Dimension reduction : PCA and Clustering
Cost-effective Outbreak Detection in Networks
Automatic Segmentation of Data Sequences
Large Graph Mining: Power Tools and a Practitioner’s guide
A. S. Weigend M. Mangeas A. N. Srivastava
Yingze Wang and Shi-Kuo Chang University of Pittsburgh
Presentation transcript:

FUNNEL: Automatic Mining of Spatially Coevolving Epidemics Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Willem G. van Panhuis (University of Pittsburgh) Christos Faloutsos (CMU) SIGKDD 2014Y. Matsubara et al.1

Motivation Given: Large set of epidemiological data Y. Matsubara et al.2SIGKDD 2014 e.g., Measles cases in the U.S. Linear (Weekly)

Motivation Given: Large set of epidemiological data Y. Matsubara et al.3SIGKDD 2014 e.g., Measles cases in the U.S. Linear Yearly periodicity (Weekly)

Motivation Given: Large set of epidemiological data Y. Matsubara et al.4SIGKDD 2014 e.g., Measles cases in the U.S. Linear Yearly periodicity Vaccine effect (Weekly)

Motivation Given: Large set of epidemiological data Y. Matsubara et al.5SIGKDD 2014 e.g., Measles cases in the U.S. Linear Yearly periodicity Shocks, e.g., 1941 Vaccine effect (Weekly)

Motivation Given: Large set of epidemiological data Y. Matsubara et al.6SIGKDD 2014 e.g., Measles cases in the U.S. Linear Goal: summarize all the epidemic time-series, “fully-automatically”

Data description Project Tycho: infectious diseases in the U.S. Y. Matsubara et al.7SIGKDD states 56 diseases Time (weekly) 1888 (> 125 years) X

50 states 56 diseases Time (weekly) 1888 (> 125 years) Data description Project Tycho: infectious diseases in the U.S. Y. Matsubara et al.8SIGKDD 2014 Element x : # of cases e.g., ‘measles’, ‘NY’, ‘April 1-7, 1931’, ‘4000’ X x

Problem definition Given: Tensor X (disease x state x time) Y. Matsubara et al.9SIGKDD 2014 X

Problem definition Given: Tensor X (disease x state x time) Find: Compact description of X, “automatically” Y. Matsubara et al.10SIGKDD 2014 XX

Problem definition Given: Tensor X (disease x state x time) Find: Compact description of X, “automatically” Y. Matsubara et al.11SIGKDD 2014 XX Discontinuities

Problem definition Given: Tensor X (disease x state x time) Find: Compact description of X, “automatically” Y. Matsubara et al.12SIGKDD 2014 XX

Roadmap SIGKDD Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔

Modeling power of FUNNEL Y. Matsubara et al.14SIGKDD 2014 Questions about epidemics Q1 Q2 Q3 Q4 Q5 X

X Questions Y. Matsubara et al.15SIGKDD 2014 Are there any periodicities? If yes, when is the peak season? Q1 Q2 Q3 Q4 Q5 Q1

Answers Y. Matsubara et al.16SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Angle: peak season Radius: seasonality strength FUNNEL: Polar plot

Answers Y. Matsubara et al.17SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ?

Answers Y. Matsubara et al.18SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ? Q: Does Influenza have seasonality? If yes, when?

Answers Y. Matsubara et al.19SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ?

Answers Y. Matsubara et al.20SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Influenza in Feb. Detected by FUNNEL (strong seasonality) Detected!

Answers Y. Matsubara et al.21SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ?

Answers Y. Matsubara et al.22SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ? Q: How about measles ?

Answers Y. Matsubara et al.23SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ?

Answers Y. Matsubara et al.24SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Measles (children’s) in spring Detected!

Answers Y. Matsubara et al.25SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ?

Answers Y. Matsubara et al.26SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ? Q: Which disease peaks in summer?

Answers Y. Matsubara et al.27SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ?

Answers Y. Matsubara et al.28SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Detected! Lyme-disease (tick-borne) in summer

Answers Y. Matsubara et al.29SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ?

Answers Y. Matsubara et al.30SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ? Q: Which disease has no periodicity?

Answers Y. Matsubara et al.31SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Questions ?

Answers Y. Matsubara et al.32SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 SIGKDD Y. Matsubara et al. Seasonality P1 Detected! Gonorrhea (STD) no periodicity

Questions Y. Matsubara et al.33SIGKDD 2014 Can we see any discontinuities? Q1 Q2 Q3 Q4 Q5 Q2 X

Answers Y. Matsubara et al.34SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 Disease reduction effect P2 Measles 1965: Detected by FUNNEL 1963: Vaccine licensure Detected!

Questions Y. Matsubara et al.35SIGKDD 2014 Q3 What’s the difference between measles in NY and in FL? Q1 Q2 Q3 Q4 Q5

Answers Y. Matsubara et al.36SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 area sensitivity P3 FUNNEL’s guess of susceptibles ( measles ) CA TX NY, PA (more children) FL (fewer children) Detected!

X Questions Y. Matsubara et al.37SIGKDD 2014 Q4 Are there any external shock events, like wars? Q1 Q2 Q3 Q4 Q5

Answers Y. Matsubara et al.38SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 external shock events P4 Funnel can detect external shocks “fully-automatically” ! World war II Scarlet fever Detected by FUNNEL Detected!

X Questions Y. Matsubara et al.39SIGKDD 2014 Q5 How can we remove mistakes and incorrect values? Q1 Q2 Q3 Q4 Q5 TYPO!

Answers Y. Matsubara et al.40SIGKDD 2014 Q1 Q2 Q3 Q4 Q5 mistakes P5 It can also detect typos, “automatically” !! Missing values Typhoid fever cases Detected! Mistake

Modeling power of FUNNEL Our model can capture 5 properties Y. Matsubara et al.41SIGKDD 2014 P1 P2 P3 P4 P5 Seasonality Disease reductions Area sensitivity External events Mistakes

Roadmap SIGKDD Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔

Problem definition Given: Tensor X (disease x state x time) Find: Compact description of X, “automatically” Y. Matsubara et al.43SIGKDD 2014 XX

Two main ideas Idea #1: Grey-box model Idea #2: MDL for fitting SIGKDD 2014Y. Matsubara et al.44 X NO magic numbers ! (parameter-free)

Two main ideas Idea #1: Grey-box model - domain knowledge SIGKDD 2014Y. Matsubara et al.45 X (SIRS+) : 6 parameters Shocks Vaccine

Two main ideas Idea #2: Fitting with MDL -> parameter free! SIGKDD 2014Y. Matsubara et al.46 X

Roadmap SIGKDD Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔ ✔

Proposed model: FUNNEL Y. Matsubara et al.48SIGKDD 2014 states diseases Time single epidemic Multi-evolving epidemics (a) FUNNEL-single (b) FUNNEL-full X

Proposed model: FUNNEL Y. Matsubara et al.49SIGKDD 2014 states diseases Time single epidemic Multi-evolving epidemics (a) FUNNEL-single (b) FUNNEL-full X

FUNNEL – with a single epidemic Given: “single” epidemic sequence Find: nonlinear equation, model parameters Y. Matsubara et al.50SIGKDD 2014 e.g., measles in NY FUNNEL

FUNNEL – with a single epidemic With a single epidemic: Funnel-RE Y. Matsubara et al.51SIGKDD 2014 S(t) I(t) V(t) Linear Log I(t) People of 3 classes S: Susceptible I : Infected V : Vigilant/ vaccinated Details

FUNNEL – with a single epidemic With a single epidemic: Funnel-RE Y. Matsubara et al.52SIGKDD 2014 S(t) : susceptible I (t) : Infected V(t) : Vigilant /Vaccinated S V I Details

FUNNEL – with a single epidemic With a single epidemic: Funnel-RE Y. Matsubara et al.53SIGKDD 2014 S V I Details : strength of infection (yearly periodic func)

FUNNEL – with a single epidemic With a single epidemic: Funnel-RE Y. Matsubara et al.54SIGKDD 2014 S V I Details : healing rate : disease reduction effect

FUNNEL – with a single epidemic With a single epidemic: Funnel-RE Y. Matsubara et al.55SIGKDD 2014 S V I Details : temporal susceptible rate

Proposed model: FUNNEL Y. Matsubara et al.56SIGKDD 2014 states diseases Time single epidemic Multi-evolving epidemics (a) FUNNEL-single (b) FUNNEL-full X

states diseases time X (P1, P2): global/country (P3): local/state M (P4, P5): extra - E : shocks & M : mistakes E Proposed model: FUNNEL-full Y. Matsubara et al.57 P1 P2 P3 P4 P5 SIGKDD 2014

states diseases time X (P1, P2): global/country Proposed model: FUNNEL-full Y. Matsubara et al.58 P1 P2 Details Global P1 P2 Base matrix (d x 6) Disease reduction matrix (d x 2) SIGKDD 2014

states diseases time X (P3): local/state Proposed model: FUNNEL-full Y. Matsubara et al.59 P3 Details Local P3 Geo-disease matrix (d x l) : potential population of disease i in state j SIGKDD 2014

states diseases time X M (P4, P5): extra - E : shocks & M : mistakes E Proposed model: FUNNEL-full Y. Matsubara et al.60 P4 P5 Extra P4 P5 External shock tensor E Mistake tensor M Details SIGKDD 2014

Roadmap SIGKDD Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔ ✔ ✔

Challenges Q1. How to automatically - find “external shocks” ? - ignore “mistakes” (i.e., typos) ? Q2. How to efficiently estimate model parameters ? SIGKDD 2014Y. Matsubara et al.62 M E X

Challenges Q1. How to automatically - find “external shocks” ? - ignore “mistakes” (i.e., typos) ? Idea (1) : Model description cost Q2. How to efficiently estimate model parameters ? Idea (2): Multi-layer optimization (linear) SIGKDD 2014Y. Matsubara et al.63 M E X

Roadmap SIGKDD Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔ ✔ ✔ ✔

Roadmap SIGKDD Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔ ✔ ✔ ✔ ✔

Roadmap SIGKDD Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔ ✔ ✔ ✔ ✔

(a) FUNNEL at work - forecasting SIGKDD Y. Matsubara et al. Forecasting future epidemics Train: 2/3 sequences Forecast: 1/3 following years ?

(a) FUNNEL at work - forecasting SIGKDD Y. Matsubara et al. Forecasting future epidemics Train: 2/3 sequences Forecast: 1/3 following years Funnel can capture future epidemics (AR: fail)

(a) FUNNEL at work - forecasting SIGKDD Y. Matsubara et al. Forecasting future epidemics Train: 2/3 sequences Forecast: 1/3 following years Funnel can capture future epidemics (AR: fail)

(b) Generality of FUNNEL Epidemics on computer networks SIGKDD Y. Matsubara et al. Funnel is general: it fits computer virus very well! (10 years) Spread via attachment Spread through corporate networks

Roadmap SIGKDD Y. Matsubara et al. -Motivation -Modeling power of FUNNEL -Overview – main ideas -Proposed model – idea #1 -Algorithm – idea #2 -Experiments -Discussion -Conclusions ✔ ✔ ✔ ✔ ✔ ✔ ✔

Conclusions FUNNEL has the following advantages Sense-making Captures all essential aspects: Fully-automatic No training set Scalable It scales linearly General Real epidemics (+ computer virus) SIGKDD 2014Y. Matsubara et al.72 ✔ ✔ ✔ ✔ P1 P2 P3 P4 P5

Thank you! SIGKDD Y. Matsubara et al. Data : Code :

FUNNEL: Automatic Mining of Spatially Coevolving Epidemics Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Willem G. van Panhuis (University of Pittsburgh) Christos Faloutsos (CMU) SIGKDD 2014Y. Matsubara et al.74

Proposed model: FUNNEL-full Y. Matsubara et al.75 Details P4P5 M E vs. What’s the difference?? External shock vs. Mistake P4 P5 giardiasis SIGKDD 2014

Proposed model: FUNNEL-full Y. Matsubara et al.76 Details P4P5 M E vs. What’s the difference?? External shock vs. Mistake P4 P5 independent influenced SIGKDD 2014

Proposed model: FUNNEL-full Y. Matsubara et al.77 Details E (e 1 ) (e 2 ) (e k ) P4 E= Disease matrix Time matrix State matrix SIGKDD 2014

Idea (1): Model description cost Q1. How should we - find “external shocks” ? - ignore “mistakes” (i.e., typos) ? Idea (1) : Model description cost Minimize coding cost find “optimal” # of externals/mistakes “automatically” SIGKDD 2014Y. Matsubara et al.78 M E

Idea (1): Model description cost Idea: Minimize encoding cost! SIGKDD 2014Y. Matsubara et al.79 Good compression Good description (# of | E |+| M |) Cost M ( F ) + Cost c ( X | F ) Model cost Coding cost min ( )

Idea (1): Model description cost Total cost of tensor X, given F SIGKDD 2014Y. Matsubara et al.80 Details

Idea (1): Model description cost Total cost of tensor X, given F SIGKDD 2014Y. Matsubara et al.81 Details Model description cost of F Coding cost of X given F Dimensions of X

Idea (2): Multi-layer optimization Q2. How to efficiently estimate model parameters ? Idea (2): Multi-layer optimization Find “optimal” solution w.r.t. – level parameters SIGKDD 2014Y. Matsubara et al.82 X Local Global

Idea (2): Multi-layer optimization Find “optimal” solution w.r.t. SIGKDD 2014Y. Matsubara et al.83 P1 P2 P3 P4P5 M E FUNNEL LocalGlobal Local Global

Idea (2): Multi-layer optimization Find “optimal” solution w.r.t. SIGKDD 2014Y. Matsubara et al.84 P1 P2 P3 P4P5 M E FUNNEL LocalGlobal Local Global P1 P2 P4P5 P4P5P3

Idea (2): Multi-layer optimization Multi-layer fitting algorithm SIGKDD 2014Y. Matsubara et al.85 Global P1 P2 P4P5 Local P4P5P3 Global fitting Local fitting X X

Experiments We answer the following questions… Q1. Sense-making Can it help us understand the given epidemics? Q2. Accuracy How well does it match the data? Q3. Scalability How does it scale in terms of computational time? SIGKDD Y. Matsubara et al.

Q1. Sense-making Our preliminary observations: SIGKDD Y. Matsubara et al. P1 P2 P3 P4 P5 yearly periodicity disease reduction effects area specificity and sensitivity external shock events mistakes, incorrect values

Q1. Sense-making Disease seasonality SIGKDD Y. Matsubara et al. P1 Radius: seasonality strength Angle: peak season

SIGKDD Y. Matsubara et al. Q1. Sense-making Disease seasonality P1 Influenza in Feb. Children’s in spring Gonorrhea no periodicity Tick-borne in summer

SIGKDD Y. Matsubara et al. Q1. Sense-making Disease reduction effect P2 Measles (vaccine licensure: 1963) Linear Log

SIGKDD Y. Matsubara et al. Q1. Sense-making area specificity and sensitivity P3 Potential population of susceptibles (measles) CA TX NY, PA FL (less children)

SIGKDD Y. Matsubara et al. Q1. Sense-making area specificity and sensitivity P3 Measles in NY and PA Linear Log

SIGKDD Y. Matsubara et al. Q1. Sense-making external shock events P4 We can detect external shocks “automatically” !! Linear Log

SIGKDD Y. Matsubara et al. Q1. Sense-making mistakes, incorrect values P5 We can also detect typos “automatically” !! Linear Log Missing values Mistake

SIGKDD Y. Matsubara et al. Q2. Accuracy Fitting accuracy for sequences (lower is better) Local Global LocalGlobal X X

SIGKDD Y. Matsubara et al. Q3. Scalability Wall clock time vs. diseases, states, Time XXX FunnelFit is linear w.r.t. data size : O(dln)