 2004 University of Pittsburgh Bayesian Biosurveillance Using Multiple Data Streams Greg Cooper, Weng-Keen Wong, Denver Dash*, John Levander, John Dowling,

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Autonomic Scaling of Cloud Computing Resources
A Tutorial on Learning with Bayesian Networks
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
2005 Syndromic Surveillance1 Estimating the Expected Warning Time of Outbreak- Detection Algorithms Yanna Shen, Weng-Keen Wong, Gregory F. Cooper RODS.
 2005 Carnegie Mellon University A Bayesian Scan Statistic for Spatial Cluster Detection Daniel B. Neill 1 Andrew W. Moore 1 Gregory F. Cooper 2 1 Carnegie.
Garrett Cox, MPH Mark Malek, MD, MPH Sonali Kulkarni, MD, MPH Los Angeles County Jail Los Angeles County Sheriff’s Department RISK-BASED SURVEILLANCE OF.
1 A Tutorial on Bayesian Networks Weng-Keen Wong School of Electrical Engineering and Computer Science Oregon State University.
1 Slides for the book: Probabilistic Robotics Authors: Sebastian Thrun Wolfram Burgard Dieter Fox Publisher: MIT Press, Web site for the book & more.
Bayesian Biosurveillance Gregory F. Cooper Center for Biomedical Informatics University of Pittsburgh The research described in this.
Introduction to Risk Factors & Measures of Effect Meg McCarron, CDC.
Effective Skill Assessment Using Expectation Maximization in a Multi Network Temporal Bayesian Network By Zach Pardos, Advisors: Neil Heffernan, Carolina.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
 2004 University of Pittsburgh Bayesian Biosurveillance Using Multiple Data Streams Weng-Keen Wong, Greg Cooper, Denver Dash *, John Levander, John Dowling,
What’s Strange About Recent Events (WSARE) v3.0: Adjusting for a Changing Baseline Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Model N : The total number of patients in an anthrax outbreak who are seen by clinicians. DT : The time to detect the anthrax outbreak Detection : The.
Bayesian Biosurveillance Using Causal Networks Greg Cooper RODS Laboratory and the Laboratory for Causal Modeling and Discovery Center for Biomedical Informatics.
Weng-Keen Wong, Oregon State University © Bayesian Networks: A Tutorial Weng-Keen Wong School of Electrical Engineering and Computer Science Oregon.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Conclusions On our large scale anthrax attack simulations, being able to infer the work zip appears to improve detection time over just using the home.
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1.
Bayesian Network Anomaly Pattern Detection for Disease Outbreaks Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University)
BCOR 1020 Business Statistics Lecture 20 – April 3, 2008.
1 Bayesian Network Anomaly Pattern Detection for Disease Outbreaks Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University)
Methods for Real-Time Detection and Assessment of Disease Outbreaks Using Information Technology Michael Wagner, M.D., Ph.D. Director, Real-Time Outbreak.
Overview of ‘Syndromic Surveillance’ presented as background to Multiple Data Source Issue for DIMACS Working Group on Adverse Event/Disease Reporting,
1 A Tutorial on Bayesian Networks Modified by Paul Anderson from slides by Weng-Keen Wong School of Electrical Engineering and Computer Science Oregon.
Determining the Significance of Item Order In Randomized Problem Sets Zachary A. Pardos, Neil T. Heffernan Worcester Polytechnic Institute Department of.
Chapter 8 Introduction to Hypothesis Testing
WSEAS AIKED, Cambridge, Feature Importance in Bayesian Assessment of Newborn Brain Maturity from EEG Livia Jakaite, Vitaly Schetinin and Carsten.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.
Additional Data For Harmonized Use Case for Biosurveillance HINF 5430 Final Project By Maria Metty, Priyaranjan Tokachichu &Resty Namata December 13, 2007.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
What’s Strange About Recent Events (WSARE) Weng-Keen Wong (University of Pittsburgh) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University.
Digital Statisticians INST 4200 David J Stucki Spring 2015.
Successful Alerts and Responses: Real-time Monitoring of ED Chief Complaints and Investigation of Anomalies CDC Public Health Preparedness Conference February.
Harmonized Biosurveillance Use Case By Resty Namata, Maria Metty & Priyaranjan Tokachichu December 13, 2007.
Emergency Department Syndromic Surveillance (EDSS): A public health unit perspective alPHa Meeting Feb 1, 2007.
Copyright © 2006, Brigham S. Anderson FDA Project: Anomaly and Temporal Pattern Detection Brigham Anderson Robin Sabhnani Adam Goode Alice Zheng Artur.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Tractable Inference for Complex Stochastic Processes X. Boyen & D. Koller Presented by Shiau Hong Lim Partially based on slides by Boyen & Koller at UAI.
Wei Sun and KC Chang George Mason University March 2008 Convergence Study of Message Passing In Arbitrary Continuous Bayesian.
Exposure Assessment for Health Effect Studies: Insights from Air Pollution Epidemiology Lianne Sheppard University of Washington Special thanks to Sun-Young.
1 Bayesian Networks: A Tutorial. 2 Introduction Suppose you are trying to determine if a patient has tuberculosis. You observe the following symptoms:
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Bayesian Disease Outbreak Detection that Includes a Model of Unknown Diseases Yanna Shen and Gregory F. Cooper Intelligent Systems Program and Department.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Michigan Disease Surveillance System Syndromic Surveillance Project January 2005.
Bayesian Biosurveillance of Disease Outbreaks RODS Laboratory Center for Biomedical Informatics University of Pittsburgh Gregory F. Cooper, Denver H.
Weng-Keen Wong, Oregon State University © Bayesian Networks: A Tutorial Weng-Keen Wong School of Electrical Engineering and Computer Science Oregon.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
Online Conditional Outlier Detection in Nonstationary Time Series
CHAPTER 11 Inference for Distributions of Categorical Data
Bayesian Networks: A Tutorial
Bayesian Biosurveillance of Disease Outbreaks
Michael M. Wagner, MD PhD Professor, Department of Biomedical Informatics, University of Pittsburgh School of Medicine
One Health Early Warning Alert
A Short Tutorial on Causal Network Modeling and Discovery
Estimating the Expected Warning Time of Outbreak-Detection Algorithms
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Presentation transcript:

 2004 University of Pittsburgh Bayesian Biosurveillance Using Multiple Data Streams Greg Cooper, Weng-Keen Wong, Denver Dash*, John Levander, John Dowling, Bill Hogan, Mike Wagner RODS Laboratory, University of Pittsburgh * Intel Research, Santa Clara

 2004 University of Pittsburgh Outline 1.Introduction 2.Model 3.Inference 4.Conclusions

 2004 University of Pittsburgh Over-the-Counter (OTC) Data Being Collected by the National Retail Data Monitor (NRDM) 19,000 stores 50% market share nationally >70% market share in large cities

 2004 University of Pittsburgh ED Chief Complaint Data Being Collected by RODS Date / Time AdmittedAgeGenderHome ZipWork ZipChief Complaint Nov 1, : Male15213Shortness of breath Nov 1, : Female Fever :::::: Chief Complaint ED Records for Allegheny County

 2004 University of Pittsburgh Objective Using the ED and OTC data streams, detect a disease outbreak in a given region as quickly and accurately as possible

 2004 University of Pittsburgh Our Approach A detection algorithm that models each individual in the population Combines ED and OTC data streams The current prototype focuses on detecting an outdoor aerosolized release of an anthrax-like agent in Allegheny county Population-wide ANomaly Detection and Assessment (PANDA)

 2004 University of Pittsburgh PANDA Visit of Person to ED Location of Anthrax Release Anthrax Infection of Person Bayesian Network: A graphical model representing the joint probability distribution of a set of random variables Uses a causal Bayesian network Home Location of Person

 2004 University of Pittsburgh PANDA The arrows convey conditional independence relationships among the variables. They also represent causal relationships. Uses a causal Bayesian network Visit of Person to ED Location of Anthrax Release Anthrax Infection of Person Home Location of Person

 2004 University of Pittsburgh Outline 1.Introduction 2.Model 3.Inference 4.Conclusions

 2004 University of Pittsburgh A Schematic of the Generic PANDA Model for Non-Contagious Diseases Population Risk Factors Population Disease Exposure (PDE) Person Model Population-Wide Evidence Person Model

 2004 University of Pittsburgh A Special Case of the Generic Model Time of Release Person Model Anthrax Release Location of Release Person Model OTC Sales for Region Each person in the population is represented as a subnetwork in the overall model

 2004 University of Pittsburgh Location of Release Time Of Release Anthrax Infection Home Zip Respiratory from Anthrax Other ED Disease Gender Age Decile Respiratory CC From Other Respiratory CC Respiratory CC When Admitted ED Admit from Anthrax ED Admit from Other ED Acute Respiratory Infection Acute Respiratory Infection Daily OTC Purchase Last 3 Days OTC Purchase Non-ED Acute Respiratory Infection ED Admission The Person Model OTC Sales for Region

 2004 University of Pittsburgh Why Use a Population-Based Approach? 1.Representational power Spatial, temporal, demographic, and symptom knowledge of potential diseases can be coherently represented in a single model Spatial, temporal, demographic, and symptom evidence can be combined to derive a posterior probability of a disease outbreak 2.Representational flexibility New types of knowledge and evidence can be readily incorporated into the model Hypothesis: A population-based approach will achieve better detection performance than non-population- based approaches.

 2004 University of Pittsburgh Location of Release Time Of Release Anthrax Infection Home Zip Respiratory from Anthrax Other ED Disease Gender Age Decile Respiratory CC From Other Respiratory CC Respiratory CC When Admitted ED Admit from Anthrax ED Admit from Other ED Acute Respiratory Infection Acute Respiratory Infection Daily OTC Purchase Last 3 Days OTC Purchase Non-ED Acute Respiratory Infection ED Admission The Person Model OTC Sales for Region

Location of Release Time Of Release Anthrax Infection Home Zip Respiratory from Anthrax Other ED Disease Gender Age Decile Respiratory CC From Other Respiratory CC Respiratory CC When Admitted ED Admit from Anthrax ED Admit from Other ED Acute Respiratory Infection Acute Respiratory Infection Daily OTC Purchase Last 3 Days OTC Purchase Non-ED Acute Respiratory Infection ED Admission The Person Model Age Decile GenderHome Zip Respiratory Chief Comp. Date Admitted 20-30Male15213YesToday Equivalence Class Example:

 2004 University of Pittsburgh Outline 1.Introduction 2.Model 3.Inference 4.Conclusions

 2004 University of Pittsburgh Inference Time of Release Person Model Anthrax Release Location of Release Person Model Derive P (Anthrax Release = true | OTC Sales Data & ED Data) OTC Sales for Region

 2004 University of Pittsburgh Inference AR = Anthrax ReleaseED = ED Data PDE = Population Disease ExposureOTC = OTC Counts P ( OTC, ED | PDE ) = P ( OTC | ED, PDE ) P ( ED | PDE ) Contribution of ED Data Contribution of OTC Counts Key Term in Deriving P ( AR | OTC, ED ) : Details in: Cooper GF, Dash DH, Levander J, Wong W-K, Hogan W, Wagner M. Bayesian Biosurveillance of Disease Outbreaks. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2004.

 2004 University of Pittsburgh Inference AR = Anthrax ReleaseED = ED Data PDE = Population Disease ExposureOTC = OTC Counts P ( OTC, ED | PDE ) = P ( OTC | ED, PDE ) P ( ED | PDE ) The focus of the remainder of this talk Key Term in Deriving P ( AR | OTC, ED ) :

 2004 University of Pittsburgh Location of Release Time Of Release Anthrax Infection Home Zip Respiratory from Anthrax Other ED Disease Gender Age Decile Respiratory CC From Other Respiratory CC Respiratory CC When Admitted ED Admit from Anthrax ED Admit from Other ED Acute Respiratory Infection Acute Respiratory Infection Daily OTC Purchase Last 3 Days OTC Purchase Non-ED Acute Respiratory Infection ED Admission The Person Model OTC Sales for Region

 2004 University of Pittsburgh Incorporating the Counts of OTC Purchases Eq Class1 Zip1 OTC count Zip1 OTC count Eq Classs2 Zip1 OTC count Person1 Zip1 OTC count Person2 Zip1 OTC count Person3 Zip1 OTC count Person4 Zip1 OTC count Approximate binomial distribution with a normal distribution

 2004 University of Pittsburgh The PANDA OTC Model P (OTC sales = X | ED, PDE ) Recall that: P ( OTC, ED | PDE ) = P ( OTC | ED, PDE ) P ( ED | PDE )

 2004 University of Pittsburgh Example Age Decile GenderHome Zip Respiratory Chief Comp. Date Admitted 50-60Male15213YesToday Equivalence Class 1 ~ Normal(100,100)

 2004 University of Pittsburgh Example Age Decile GenderHome Zip Respiratory Chief Comp. Date Admitted 50-60Male15213YesToday Equivalence Class 1 ~ Normal(100,100) Age Decile GenderHome Zip Respiratory Chief Comp. Date Admitted 50-60Female15213YesToday Equivalence Class 2 ~ Normal(150,225)

 2004 University of Pittsburgh Example Age Decile GenderHome Zip Respiratory Chief Comp. Date Admitted 50-60Male15213YesToday Equivalence Class 1 ~ Normal(100,100) Age Decile GenderHome Zip Respiratory Chief Comp. Date Admitted 50-60Female15213YesToday Equivalence Class 2 ~ Normal(150,225) If these were the only 2 Equivalence Classes in the County then County Cough & Cold OTC ~ Normal( , )

 2004 University of Pittsburgh Example Now suppose 260 units are sold in the county P( OTC Sales = 260 | ED Data, PDE ) = Normal( 260; 250, 325 ) =

 2004 University of Pittsburgh Inference Timing Machine: P4 3 Gigahertz, 2 GB RAM Initialization Time (seconds) Each hour of data (seconds) ED model555 ED and OTC model 2295

 2004 University of Pittsburgh A Current Limitation Problem: Currently we assume unrealistically that a person only makes OTC purchases in his or her home zip code Approach 1: Aggregate OTC-counts (e.g., at the county level) Approach 2: For each home zip code, model the distribution of zip codes where OTC purchases are made

 2004 University of Pittsburgh Outline 1.Introduction 2.Model 3.Inference 4.Conclusions

 2004 University of Pittsburgh Challenges in Population-Wide Modeling Include … Obtaining good parameter estimates to use in modeling (e.g., the probability of an OTC cough medication purchase given an acute respiratory illness) Modeling time and space in a way that is both useful and computationally tractable Modeling contagious diseases

 2004 University of Pittsburgh Conclusions PANDA is a multivariate algorithm that can combine multiple data streams Modeling each individual in the population is computationally feasible (so far) An evaluation of the PANDA approach to modeling multiple data streams is in progress using semi-synthetic test data

 2004 University of Pittsburgh Thank you Current funding: National Science Foundation Department of Homeland Security Earlier funding: DARPA

 2004 University of Pittsburgh

The PANDA OTC Model Model the OTC purchases for each Equivalence Class E i as a binomial Distribution. E i ~ Binomial(N E i,P E i )

 2004 University of Pittsburgh The PANDA OTC Model Model the OTC purchases for each Equivalence Class E i as a binomial Distribution. E i ~ Binomial(N E i,P E i ) Number of people in Equivalence Class E i Probability of an OTC cough medication purchase during the previous 3 days by each person in Equivalence Class E i

 2004 University of Pittsburgh The PANDA OTC Model Model the OTC purchases for each Equivalence Class E i as a binomial Distribution. Approximate the binomial distribution as a normal distribution. E i ~ Binominal(N E i,P E i )  Normal(  E i,  2 E i )

 2004 University of Pittsburgh The PANDA OTC Model Model the OTC purchases for each Equivalence Class E i as a binomial Distribution. Approximate the binomial distribution as a normal distribution. E i ~ Binominal(N E i,P E i )  Normal(  E i,  2 E i )  E i = N E i × P E i  2 E i = N E i × P E i × (1 - P E i )

 2004 University of Pittsburgh Computational Cost of a Population-Wide Approach? ~1.4 million people in Allegheny County, Pennsylvania

 2004 University of Pittsburgh Equivalence Classes The ~1.4M people in the modeled population can be partitioned into approximately 24,240 equivalence classes