What’s Strange About Recent Events (WSARE) Weng-Keen Wong (University of Pittsburgh) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University.

Slides:

Advertisements

Similar presentations

Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University.

Advertisements

Time Series and Forecasting

2005 Syndromic Surveillance1 Estimating the Expected Warning Time of Outbreak- Detection Algorithms Yanna Shen, Weng-Keen Wong, Gregory F. Cooper RODS.

 2005 Carnegie Mellon University A Bayesian Scan Statistic for Spatial Cluster Detection Daniel B. Neill 1 Andrew W. Moore 1 Gregory F. Cooper 2 1 Carnegie.

Optimizing Disease Outbreak Detection Methods Using Reinforcement Learning Masoumeh Izadi Clinical & Health Informatics Research Group Faculty of Medicine,

Bayesian Biosurveillance Gregory F. Cooper Center for Biomedical Informatics University of Pittsburgh The research described in this.

1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.

An introduction to time series approaches in biosurveillance Professor The Auton Lab School of Computer Science Carnegie Mellon University

 2004 University of Pittsburgh Bayesian Biosurveillance Using Multiple Data Streams Weng-Keen Wong, Greg Cooper, Denver Dash *, John Levander, John Dowling,

What’s Strange About Recent Events (WSARE) v3.0: Adjusting for a Changing Baseline Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon.

 2004 University of Pittsburgh Bayesian Biosurveillance Using Multiple Data Streams Greg Cooper, Weng-Keen Wong, Denver Dash*, John Levander, John Dowling,

Software Quality Control Methods. Introduction Quality control methods have received a world wide surge of interest within the past couple of decades.

1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.

The Space-Time Scan Statistic for Multiple Data Streams

Summarization and Deviation Detection -- What is new?

Weng-Keen Wong, Oregon State University © Bayesian Networks: A Tutorial Weng-Keen Wong School of Electrical Engineering and Computer Science Oregon.

Conclusions On our large scale anthrax attack simulations, being able to infer the work zip appears to improve detection time over just using the home.

Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.

Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1.

Chapter 10 Hypothesis Testing

Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.

Bayesian Network Anomaly Pattern Detection for Disease Outbreaks Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University)

1 Bayesian Network Anomaly Pattern Detection for Disease Outbreaks Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University)

Today Concepts underlying inferential statistics

1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.

EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()

This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.

Chapter 10 Hypothesis Testing

Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,

Lucio Baggio - Lucio Baggio - False discovery rate: setting the probability of false claim of detection 1 False discovery rate: setting the probability.

Fundamentals of Hypothesis Testing: One-Sample Tests

Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.

A Wavelet-based Anomaly Detector for Disease Outbreaks Thomas Lotze Galit Shmueli University of Maryland College Park Sean Murphy Howard Burkom Johns Hopkins.

Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Digital Statisticians INST 4200 David J Stucki Spring 2015.

Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

Statistical Inference

Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.

Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.

Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.

Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.

1 Auton Lab Walkerton Analysis. Proprietary Information. Early Analysis of Walkerton Data Version 11, June 18 th, 2005 Auton Lab:

Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.

Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

Copyright © 2010 Pearson Education, Inc. Slide

Section 3.3: The Story of Statistical Inference Section 4.1: Testing Where a Proportion Is.

Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.

N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.

Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.

IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.

Using Propensity Score Matching in Observational Services Research Neal Wallace, Ph.D. Portland State University February

Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.

Rerandomization to Improve Covariate Balance in Randomized Experiments Kari Lock Harvard Statistics Advisor: Don Rubin 4/28/11.

Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.

Bayesian Biosurveillance of Disease Outbreaks RODS Laboratory Center for Biomedical Informatics University of Pittsburgh Gregory F. Cooper, Denver H.

Weng-Keen Wong, Oregon State University © Bayesian Networks: A Tutorial Weng-Keen Wong School of Electrical Engineering and Computer Science Oregon.

Methods of Presenting and Interpreting Information Class 9.

CHAPTER 15: THE NUTS AND BOLTS OF USING STATISTICS.

Quality of Electronic Emergency Department Data: How Good Are They?

Online Conditional Outlier Detection in Nonstationary Time Series

Genome Wide Association Studies using SNP

Bayesian Biosurveillance of Disease Outbreaks

Estimating the Expected Warning Time of Outbreak-Detection Algorithms

What’s Strange About Recent Events (WSARE)

Improving Overlap Farrokh Alemi, Ph.D.

Chapter 9 Hypothesis Testing: Single Population

Evaluation David Kauchak CS 158 – Fall 2019.

Presentation transcript:

What’s Strange About Recent Events (WSARE) Weng-Keen Wong (University of Pittsburgh) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University of Pittsburgh) Michael Wagner (University of Pittsburgh) This work funded by DARPA, the State of Pennsylvania, and NSF

Motivation Primary Key DateTimeHospitalICD9ProdromeGenderAgeHome Location Work Location Many more… 1006/1/039:121781FeverM20sNE?… 1016/1/0310:451787DiarrheaF40sNE … 1026/1/0311:031786RespiratoryF60sNEN… 1036/1/0311:072787DiarrheaM60sE?… 1046/1/0312:151717RespiratoryM60sENE… 1056/1/0313:013780ViralF50s?NW… 1066/1/0313:053487RespiratoryF40sSW … 1076/1/0313:572786UnmappedM50sSESW… 1086/1/0314:221780ViralM40s??… : : : : : : : : : : : Suppose we have real-time access to Emergency Department data from hospitals around a city (with patient confidentiality preserved)

The Problem From this data, can we detect if a disease outbreak is happening?

The Problem From this data, can we detect if a disease outbreak is happening? We’re talking about a non- specific disease detection

The Problem From this data, can we detect if a disease outbreak is happening? How early can we detect it?

The Problem From this data, can we detect if a disease outbreak is happening? How early can we detect it? The question we’re really asking: What’s strange about recent events?

Traditional Approaches What about using traditional anomaly detection? Typically assume data is generated by a model Finds individual data points that have low probability with respect to this model These outliers have rare attributes or combinations of attributes Need to identify anomalous patterns not isolated data points

Traditional Approaches –Time series algorithms –Regression techniques –Statistical Quality Control methods Need to know apriori which attributes to form daily aggregates for! What about monitoring aggregate daily counts of certain attributes? We’ve now turned multivariate data into univariate data Lots of algorithms have been developed for monitoring univariate data:

Traditional Approaches What if we don’t know what attributes to monitor? What if we want to exploit the spatial, temporal and/or demographic characteristics of the epidemic to detect the outbreak as early as possible?

Traditional Approaches We need to build a univariate detector to monitor each interesting combination of attributes: Diarrhea cases among children Respiratory syndrome cases among females Viral syndrome cases involving senior citizens from eastern part of city Number of children from downtown hospital Number of cases involving people working in southern part of the city Number of cases involving teenage girls living in the western part of the city Botulinic syndrome cases And so on…

Traditional Approaches We need to build a univariate detector to monitor each interesting combination of attributes: Diarrhea cases among children Respiratory syndrome cases among females Viral syndrome cases involving senior citizens from eastern part of city Number of children from downtown hospital Number of cases involving people working in southern part of the city Number of cases involving teenage girls living in the western part of the city Botulinic syndrome cases And so on… You’ll need hundreds of univariate detectors! We would like to identify the groups with the strangest behavior in recent events.

One Possible Approach Primary Key DateTimeGenderAgeHospitalMany more… 1008/24/039:12M20s1… 1018/24/0310:45F40s1… ::::::: 22438/17/0311:07M60s2… 22448/17/0312:15M60s1… ::::::: /24/0213:05F40s3… /24/0213:57M50s2… : : : : : : : Today’s Records Yesterday’s Records Last Year’s Records

One Possible Approach Primary Key DateTimeGenderAgeHospitalMany more… 1008/24/039:12M20s1… 1018/24/0310:45F40s1… ::::::: 22438/17/0311:07M60s2… 22448/17/0312:15M60s1… ::::::: /24/0213:05F40s3… /24/0213:57M50s2… : : : : : : : Today’s Records Yesterday’s Records Last Year’s Records Idea: Can use association rules to find patterns in today’s records that weren’t there in past data

One Possible Approach Primary Key DateTimeGenderAge… 1008/24/039:12MChild… 1018/24/0310:45MSenior… : : : : : : Primary Key DateTimeGenderAge… 21648/17/0313:05FSenior… 21658/17/0313:57FSenior… : : : : : : Recent records ( from today ) Baseline records ( from 7 days ago ) Primary Key DateTime…Source 1008/24/039:12…Recent 1018/24/0310:45…Recent : :::: 21648/17/0313:05…Baseline 21658/17/0313:57…Baseline ::: : : Find which rules predict unusually high proportions in recent records when compared to the baseline eg. 52/200 records from “recent” have Gender = Male AND Age = Senior 90/180 records from “baseline” have Gender = Male AND Age = Senior

Which rules do we report? Search over all rules up to a maximum number of components For each rule, form a 2x2 contingency table eg. Perform Fisher’s Exact Test to get a p-value for each rule (call this the score) Report the rule with the lowest score Count Recent Count Baseline Home Location = NW4845 Home Location  NW 86220

Problems with the Approach 1.Multiple Hypothesis Testing 2. A Changing Baseline

Problem #1: Multiple Hypothesis Testing Can’t interpret the rule scores as p-values Suppose we reject null hypothesis when score < , where  = 0.05 For a single hypothesis test, the probability of making a false discovery =  Suppose we do 1000 tests, one for each possible rule Probability(false discovery) could be as bad as: 1 – ( 1 – 0.05) 1000 >> 0.05

Randomization Test Take the recent cases and the baseline cases. Shuffle the date field to produce a randomized dataset called DB Rand Find the rule with the best score on DB Rand. Aug 16, 2003C2 Aug 17, 2003C3 Aug 17, 2003C4 Aug 17, 2003C5 Aug 17, 2003C6 Aug 17, 2003C7 Aug 21, 2003C8 Aug 21, 2003C9 Aug 22, 2003C10 Aug 22, 2003C11 Aug 23, 2003C12 Aug 23, 2003C13 Aug 24, 2003C14 Aug 24, 2003C15 Aug 16, 2003C2 Aug 17, 2003C3 Aug 24, 2003C4 Aug 17, 2003C5 Aug 24, 2003C6 Aug 17, 2003C7 Aug 21, 2003C8 Aug 21, 2003C9 Aug 22, 2003C10 Aug 22, 2003C11 Aug 23, 2003C12 Aug 23, 2003C13 Aug 17, 2003C14 Aug 17, 2003C15

Randomization Test Repeat the procedure on the previous slide for 1000 iterations. Determine how many scores from the 1000 iterations are better than the original score. If the original score were here, it would place in the top 1% of the 1000 scores from the randomization test. We would be impressed and an alert should be raised. Corrected p-value of the rule is: # better scores / # iterations

Reporting Multiple Rules on each Day But reporting only the best scoring rule can hide other more interesting anomalous patterns! For example: 1.The best scoring rule is statistically significant but not a public health concern 2.The top 5 scoring rules indicate anomalous patterns in 5 neighboring zip codes but individually their p-values do not cause an alarm to be raised

Our Solution: FDR False Discovery Rate [Benjamini and Hochberg] Can determine which of these p-values are significant Specifically, given an α FDR, FDR guarantees that Given an α FDR, FDR produces a threshold below which any p-values in the history are considered significant

Our Solution: FDR Once we have the set of all possible rules and their scores, use FDR to determine which ones are significant

Problem #2: A Changing Baseline From: Goldenberg, A., Shmueli, G., Caruana, R. A., and Fienberg, S. E. (2002). Early statistical detection of anthrax outbreaks by tracking over-the-counter medication sales. Proceedings of the National Academy of Sciences (pp )

Problem #2: A Changing Baseline Baseline is affected by temporal trends in health care data: –Seasonal effects in temperature and weather –Day of Week effects –Holidays –Etc. Choosing the wrong baseline distribution can affect the detection time and false positives rate

Generating the Baseline… “Taking into account that today is a public holiday…” “Taking into account that this is Spring…” “Taking into account recent heatwave…” “Taking into account recent flu levels…” “Taking into account that there’s a known natural Food- borne outbreak in progress…”

Generating the Baseline… “Taking into account that today is a public holiday…” “Taking into account that this is Spring…” “Taking into account recent heatwave…” “Taking into account recent flu levels…” “Taking into account that there’s a known natural Food- borne outbreak in progress…” Use a Bayes net to model the joint probability distribution of the attributes

Obtaining Baseline Data Baseline All Historical Data Today’s Environment 1.Learn Bayesian Network using Optimal Reinsertion [Moore and Wong 2003] 2. Generate baseline given today’s environment

Environmental Attributes Divide the data into two types of attributes: Environmental attributes: attributes that cause trends in the data eg. day of week, season, weather, flu levels Response attributes: all other non- environmental attributes

Environmental Attributes When learning the Bayesian network structure, do not allow environmental attributes to have parents. Why? We are not interested in predicting their distributions Instead, we use them to predict the distributions of the response attributes Side Benefit: We can speed up the structure search by avoiding DAGs that assign parents to the environmental attributes SeasonDay of WeekWeatherFlu Level

Generate Baseline Given Today’s Environment SeasonDay of WeekWeatherFlu Level TodayWinterMondaySnowHigh Season = Winter Day of Week = Monday Weather = Snow Flu Level = High Suppose we know the following for today: We fill in these values for the environmental attributes in the learned Bayesian network Baseline We sample records from the Bayesian network and make this data set the baseline

Generate Baseline Given Today’s Environment SeasonDay of WeekWeatherFlu Level TodayWinterMondaySnowHigh Season = Winter Day of Week = Monday Flu Level = High Suppose we know the following for today: We fill in these values for the environmental attributes in the learned Bayesian network Baseline We sample records from the Bayesian network and make this data set the baseline Sampling is easy because environmental attributes are at the top of the Bayes Net Weather = Snow

Generate Baseline Given Today’s Environment SeasonDay of WeekWeatherFlu Level TodayWinterMondaySnowHigh Season = Winter Day of Week = Monday Flu Level = High Suppose we know the following for today: We fill in these values for the environmental attributes in the learned Bayesian network Baseline We sample records from the Bayesian network and make this data set the baseline An alternate possible technique is to use inference Weather = Snow

What’s Strange About Recent Events (WSARE) Search for rule with best score 3.Determine p-value of best scoring rule All Data 4.If p-value is less than threshold, signal alert Recent Data Baseline 1.Obtain Recent and Baseline datasets

Simulator

Simulation 100 different data sets Each data set consisted of a two year period Anthrax release occurred at a random point during the second year Algorithms allowed to train on data from the current day back to the first day in the simulation Any alerts before actual anthrax release are considered a false positive Detection time calculated as first alert after anthrax release. If no alerts raised, cap detection time at 14 days

Other Algorithms used in Simulation 1.Control Chart: Mean + multiplier * standard deviation 2.Moving Average: 7 day window 3.ANOVA Regression: Linear regression with extra covariates for season, day of week, count from yesterday 4.WSARE 2.0: Create baseline using raw historical data 5.WSARE 2.5: Use raw historical data that matches environmental attributes

Results on Simulation

Results on Actual ED Data from Sat : SCORE = PVALUE = % ( 74/500) of today's cases have Viral Syndrome = True and Encephalitic Prodome = False 7.42% (742/10000) of baseline have Viral Syndrome = True and Encephalitic Syndrome = False 2. Sat : SCORE = PVALUE = % ( 58/467) of today's cases have Respiratory Syndrome = True 6.53% (653/10000) of baseline have Respiratory Syndrome = True 3. Wed : SCORE = PVALUE = % ( 9/625) of today's cases have 100 <= Age < % ( 8/10000) of baseline have 100 <= Age < Sun : SCORE = PVALUE = % (481/574) of today's cases have Unknown Syndrome = False 74.29% (7430/10001) of baseline have Unknown Syndrome = False 5. Thu : SCORE = PVALUE = % ( 70/476) of today's cases have Viral Syndrome = True and Encephalitic Syndrome = False 7.89% (789/9999) of baseline have Viral Syndrome = True and Encephalitic Syndrome = False 6. Thu : SCORE = PVALUE = % ( 38/443) of today's cases have Hospital ID = 1 and Viral Syndrome = True 2.40% (240/10000) of baseline have Hospital ID = 1 and Viral Syndrome = True

Limitations of WSARE Works on categorical data Works on lower dimensional, dense data Cannot monitor aggregate counts – relies on changes in ratios Assumes that given the environmental variables, the baseline ratios are fairly stationary over time

Related Work Contrast sets [Bay and Pazzani] Association Rules and Data Mining in Hospital Infection Control and Public Health Surveillance [Brossette et. al.] Spatial Scan Statistic [Kulldorff] WRSARE: What’s Really Strange About Recent Events [Singh and Moore] P( Age = Senior, Gender = Male | Season = Winter, Day of Week = Monday) =

Bayesian Biosurveillance of Disease Outbreaks To appear in UAI04 [Cooper, Dash, Levander, Wong, Hogan, Wagner]

Conclusion One approach to biosurveillance: one algorithm monitoring millions of signals derived from multivariate data instead of Hundreds of univariate detectors WSARE is best used as a general purpose safety net in combination with other detectors Careful evaluation of statistical significance Modeling historical data with Bayesian Networks to allow conditioning on unique features of today Software: