Copyright © 2006, Brigham S. Anderson FDA Project: Anomaly and Temporal Pattern Detection Brigham Anderson Robin Sabhnani Adam Goode Alice Zheng Artur.

Slides:



Advertisements
Similar presentations
Enhancing Security Using Mobile Based Anomaly Detection in Cellular Mobile Networks Bo Sun, Fei Yu, KuiWu, Yang Xiao, and Victor C. M. Leung. Presented.
Advertisements

Automated Payment System. Benefits There is minimal training needed No expensive equipment necessary You can maintain your existing banking relationship.
Mining Data Streams.
Sampling Distributions and Sample Proportions
G. Alonso, D. Kossmann Systems Group
VAR.
© 2010 Artur Dubrawski 1 T-Cube Web Interface in RTBP: A Review of R&D Challenges Artur Dubrawski, Ph.D, M.Eng. Director, Auton Lab Senior Systems Scientist,
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Machine Learning and Data Mining Course Summary. 2 Outline  Data Mining and Society  Discrimination, Privacy, and Security  Hype Curve  Future Directions.
Second order cone programming approaches for handing missing and uncertain data P. K. Shivaswamy, C. Bhattacharyya and A. J. Smola Discussion led by Qi.
Assuming normally distributed data! Naïve Bayes Classifier.
 2004 University of Pittsburgh Bayesian Biosurveillance Using Multiple Data Streams Greg Cooper, Weng-Keen Wong, Denver Dash*, John Levander, John Dowling,
Distributed DBMSs A distributed database is a single logical database that is physically distributed to computers on a network. Homogeneous DDBMS has the.
Business Intelligence Michael Gross Tina Larsell Chad Anderson.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Final Project Some details on your project –Goal is to collect some numerical data pertinent to some question and analyze it using one of the statistical.
Anomaly Detection. Anomaly/Outlier Detection  What are anomalies/outliers? The set of data points that are considerably different than the remainder.
School of Computer Science and Information Systems
1 Bayesian Network Anomaly Pattern Detection for Disease Outbreaks Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University)
Spring 2012 MATH 250: Calculus III. Course Topics Review: Parametric Equations and Polar Coordinates Vectors and Three-Dimensional Analytic Geometry.
Radial Basis Function Networks
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
United States Department of Agriculture Food Safety and Inspection Service August 27, 2008 Carol Maczka, PhD Assistant Administrator Office of Food Defense.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Dennis Shasha From a book co-written with Manda Wilson
Intrusion and Anomaly Detection in Network Traffic Streams: Checking and Machine Learning Approaches ONR MURI area: High Confidence Real-Time Misuse and.
Anomaly detection Problem motivation Machine Learning.
Data Mining for Intrusion Detection: A Critical Review Klaus Julisch From: Applications of data Mining in Computer Security (Eds. D. Barabara and S. Jajodia)
Biostatistics Analysis Center Center for Clinical Epidemiology and Biostatistics University of Pennsylvania School of Medicine Minimum Documentation Requirements.
Options, Futures, and Other Derivatives 6 th Edition, Copyright © John C. Hull Chapter 18 Value at Risk.
Value at Risk.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
PATTERN RECOGNITION AND MACHINE LEARNING
Network Intrusion Detection Using Random Forests Jiong Zhang Mohammad Zulkernine School of Computing Queen's University Kingston, Ontario, Canada.
Anomaly detection with Bayesian networks Website: John Sandiford.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
MULTICLASS CONTINUED AND RANKING David Kauchak CS 451 – Fall 2013.
Chapter 12 Correlation & Regression
Copyright © 2010 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
1 Controversial Issues  Data mining (or simple analysis) on people may come with a profile that would raise controversial issues of  Discrimination 
Jhih-sin Jheng 2009/09/01 Machine Learning and Bioinformatics Laboratory.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.
Data Mining Algorithms for Large-Scale Distributed Systems Presenter: Ran Wolff Joint work with Assaf Schuster 2003.
Fall 2014 MATH 250: Calculus III. Course Topics Review: Parametric Equations and Polar Coordinates Vectors and Three-Dimensional Analytic Geometry.
1 Data Mining: Data Lecture Notes for Chapter 2. 2 What is Data? l Collection of data objects and their attributes l An attribute is a property or characteristic.
May 03, UFE ANALYSIS Old – New Model Comparison Compiled by the Load Profiling Group ERCOT Energy Analysis & Aggregation May 03, 2007.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
The Electronic Laboratory Exchange Network (eLEXNET) had been updated in several key areas in recent years. The New eLEXNET Home Page eCCMS — Communities.
1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,
Value at Risk Chapter 20 Options, Futures, and Other Derivatives, 7th International Edition, Copyright © John C. Hull 2008.
Copyright © 2009 Pearson Education, Inc. 8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the.
Excellent investigations. Change only one variable. Keep all the others the same. It’s not only easier to do only one experiment at a time, it’s also.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Approaches to Intrusion Detection statistical anomaly detection – threshold – profile based rule-based detection – anomaly – penetration identification.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Chapter 6 Sampling and Sampling Distributions
By: Surapheal Belay ITEC 6322 / Spring ABSTRACT NIST , guide to intrusion detection and prevention systems (IDPS), discusses four types of.
Fraud Mobility Ken Meiser VP- Identity Solutions.
Online Conditional Outlier Detection in Nonstationary Time Series
CSC321: Neural Networks Lecture 19: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
8.1 Sampling Distributions
Roland Kwitt & Tobias Strohmeier
Outlier Discovery/Anomaly Detection
Data Mining Anomaly Detection
Jia-Bin Huang Virginia Tech
Data Pre-processing Lecture Notes for Chapter 2
Data Mining Anomaly Detection
Presentation transcript:

Copyright © 2006, Brigham S. Anderson FDA Project: Anomaly and Temporal Pattern Detection Brigham Anderson Robin Sabhnani Adam Goode Alice Zheng Artur Dubrawski

2 OUTLINE Client Data Solutions Anomaly Detector Temporal Pattern Detector

3 The Players eLEXNET: Electronic Laboratory Exchange Network NBIS: National Bio-surveillance Integration System The Department of Homeland Security has “asked” the FDA to submit relevant eLEXNET data to NBIS.

4 Electronic Laboratory Exchange Network National Bio-surveillance Integration System Auton Lab SAIC?

5 Scenarios Scenario #1: Anchovy + Mercury

6 Anchovy/Mercury Summarization Report to FDA analyst…

7 Scenarios Scenario #1: Anchovy + Mercury Scenario #2: OJ + Salmonella

8 OK, so what does the data look like?

9 DATA Samples of food products Sample ID Collection Date Product Code Country Code Zip Code Reason Collected Human Illness? On order of 10,000 different products

10 DATA Each Sample consists of multiple Tests “Analyte” Detection Lab ID Test Method … Estimated 5,000 different analytes

11 Example Sample #223591: 2/18/2005 Coffee/Tea Analyte: Salmonella spp Detect: Negative Analyte: Staphylococcus aureus Detect: Negative Analyte: Bacillus cereus Detect: Negative

12 Data (Show spreadsheet)

13 Data Time span: 1999-present Number of records: 300K to 1 M? Missing data? …Only a few in the sample datasets provided. Different types of tests: Microbials Mycotoxins Pesticides Dyes …

14 Data Stream About 1200 Microbial tests submitted per week Tests are not submitted regularly!

15 Anomaly Detector Temporal Pattern Detector

16 What is an Anomaly? An irregularity that cannot be explained by simple domain models and knowledge Anomaly detection only needs to learn from examples of “normal” system behavior. Classification, on the other hand, would need examples labeled “normal” and “not-normal”

17 Anomaly Detectors in Practice Monitoring computer networks for attacks. Looking for suspicious activity in bank transactions Detecting unusual eBay selling/buying behavior.

18 Simple FDA Anomaly Detection GIVEN: 1 test = 1 record The relevant features of a test are Product Analyte Detect PROBLEM: For each test, compute P(product,analyte,detect) and explain it.

19 Simple Anomaly Detector Suppose we estimate all the probabilities from data: P(Meat,EColi,N) = P(Meat,EColi,Y) = P(Meat,Salmonella,N) = P(Meat,Salmonella,Y) = P(Apple,Vibrosa,N) = P(Apple,Vibrosa,Y) = P(Apple,Listeria,N) = P(Apple,Listeria,Y) = P(Product,Analyte,Detect) =

20 Simple Anomaly Detector How likely is ? Could not be easier! Just look up the entry in the JPT! Smaller numbers are more anomalous because the model is more surprised to see them.

21 Estimating P(product,analyte,detect) There are ~ 10,000 products. There are ~ 5,000 analytes. There are 2 detection outcomes. …so there are ~100M possible triplets. We cannot directly estimate P(product,analyte,detect) from the data…

22 P(product,analyte,detect) P(product,analyte,detect) = P(product) P(analyte|product) P(detect|product,analyte) P(Anchovy,Mercury,Y) = P(Anchovy) P(Mercury | Anchovy) P(Y | Anchovy, Mercury) e.g.,

23 Product ~10,000 values Analyte ~5,000 values Detect 2 values 10,000 x 1 vector 10,000 x 5,000 matrix 10,000 x 5,000 x 2 matrix

24 Two ways we handle insufficient data: Aggregate Products into “Industries” Dirichlet priors on CPTs

25 Product ~10,000 values Analyte ~5,000 values Detect 2 values 50 x 1 vector 50 x 5,000 matrix 50 x 5,000 x 2 matrix Industry ~50 values

26 Least Anomalous in 2005 Anomaly Score

27 Most Anomalous in 2005 Anomaly Score

28 Dirichlet priors How we add Dirichlet priors: 1.Before learning the CPTs, assume that we’ve seen every possible combination exactly “once”. 2.Continue learning the network.

29 Which Abstraction Level? There are about three levels of detail for a given product… E.g., Seafood  Anchovy  Smoked Achovy Currently, use P(Mercury | Seafood) …should we use P(Mercury | Anchovy) instead? …but what if we’ve only seen 4 Anchovy/Mercury tests? Do we use that to estimate P(Mercury | Anchovy) ?

30 Which Abstraction Level? There are about three levels of detail for a given product… E.g., Seafood  Anchovy  Smoked Achovy IDEA: 1.Build one anomaly detector for each level. 2.Test each sample at all three levels. 3.Choose the most anomalous score. Are you insane? Maybe not… At the lower levels, the anomaly score will tend to be dominated by the prior (and thus produce high probabilities.)

31 Anomaly Detector Temporal Pattern Detector

32 What is a Temporal Pattern? How find the Orange Juice + Salmonella pattern? This is not a daily scan, it is “on-demand”

33 What is a Temporal Pattern? BASIC PROBLEM #1: Check each product/analyte pair in the last t weeks against the previous t’ weeks for unusual “behavior”. BASIC SOLUTION: Chi-square test for each product/analyte pair: DetectsNon-Detects RecentO 11 O 12 BaselineO 21 O 22

vs Microbials only

35 What is a Temporal Pattern? BASIC PROBLEM #2: Check each product/analyte pair in the last t weeks for any interval of unusual behavior. BASIC SOLUTION: Chi-square test for each product/analyte pair for each interval (Bootstrap to get baseline distribution of best chi-square.) DetectsNon-DetectsDuration Inside#detects_inside, O 11 #non-detects_inside, O 12 #weeks_inside O 13 Outside#detects_outside, O 21 #non-detects_outside, O 22 #weeks_outside O 23

36 Patulin mycotoxin tests on Fruits

37 All years? Microbials only

38

39

40