Online Detection of Change in Data Streams Shai Ben-David School of Computer Science U. Waterloo.

Slides:

Advertisements

Similar presentations

Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis.

Advertisements

Applications of one-class classification

Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.

Chapter 9 Introduction to the t-statistic

Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.

Linear Regression.

ABSTRACT We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result.

Outline input analysis input analyzer of ARENA parameter estimation

Economics 105: Statistics Review #1 due next Tuesday in class Go over GH 8 No GH’s due until next Thur! GH 9 and 10 due next Thur. Do go to lab this week.

Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.

Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.

Computational Geometry and Spatial Data Mining

EE-148 Expectation Maximization Markus Weber 5/11/99.

Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.

. Learning Bayesian networks Slides by Nir Friedman.

Statistical Methods Chichang Jou Tamkang University.

1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.

The Computational Complexity of Searching for Predictive Hypotheses Shai Ben-David Computer Science Dept. Technion.

Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.

Computational Learning Theory; The Tradeoff between Computational Complexity and Statistical Soundness Shai Ben-David CS Department, Cornell and Technion,

Topic 2: Statistical Concepts and Market Returns

Evaluating Hypotheses

Parametric Inference.

. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.

Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.

Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.

1 Validation and Verification of Simulation Models.

Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?

Class 3: Estimating Scoring Rules for Sequence Alignment.

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.

Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.

Nonparametric or Distribution-free Tests

RL for Large State Spaces: Policy Gradient

Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.

Multiple testing correction

Chapter 8 Introduction to Hypothesis Testing

Basic Data Analysis for Quantitative Research

Chapter 9 Statistical Data Analysis

Evidence and scenario sensitivities in naïve Bayesian classifiers Presented by Marwan Kandela & Rejin James 1 Silja Renooij, Linda C. van der Gaag, "Evidence.

The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.

STAT 111 Introductory Statistics Lecture 9: Inference and Estimation June 2, 2004.

Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

What are Nonparametric Statistics? In all of the preceding chapters we have focused on testing and estimating parameters associated with distributions.

Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

J. Carmona R. Gavaldà UPC (Barcelona, Spain) 1. Outline  The Advent of Process Mining (PM) The challenge of Concept Drift (CD)  Key ingredients  Online.

1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Learning to Detect Events with Markov-Modulated Poisson Processes Ihler, Hutchins and Smyth (2007)

Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.

IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.

1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.

Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.

A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.

Building Valid, Credible & Appropriately Detailed Simulation Models

Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”

Spatial Scan Statistic for Geographical and Network Hotspot Detection C. Taillie and G. P. Patil Center for Statistical Ecology and Environmental Statistics.

L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 8 1 MER301: Engineering Reliability LECTURE 8: Chapter 4: Statistical Inference,

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

EM for Inference in MV Data

Feifei Li, Ching Chang, George Kollios, Azer Bestavros

EM for Inference in MV Data

Learning Bayesian networks

Presentation transcript:

Online Detection of Change in Data Streams Shai Ben-David School of Computer Science U. Waterloo

Some Change Detection Tasks  Quality control – Factory products are being regularly tested and scored. Can we detect when the distribution of scores changes?  Real estate prices – Following selling prices of houses in K-W Can we tell when market trends change?

Problem Formalization  Data points are generated sequentially and independently by some underlying probability distribution.  Viewing the generated stream of data points, we wish to detect when the underlying data generating distribution changes (and how it changes).

Detection in Sensor Networks  We consider large scale networks of sensors Each sensor makes local binary decisions about the monitored physical phenomena: RED/GREEN  An observer collects a random sample of sensors’ readings.

First data collectionSecond data collection Is there a change in the underlying data-generating distribution? If a change has been detected, What has exactly changed ? Change Detection in Sensor Networks

Similar Issues in Other Disciplines  Ecology – Tracing the distribution of species over geographical locations.  Public Health – Tracing spread of various diseases.  Census data analysis.

Our basic paradigm Compare two sliding windows over the data stream: S1S1 S2S2 time Reducing change detection problem to the “two samples” problem; Given two samples S 1, S 2, generated by distributions P 1,P 2, Infer from S 1, S 2, whether P 1 =P 2.

Meta-Algorithm for Online Change Detection

Explanation k (m 1,i,m 2,i,α i ).  Note that the meta-algorithm is actually running k independent algorithms in parallel – one for each triplet (m 1,i,m 2,i,α i ). X i m 1,i c 0 Y i m 2,i  Each keeps a baseline window X i, containing the m 1,i points following last-detected change, c 0, and a second window, Y i, containing the most recent m 2,i points in the stream. d(X i Y i )> α i  We declare CHANGE whenever d(X i, Y i )> α i  c 0 X i  At such a point we reset c 0 and X i α i m i α i  The different α i ‘s reflect different levels of ‘change sensitivity. The m i ‘s are computed from the α i ‘s using the theory outlined below

Statistical requirements We wish to support our statistical tests with formal, finite sample size guarantees for:  Control the rate of False Positives ( `false alarms’).  Control the rate of False Negatives (`Missed- detections’).  Reliability of the change description.

Previous Work on the Two-Sample Problem Mostly within the context of parametric statistics. (Assuming the underlying distributions come from a known family of ‘nice’ distributions) Previous applications not concerned with memory and computation time limitations. Performance guarantees are asymptotic – apply only in the limit when sample sizes go to infinity. Previous focus on detection only – we wish to also describe the change.

The Need for Probability-Distance Measure  False Positives guarantees are straightforward: “If S 1, S 2 are samples of the same distribution, then the probability that the test will declare `CHANGE’ is small”  False Negatives guarantees are more delicate: “If S 1, S 2, come from different distributions then, w.h.p. declare `CHANGE’” This is infeasible.  One needs to quantify “d(P 1, P 2 )> ε”

Inadequacy of common Measures  The L 1 norm (or `total valiance’) is too sensitive: For every sample-based test and every m, there are P 1, P 2 s.t. L 1 (P 1, P 2 )> ¼ but the test fails to detect change from m-samples.  L p ‘s for p>1 are too insensitive.

A New Measure of Distance F Given a family F of domain subsets, we define Note that this is a pseudo-metric over probability distributions. F Intuitively, F is chosen as a family of sets that the user cares about, d F F d F measures the largest change in probability over sets in F.

Major Merits of the F-distance  If F is the family of disks or rectangles, d F captures an intuitive notion of ‘localized change’  If the family of sets, F, has a finite VC-dimension, then one gets finite sample- size guarantees against false negatives (w.r.t. d F )

Background: VC-Dimension The Vapnik-Chervonenkis dimension (VC-dim) is a parameter that measures the `combinatorial complexity of a family of sets. For algebraically defined families it is roughly the number of parameters needed to define a set in the family: So, VC-dim(Planar disks)=3, VC-dim{Axis Aligned Rectangles)=4

VC-Based Guarantees Let P 1, P 2 be any probability distributions over some domain set X. And let F be a family of subsets of X of finite VC-dimension d. For every 0 < ε <1, if S 1, S 2 are i.i.d samples of size m each, drawn by P 1, P 2 (respectively) then,

VC-Based Guarantees (2) In particular, we get Where S i (A) is the empirical measure

A Relativized Discrepancy To focus on small-weight subsets, we define a variation of the d F distance

Statistical Guarantees for the Relativized Discrepancy Let P 1, P 2 be any probability distributions over some domain set X. And let F be a family of subsets of X of finite VC-dimension d. For every 0 < ε <1, if S 1, S 2 are i.i.d samples of size m each, drawn by P 1, P 2 (respectively) then,

Algorithms for computing d F (S 1, S 2 ) We developed several basic algorithms, that take a pair of samples S 1, S 2 as input, and output the sets A in F that exhibit maximal empirical discrepancy. Our focus is the computational complexity of the algorithms as function of the input sample sizes.

Algorithms – The Basic Ideas (1) We say that a collection H of subsets is F-complete w.r.t. a sample S, if for every A in F there exist a set B in H such that. It follows that, if H is F-complete w.r.t. S1 U S2, then and

Algorithms – The Basic Ideas (2) The next step is to find finite collections of subsets that are F-complete for some natural families, F. For example, Where, D(s 1, s 2, s 3 ) is the disk whose boundary is defined by this triple of points.

Running times of our algorithms For Real-valued data we designed a data structure and an algorithm that requires O(m 1i + m 2i ) O(m 1i + m 2i ) time at every initiation and (O(log (m 1i + m 2i )) (O(log (m 1i + m 2i )) time for incremental updates for every new arriving data point.

Running times of our Algorithms (2) For two-dimensional data points, we consider two F basic families F of sets in the plane: Axis Aligned Rectangles and Planar Disks. For Rectangles we get computational complexity O(|S| 3 ) For Disks we get an exhaustive algorithm that runs O(|S| 4 ) in time O(|S| 4 ) and an approximation algorithm of complexity O(|S| 2 log|S|)

Summary  We defined notions of spatial distances between probability distributions – changes that are detectable within local geometric regions (say, circles)  Apply Vapnik-Chervonenkis theory to derive confidence guarantees.  Develop efficient detection and estimation algorithms

Novelty of Our Approach  Non-parametric statistics. (We make no prior assumptions about the underlying distribution)  We provide performance guarantees for manageable (finite) sample sizes.  We develop computationally efficient algorithms for change detection and change estimation.