N. Keriven1, D. Garreau2, I. Poli3

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Applications of one-class classification
Aggregating local image descriptors into compact codes
Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores Presented By: Rahil Shah Candidate for Master of Engineering in ECE Electrical.
Fast Algorithms For Hierarchical Range Histogram Constructions
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
Dynamic Bayesian Networks (DBNs)
Resource Prediction Based on Double Exponential Smoothing in Cloud Computing Authors: Jinhui Huang, Chunlin Li, Jie Yu The International Conference on.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
Advancing Wireless Link Signatures for Location Distinction J. Zhang, M. H. Firooz, N. Patwari, S. K. Kasera MobiCom’ 08 Presenter: Yuan Song.
Speaker Adaptation for Vowel Classification
Incremental Learning of Temporally-Coherent Gaussian Mixture Models Ognjen Arandjelović, Roberto Cipolla Engineering Department, University of Cambridge.
1 Distributed Online Simultaneous Fault Detection for Multiple Sensors Ram Rajagopal, Xuanlong Nguyen, Sinem Ergen, Pravin Varaiya EECS, University of.
Expectation Maximization Algorithm
1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.
Amos Storkey, School of Informatics. Density Traversal Clustering and Generative Kernels a generative framework for spectral clustering Amos Storkey, Tom.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Overview Introduction to local features
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Patch Complexity, Finite Pixel Correlations and Optimal Denoising Anat Levin, Boaz Nadler, Fredo Durand and Bill Freeman Weizmann Institute, MIT CSAIL.
Particle Filtering in Network Tomography
Online Detection of Change in Data Streams Shai Ben-David School of Computer Science U. Waterloo.
1 Reproducing Kernel Exponential Manifold: Estimation and Geometry Kenji Fukumizu Institute of Statistical Mathematics, ROIS Graduate University of Advanced.
Constructing Optimal Wavelet Synopses Dimitris Sacharidis Timos Sellis
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Digital Image Processing DIGITIZATION. Summery of previous lecture Digital image processing techniques Application areas of the digital image processing.
Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.
Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.
CS654: Digital Image Analysis Lecture 30: Clustering based Segmentation Slides are adapted from:
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.
Robust Kernel Density Estimation by Scaling and Projection in Hilbert Space Presented by: Nacer Khalil.
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Optimization of Nonlinear Singularly Perturbed Systems with Hypersphere Control Restriction A.I. Kalinin and J.O. Grudo Belarusian State University, Minsk,
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
The Poincaré Constant of a Random Walk in High- Dimensional Convex Bodies Ivona Bezáková Thesis Advisor: Prof. Eric Vigoda.
Approaches to Intrusion Detection statistical anomaly detection – threshold – profile based rule-based detection – anomaly – penetration identification.
BITS Pilani Pilani Campus Data Structure and Algorithms Design Dr. Maheswari Karthikeyan Lecture1.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
Simultaneous Multi-Line- Segment Merging for Robot Mapping using Mean Shift Clustering Rolf Lakaemper Temple University, Philadelphia,PA,USA.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.
Project GuideBenazir N( ) Mr. Nandhi Kesavan RBhuvaneshwari R( ) Batch no: 32 Department of Computer Science Engineering.
Experience Report: System Log Analysis for Anomaly Detection
Intrinsic Data Geometry from a Training Set
Opracowanie językowe dr inż. J. Jarnicki
Unit 3 Hypothesis.
Advanced Statistical Computing Fall 2016
Probabilistic Robotics
PSG College of Technology
Machine Learning Basics
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Hidden Markov Models Part 2: Algorithms
Objective of This Course
Summarizing Data by Statistics
Online sketches with random features
Sketched Learning from Random Features Moments
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Advancing Wireless Link Signatures for Location Distinction
Signals and Systems Lecture 2
Retrieval Performance Evaluation - Measures
Presentation transcript:

NEWMA: a new method for scalable model-free online change point detection N. Keriven1, D. Garreau2, I. Poli3 1Ecole Normale Supérieure, Paris (CFM-ENS chair) 2Max Plank Institute, Tuebingen 3LightOn, Paris Séminaire SPOC, Dijon, June 27th 2018

Change Point Detection Goal: Detect « changes » in a multivariate time series in an online fashion. Change point Long history of applications in: Brain activity detection Network intrusion Fault detection in engineering systems Seismic detection Epidemic outbreaks… Basseville, Tartakovsky. Detection of abrupt changes. 1993 27/07/2018

Modern challenge: scalability, sensitive data Modern data ? Heavy in memory Sensitive, collected by personal devices Need for methods that: Are light and fast Do not store raw data (« truly » online) Can be performed on embedded devices with low power and memory Real-time 27/07/2018

Framework Time series Goal: detect changes in some quantity where Examples: Mean: Moment of order : Histogram: Coming: random nonlinearities… 27/07/2018

« Scan » type algorithms Exponentially Weighted Moving Average (EWMA) Context Parametric: knowledge of « normal » Non-parametric: « model-free » Moving Average (MA) « Scan » type algorithms Alarm Alarm Prior knowledge Store raw data Exponentially Weighted Moving Average (EWMA) Alarm No prior knowledge Store raw data Prior knowledge Store only weighted average Proposed: NEWMA 27/07/2018

Choice of mapping: random features Outline NEWMA: description & basic properties Choice of mapping: random features Experiments Conclusion 27/07/2018

Proposed algorithm: NEWMA « No prior knowledge » EWMA: NEWMA EWMA Scan Idea: compute two EWMAs with different forgetting factors NEWMA compares implicitely old and new samples, without the need to explicitely store them Alarm 27/07/2018

Pseudo-metric & « pointwise » detection Assumption: Appropriate pseudo-metric: Prop. : If bounded, and Initialization Window size Concentration ineq. Then, w.h.p., Prop. : If bounded, under the null, w.h.p. 27/07/2018

Asymptotic distribution under the null Null hyp.: Assume all samples are drawn according to Can we be more precise ? We have seen : bounded small w.h.p. Intuitively, for Scan algo: Known [Serfling 1980] Thm. : assume , take Mercer decomposition: Known or estimated (future work) Then 27/07/2018

Average Run Length Null hyp.: Assume all samples are drawn according to In expectation, how long before a false alarm? Markov-chain based approach for (1d) EWMA [Fu 2002]: Adaptation for NEWMA Markov Hyp.: finite-dim Markov-chain Discretize the domain with prec. Compute (easy) Show Useful insight, but limited usability in practice… (beyond toy examples) Discretize ( - coverings), and - bound the domain Compute Show 27/07/2018

Choice of mapping: random features Outline NEWMA: description & basic properties Choice of mapping: random features Experiments Conclusion 27/07/2018

Good embedding ? Ideally, is a true metric: Def: MMD metric With no prior knowledge, which is « good » ? Recall: pseudo-metric and Never true for finite-dimensional … Ideally, is a true metric: Def: MMD metric : Maximum Mean Discrepancy (MMD) Positive Definite kernel: [Gretton 2006, Borgwardt 2006] : RKHS associated to Infinite-dimensional ! Not directly usable in practice… Many usual kernels are characteristic: Gaussian Laplace … [Gretton 2006, Sriperumbudur 2010]: true metric « characteristic » 27/07/2018

MMD and random features Usual estimation of the MMD Used in « Kernel Scan-B » algorithm [Li 2015] Compute MMD between windows of old and recent samples Infeasible for NEWMA: no sample in memory = no kernel trick Random features expansions of kernels [Rahimi 2007] Hyp: w.h.p.: eg. any translation-invariant kernels: Prop. : If Then, w.h.p. on and , 27/07/2018

Choice of mapping: random features Outline NEWMA: description & basic properties Choice of mapping: random features Experiments Conclusion 27/07/2018

Algorithms Algorithms: Kernel Scan-B [Li 2015] NEWMA with: Time (1 iter.) Memory Compute MMD between windows of old and recent samples Algorithms: Kernel Scan-B [Li 2015] NEWMA with: Classical Random Fourier Features (NEWMA-RF) « FastFood » Random Features [Le 2013] (NEWMA-FF) Lighton’s Optical Random Features [Saade 2016] (coming soon…) Raw data Dense matrix of In practice, use of numerical procedure to dynamically adapt the threshold (see details in the paper) 27/07/2018

Synthetic Data Data: GMM with 10 components Detection result (varying threshold, lower left best): Time of execution: 27/07/2018

Voice activity detection + Short Time Fourier Transform Change point detection 27/07/2018

Video monitoring Apply NEWMA on each block of pixels: Real-time on a laptop, no GPU. No need to record everything ! 27/07/2018

Choice of mapping: random features Outline NEWMA: description & basic properties Choice of mapping: random features Experiments Conclusion 27/07/2018

Conclusion, outlooks Introduced « No prior knowledge »-EWMA: Simple but novel idea inspired by existing methods (EWMA and Scan algorithm) Compare old and recent samples without storing them in memory Faster than existing non-parametric methods Addition of Random Features to use the MMD between distributions Fast random features for sublinear cost in dimension Outlooks: Optical Random Features (very soon) Application to other type of data: graph… Non-iid data 27/07/2018

Thank you ! Keriven, Garreau, Poli. NEWMA: a new method for scalable model-free online change-point detection. Submitted. arXiv:1805.08061 Code: github.com/lightonai/newma 27/07/2018