Enabling Real Time Alerting through streaming pattern discovery Chengyang Zhang Computer Science Department University of North Texas 11/21/2016 CRI Group.

Slides:



Advertisements
Similar presentations
Beyond Streams and Graphs: Dynamic Tensor Analysis
Advertisements

Slides from: Doug Gray, David Poole
PCA for analysis of complex multivariate data. Interpretation of large data tables by PCA In industry, research and finance the amount of data is often.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Carnegie Mellon DB/IR '06C. Faloutsos#1 Data Mining on Streams Christos Faloutsos CMU.
Shape From Light Field meets Robust PCA
Streaming Pattern Discovery in Multiple Time-Series Spiros Papadimitriou Jimeng Sun Christos Faloutsos Carnegie Mellon University VLDB 2005, Trondheim,
Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.
Lecture 7: Principal component analysis (PCA)
Vikramaditya Jakkula Washington State University First International Workshop on Smart Homes for Tele-Health.
x – independent variable (input)
Principal Component Analysis
Context Compression: using Principal Component Analysis for Efficient Wireless Communications Christos Anagnostopoulos & Stathes Hadjiefthymiades Pervasive.
Rodent Behavior Analysis Tom Henderson Vision Based Behavior Analysis Universitaet Karlsruhe (TH) 12 November /9.
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
1 Toward Sophisticated Detection With Distributed Triggers Ling Huang* Minos Garofalakis § Joe Hellerstein* Anthony Joseph* Nina Taft § *UC Berkeley §
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Modeling and Prediction of Abdominal Tumor Motion Haobing Wang Department of Computer Science May 9 th, 2003.
Privacy Preservation for Data Streams Feifei Li, Boston University Joint work with: Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana.
Principal Component Analysis Principles and Application.
Energy-efficient Self-adapting Online Linear Forecasting for Wireless Sensor Network Applications Jai-Jin Lim and Kang G. Shin Real-Time Computing Laboratory,
Statistical Methods for long-range forecast By Syunji Takahashi Climate Prediction Division JMA.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Assigned work: pg. 433 #1-12 Equation of a line – slope and point or two points BUT NOW we will learn to describe an Equation of a Line by using vectors…………………..
Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
-1- ICA Based Blind Adaptive MAI Suppression in DS-CDMA Systems Malay Gupta and Balu Santhanam SPCOM Laboratory Department of E.C.E. The University of.
InteMon: Intelligent monitoring system for large clusters Evan Hoke, Jimeng Sun and Christos Faloutsos.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CHAPTER 38 Scatter Graphs. Correlation To see if there is a relationship between two sets of data we plot a SCATTER GRAPH. If there is some sort of relationship.
BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos.
BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005.
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Gap-filling and Fault-detection for the life under your feet dataset.
How Errors Propagate Error in a Series Errors in a Sum Error in Redundant Measurement.
K. Kolomvatsos 1, C. Anagnostopoulos 2, and S. Hadjiefthymiades 1 An Efficient Environmental Monitoring System adopting Data Fusion, Prediction & Fuzzy.
Lei Li Computer Science Department Carnegie Mellon University Pre Proposal Time Series Learning completed work 11/27/2015.
Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
9e. Scatter Charts CSCI N207 Data Analysis Using Spreadsheet Department of Computer and Information Science, IUPUI Lingma Acheson
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University.
Model Based Event Detection in Sensor Networks Jayant Gupchup, Andreas Terzis, Randal Burns, Alex Szalay.
D YNA MM O : M INING AND S UMMARIZATION OF C OEVOLVING S EQUENCES WITH M ISSING V ALUES Lei Li joint work with Christos Faloutsos, James McCann, Nancy.
Locations. Soil Temperature Dataset Observations Data is – Correlated in time and space – Evolving over time (seasons) – Gappy (Due to failures) – Faulty.
MBF1413 | Quantitative Methods Prepared by Dr Khairul Anuar 8: Time Series Analysis & Forecasting – Part 1
 1 More Mathematics: Finding Minimum. Numerical Optimization Find the minimum of If a given function is continuous and differentiable, find the root.
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
CSC2535: Lecture 4: Autoencoders, Free energy, and Minimum Description Length Geoffrey Hinton.
Regression and Correlation of Data Summary
Fitting Curve Models to Edges
Range Imaging Through Triangulation
2.5 Correlation and Best-Fitting Lines
2.6 Draw Scatter Plots and Best-Fitting Lines
Jimeng Sun · Charalampos (Babis) E
Descriptive Statistics vs. Factor Analysis
EE513 Audio Signals and Systems
Bandwidth Extrapolation of Audio Signals
Functions and Their Graphs
K. Kolomvatsos1, C. Anagnostopoulos2, and S. Hadjiefthymiades1
MATH 6380J Mini-Project 1: Realization of Recent Trends in Machine Learning Community in Recent Years by Pattern Mining of NIPS Words Chan Lok Chun
Chengyang Zhang Computer Science Department University of North Texas
Artificial Intelligence 10. Neural Networks
1st Annual Israel Multinational BMD Conference & Exhibition
Lecture 8: Factor analysis (FA)
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

Enabling Real Time Alerting through streaming pattern discovery Chengyang Zhang Computer Science Department University of North Texas 11/21/2016 CRI Group Meeting

Motivation Real Time Event Detection  Flood Detection  Sudden change of water quality System Malfunction Detection  Some sensors are dead  Some sensors give incorrect readings In any case, it is likely there are some anomalous pattern changes in the data! 11/21/2016 CRI Group Meeting

How to automatically identify such changes Threshold-based approach  E.g. Temp; relative humidity; barometric pressure; sun radiation ; ; ; ; We can easily set the temperature threshold to be say, between 0F, and 150F  Problems:  Has to be manually assigned  Cannot reflect temporal patterns Temp; relative humidity; barometric pressure; sun radiation, time 50; ; ; ; :00:00  Cannot reflect correlations between sensor readings Time; UVB; Temp :00:00; 0.727; :00:00; 0.827; :00: ; 73.4 System Malfunction Detection  Some sensors are dead  Some sensors give incorrect readings In any case, it is likely there are some pattern changes in the data! 11/21/2016 CRI Group Meeting

How to automatically identify such changes Simple Statistics-based approach  Historical Mean Temp; relative humidity; barometric pressure; sun radiation, time 50; ; ; ; :00:00  Mean of Multiple Sensors  Problems:  Still not able to reflect correlations among data 11/21/2016 CRI Group Meeting

Objective 11/21/2016 CRI Group Meeting Objective  Find hidden trend from the correlate data  On the fly(real time) detection  Limited memory

Slides 7 to 17 are extracted from reference 2 11/21/2016 CRI Group Meeting

1. How to capture correlations 1 ? 20 o C 30 o C Temperature T 1 First sensor time

1. How to capture correlations? First sensor Second sensor 20 o C 30 o C Temperature T 2 time

20 o C30 o C 1. How to capture correlations 20 o C 30 o C Temperature T 2 Temperature T 1 First three lie (almost) on a line in the space of value- pairs…  O(n) numbers for the slope, and  One number for each value- pair (offset on line) Offset “ hidden variable ” time=1 time=2 time=3

1. How to capture correlations 20 o C30 o C 20 o C 30 o C Temperature T 2 Temperature T 1 Other pairs also follow the same pattern: they lie (approximately) on this line

2. Incremental update error 20 o C30 o C 20 o C 30 o C Temperature T 2 Temperature T 1 For each new point Project onto current line Estimate error New value

2. Incremental update 20 o C 30 o C 20 o C30 o C Temperature T 2 Temperature T 1 For each new point Project onto current line Estimate error Rotate line in the direction of the error and in proportion to its magnitude

Stream correlations Principal Component Analysis (PCA) The “line” is the first principal component (PC) vector This line is optimal: it minimizes the sum of squared projection errors

T3T3 3. Number of hidden variables If we had three sensors with similar measurements Again: points would lie on a line (i.e., one hidden variable, k=1), but in 3-D space T1T1 T2T2 value-tuple space

T3T3 3. Number of hidden variables Assume one sensor intermittently gets stuck Now, no line can give a good approximation T1T1 T2T2 value-tuple space

T3T3 3. Number of hidden variables Assume one sensor intermittently gets stuck Now, no line can give a good approximation But a plane will do (two hidden variables, k = 2) T1T1 T2T2 value-tuple space

Number of hidden variables (PCs) Keep track of energy maintained by approximation with k variables (PCs): Reconstruction accuracy, w.r.t. total squared error Increment (or decrement) k if fraction of energy maintained goes below (or above) a threshold If below 95%, k  k  1 If above 98%, k  k  1

11/21/2016 References 1.Spiros Papadimitriou, Jimeng Sun, Christos Faloutsos. Streaming pattern discovery in multiple time- series. In VLDB’05. (Part of the slides are also borrowed from their vldb presentation) CRI Group Meeting

11/21/2016 Questions? CRI Group Meeting