Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enabling Real Time Alerting through streaming pattern discovery Chengyang Zhang Computer Science Department University of North Texas 11/21/2016 CRI Group.

Similar presentations


Presentation on theme: "Enabling Real Time Alerting through streaming pattern discovery Chengyang Zhang Computer Science Department University of North Texas 11/21/2016 CRI Group."— Presentation transcript:

1 Enabling Real Time Alerting through streaming pattern discovery Chengyang Zhang Computer Science Department University of North Texas 11/21/2016 CRI Group Meeting

2 Motivation Real Time Event Detection  Flood Detection  Sudden change of water quality System Malfunction Detection  Some sensors are dead  Some sensors give incorrect readings In any case, it is likely there are some anomalous pattern changes in the data! 11/21/2016 CRI Group Meeting

3 How to automatically identify such changes Threshold-based approach  E.g. Temp; relative humidity; barometric pressure; sun radiation -6999.000000; 95.000000; 751.000000; 0.000000; We can easily set the temperature threshold to be say, between 0F, and 150F  Problems:  Has to be manually assigned  Cannot reflect temporal patterns Temp; relative humidity; barometric pressure; sun radiation, time 50; 95.000000; 751.000000; 0.000000; 2008-10-22 13:00:00  Cannot reflect correlations between sensor readings Time; UVB; Temp 2008-10-24 11:00:00; 0.727; 58.9 2008-10-24 12:00:00; 0.827; 69.8 2008-10-24 13:00:00 0.774; 73.4 System Malfunction Detection  Some sensors are dead  Some sensors give incorrect readings In any case, it is likely there are some pattern changes in the data! 11/21/2016 CRI Group Meeting

4 How to automatically identify such changes Simple Statistics-based approach  Historical Mean Temp; relative humidity; barometric pressure; sun radiation, time 50; 95.000000; 751.000000; 0.000000; 2008-10-22 13:00:00  Mean of Multiple Sensors  Problems:  Still not able to reflect correlations among data 11/21/2016 CRI Group Meeting

5 Objective 11/21/2016 CRI Group Meeting Objective  Find hidden trend from the correlate data  On the fly(real time) detection  Limited memory

6 Slides 7 to 17 are extracted from reference 2 11/21/2016 CRI Group Meeting

7 1. How to capture correlations 1 ? 20 o C 30 o C Temperature T 1 First sensor time

8 1. How to capture correlations? First sensor Second sensor 20 o C 30 o C Temperature T 2 time

9 20 o C30 o C 1. How to capture correlations 20 o C 30 o C Temperature T 2 Temperature T 1 First three lie (almost) on a line in the space of value- pairs…  O(n) numbers for the slope, and  One number for each value- pair (offset on line) Offset “ hidden variable ” time=1 time=2 time=3

10 1. How to capture correlations 20 o C30 o C 20 o C 30 o C Temperature T 2 Temperature T 1 Other pairs also follow the same pattern: they lie (approximately) on this line

11 2. Incremental update error 20 o C30 o C 20 o C 30 o C Temperature T 2 Temperature T 1 For each new point Project onto current line Estimate error New value

12 2. Incremental update 20 o C 30 o C 20 o C30 o C Temperature T 2 Temperature T 1 For each new point Project onto current line Estimate error Rotate line in the direction of the error and in proportion to its magnitude

13 Stream correlations Principal Component Analysis (PCA) The “line” is the first principal component (PC) vector This line is optimal: it minimizes the sum of squared projection errors

14 T3T3 3. Number of hidden variables If we had three sensors with similar measurements Again: points would lie on a line (i.e., one hidden variable, k=1), but in 3-D space T1T1 T2T2 value-tuple space

15 T3T3 3. Number of hidden variables Assume one sensor intermittently gets stuck Now, no line can give a good approximation T1T1 T2T2 value-tuple space

16 T3T3 3. Number of hidden variables Assume one sensor intermittently gets stuck Now, no line can give a good approximation But a plane will do (two hidden variables, k = 2) T1T1 T2T2 value-tuple space

17 Number of hidden variables (PCs) Keep track of energy maintained by approximation with k variables (PCs): Reconstruction accuracy, w.r.t. total squared error Increment (or decrement) k if fraction of energy maintained goes below (or above) a threshold If below 95%, k  k  1 If above 98%, k  k  1

18 11/21/2016 References 1.Spiros Papadimitriou, Jimeng Sun, Christos Faloutsos. Streaming pattern discovery in multiple time- series. In VLDB’05. (Part of the slides are also borrowed from their vldb presentation) 1.http://data.cs.washington.edu/ 2.http://52north.org/index.php?option=com_projects&task=showProject&id=8&Itemid=127 CRI Group Meeting

19 11/21/2016 Questions? CRI Group Meeting


Download ppt "Enabling Real Time Alerting through streaming pattern discovery Chengyang Zhang Computer Science Department University of North Texas 11/21/2016 CRI Group."

Similar presentations


Ads by Google