Presentation is loading. Please wait.

Presentation is loading. Please wait.

Path-State Modeling for Time Series Anomaly Detection Matt Mahoney.

Similar presentations


Presentation on theme: "Path-State Modeling for Time Series Anomaly Detection Matt Mahoney."— Presentation transcript:

1 Path-State Modeling for Time Series Anomaly Detection Matt Mahoney

2 Outline Review of time series anomaly detection – Gecko – Compression – Path modeling Piecewise linear approximation of path Fast testing using state Experimental results on NASA valve data

3 Problem: How to Detect Anomalies in Time Series Data Normal Marotta Fuel Valve Solenoid Current (Used on Space Shuttle) Abnormal (poppet partially blocked)

4 Goal Reduce human workload in specifying “normal” model Editable rule based model (in SCL) Real time testing (1K-10K samples per second)

5 Manual Method Identify features (zero crossings, peaks…) Specify correct behavior using SCL rules

6 Gecko (Stan Salvador) Identify model states (parabolic segments) –Multiple training series are averaged by dynamic time warping Classify points (x,dx,d 2 x) using RIPPER Construct linear state machine Pass/fail test result

7 Compression Model Normal, uncompressed Abnormal, uncompressed Normal, compressed Abnormal, compressed Normal 1Normal 2 Normal 1 or 2Abnormal

8 TEK Compression Anomaly Scores

9 Goal Evaluation ManualGeckoCompres- sion Reduce Workload NoYes Real TimeYes Possible Editable model Yes No

10 Problem with Gecko/RIPPER: State Machine May Underconstrain Model Training Segment 1: x = 0, dx = 0 Segment 2: 0 < x < 1, dx = 1 Test Segment 1: x = 0, dx = 0 Segment 2: 0 < x < 1, dx = 3 State 1State 2 dx > 0.5 Accept

11 Path Model dx x 1 2 3 1 2 3 Training Path (scaled to unit cube) Test Path (d 2 = 4)

12 Path Model Example Training Training Normal Too steep Too low x dx d2xd2x Anomaly Score

13 Example TEK Results TEK 0 TEK 1 TEK 10 TEK 11 TEK 12 (Training) (Normal) Anomaly Score

14 Problems with Path Modeling Testing is slow, O(n 2 ) –Compares n test points to n training points each Model is complex (stores n points)

15 Proposed Solution Piecewise linear approximation of path –Editable (k segments, k << n) –Faster testing, O(kn) State machine model (nearest segment) –Fast testing, O(n) (same as Gecko) –Local minima problem (same as Gecko)

16 Piecewise Approximation Algorithm Repeat n – k times –Remove vertex with lowest cost = dh 2 Run time is O(n log n) using doubly linked heap d h

17 Test k: compare to all segments TEK0 training TEK3 near normal TEK12 stuck poppet TEK16 late release x dx Anomaly Score Nearest segment: 0-19

18 Paths (not segmented) TEK 16 TEK 0 TEK 3 TEK 12 x dx d2xd2x

19 TEK 0 approximation with k = 20 segments

20 Test 2: compare only to current and next segment (fails) TEK 0 training TEK 3 OK TEK 12 local minima TEK 16 local minima

21 Test 4 segments (previous, current, next 2) succeeds Training OK Skips past minimum Transitions back

22 Test 4 fails with k = 50 Training OK Not complete Delayed completion

23 Test 5 (previous, current, next 2, and one random segment) succeeds

24 Path Fitting (optimal if no sharp bends) Repeat n – k times –Remove lowest cost vertex (cost = dh 2 ) –Move adjacent vertices by h/4 toward removed vertex

25 Vertex Removal vs. Path Fitting TEK 0 self anomaly scores –Path fitting better for k > 50 –Vertex removal better for k < 50 Vertex removal Path fitting K Maximum Total Maximum Total 200 0.000008 0.000656 0.000005 0.000350 100 0.000057 0.005802 0.000019 0.003903 50 0.000345 0.027968 0.000542 0.025327 20 0.010298 0.601229 0.015872 0.961845

26 Path Modeling vs. Gecko Data: Voltage Test 1 at 14V, 16V, 18V... to 32V –10 x 20K points – 31 sets of 1-3 training files Gecko –Transition threshold = 3 –Error threshold = 10 or 20 –Results: pass at 10 (P), pass at 20 (P/F) or fail Path Modeling –Filter delay 2 x 50 samples per dimension –k = 50 segments –Test 5 (last, current, next 2, and random) –Results: maximum and total anomaly score

27 Typical Results Test file + = Train Maximum Total Gecko V37898 V14 T21 R00s.txt 0.041018 58.254755 V37898 V16 T21 R00s.txt 0.021778 43.696323 V37898 V18 T21 R00s.txt 0.006596 26.814669 V37898 V20 T21 R00s.txt + 0.000913 0.705107 P V37898 V22 T21 R00s.txt 0.008819 48.095410 P/F V37898 V24 T21 R00s.txt 0.006635 23.487464 P V37898 V26 T21 R00s.txt + 0.000361 0.593473 P V37898 V28 T21 R00s.txt 0.009032 48.236476 V37898 V30 T21 R00s.txt 0.033475 194.134671 V37898 V32 T21 R00s.txt 0.076193 448.467580

28 Gecko Summary (Stan) Gecko –1 training file: correct behavior 10 self: 10 P (100% correct) 90 others: 3 P/F, 87 F (97-100% correct) –2-3 training files: some generalization 26 self: 23 P, 3 F (14V, 14V, 16V) (88% correct) –14V is too different from the others 22 “between”: 8 P, 6 P/F, 8 F (36-63% correct) 162 others: 1 P/F, 161 F (99-100% correct)

29 Path Model Summary Anomaly score proportional to training-test difference (correct) Multiple training sets: no generalization (expected)

30 Run Time Performance Tested on data set 1 (218 x 20K points) –50 training files = 10 6 samples –168 test files = 3.36 x 10 6 samples 750 MHz Duron, tsad4.cpp, g++ -O 2.95.2 –Read and filter 10 6 points: 23 sec –Approximate to k = 100 segments: 30 sec. –Test k: 162 sec (500 ns per point per segment)

31 Summary Path ModelGecko Meets all goalsYes OutputNumericPass/fail Training speedO(n log n)O(n 2 ) (DTW) Test speedO(n) ParametersFilter delay, number of segments Transition and error thresholds Local minimaYes GeneralizationNoSome

32 Future Work Test path modeling with other data sets –UCR archive, http://www.cs.ucr.edu/~eamonn/TSDMA/ –Power load profiles, http://www.delelect.com/pdfs/Del-Res.txt Test with multiple dimensions Generalization?

33 Thank You Further Reading http://cs.fit.edu/~mmahoney/nasa/


Download ppt "Path-State Modeling for Time Series Anomaly Detection Matt Mahoney."

Similar presentations


Ads by Google