Path-State Modeling for Time Series Anomaly Detection Matt Mahoney.

Path-State Modeling for Time Series Anomaly Detection Matt Mahoney

Outline Review of time series anomaly detection – Gecko – Compression – Path modeling Piecewise linear approximation of path Fast testing using state Experimental results on NASA valve data

Problem: How to Detect Anomalies in Time Series Data Normal Marotta Fuel Valve Solenoid Current (Used on Space Shuttle) Abnormal (poppet partially blocked)

Goal Reduce human workload in specifying “normal” model Editable rule based model (in SCL) Real time testing (1K-10K samples per second)

Manual Method Identify features (zero crossings, peaks…) Specify correct behavior using SCL rules

Gecko (Stan Salvador) Identify model states (parabolic segments) –Multiple training series are averaged by dynamic time warping Classify points (x,dx,d 2 x) using RIPPER Construct linear state machine Pass/fail test result

Compression Model Normal, uncompressed Abnormal, uncompressed Normal, compressed Abnormal, compressed Normal 1Normal 2 Normal 1 or 2Abnormal

TEK Compression Anomaly Scores

Goal Evaluation ManualGeckoCompres- sion Reduce Workload NoYes Real TimeYes Possible Editable model Yes No

Problem with Gecko/RIPPER: State Machine May Underconstrain Model Training Segment 1: x = 0, dx = 0 Segment 2: 0 < x < 1, dx = 1 Test Segment 1: x = 0, dx = 0 Segment 2: 0 < x < 1, dx = 3 State 1State 2 dx > 0.5 Accept

Path Model dx x 1 2 3 1 2 3 Training Path (scaled to unit cube) Test Path (d 2 = 4)

Path Model Example Training Training Normal Too steep Too low x dx d2xd2x Anomaly Score

Example TEK Results TEK 0 TEK 1 TEK 10 TEK 11 TEK 12 (Training) (Normal) Anomaly Score

Problems with Path Modeling Testing is slow, O(n 2 ) –Compares n test points to n training points each Model is complex (stores n points)

Proposed Solution Piecewise linear approximation of path –Editable (k segments, k << n) –Faster testing, O(kn) State machine model (nearest segment) –Fast testing, O(n) (same as Gecko) –Local minima problem (same as Gecko)

Piecewise Approximation Algorithm Repeat n – k times –Remove vertex with lowest cost = dh 2 Run time is O(n log n) using doubly linked heap d h

Test k: compare to all segments TEK0 training TEK3 near normal TEK12 stuck poppet TEK16 late release x dx Anomaly Score Nearest segment: 0-19

Paths (not segmented) TEK 16 TEK 0 TEK 3 TEK 12 x dx d2xd2x

TEK 0 approximation with k = 20 segments

Test 2: compare only to current and next segment (fails) TEK 0 training TEK 3 OK TEK 12 local minima TEK 16 local minima

Test 4 segments (previous, current, next 2) succeeds Training OK Skips past minimum Transitions back

Test 4 fails with k = 50 Training OK Not complete Delayed completion

Test 5 (previous, current, next 2, and one random segment) succeeds

Path Fitting (optimal if no sharp bends) Repeat n – k times –Remove lowest cost vertex (cost = dh 2 ) –Move adjacent vertices by h/4 toward removed vertex

Vertex Removal vs. Path Fitting TEK 0 self anomaly scores –Path fitting better for k > 50 –Vertex removal better for k < 50 Vertex removal Path fitting K Maximum Total Maximum Total 200 0.000008 0.000656 0.000005 0.000350 100 0.000057 0.005802 0.000019 0.003903 50 0.000345 0.027968 0.000542 0.025327 20 0.010298 0.601229 0.015872 0.961845

Path Modeling vs. Gecko Data: Voltage Test 1 at 14V, 16V, 18V... to 32V –10 x 20K points – 31 sets of 1-3 training files Gecko –Transition threshold = 3 –Error threshold = 10 or 20 –Results: pass at 10 (P), pass at 20 (P/F) or fail Path Modeling –Filter delay 2 x 50 samples per dimension –k = 50 segments –Test 5 (last, current, next 2, and random) –Results: maximum and total anomaly score

Typical Results Test file + = Train Maximum Total Gecko V37898 V14 T21 R00s.txt 0.041018 58.254755 V37898 V16 T21 R00s.txt 0.021778 43.696323 V37898 V18 T21 R00s.txt 0.006596 26.814669 V37898 V20 T21 R00s.txt + 0.000913 0.705107 P V37898 V22 T21 R00s.txt 0.008819 48.095410 P/F V37898 V24 T21 R00s.txt 0.006635 23.487464 P V37898 V26 T21 R00s.txt + 0.000361 0.593473 P V37898 V28 T21 R00s.txt 0.009032 48.236476 V37898 V30 T21 R00s.txt 0.033475 194.134671 V37898 V32 T21 R00s.txt 0.076193 448.467580

Gecko Summary (Stan) Gecko –1 training file: correct behavior 10 self: 10 P (100% correct) 90 others: 3 P/F, 87 F (97-100% correct) –2-3 training files: some generalization 26 self: 23 P, 3 F (14V, 14V, 16V) (88% correct) –14V is too different from the others 22 “between”: 8 P, 6 P/F, 8 F (36-63% correct) 162 others: 1 P/F, 161 F (99-100% correct)

Path Model Summary Anomaly score proportional to training-test difference (correct) Multiple training sets: no generalization (expected)

Run Time Performance Tested on data set 1 (218 x 20K points) –50 training files = 10 6 samples –168 test files = 3.36 x 10 6 samples 750 MHz Duron, tsad4.cpp, g++ -O 2.95.2 –Read and filter 10 6 points: 23 sec –Approximate to k = 100 segments: 30 sec. –Test k: 162 sec (500 ns per point per segment)

Summary Path ModelGecko Meets all goalsYes OutputNumericPass/fail Training speedO(n log n)O(n 2 ) (DTW) Test speedO(n) ParametersFilter delay, number of segments Transition and error thresholds Local minimaYes GeneralizationNoSome

Future Work Test path modeling with other data sets –UCR archive, http://www.cs.ucr.edu/~eamonn/TSDMA/ –Power load profiles, http://www.delelect.com/pdfs/Del-Res.txt Test with multiple dimensions Generalization?

Thank You Further Reading http://cs.fit.edu/~mmahoney/nasa/

Path-State Modeling for Time Series Anomaly Detection Matt Mahoney.

Similar presentations

Presentation on theme: "Path-State Modeling for Time Series Anomaly Detection Matt Mahoney."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Path-State Modeling for Time Series Anomaly Detection Matt Mahoney.

Similar presentations

Presentation on theme: "Path-State Modeling for Time Series Anomaly Detection Matt Mahoney."— Presentation transcript:

Similar presentations

About project

Feedback