Download presentation
Presentation is loading. Please wait.
Published byMildred Simon Modified over 9 years ago
1
Path-State Modeling for Time Series Anomaly Detection Matt Mahoney
2
Outline Review of time series anomaly detection – Gecko – Compression – Path modeling Piecewise linear approximation of path Fast testing using state Experimental results on NASA valve data
3
Problem: How to Detect Anomalies in Time Series Data Normal Marotta Fuel Valve Solenoid Current (Used on Space Shuttle) Abnormal (poppet partially blocked)
4
Goal Reduce human workload in specifying “normal” model Editable rule based model (in SCL) Real time testing (1K-10K samples per second)
5
Manual Method Identify features (zero crossings, peaks…) Specify correct behavior using SCL rules
6
Gecko (Stan Salvador) Identify model states (parabolic segments) –Multiple training series are averaged by dynamic time warping Classify points (x,dx,d 2 x) using RIPPER Construct linear state machine Pass/fail test result
7
Compression Model Normal, uncompressed Abnormal, uncompressed Normal, compressed Abnormal, compressed Normal 1Normal 2 Normal 1 or 2Abnormal
8
TEK Compression Anomaly Scores
9
Goal Evaluation ManualGeckoCompres- sion Reduce Workload NoYes Real TimeYes Possible Editable model Yes No
10
Problem with Gecko/RIPPER: State Machine May Underconstrain Model Training Segment 1: x = 0, dx = 0 Segment 2: 0 < x < 1, dx = 1 Test Segment 1: x = 0, dx = 0 Segment 2: 0 < x < 1, dx = 3 State 1State 2 dx > 0.5 Accept
11
Path Model dx x 1 2 3 1 2 3 Training Path (scaled to unit cube) Test Path (d 2 = 4)
12
Path Model Example Training Training Normal Too steep Too low x dx d2xd2x Anomaly Score
13
Example TEK Results TEK 0 TEK 1 TEK 10 TEK 11 TEK 12 (Training) (Normal) Anomaly Score
14
Problems with Path Modeling Testing is slow, O(n 2 ) –Compares n test points to n training points each Model is complex (stores n points)
15
Proposed Solution Piecewise linear approximation of path –Editable (k segments, k << n) –Faster testing, O(kn) State machine model (nearest segment) –Fast testing, O(n) (same as Gecko) –Local minima problem (same as Gecko)
16
Piecewise Approximation Algorithm Repeat n – k times –Remove vertex with lowest cost = dh 2 Run time is O(n log n) using doubly linked heap d h
17
Test k: compare to all segments TEK0 training TEK3 near normal TEK12 stuck poppet TEK16 late release x dx Anomaly Score Nearest segment: 0-19
18
Paths (not segmented) TEK 16 TEK 0 TEK 3 TEK 12 x dx d2xd2x
19
TEK 0 approximation with k = 20 segments
20
Test 2: compare only to current and next segment (fails) TEK 0 training TEK 3 OK TEK 12 local minima TEK 16 local minima
21
Test 4 segments (previous, current, next 2) succeeds Training OK Skips past minimum Transitions back
22
Test 4 fails with k = 50 Training OK Not complete Delayed completion
23
Test 5 (previous, current, next 2, and one random segment) succeeds
24
Path Fitting (optimal if no sharp bends) Repeat n – k times –Remove lowest cost vertex (cost = dh 2 ) –Move adjacent vertices by h/4 toward removed vertex
25
Vertex Removal vs. Path Fitting TEK 0 self anomaly scores –Path fitting better for k > 50 –Vertex removal better for k < 50 Vertex removal Path fitting K Maximum Total Maximum Total 200 0.000008 0.000656 0.000005 0.000350 100 0.000057 0.005802 0.000019 0.003903 50 0.000345 0.027968 0.000542 0.025327 20 0.010298 0.601229 0.015872 0.961845
26
Path Modeling vs. Gecko Data: Voltage Test 1 at 14V, 16V, 18V... to 32V –10 x 20K points – 31 sets of 1-3 training files Gecko –Transition threshold = 3 –Error threshold = 10 or 20 –Results: pass at 10 (P), pass at 20 (P/F) or fail Path Modeling –Filter delay 2 x 50 samples per dimension –k = 50 segments –Test 5 (last, current, next 2, and random) –Results: maximum and total anomaly score
27
Typical Results Test file + = Train Maximum Total Gecko V37898 V14 T21 R00s.txt 0.041018 58.254755 V37898 V16 T21 R00s.txt 0.021778 43.696323 V37898 V18 T21 R00s.txt 0.006596 26.814669 V37898 V20 T21 R00s.txt + 0.000913 0.705107 P V37898 V22 T21 R00s.txt 0.008819 48.095410 P/F V37898 V24 T21 R00s.txt 0.006635 23.487464 P V37898 V26 T21 R00s.txt + 0.000361 0.593473 P V37898 V28 T21 R00s.txt 0.009032 48.236476 V37898 V30 T21 R00s.txt 0.033475 194.134671 V37898 V32 T21 R00s.txt 0.076193 448.467580
28
Gecko Summary (Stan) Gecko –1 training file: correct behavior 10 self: 10 P (100% correct) 90 others: 3 P/F, 87 F (97-100% correct) –2-3 training files: some generalization 26 self: 23 P, 3 F (14V, 14V, 16V) (88% correct) –14V is too different from the others 22 “between”: 8 P, 6 P/F, 8 F (36-63% correct) 162 others: 1 P/F, 161 F (99-100% correct)
29
Path Model Summary Anomaly score proportional to training-test difference (correct) Multiple training sets: no generalization (expected)
30
Run Time Performance Tested on data set 1 (218 x 20K points) –50 training files = 10 6 samples –168 test files = 3.36 x 10 6 samples 750 MHz Duron, tsad4.cpp, g++ -O 2.95.2 –Read and filter 10 6 points: 23 sec –Approximate to k = 100 segments: 30 sec. –Test k: 162 sec (500 ns per point per segment)
31
Summary Path ModelGecko Meets all goalsYes OutputNumericPass/fail Training speedO(n log n)O(n 2 ) (DTW) Test speedO(n) ParametersFilter delay, number of segments Transition and error thresholds Local minimaYes GeneralizationNoSome
32
Future Work Test path modeling with other data sets –UCR archive, http://www.cs.ucr.edu/~eamonn/TSDMA/ –Power load profiles, http://www.delelect.com/pdfs/Del-Res.txt Test with multiple dimensions Generalization?
33
Thank You Further Reading http://cs.fit.edu/~mmahoney/nasa/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.