Space Shuttle Engine Valve Anomaly Detection by Data Compression Matt Mahoney
Outline Problem Statement Related Work Anomaly Detection by Data Compression Future Work
Problem: How to Detect Anomalies in Space Shuttle Valves Normal Solenoid Current Abnormal
Current Method Identify features (zero crossings, peaks…) Specify correct behavior using SCL rules
Labeled Rising Edge Details
Goal Reduce the human workload in specifying “normal” behavior of time-series data Rule output should be in Space Command Language (SCL, an expert system language) to allow manual adjustments Anomaly detection must be real time (1K- 10K samples per second)
Related Work Automated waveform segmentation (Gecko, Stan Salvador) Segment characteristics (level, slope, curvature) identify states Rules are specified as allowed state transitions Problem: segmentation is slow
Proposal: Modeling using Data Compression Train model on “normal” time series Test by measuring goodness of fit to the trained model
Cross Entropy Measures fitness of a model M relative to a true (but unknown) probability distribution, P Minimized when M = P Estimated by a data compressor that uses M H M (P) = x X -P(x) log M(x) H M (P) = Cross entropy (compressed data size) X = set of all possible inputs (waveforms) P(x) = true probability of x M(x) = estimated probability by model M
Measuring Cross Entropy Normal, uncompressed Abnormal, uncompressed Normal, compressed Abnormal, compressed Normal 1Normal 2 Normal 1 or 2Abnormal
Anomaly Score Score(y) = (C(xy) – C(x)) / C(y) x = Training (normal) waveform y = Test (possibly abnormal) waveform xy = Concatenation of x and y C(.) = Size after compression A higher score (worse compression after training) indicates an anomaly
Data Compressors GZIP (Gailly) –LZ77: duplicate strings are replaced by pointers to the previous occurrence PAQ3 (Mahoney) –Weighted context mixing –Arithmetic coding of next-bit probability RK 1.04 (Taylor) –PPMZ (models longest matching context) –Delta coding option for analog data
Data TEK 0, TEK 1 = Normal on/off cycle of Marotta valve S/N TEK {2, 3, 5, 10, 11, 15, 16, 17} = various forced failures 1000 solenoid current samples at 1 ms intervals Range: -3.1 to 7.06 A at 0.04 A resolution Converted to bit values (1000 byte files)
Experimental Procedure Nor 0: Train on TEK 0, test on TEK 1 (normal) Nor 1: Train on TEK 1, test on TEK 0 (normal) Ab 0: Train on TEK 0, average of tests on 8 abnormal traces Ab 1: Train on TEK 1, average of tests on 8 abnormal traces
Anomaly Scores
Anomaly Scores for TEK 0 GZIPPAQ3RK –mx3 –fd1 TEK TEK TEK TEK TEK TEK TEK TEK TEK
Run Time Performance (750 MHz PC) Real Time = 1K sample/sec GZIP – 3000K samples/sec PAQ3 – 40K samples/sec RK -mx3 –fd1 – 78K samples/sec
Summary Data compression detects anomalies in the TEK valve data (2 normal, 8 abnormal traces) GZIP and PAQ3 detect anomalies in 8 of 8 cases using either training set RK detects 7 of 8 anomalies using either training set (TEK 15 appears more “normal” to all 3 compressors)
Future Work Verify with more data sets (voltage, temperature, plunger blockage) Identify anomalous points within the trace Improve modeling of analog data Translate models to SCL Work is preliminary. Much needs to be done.
Thank You For more information,