Space Shuttle Engine Valve Anomaly Detection by Data Compression Matt Mahoney.

Slides:



Advertisements
Similar presentations
Empirical Model Building I: Objectives: By the end of this class you should be able to: find the equation of the “best fit” line for a linear model explain.
Advertisements

Fast Firewall Implementation for Software and Hardware-based Routers Lili Qiu, Microsoft Research George Varghese, UCSD Subhash Suri, UCSB 9 th International.
Tactical Event Resolution Using Software Agents, Crisp Rules, and a Genetic Algorithm John M. D. Hill, Michael S. Miller, John Yen, and Udo W. Pooch Department.
Uncertainty Analysis Using GEM-SA. GEM-SA course - session 42 Outline Setting up the project Running a simple analysis Exercise More complex analyses.
Test Logging and Automated Failure Analysis Why Weak Automation Is Worse Than No Automation Geoff Staneff
5/4/2006BAE Analog to Digital (A/D) Conversion An overview of A/D techniques.
Processing of large document collections
Quantization Prof. Siripong Potisuk.
The loss function, the normal equation,
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
© 2009 Pearson Education, Upper Saddle River, NJ All Rights ReservedFloyd, Digital Fundamentals, 10 th ed Digital Fundamentals Tenth Edition Floyd.
GLAST LAT ProjectI&T Meeting – Feb 12, 2003 W. Focke 1 EM timing analysis Warren Focke February 12, 2004.
1 Lab Equipment. 2 TopicSlides DC Power Supply3-4 Digital Multimeter5-8 Function Generator9-12 Scope – basic controls13-20 Scope – cursors21-24 Scope.
111/9/2005EE 108A Lecture 13 (c) 2005 W. J. Dally EE108A Lecture 13: Metastability and Synchronization Failure (or When Good Flip-Flops go Bad)
Digital to Analog Converters
Chapter 7 Special Section Focus on Data Compression.
Gzip Compression and Decompression 1. Gzip file format 2. Gzip Compress Algorithm. LZ77 algorithm. LZ77 algorithm.Dynamic Huffman coding algorithm.Dynamic.
16-Bit Timer/Counter 1 and 3 Counter/Timer 1,3 (TCNT1, TCNT3) are identical in function. Three separate comparison registers exist. Thus, three separate.
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
Chapter 5 Sampling and Statistics Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
Digital Fundamentals Floyd Chapter 1 Tenth Edition
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
Testing. Definition From the dictionary- the means by which the presence, quality, or genuineness of anything is determined; a means of trial. For software.
Path-State Modeling for Time Series Anomaly Detection Matt Mahoney.
ESTIMATING with confidence. Confidence INterval A confidence interval gives an estimated range of values which is likely to include an unknown population.
©2008 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved. This material is protected under all copyright laws as they currently exist.
3 SIGNALLING Analogue vs. digital signalling oRecap advantages and disadvantages of analogue and digital signalling oCalculate signal transmission rates.
COMMUNICATION SYSTEM EEEB453 Chapter 5 (Part IV) DIGITAL TRANSMISSION.
Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of.
Bug Localization with Machine Learning Techniques Wujie Zheng
COMPUTER PROGRAMMING. Control Structures A program is usually not limited to a linear sequence of instructions. During its process it may repeat code.
The PAQ4 Data Compressor
© 2009 Pearson Education, Upper Saddle River, NJ All Rights ReservedFloyd, Digital Fundamentals, 10 th ed Digital Fundamentals with PLD Programming.
Uncertainty in Automation: Anomaly Detection in Event-Based Systems Dawn Tilbury Linday Allen (PhD) and John Broderick University of Michigan.
Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Floyd, Digital Fundamentals, 10 th ed Slide 1 Digital Fundamentals Tenth Edition Floyd © 2008 Pearson Education Chapter 1.
Instrumentation Overview Spring 2012 The laboratory is a controlled environment where we can measure isolated physical phenomena with a view to eventual.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Pulse Code Modulation PCM is a method of converting an analog signal into a digital signal. (A/D conversion) The amplitude of Analog signal can take any.
Wobbles, humps and sudden jumps1 Transitions in time: what to look for and how to describe them …
Data Compression Meeting October 25, 2002 Arithmetic Coding.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.
Abdullah Aldahami ( ) April 6,  Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average.
1.  Introduction  The Benefits of the Report Writer Module ◦ For Detail and Summary Printing ◦ For Control Break Processing ◦ For Printing Headings.
FADC Time Study EJ, HD 10/09. Purpose: to test Hai Dong’s firmware implementation of Indiana U. timing algorithm NOT meant to be a definitive study of.
COMPANDING - is the process of compressing and then expanding
Application of Maximum Entropy Principle to software failure prediction Wu Ji Software Engineering Institute BeiHang University.
10.1 – Estimating with Confidence. Recall: The Law of Large Numbers says the sample mean from a large SRS will be close to the unknown population mean.
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
Chapter 4: Variables, Constants, and Arithmetic Operators Introduction to Programming with C++ Fourth Edition.
Chapter 14: Combining Data Vertically 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
DOWeR Detecting Outliers in Web Service Requests Master’s Presentation of Christian Blass.
Confidence Intervals Cont.
EKT124 Digital Electronics 1 Introduction to Digital Electronics
Testing Tutorial 7.
Analog to digital conversion
Digital Fundamentals Floyd Chapter 1 Digital concepts Tenth Edition
Chapter 10 © 2011, The McGraw-Hill Companies, Inc.
Other Kinds of Arrays Chapter 11
Introduction to Instrumentation Engineering
Lesson 8: Analog Signal Conversion
Introduction to Data Mining, 2nd Edition
Digital Fundamentals Floyd Chapter 1 Tenth Edition
The loss function, the normal equation,
Introduction to SAS Essentials Mastering SAS for Data Analytics
Mathematical Foundations of BME Reza Shadmehr
Series 5300 Lithium Cell Formation System
Presentation transcript:

Space Shuttle Engine Valve Anomaly Detection by Data Compression Matt Mahoney

Outline Problem Statement Related Work Anomaly Detection by Data Compression Future Work

Problem: How to Detect Anomalies in Space Shuttle Valves Normal Solenoid Current Abnormal

Current Method Identify features (zero crossings, peaks…) Specify correct behavior using SCL rules

Labeled Rising Edge Details

Goal Reduce the human workload in specifying “normal” behavior of time-series data Rule output should be in Space Command Language (SCL, an expert system language) to allow manual adjustments Anomaly detection must be real time (1K- 10K samples per second)

Related Work Automated waveform segmentation (Gecko, Stan Salvador) Segment characteristics (level, slope, curvature) identify states Rules are specified as allowed state transitions Problem: segmentation is slow

Proposal: Modeling using Data Compression Train model on “normal” time series Test by measuring goodness of fit to the trained model

Cross Entropy Measures fitness of a model M relative to a true (but unknown) probability distribution, P Minimized when M = P Estimated by a data compressor that uses M H M (P) =  x  X -P(x) log M(x) H M (P) = Cross entropy (compressed data size) X = set of all possible inputs (waveforms) P(x) = true probability of x M(x) = estimated probability by model M

Measuring Cross Entropy Normal, uncompressed Abnormal, uncompressed Normal, compressed Abnormal, compressed Normal 1Normal 2 Normal 1 or 2Abnormal

Anomaly Score Score(y) = (C(xy) – C(x)) / C(y) x = Training (normal) waveform y = Test (possibly abnormal) waveform xy = Concatenation of x and y C(.) = Size after compression A higher score (worse compression after training) indicates an anomaly

Data Compressors GZIP (Gailly) –LZ77: duplicate strings are replaced by pointers to the previous occurrence PAQ3 (Mahoney) –Weighted context mixing –Arithmetic coding of next-bit probability RK 1.04 (Taylor) –PPMZ (models longest matching context) –Delta coding option for analog data

Data TEK 0, TEK 1 = Normal on/off cycle of Marotta valve S/N TEK {2, 3, 5, 10, 11, 15, 16, 17} = various forced failures 1000 solenoid current samples at 1 ms intervals Range: -3.1 to 7.06 A at 0.04 A resolution Converted to bit values (1000 byte files)

Experimental Procedure Nor 0: Train on TEK 0, test on TEK 1 (normal) Nor 1: Train on TEK 1, test on TEK 0 (normal) Ab 0: Train on TEK 0, average of tests on 8 abnormal traces Ab 1: Train on TEK 1, average of tests on 8 abnormal traces

Anomaly Scores

Anomaly Scores for TEK 0 GZIPPAQ3RK –mx3 –fd1 TEK TEK TEK TEK TEK TEK TEK TEK TEK

Run Time Performance (750 MHz PC) Real Time = 1K sample/sec GZIP – 3000K samples/sec PAQ3 – 40K samples/sec RK -mx3 –fd1 – 78K samples/sec

Summary Data compression detects anomalies in the TEK valve data (2 normal, 8 abnormal traces) GZIP and PAQ3 detect anomalies in 8 of 8 cases using either training set RK detects 7 of 8 anomalies using either training set (TEK 15 appears more “normal” to all 3 compressors)

Future Work Verify with more data sets (voltage, temperature, plunger blockage) Identify anomalous points within the trace Improve modeling of analog data Translate models to SCL Work is preliminary. Much needs to be done.

Thank You For more information,