QPRC June Wookyeon Hwang Univ. of South Carolina George Runger Industrial Engineering Industrial, Systems, and Operations Engineering School of Computing, Informatics, and Decision Systems Engineering Arizona State University Eugene Tuv Intel Process Monitoring with Supervised Learning and Artificial Contrasts
QPRC June 2 Statistical Process Control /Anomaly Detection Objective is to detect change in a system –Transportation, environmental, security, health, processes, etc. In modern approach, leverage massive data –Continuous, categorical, missing, outliers, nonlinear relationships Goal is a widely-applicable, flexible method –Normal conditions and fault type unknown Capture relationships between multiple variables –Learn patterns, exploit patterns –Traditional Hotellings T 2 captures structure, provides control region (boundary), quantifies false alarms
QPRC June Traditional Monitoring Traditional approach is Hotellings (1948) T- squared chart Numerical measurements, based on multivariate normality Simple elliptical pattern (Mahalanobis distance) Time-weighted extensions, exponentially weighted moving average, and cumulative sum –More efficient, but same elliptical patterns
QPRC June Transform to Supervised Learning Process monitoring can be transformed to a supervised learning problem –One approach--supplement with artificial, contrasting data –Any one of multiple learners can be used, without pre- specified faults –Results can generalize monitoring in several directionssuch as arbitrary (nonlinear) in-control conditions, fault knowledge, and categorical variables –High-dimensional problems can be handled with an appropriate learner
QPRC June 5 Learn Process Patterns Learn pattern compared to structureless alternative Generate noise, artificial data without structure to differentiate –For example, f(x) = f 1 (x 1 )… f 2 (x 2 ) joint distribution as product of marginals (enforce independence) –Or f(x) = product of uniforms Define & assign y = +/–1 to actual and artificial data, artificial contrast Use supervised (classification) learner to distinguish the data sets –Only simple examples used here
QPRC June 6 Learn Pattern from Artificial Contrast
QPRC June 7 Regularized Least Squares (Kernel Ridge) Classifier with Radial Basis Functions Model with a linear combination of basis functions Smoothness penalty controls complexity –Tightly related to Support Vector Machines (SVM) –Regularized least squares allows closed form solution, trades it for sparsity, may not want to trade! Previous example: challenge for a generalized learner-- multivariate normal data! f(x) x1x1 x2x2
QPRC June 8 RLS Classifier where with parameters, Solution
QPRC June 9 Patterns Learned from Artificial Contrast RLSC True Hotellings 95% probability bound Red: learned contour function to assign +/-1 Actual: n = 1000 Artificial: n = 2000 Complexity: 4/3000 Sigma 2 = 5
QPRC June More Challenging Example with Hotellings Contour
QPRC June Patterns Learned from Artificial Contrast RLSC Actual: n = 1000 Artificial: n = 2000 Complexity: 4/3000 Sigma 2 = 5
QPRC June Patterns Learned from Artificial Contrast RLSC n Actual: n = 1000 Artificial: n = 1000 n Complexity: 4/2000 n Sigma 2 = 5
QPRC June RLSC for p = 10 dimensions Shift = 1 Training error (Type II error) Testing error (Type II error) Chi-squared (99.5%) (Type II error) Mean StDev Shift = 3 Mean StDev
QPRC June Tree-Based Ensembles p = 10 Alternative learner –works with mixed data –elegantly handle missing data –scale invariant –outlier resistance –insensitive to extraneous predictors Provide an implicit ability to select key variables Shift = 1 Training error (Type I error) OOB for training data Testing error (Type II error) OOB for test data Chi-squared (99.5%) (Type II error) Mean StDe v Shift = 3 Mean StDev
QPRC June Nonlinear Patterns Hotellings boundarynot a good solution when patterns are not linear Control boundaries from supervised learning captures the normal operating condition
QPRC June Tuned Control Extend to incorporate specific process knowledge of faults Artificial contrasts generated from the specified fault distribution –or from a mixture of samples from different fault distributions Numerical optimization to design a control statistic can be very complicated –maximizes the likelihood function under a specified fault (alternative)
QPRC June Tuned Control Fault: means of both variables x 1 and x 2 are known to increase Artificial data (black) are sampled from 12 independent normal distributions –Mean vectors are selected from a grid over the area [0, 3] x [0, 3] Learned control region is shown in the right panelapprox. matches the theoretical result in Testik et al., 2004.
QPRC June Incorporate Time-Weighted Rules What form of statistic should be filtered and monitored? –Log likelihood ratio Some learners provide call probability estimates Bayes theorem (for equal sample size) gives Log likelihood ratio for an observation x t estimated as Apply EWMA (or CUSUM, etc.) to l t
QPRC June Time-Weighted ARLs ARLs for selected schemes applied to l t statistic –10-dimensional, independent normal
QPRC June Example: 50 Dimensions
QPRC June Example: 50 Dimensions Hotellings: left Artificial contrast: right
QPRC June Example: Credit Data (UCI) 20 attributes: 7 numerical and 13 categorical Associated class label of good or bad credit risk Artificial data generated from continuous and discrete uniform distributions, respectively, independently for each attribute Ordered by 300 good instances followed by 300 bad
QPRC June Artificial Contrasts for Credit Data Plot of l t over time
QPRC June Diagnostics: Contribution Plots 50 dimensions: 2 contributors, 48 noise variables (scatter plot projections to contributor variables)
QPRC June Contributor Plots from PCA T2
QPRC June Contributor Plots from PCA SPE
QPRC June Contributor Plots from Artificial Contrast Ensemble (ACE) Impurity importance weighted by means of split variable
QPRC June Contributor Plots for Nonlinear System Contributor plots from SPE, T2 and ACE in left, center, right, respectively
QPRC June Conclusions Can/must leverage the automated-ubiquitous, data- computational environment –Professional obsolesce Employ flexible, powerful control solution, for broad applications: environment, health, security, etc., as well as manufacturing –Normal sensors not obvious, patterns not known Include automated diagnosis –Tools to filter to identify contributors Computational feasibility in embedded software This material is based upon work supported by the National Science Foundation under Grant No