QPRC June Wookyeon Hwang Univ. of South Carolina George Runger Industrial Engineering Industrial, Systems, and Operations Engineering.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Data-Assimilation Research Centre
Introduction to Support Vector Machines (SVM)
CS 268: Packet Scheduling Ion Stoica March 18/20, 2003.
9/5/ Measuring Errors Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
Component Analysis (Review)
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Data Mining Classification: Alternative Techniques
Support Vector Machines
Chapter 4: Linear Models for Classification
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Announcements  Homework 4 is due on this Thursday (02/27/2004)  Project proposal is due on 03/02.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Data mining and statistical learning - lecture 13 Separating hyperplane.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Chapter 5 Data mining : A Closer Look.
Lecture II-2: Probability Review
Classification and Prediction: Regression Analysis
An Introduction to Support Vector Machines Martin Law.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Anomaly detection with Bayesian networks Website: John Sandiford.
Principles of Pattern Recognition
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
An Introduction to Support Vector Machines (M. Law)
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
1 SMU EMIS 7364 NTU TO-570-N Control Charts Basic Concepts and Mathematical Basis Updated: 3/2/04 Statistical Quality Control Dr. Jerrell T. Stracener,
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Review of statistical modeling and probability theory Alan Moses ML4bio.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Lecture 1.31 Criteria for optimal reception of radio signals.
Chapter 7. Classification and Prediction
Probability Theory and Parameter Estimation I
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Supervised Time Series Pattern Discovery through Local Importance
CH 5: Multivariate Methods
COSC 4335: Other Classification Techniques
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
Linear Discrimination
Support Vector Machines 2
Presentation transcript:

QPRC June Wookyeon Hwang Univ. of South Carolina George Runger Industrial Engineering Industrial, Systems, and Operations Engineering School of Computing, Informatics, and Decision Systems Engineering Arizona State University Eugene Tuv Intel Process Monitoring with Supervised Learning and Artificial Contrasts

QPRC June 2 Statistical Process Control /Anomaly Detection Objective is to detect change in a system –Transportation, environmental, security, health, processes, etc. In modern approach, leverage massive data –Continuous, categorical, missing, outliers, nonlinear relationships Goal is a widely-applicable, flexible method –Normal conditions and fault type unknown Capture relationships between multiple variables –Learn patterns, exploit patterns –Traditional Hotellings T 2 captures structure, provides control region (boundary), quantifies false alarms

QPRC June Traditional Monitoring Traditional approach is Hotellings (1948) T- squared chart Numerical measurements, based on multivariate normality Simple elliptical pattern (Mahalanobis distance) Time-weighted extensions, exponentially weighted moving average, and cumulative sum –More efficient, but same elliptical patterns

QPRC June Transform to Supervised Learning Process monitoring can be transformed to a supervised learning problem –One approach--supplement with artificial, contrasting data –Any one of multiple learners can be used, without pre- specified faults –Results can generalize monitoring in several directionssuch as arbitrary (nonlinear) in-control conditions, fault knowledge, and categorical variables –High-dimensional problems can be handled with an appropriate learner

QPRC June 5 Learn Process Patterns Learn pattern compared to structureless alternative Generate noise, artificial data without structure to differentiate –For example, f(x) = f 1 (x 1 )… f 2 (x 2 ) joint distribution as product of marginals (enforce independence) –Or f(x) = product of uniforms Define & assign y = +/–1 to actual and artificial data, artificial contrast Use supervised (classification) learner to distinguish the data sets –Only simple examples used here

QPRC June 6 Learn Pattern from Artificial Contrast

QPRC June 7 Regularized Least Squares (Kernel Ridge) Classifier with Radial Basis Functions Model with a linear combination of basis functions Smoothness penalty controls complexity –Tightly related to Support Vector Machines (SVM) –Regularized least squares allows closed form solution, trades it for sparsity, may not want to trade! Previous example: challenge for a generalized learner-- multivariate normal data! f(x) x1x1 x2x2

QPRC June 8 RLS Classifier where with parameters, Solution

QPRC June 9 Patterns Learned from Artificial Contrast RLSC True Hotellings 95% probability bound Red: learned contour function to assign +/-1 Actual: n = 1000 Artificial: n = 2000 Complexity: 4/3000 Sigma 2 = 5

QPRC June More Challenging Example with Hotellings Contour

QPRC June Patterns Learned from Artificial Contrast RLSC Actual: n = 1000 Artificial: n = 2000 Complexity: 4/3000 Sigma 2 = 5

QPRC June Patterns Learned from Artificial Contrast RLSC n Actual: n = 1000 Artificial: n = 1000 n Complexity: 4/2000 n Sigma 2 = 5

QPRC June RLSC for p = 10 dimensions Shift = 1 Training error (Type II error) Testing error (Type II error) Chi-squared (99.5%) (Type II error) Mean StDev Shift = 3 Mean StDev

QPRC June Tree-Based Ensembles p = 10 Alternative learner –works with mixed data –elegantly handle missing data –scale invariant –outlier resistance –insensitive to extraneous predictors Provide an implicit ability to select key variables Shift = 1 Training error (Type I error) OOB for training data Testing error (Type II error) OOB for test data Chi-squared (99.5%) (Type II error) Mean StDe v Shift = 3 Mean StDev

QPRC June Nonlinear Patterns Hotellings boundarynot a good solution when patterns are not linear Control boundaries from supervised learning captures the normal operating condition

QPRC June Tuned Control Extend to incorporate specific process knowledge of faults Artificial contrasts generated from the specified fault distribution –or from a mixture of samples from different fault distributions Numerical optimization to design a control statistic can be very complicated –maximizes the likelihood function under a specified fault (alternative)

QPRC June Tuned Control Fault: means of both variables x 1 and x 2 are known to increase Artificial data (black) are sampled from 12 independent normal distributions –Mean vectors are selected from a grid over the area [0, 3] x [0, 3] Learned control region is shown in the right panelapprox. matches the theoretical result in Testik et al., 2004.

QPRC June Incorporate Time-Weighted Rules What form of statistic should be filtered and monitored? –Log likelihood ratio Some learners provide call probability estimates Bayes theorem (for equal sample size) gives Log likelihood ratio for an observation x t estimated as Apply EWMA (or CUSUM, etc.) to l t

QPRC June Time-Weighted ARLs ARLs for selected schemes applied to l t statistic –10-dimensional, independent normal

QPRC June Example: 50 Dimensions

QPRC June Example: 50 Dimensions Hotellings: left Artificial contrast: right

QPRC June Example: Credit Data (UCI) 20 attributes: 7 numerical and 13 categorical Associated class label of good or bad credit risk Artificial data generated from continuous and discrete uniform distributions, respectively, independently for each attribute Ordered by 300 good instances followed by 300 bad

QPRC June Artificial Contrasts for Credit Data Plot of l t over time

QPRC June Diagnostics: Contribution Plots 50 dimensions: 2 contributors, 48 noise variables (scatter plot projections to contributor variables)

QPRC June Contributor Plots from PCA T2

QPRC June Contributor Plots from PCA SPE

QPRC June Contributor Plots from Artificial Contrast Ensemble (ACE) Impurity importance weighted by means of split variable

QPRC June Contributor Plots for Nonlinear System Contributor plots from SPE, T2 and ACE in left, center, right, respectively

QPRC June Conclusions Can/must leverage the automated-ubiquitous, data- computational environment –Professional obsolesce Employ flexible, powerful control solution, for broad applications: environment, health, security, etc., as well as manufacturing –Normal sensors not obvious, patterns not known Include automated diagnosis –Tools to filter to identify contributors Computational feasibility in embedded software This material is based upon work supported by the National Science Foundation under Grant No