Digital Media Lab 1 Data Mining Applied To Fault Detection Shinho Jeong Jaewon Shim Hyunsoo Lee {cinooco, poohut,

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
« هو اللطیف » By : Atefe Malek. khatabi Spring 90.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
Principal Component Analysis
Speaker Adaptation for Vowel Classification
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
Learning From Data Chichang Jou Tamkang University.
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Radford M. Neal and Jianguo Zhang the winners.
Sparse Kernels Methods Steve Gunn.
1 When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS, Syracuse University.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Saturation, Flat-spotting Shift up Derivative Weight Decay No derivative on output nodes.
Projection methods in chemistry Autumn 2011 By: Atefe Malek.khatabi M. Daszykowski, B. Walczak, D.L. Massart* Chemometrics and Intelligent Laboratory.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc. Revised talk:
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 2011 Predicting Solar Generation from Weather Forecasts Using Machine Learning Navin.
A Generalized Likelihood Ratio Approach to the Detection of Jumps in Linear Systems Min Luo.
© N. Kasabov Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering, MIT Press, 1996 INFO331 Machine learning. Neural networks. Supervised.
Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Data Mining: Neural Network Applications by Louise Francis CAS Annual Meeting, Nov 11, 2002 Francis Analytics and Actuarial Data Mining, Inc.
An Introduction to Support Vector Machines (M. Law)
CHAPTER 5 S TOCHASTIC G RADIENT F ORM OF S TOCHASTIC A PROXIMATION Organization of chapter in ISSO –Stochastic gradient Core algorithm Basic principles.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Gap-filling and Fault-detection for the life under your feet dataset.
Taguchi. Abstraction Optimisation of manufacturing processes is typically performed utilising mathematical process models or designed experiments. However,
Univ logo Fault Diagnosis for Power Transmission Line using Statistical Methods Yuanjun Guo Prof. Kang Li Queen’s University, Belfast UKACC PhD Presentation.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Principal Manifolds and Probabilistic Subspaces for Visual Recognition Baback Moghaddam TPAMI, June John Galeotti Advanced Perception February 12,
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Simultaneous estimation of monotone trends and seasonal patterns in time series of environmental data By Mohamed Hussian and Anders Grimvall.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Seoul National University Neural Network Modeling for Intelligent Novelty Detection 제 2 차 뇌신경정보학 Workshop 일시 : 2002 년 2 월 27 일 ( 수 ) 10:00-18:00 장소 : KAIST.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
1 Neighboring Feature Clustering Author: Z. Wang, W. Zheng, Y. Wang, J. Ford, F. Makedon, J. Pearlman Presenter: Prof. Fillia Makedon Dartmouth College.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Data Mining and Decision Support
Feature Selection and Extraction Michael J. Watts
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
Data Mining: Neural Network Applications by Louise Francis CAS Convention, Nov 13, 2001 Francis Analytics and Actuarial Data Mining, Inc.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
A generic procedure for simultaneous estimation of monotone trends and seasonal patterns in time series of environmental data by Mohamed Hussian and Anders.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Big data classification using neural network
Data Transformation: Normalization
Chapter 3: Maximum-Likelihood Parameter Estimation
Ch 12. Continuous Latent Variables ~ 12
School of Computer Science & Engineering
Principal Component Analysis (PCA)
Descriptive Statistics vs. Factor Analysis
Somi Jacob and Christian Bach
Principal Component Analysis
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
Principal Component Analysis
Marios Mattheakis and Pavlos Protopapas
What is Artificial Intelligence?
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Digital Media Lab 1 Data Mining Applied To Fault Detection Shinho Jeong Jaewon Shim Hyunsoo Lee {cinooco, poohut,

Logo Digital Media Lab 2 Introduction Aims of work Neural Network Implementation of the Non-linear PCA model using Principal Curve algorithm to increase both rapidity & accuracy of fault detection. Data mining? Extracting useful information from raw data using statistical methods and/or AI techniques. Characteristics Maximum use of data available. Rigorous theoretical knowledge not required. Efficient for a system with deviation between actual process and first principal based model. Application Process monitoring  Fault detection/diagnosis/isolation Process estimation  Soft sensor

Logo Digital Media Lab 3 Fault Detection? Fault introduction

Logo Digital Media Lab 4 Issues Major concerns Rapidity  Ability to detect fault situation at an earlier stage of fault introduction. Accuracy  Ability to distinguish fault situation from possible process variations. Trade-off problem Solve through  Frequent acquisition of process data.  Derivation of efficient process model through data analysis using Data mining methodologies.

Logo Digital Media Lab 5 Inherent Problems  Multi-colinearity problem  Due to high correlation among variables.  Likely to cause redundancy problem.  Derivation of new uncorrelated feature variables required.  Dimensionality problem  Due to more variables than observations.  Likely to cause over-fitting problem in model-building phase.  Dimensional reduction required.  Non-linearity problem  Due to non-linear relation among variables.  Pre-determination of degree of non-linearity required.  Application of non-linear model required.  Process dynamics problem  Due to change of operating conditions with time.  Likely to cause change of correlation structure among variables.

Logo Digital Media Lab 6 Statistical Approach Statistical data analysis Uni-variate SPC  Conventional Shewart, CUSUM, EWMA, etc.  Limitations  Perform monitoring for each process variable.  Inefficient for multi-variate system.  More concerned with how variables co-vary.  Need for multi-variate data analysis Multi-variate SPC  PCA  Most popular multi-variate data analysis method.  Basis for regression modesl(PLS, PCR, etc).

Logo Digital Media Lab 7 Linear PCA(1) Features Creation of…  Fewer => solve ‘Dimensionality problem‘ &  Orthogonal => solve ‘Multi-colinearity problem‘ new feature variables(Principal components) through linear combination of original variables. Perform Noise reduction additionally. Basis for PCR, PLS. Limitation Linear model => inefficient for nonlinear process.

Logo Digital Media Lab 8 Linear PCA(2) Theory Decoding mapping Encoding mapping

Logo Digital Media Lab 9 Linear PCA(3) ERM inductive principle Limitation Alternatives Extension of linear functions to non-linear ones using…  Neural networks.  Statistical method.

Logo Digital Media Lab 10 Kramer’s Approach Limitations Difficult to train the networks with 3 hidden layers. Difficult to determine the optimal # of hidden nodes. Difficult to interpret the meaning of the bottle-neck layer.

Logo Digital Media Lab 11 Non-linear PCA(1) Principal curve (Hastie et al. 1989) Statistical, Non-linear generalization of the first linear Principal component. Self-consistency principle  Projection step(Encoding)  Conditional averaging(Decoding)

Logo Digital Media Lab 12 Non-linear PCA(2) Limitations Finiteness of data. Unknown density distribution. No a priori information about data. Additional consideration  Conditional averaging => Locally weighted regression, Kernel regression Increasing flexibility(Span decreasing)  Span : fraction of data considered to be in the neighborhood. ~ smoothness of fit ~ generalization capacity

Logo Digital Media Lab 13 Proposed Approach(1) LPCA v.s. NLPCA

Logo Digital Media Lab 14 Proposed Approach(1) Creation of Non-linear principal scores

Logo Digital Media Lab 15 Proposed Approach(2) Implementation of Auto-associative N.N.

Logo Digital Media Lab 16 Case Study Objective Fault detection during operating mode change using 6 variables Data acquisition & Model building NOC data : 120 observations => NLPCA model building Fault data : another 120 observations drift

Logo Digital Media Lab 17 Model Building  Auto-associative N.N. using 2 MLP’s 5 iterations 50 iterations 30 iterations 1st MLP N.N. 2nd MLP N.N.  Principal curve fitting

Logo Digital Media Lab 18 Monitoring Result NLPCA model more efficient than LPCA model!!! Fault introduction

Logo Digital Media Lab 19 Conclusion Result Fault Detection performance was enhanced in terms of both speed and accuracy when applied to a test case. Future work Integration of ‘Fault Diagnosis’ and ‘Fault Isolation’ methods to perform complete process monitoring on a single platform.