Data Mining Manufacturing Data Dave E. Stevens Eastman Chemical Company Kingsport, TN.

Slides:



Advertisements
Similar presentations
Step three: statistical analyses to test biological hypotheses General protocol continued.
Advertisements

Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
PCA for analysis of complex multivariate data. Interpretation of large data tables by PCA In industry, research and finance the amount of data is often.
On-line Performance Monitoring of a Chemical Process BP Chemicals/CPACT/MDC.
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Machine Learning Lecture 8 Data Processing and Representation
Dimension reduction (1)
Simple Regression Model
General morphometric protocol Four simple steps to morphometric success.
S-SENCE Signal processing for chemical sensors Martin Holmberg S-SENCE Applied Physics, Department of Physics and Measurement Technology (IFM) Linköping.
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Correlation and Regression Analysis
Data mining and statistical learning, lecture 4 Outline Regression on a large number of correlated inputs  A few comments about shrinkage methods, such.
09/05/2005 סמינריון במתמטיקה ביולוגית Dimension Reduction - PCA Principle Component Analysis.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Correlation and Regression Analysis
Correlation & Regression Math 137 Fresno State Burger.
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc. Revised talk:
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
Inference for regression - Simple linear regression
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
TIME SERIES by H.V.S. DE SILVA DEPARTMENT OF MATHEMATICS
Graphical Analysis. Why Graph Data? Graphical methods Require very little training Easy to use Massive amounts of data can be presented more readily Can.
Structural Equation Modeling Made Easy A Tutorial Based on a Behavioral Study of Communication in Virtual Teams Using WarpPLS Ned Kock.
Chapter Fourteen Statistical Analysis Procedures Statistical procedures that simultaneously analyze multiple measurements on each individual or.
Copyright © Cengage Learning. All rights reserved. 12 Simple Linear Regression and Correlation
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Available at Chapter 13 Multivariate Analysis BCB 702: Biostatistics
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
CORRELATION. Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson’s coefficient of correlation.
Tom.h.wilson Department of Geology and Geography West Virginia University Morgantown, WV.
1 Association  Variables –Response – an outcome variable whose values exhibit variability. –Explanatory – a variable that we use to try to explain the.
We would expect the ENTER score to depend on the average number of hours of study per week. So we take the average hours of study as the independent.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Principal Component Analysis (PCA)
Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.
Canonical Correlation. Canonical correlation analysis (CCA) is a statistical technique that facilitates the study of interrelationships among sets of.
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple.
Part II Exploring Relationships Between Variables.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Determining How Costs Behave
Unsupervised Learning
JMP Discovery Summit 2016 Janet Alvarado
Warm Up Scatter Plot Activity.
Clinical Calculation 5th Edition
Correlation & Regression
Exploring Microarray data
SCATTERPLOTS, ASSOCIATION AND RELATIONSHIPS
Principal Components Analysis
Descriptive Statistics vs. Factor Analysis
Feature Selection Methods
ENM 310 Design of Experiments and Regression Analysis Chapter 3
Principal Component Analysis
14 Design of Experiments with Several Factors CHAPTER OUTLINE
CORRELATION & REGRESSION compiled by Dr Kunal Pathak
Unsupervised Learning
Presentation transcript:

Data Mining Manufacturing Data Dave E. Stevens Eastman Chemical Company Kingsport, TN

Presentation Outline Intro: Data Mining Manufacturing Data Data Preparation Principal Component Analysis Partial Least Squares PLS Discriminate Analysis

Manufacturing Data Then and Now 40 Years Ago - Few Measurements - Temp, Press., Flows Today - Many Measurements - Very Often - Creates Large Data Sets Purposes For Measuring - Process “State” - Relationships (X, X to Y) - Classification - Optimization

Concerns With Current Manufacturing Data Dimensionality: (Large) >1000 process variables every few seconds >10 quality variables every few hours Data Overload - Analyst concentrates on only a few variables and ignore most of the information! Collinearity: Not 1000 independent things at work. Only a few underlying events affecting all variables. Variables are all highly correlated. Noise: Missing Data:

Multivariate Data Concept * *** * * * * * * * * * * * * * * * * * * * * * * * * * * * * BreakLoad Control Chart Elongation Control Chart Is This Process In Control? *

Data Preparation Data collected in a Process Data Historian will have Process Up and Down Times recorded from the instrumentation Need a software tool that will permit easy methods to clean the data and do initial Exploratory Data Analyses JMP Software –Interactive Graphing –Removal of Outliers Graphically or Variable Selection Criteria –Join and/or Subset Data Tables –Statistical Analyses

Principle Components Analysis Understanding Relationships Between Process Variables

Principle Component Analysis Principle Component Analysis is a Projection Technique Raw data are first “Centered” and “Scaled” Each Principle Component represents a direction through the data that captures the maximum amount of raw data variation For each Principle Component (a), new data values are generated for each obs. (i) which are a linear combination of the raw X variables (k): t i,a = b a,1 *X i,1 + b a,2 *X i,2... b a,k *X i,k for each obs. i Where the b’s are loadings (-1 to 1) Increasing number of Principle Components represent less and less raw data variation

Principle Component Analysis Fundamentals 2nd PC 1st PC Projections X1X1 X2X2 X3X3

PCA: Scores x1x1 x2x2 x3x3 1st PC 2nd PC Obs. i t i,1 t i,2 The scores t ia (observation i, dimension a) are the places along the component lines where the observations are projected.

PCA: Loadings x1x1 x2x2 x3x3 The loadings p ak (dimension a, variable k) indicate the importance of the variable k to the given dimension. p ak is the direction cosine (cos  of  the given component line vs. the x k coordinate axis. 11 x1x1 x2x2 x3x3 22 33 1st PC Cos(  X/PC

PCA Example 10 process responses obtained on each observation Data represented weekly process response averages Data spanned 10 months Objective: Determine if the system was stable.

Process Shift June 30 (5_30) PCA Score Plot PC #2 PC #1

Loadings PC#2Loadings PC#2 Loadings PC #1 X3X3 X7X7 X2X2 X4X4 X8X8 X6X6 X9X9 X1X1 X5X5 X 10 PCA Loadings Plot

Process Shift June 30 (5_30) PC #1 PC #2 Relative to process shift, X 1 and X 5 were high in value and X 4 and X 8 were low in value. Pos. Corr. Vars. were X 1, X 5 and X 4, X 8 Neg. Corr. Vars. were X 1, X 5 to X 4, X 8

Process variable X 1 increased in value when the system shifted from the left side to the right side on the PCA Score plot

Variables X 1 and X 5 were positively correlated

Partial Least Squares Technique Understanding Relationships Between Process & Response Variables

Partial Least Squares Fundamentals X SpaceY Space Planes Projections X1X1 X2X2 X3X3 Y1Y1 Y2Y2 Y3Y3

TA Filter Example Objective: Relate Filtrate, TA Catalyst and Dryer Temp to Filter Speed, Vacuum, Wash Acid, Weir Level, Nash Discharge Pressure and Feed Tank Temperature –Keep Filtrate High, TA Catalyst Low Data: 12 Hour Averages from PI collected over a four month period

TA Filter

TA Filter Relationships Catalyst Higher filter speed and vac. pressure increased the filtrate flow and catalyst content but lowered the dyer temp. Higher weir level, nash discharge pressure and Op tank temp increased filtrate flow. Wash acid flow had no driving effect on the responses.

PLS Results Obtain Weight Plots (Previous Slide) –Shows the inter-relationships between the Xs and Ys Obtain Regression Coefficients –Can be used to generate response surface plot Display Variables Important to Prediction (VIP) Display Residual Plots and Distance to the Model Plot

Correlation Does Not Always Mean Causation

PLS Discriminate Technique Determine What Drives Data Groups To Be Different

Objective Given groups of data from a particular process, determine what makes the groups different with respect to the given measurements. Example: TA %T –Measurements: 4-HMB, TMA, TPAD, 4- HBA, 4-CBA, IPA, BA, PTAD, p-TA, 2,7- DCF, 2,6-DCF, 4-4-DCB, 3,5-DCF, 9-F-2- CA, 9-F-4-CA, 2,6-DCA, 4,4-DCS, L*, a*, b*,.1%,.9%, Mean, %T –Daily Numbers –Data taken from Convey Line #1 and #2

TA %T

PLS Discriminate Analysis High %T Low %T

What Measurements Separated the Groups? The high %T group ($DA1) was high in %T, 0.1, Mean and L. The low %T group ($DA2) had several measurements that were high in value and were positively correlated (see next slide for details). 2

The low %T group ($DA2) had several variables that were correlated and high in value: 4 4’-DCS, 4-CBA TMA and p-TA

Cat

Computer Software JMP Software – SIMCA-P from Umetrics –