Download presentation
Presentation is loading. Please wait.
Published byAmy Sutton Modified over 9 years ago
2
Data Mining Manufacturing Data Dave E. Stevens Eastman Chemical Company Kingsport, TN
3
Presentation Outline Intro: Data Mining Manufacturing Data Data Preparation Principal Component Analysis Partial Least Squares PLS Discriminate Analysis
4
Manufacturing Data Then and Now 40 Years Ago - Few Measurements - Temp, Press., Flows Today - Many Measurements - Very Often - Creates Large Data Sets Purposes For Measuring - Process “State” - Relationships (X, X to Y) - Classification - Optimization
5
Concerns With Current Manufacturing Data Dimensionality: (Large) >1000 process variables every few seconds >10 quality variables every few hours Data Overload - Analyst concentrates on only a few variables and ignore most of the information! Collinearity: Not 1000 independent things at work. Only a few underlying events affecting all variables. Variables are all highly correlated. Noise: Missing Data:
6
Multivariate Data Concept * *** * * * * * * * * * * * * * * * * * * * * * * * * * * * * BreakLoad Control Chart Elongation Control Chart Is This Process In Control? *
7
Data Preparation Data collected in a Process Data Historian will have Process Up and Down Times recorded from the instrumentation Need a software tool that will permit easy methods to clean the data and do initial Exploratory Data Analyses JMP Software –Interactive Graphing –Removal of Outliers Graphically or Variable Selection Criteria –Join and/or Subset Data Tables –Statistical Analyses
8
Principle Components Analysis Understanding Relationships Between Process Variables
9
Principle Component Analysis Principle Component Analysis is a Projection Technique Raw data are first “Centered” and “Scaled” Each Principle Component represents a direction through the data that captures the maximum amount of raw data variation For each Principle Component (a), new data values are generated for each obs. (i) which are a linear combination of the raw X variables (k): t i,a = b a,1 *X i,1 + b a,2 *X i,2... b a,k *X i,k for each obs. i Where the b’s are loadings (-1 to 1) Increasing number of Principle Components represent less and less raw data variation
10
Principle Component Analysis Fundamentals 2nd PC 1st PC Projections X1X1 X2X2 X3X3
11
PCA: Scores x1x1 x2x2 x3x3 1st PC 2nd PC Obs. i t i,1 t i,2 The scores t ia (observation i, dimension a) are the places along the component lines where the observations are projected.
12
PCA: Loadings x1x1 x2x2 x3x3 The loadings p ak (dimension a, variable k) indicate the importance of the variable k to the given dimension. p ak is the direction cosine (cos of the given component line vs. the x k coordinate axis. 11 x1x1 x2x2 x3x3 22 33 1st PC Cos( X/PC
13
PCA Example 10 process responses obtained on each observation Data represented weekly process response averages Data spanned 10 months Objective: Determine if the system was stable.
14
Process Shift June 30 (5_30) PCA Score Plot PC #2 PC #1
15
Loadings PC#2Loadings PC#2 Loadings PC #1 X3X3 X7X7 X2X2 X4X4 X8X8 X6X6 X9X9 X1X1 X5X5 X 10 PCA Loadings Plot
16
Process Shift June 30 (5_30) PC #1 PC #2 Relative to process shift, X 1 and X 5 were high in value and X 4 and X 8 were low in value. Pos. Corr. Vars. were X 1, X 5 and X 4, X 8 Neg. Corr. Vars. were X 1, X 5 to X 4, X 8
17
Process variable X 1 increased in value when the system shifted from the left side to the right side on the PCA Score plot
18
Variables X 1 and X 5 were positively correlated
19
Partial Least Squares Technique Understanding Relationships Between Process & Response Variables
20
Partial Least Squares Fundamentals X SpaceY Space Planes Projections X1X1 X2X2 X3X3 Y1Y1 Y2Y2 Y3Y3
21
TA Filter Example Objective: Relate Filtrate, TA Catalyst and Dryer Temp to Filter Speed, Vacuum, Wash Acid, Weir Level, Nash Discharge Pressure and Feed Tank Temperature –Keep Filtrate High, TA Catalyst Low Data: 12 Hour Averages from PI collected over a four month period
22
TA Filter
23
TA Filter Relationships Catalyst Higher filter speed and vac. pressure increased the filtrate flow and catalyst content but lowered the dyer temp. Higher weir level, nash discharge pressure and Op tank temp increased filtrate flow. Wash acid flow had no driving effect on the responses.
24
PLS Results Obtain Weight Plots (Previous Slide) –Shows the inter-relationships between the Xs and Ys Obtain Regression Coefficients –Can be used to generate response surface plot Display Variables Important to Prediction (VIP) Display Residual Plots and Distance to the Model Plot
25
Correlation Does Not Always Mean Causation
26
PLS Discriminate Technique Determine What Drives Data Groups To Be Different
27
Objective Given groups of data from a particular process, determine what makes the groups different with respect to the given measurements. Example: TA %T –Measurements: 4-HMB, TMA, TPAD, 4- HBA, 4-CBA, IPA, BA, PTAD, p-TA, 2,7- DCF, 2,6-DCF, 4-4-DCB, 3,5-DCF, 9-F-2- CA, 9-F-4-CA, 2,6-DCA, 4,4-DCS, L*, a*, b*,.1%,.9%, Mean, %T –Daily Numbers –Data taken from Convey Line #1 and #2
28
TA %T
29
PLS Discriminate Analysis High %T Low %T
30
What Measurements Separated the Groups? The high %T group ($DA1) was high in %T, 0.1, Mean and L. The low %T group ($DA2) had several measurements that were high in value and were positively correlated (see next slide for details). 2
31
The low %T group ($DA2) had several variables that were correlated and high in value: 4 4’-DCS, 4-CBA TMA and p-TA
32
Cat
33
Computer Software JMP Software –http://www.jmpdiscovery.com SIMCA-P from Umetrics –http://www.umetrics.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.