1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc. Revised talk: www.statgraphics.com\documents.htm.

Slides:



Advertisements
Similar presentations
3.3 Hypothesis Testing in Multiple Linear Regression
Advertisements

Canonical Correlation simple correlation -- y 1 = x 1 multiple correlation -- y 1 = x 1 x 2 x 3 canonical correlation -- y 1 y 2 y 3 = x 1 x 2 x 3 The.
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Probability & Statistical Inference Lecture 9
12-1 Multiple Linear Regression Models Introduction Many applications of regression analysis involve situations in which there are more than.
12 Multiple Linear Regression CHAPTER OUTLINE
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Chapter 10 Curve Fitting and Regression Analysis
Chapter 17 Overview of Multivariate Analysis Methods
11.1 Introduction to Response Surface Methodology
São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
To accompany Quantitative Analysis for Management, 9e by Render/Stair/Hanna 4-1 © 2006 by Prentice Hall, Inc., Upper Saddle River, NJ Chapter 4 RegressionModels.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Statistics 350 Lecture 16. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Data mining and statistical learning, lecture 4 Outline Regression on a large number of correlated inputs  A few comments about shrinkage methods, such.
19-1 Chapter Nineteen MULTIVARIATE ANALYSIS: An Overview.
Response Surfaces max(S(  )) Marco Lattuada Swiss Federal Institute of Technology - ETH Institut für Chemie und Bioingenieurwissenschaften ETH Hönggerberg/
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
CSE 300: Software Reliability Engineering Topics covered: Software metrics and software reliability Software complexity and software quality.
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University.
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Chapter 51Introduction to Statistical Quality Control, 5th Edition by Douglas C. Montgomery. Copyright (c) 2005 John Wiley & Sons, Inc.
Discriminant Analysis Testing latent variables as predictors of groups.
13 Design and Analysis of Single-Factor Experiments:
1 14 Design of Experiments with Several Factors 14-1 Introduction 14-2 Factorial Experiments 14-3 Two-Factor Factorial Experiments Statistical analysis.
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
The Unscrambler ® A Handy Tool for Doing Chemometrics Prof. Waltraud Kessler Prof. Dr. Rudolf Kessler Hochschule Reutlingen, School of Applied Chemistry.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
1 Chapter 3 Multiple Linear Regression Multiple Regression Models Suppose that the yield in pounds of conversion in a chemical process depends.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Data Mining Manufacturing Data Dave E. Stevens Eastman Chemical Company Kingsport, TN.
1 Multivariate Linear Regression Models Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
1 Statistical Design of Experiments BITS Pilani, November ~ Shilpa Gupta (97A4)
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Module III Multivariate Analysis Techniques- Framework, Factor Analysis, Cluster Analysis and Conjoint Analysis Research Report.
Principal Component Analysis (PCA)
Feature Selection and Extraction Michael J. Watts
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
D/RS 1013 Discriminant Analysis. Discriminant Analysis Overview n multivariate extension of the one-way ANOVA n looks at differences between 2 or more.
Canonical Correlation. Canonical correlation analysis (CCA) is a statistical technique that facilitates the study of interrelationships among sets of.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Simple linear regression and correlation Regression analysis is the process of constructing a mathematical model or function that can be used to predict.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Strategies for Metabolomic Data Analysis Dmitry Grapov, PhD.
Chapter 61Introduction to Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2012 John Wiley & Sons, Inc.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Analysis and Interpretation: Multiple Variables Simultaneously
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Copyright (c) 2005 John Wiley & Sons, Inc.
What is Regression Analysis?
Introduction to Probability and Statistics Thirteenth Edition
Multivariate Linear Regression Models
Control Charts Johnson, R.A. and D.W. Wichern (2007). Applied Multivariate Statistical Analysis, 6th. Ed. Pearson, Upper Saddle River, N.J. Montgomery,
ENM 310 Design of Experiments and Regression Analysis Chapter 3
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
14 Design of Experiments with Several Factors CHAPTER OUTLINE
What is Artificial Intelligence?
Presentation transcript:

1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc. Revised talk:

2 The Challenge The quality of an item or service usually depends on more than one characteristic. When the characteristics are not independent, considering each characteristic separately can give a misleading estimate of overall performance.

3 The Solution Proper analysis of data from such processes requires the use of multivariate statistical techniques.

4 Important Tools  Statistical Process Control Multivariate capability analysis Multivariate control charts  Statistical Model Building* Data Mining - dimensionality reduction DOE - multivariate optimization * Regression and classification.

5 Example #1 Textile fiber Characteristic #1: tensile strength (115.0 ± 1.0) Characteristic #2: diameter (1.05 ± 0.01)

6 Individuals Charts

7 Capability Analysis (each separately)

8 Scatterplot

9 Multivariate Normal Distribution

10 Control Ellipse

11 Multivariate Capability Determines joint probability of being within the specification limits on all characteristics.

12 Mult. Capability Indices Defined to give the same DPM as in the univariate case.

13 More than 2 Variables

14 Hotelling’s T-Squared Measures the distance of each point from the centroid of the data (or the assumed distribution).

15 T-Squared Chart

16 T-Squared Decomposition

17 Statistical Model Building  Defining relationships (regression and ANOVA)  Classifying items  Detecting unusual events  Optimizing processes When the response variables are correlated, it is important to consider the responses together. When the number of variables is large, the dimensionality of the problem often makes it difficult to determine the underlying relationships.

18 Example #2

19 Matrix Plot

20 Multiple Regression

21 Reduced Models MPG City = *Weight *Wheelbase (R 2 =73.0%) MPG City = *Horsepower *Passengers *Width (R 2 =64.3%)

22 Dimensionality Reduction Construction of linear combinations of the variables can often provide important insights.  Principal components analysis (PCA) and principal components regression (PCR): constructs linear combinations of the predictor variables X that contain the greatest variance and then uses those to predict the responses.  Partial least squares (PLS): finds components that minimize the variance in both the X’s and the Y’s simultaneously.

23 Principal Components Analysis

24 Scree Plot

25 Component Weights C1 = 0.377*Engine Size *Horsepower *Passengers *Length *Wheelbase *Width *U Turn Space *Weight C2 = *Engine Size – 0.593*Horsepower *Passengers *Length *Wheelbase – 0.042*Width – 0.026*U Turn Space – 0.030*Weight

26 Interpretation

27 PC Regression

28 Contour Plot

29 PLS Model Selection

30 PLS Coefficients Selecting to extract 3 components:

31 Interpretation

32 Neural Networks

33 Bayesian Classifier

34 Classification

35 Design of Experiments When more than one characteristic is important, finding the optimal operating conditions usually requires a tradeoff of one characteristic for another. One approach to finding a single solution is to use desirability functions.

36 Example #3 Myers and Montgomery (2002) describe an experiment on a chemical process (20-run central composite design): Response variableGoal Conversion percentagemaximize Thermal activityMaintain between 55 and 60 Input factorLowHigh time8 minutes17 minutes temperature160˚ C210˚ C catalyst1.5%3.5%

37 Optimize Conversion

38 Optimize Activity

39 Desirability Functions  Maximization

40 Desirability Functions  Hit a target

41 Combined Desirability d i = desirability of i-th response given the settings of the m experimental factors X. D ranges from 0 (least desirable) to 1 (most desirable).

42 Desirability Contours Max D=0.959 at time=11.14, temperature=210.0, and catalyst = 2.20.

43 Desirability Surface

44 References  Johnson, R.A. and Wichern, D.W. (2002). Applied Multivariate Statistical Analysis. Upper Saddle River: Prentice Hall.Mason, R.L. and Young, J.C. (2002).  Mason and Young (2002). Multivariate Statistical Process Control with Industrial Applications. Philadelphia: SIAM.  Montgomery, D. C. (2005). Introduction to Statistical Quality Control, 5th edition. New York: John Wiley and Sons.  Myers, R. H. and Montgomery, D. C. (2002). Response Surface Methodology: Process and Product Optimization Using Designed Experiments, 2nd edition. New York: John Wiley and Sons. Revised talk: