Chapter 11: Inferential methods in Regression and Correlation

Slides:



Advertisements
Similar presentations
Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Advertisements

Topic 12: Multiple Linear Regression
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
Chapter 12 Simple Linear Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
EPI 809/Spring Probability Distribution of Random Error.
Chapter 12 Simple Linear Regression
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
Chapter 10: Inferential for Regression 1.
Multiple regression analysis
Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)
Statistics for Business and Economics
Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Our theory states.
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
1 Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Consider the variable.
Chapter 11 Multiple Regression.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
1 732G21/732A35/732G28. Formal statement  Y i is i th response value  β 0 β 1 model parameters, regression parameters (intercept, slope)  X i is i.
Simple Linear Regression and Correlation
Introduction to Linear Regression and Correlation Analysis
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
CHAPTER 15 Simple Linear Regression and Correlation
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
Topic 7: Analysis of Variance. Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General.
AGENDA I.Homework 3 II.Parameter Estimates Equations III.Coefficient of Determination (R 2 ) Formula IV.Overall Model Test (F Test for Regression)
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Regression Examples. Gas Mileage 1993 SOURCES: Consumer Reports: The 1993 Cars - Annual Auto Issue (April 1993), Yonkers, NY: Consumers Union. PACE New.
Regression For the purposes of this class: –Does Y depend on X? –Does a change in X cause a change in Y? –Can Y be predicted from X? Y= mX + b Predicted.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Copyright © Cengage Learning. All rights reserved. 12 Simple Linear Regression and Correlation
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Simple Linear Regression ANOVA for regression (10.2)
ANOVA for Regression ANOVA tests whether the regression model has any explanatory power. In the case of simple regression analysis the ANOVA test and the.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
Regression Analysis Relationship with one independent variable.
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Simple Linear Regression. Data available : (X,Y) Goal : To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Experimental Statistics - week 9
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
Correlation and Regression Elementary Statistics Larson Farber Chapter 9 Hours of Training Accidents.
© 2001 Prentice-Hall, Inc.Chap 13-1 BA 201 Lecture 19 Measure of Variation in the Simple Linear Regression Model (Data)Data.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
The simple linear regression model and parameter estimation
Lecture 11: Simple Linear Regression
Linear Regression Prof. Andy Field.
Relationship with one independent variable
Prepared by Lee Revere and John Large
Hypothesis testing and Estimation
Review of Chapter 2 Some Basic Concepts: Sample center
Relationship with one independent variable
Introduction to Regression
Presentation transcript:

Chapter 11: Inferential methods in Regression and Correlation

Example: distribution of y The relationship between age and change in systolic blood pressure (BP, mm Hg) after 24 hours in response to a particular treatment has a linear regression equation of y = – 0.526x + e with σ = a)What is the mean value of y when x = 30? x = 50? x = 70? b)What is the standard deviation of y when x = 30? x = 50? x = 70?

Example: Estimating  and  The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. a)What are the point estimates of  and  ? b)What is a point estimate of the true average cetane number whose iodine value is 100?

Example: Estimating  and  (cont) x: y: x: y: a)What are the point estimates of  and  ?

Example: Estimating  and  (cont)

x: y: x: y:

Example: Estimating  and  (cont) b) What is a point estimate of the true average cetane number whose iodine value is 100?

Example: Estimating  and  The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. c) Find the point estimate of the error standard deviation, σ. d) What proportion of the observed variation in y can be attributed to the simple linear regression relationship between x and y?

Example: Estimating  and  (cont) c) Find the point estimate of the error standard deviation, σ. d) What proportion of the observed variation in y can be attributed to the simple linear regression relationship between x and y?

Example: Estimating  and  (SAS) The REG Procedure Model: MODEL1 Dependent Variable: cetane Number of Observations Read 15 Number of Observations Used 14 Number of Observations with Missing Values 1 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 iodine <.0001

Example: CI The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. e) What is the 95% CI for the true slope?

Example: Output (SAS) The SAS System 09:20 Thursday, November 10, The REG Procedure Model: MODEL1 Dependent Variable: cetane Number of Observations Read 15 Number of Observations Used 14 Number of Observations with Missing Values 1 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| 95% Confidence Limits Intercept < iodine < S xx =

Example: Hypothesis test The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. f) Is the model useful (that is, is there a useful linear relationship between x and y)?

Example: Hypothesis test (SAS) The REG Procedure Model: MODEL1 Dependent Variable: cetane Number of Observations Read 15 Number of Observations Used 14 Number of Observations with Missing Values 1 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 iodine <.0001

Summary Slide SourcedfSSMS Model (Regression) 1SSR Errorn - 2SST – b S xy Totaln - 1S yy

Example: ANOVA (SAS) The REG Procedure Model: MODEL1 Dependent Variable: cetane Number of Observations Read 15 Number of Observations Used 14 Number of Observations with Missing Values 1 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 iodine <.0001

Example: Hypothesis test for  The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. g) Is the model useful (that is, is there a useful linear relationship between x and y) using the population correlation coefficient?

Example: ANOVA (SAS) The REG Procedure Model: MODEL1 Dependent Variable: cetane Number of Observations Read 15 Number of Observations Used 14 Number of Observations with Missing Values 1 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 iodine <.0001

Example: Hypothesis test for  (2) In some locations, there is a strong association between concentrations for two different pollutants. The following data consists of the concentrations of x = ozone (ppm) and y = secondary carbon concentration (μg/m 3 ). x y x y

Example: Hypothesis test for  (2) x y

Using the population correlation coefficient, is this model useful? The summary statistics are:

Example: Hypothesis test for  (2) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept x

Example: Hypothesis test for  (2) The REG Procedure Model: MODEL1 Dependent Variable: cetane Number of Observations Read 15 Number of Observations Used 14 Number of Observations with Missing Values 1 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 iodine <.0001

Example: Hypothesis test for  (2)

Multiple Linear Regression

Example: Multiple Linear Regression It is important to know how long a tool will last (min) in the industrial setting. The cutting tool in this study is used to cut a particular type and size of cold-rolled steel. The predictors of interest are x 1 = cutting speed (feet/min), x 2 = feed rate (in/revolution) and x 3 = depth of cut (in). The predicted model is y = – x 1 – x x 3 + e a) What is the mean life of a tool that is being used to cut depths of 0.03 inch at a speed rate of 450 feet/min with a feed rate of 0.01 in/revolution? b) What is the interpretation of  1 = ? Of  2 = ? Of  3 = ?

Example: Polynomial Regression Suppose the mean daily peak load (MW) for a power plant and the maximum outdoor temperature ( o F) for a sample of 10 days is given below. a)What is the estimated regression line using a quadratic regression model (besides the equation of the line, include the values of adj. r 2 and s e ? b)Using the line, predict the required peak power if the temperature is 98 o F? x i ( o F) y i (MW)

Example: Polynomial Regression (SAS) data newpower; set power; temp2 = temp*temp; proc reg data=newpower; model load=temp temp2; output out=fit r=res; run; Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept temp temp

Example: Polynomial Regression (cont) b) Using the line, predict the required peak power if the temperature is 98 o F? lineadj r 2 sese linearŷ= temp quadraticŷ= temp+0.27temp

Residual Plots

Interaction Effect

I love statistics! Thank you for not eating me!

Example: Multiple Regression Qualitative Predictors A study is conducted to determine the effects of x 1 = company size and x 2 = the presence (1) or absence (0) of a safety program on y = the number of work hours lost due to work-related accidents (thousands). 20 companies with no active safety programs were randomly chosen and 20 companies with active safety programs were randomly chosen. The SAS file (qualpred.txt) is on the class notes web site. The estimated regression line isqualpred.txt y ̂ = x 1 – x 2 + e What are the interpretations of  1 = and  2 = ?

Conceptual Understanding X1 X2 X3 Total Variation of Y

ANOVA table - MRR SourcedfSSMS Model (Regression) k SSM (from data) Errorn – k - 1 SSE (from data) Totaln - 1 SST (from data)

Example: Multiple Linear Regression It is important to know how long a tool will last (min) in the industrial setting. The cutting tool in this study is used to cut a particular type and size of cold-rolled steel. The predictors of interest are x 1 = cutting speed (feet/min), x 2 = feed rate (in/revolution) and x 3 = depth of cut (in). a) Is there a useful linear relationship between the cutting tool lifetime and the predictors?

Example: MLR (cont) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 speed <.0001 feed depth

Example: MLR (cont) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 speed <.0001 feed depth

Conceptual Understanding X1 X2 X3 Total Variation of Y

Example: MLR (backwards elimination) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 speed <.0001 feed depth

Example: MLR (backwards elimination) (cont) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 speed <.0001 depth

Example: MLR (backwards elimination) (cont) fullw/o feed line y = – speed feed – depth y = – speed – depth R2R adj R ANOVA table Model Error Total Model Error Total

Example: MLR (backwards elimination) (cont) fullfull - Pw/ow/o - P F – test20.93< < t-tests:speed-6.72< < t- tests: feed t-tests:depth