Chapter 10: Inferential for Regression 1.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Chapter 12 Inference for Linear Regression
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Objectives (BPS chapter 24)
Chapter 11: Inferential methods in Regression and Correlation
REGRESSION Want to predict one variable (say Y) using the other variable (say X) GOAL: Set up an equation connecting X and Y. Linear regression linear.
1 Pertemuan 13 Uji Koefisien Korelasi dan Regresi Matakuliah: A0392 – Statistik Ekonomi Tahun: 2006.
SIMPLE LINEAR REGRESSION
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Simple Linear Regression Analysis
Introduction to Probability and Statistics Linear Regression and Correlation.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Pertemua 19 Regresi Linier
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Simple Linear Regression and Correlation
Chapter 7 Forecasting with Simple Regression
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Analysis
Correlation & Regression
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Chapter 11 Simple Regression
BPS - 3rd Ed. Chapter 211 Inference for Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Copyright © Cengage Learning. All rights reserved. 12 Simple Linear Regression and Correlation
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Simple Linear Regression ANOVA for regression (10.2)
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Chapter 10 Inference for Regression
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Dependent (response) Variable Independent (control) Variable Random Error XY x1x1 y1y1 x2x2 y2y2 …… xnxn ynyn Raw data: Assumption:  i ‘s are independent.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Chapter 12: Correlation and Linear Regression 1.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Inference about the slope parameter and correlation
Inference for Least Squares Lines
AP Statistics Chapter 14 Section 1.
Inference for Regression
Chapter 11 Simple Regression
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
PENGOLAHAN DAN PENYAJIAN
Review of Chapter 2 Some Basic Concepts: Sample center
Simple Linear Regression
Basic Practice of Statistics - 3rd Edition Inference for Regression
Chapter 14 Inference for Regression
Correlation and Simple Linear Regression
Presentation transcript:

Chapter 10: Inferential for Regression 1

10.1: Simple Linear Regression 10.2: More Detail about Simple Linear Regression Goals Describe the simple linear regression model (review – Ch. 2). Be able to perform the method (with the output from software packages, Lab 8). Use diagnostic plots to check the assumptions. Be able to perform inference on the slope (Confidence interval and hypothesis test). Be able to determine if there is an association between the response and explanatory variables. Be able to perform a hypothesis test using the correlation coefficient. Be able to state the similarities and differences between a confidence interval for a mean response and a prediction interval and in which situations each would be used (if there is time) 2

Conditions for Linear Regression We have n (x,y) pairs. For any fixed x, y ~ N(  y,  ) Each y i is independent of the other y j ’s.  y =  0 +  1 x 3

Model for Linear Regression 4 y i =  0 +  1 x +  i Data = Fit + Error

Linear Regression 5 ŷ = b 0 + b 1 x ŷ is an unbiased estimator for  y b 0 is an unbiased estimator for  0 b 1 is an unbiased estimator for  1

Linear Regression b 0 = ȳ - b 1 x̄ 6

Other SS and df 7

ANOVA table for Linear Regression SourcedfSSMS Model (Regression) 1Σ(ŷ i - ȳ) 2 Errorn – 2Σ(y i - ŷ i ) 2 Totaln - 1Σ(y i - ȳ) 2 8

Conditions for Linear Regression SRS Observations are independent of each other. The relationship is linear in the population. The response, y, is normally distribution around the population regression line. The standard deviation of the response is constant. Important plots: – Scatter plot – Residual plot – Histogram/Normal quantile plot of the residuals. 9

Residual Plots 10

Example: Linear Regression 1 The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. a)Verify the assumptions required for linear regression. b)Determine the equation of the fitted line. c)What is a point estimate of the true average cetane number whose iodine value is 100? d)Estimate the value of σ. e)What proportion of the observed variation in cetane number that can be attributed to the iodine value? 11

Example: Linear Regression 1 (cont.) x: y: x: y:

Example: SLR 1 - Scatterplot 13

Example: SLR 1 – Residual Plot 14

Example: SLR 1 – Normality 15

Example: Linear Regression 1 The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. a)Verify the assumptions required for linear regression. b)Determine the equation of the fitted line. c)What is a point estimate of the true average cetane number whose iodine value is 100? d)Estimate the value of σ. e)What proportion of the observed variation in cetane number that can be attributed to the iodine value? 16

Example: SLR 1 – Fitted Line x: y: x: y: r = s x = s y = y̅ = x̅ =

Example: SLR – fitted line 18

Example: Linear Regression 1 The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. a)Verify the assumptions required for linear regression. b)Determine the equation of the fitted line. c)What is a point estimate of the true average cetane number whose iodine value is 100? d)Estimate the value of σ. e)What proportion of the observed variation in cetane number that can be attributed to the iodine value? 19

Example: Linear Regression 1 The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. a)Verify the assumptions required for linear regression. b)Determine the equation of the fitted line. c)What is a point estimate of the true average cetane number whose iodine value is 100? d)Estimate the value of σ. e)What proportion of the observed variation in cetane number that can be attributed to the iodine value? 20

Example: SLR - s x: y: x: y: Analysis of Variance SourceDFSum of Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total

Example: Linear Regression 1 The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. a)Verify the assumptions required for linear regression. b)Determine the equation of the fitted line. c)What is a point estimate of the true average cetane number whose iodine value is 100? d)Estimate the value of σ. e)What proportion of the observed variation in cetane number that can be attributed to the iodine value? 22

Example: SLR 1 x: y: x: y: Analysis of Variance SourceDFSum of Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total

Confidence Interval Point estimates – b 0 is an unbiased estimator for  0 – b 1 is an unbiased estimator for  1 Assumptions – SRS – linearity – Constant standard deviation of residuals – Normality If y is normal, then both b 0 and b 1 are normal If y is not normal, there is still CLT 24

Standard deviation for b 1 25

Confidence Interval for  1 26

Example: SLR 1 - Inference The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. e)What is the 95% Confidence Interval for the population slope? f)Is the model useful (that is, is there a useful linear relationship between iodine value and cetane number)? 27

Example: SLR 1 x: y: x: y: Analysis of Variance SourceDFSum of Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total b 1 = S xx =

Example: SLR 1 – CI. We are 95% confidence that the population slope is between and

Example: SLR – fitted line 30

LR Hypothesis Test: Summary Alternative Hypothesis P-Value Upper-tailed H a :  1 > Δ P(T ≥ t) Lower-tailed H a :  1 < Δ P(T ≤ t) two-sided H a :  1 ≠ Δ 2P(T ≥ |t|) 31

Example: SLR 1 - Inference The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. e)What is the 95% Confidence Interval for the population slope? f)Is the model useful (that is, is there a useful linear relationship between iodine value and cetane number)? 32

Example: SLR 1 - HT The data does provide strong support (P = 2.13 x ) to the claim that there is a linear relationship between iodine value and cetane number. 33

ANOVA table for Linear Regression SourcedfSSMS Model (Regression) 1Σ(ŷ i - ȳ) 2 Errorn – 2Σ(y i - ŷ i ) 2 Totaln - 1Σ(y i - ȳ) 2 34

LR Hypothesis Test: Summary 35

Example: LR - Inference The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. g)Perform the hypothesis test using the F test statistic. h)Perform the hypothesis using the population correlation coefficient 36

Example: LR – Inference - ANOVA Parameter Estimates VariableDF Parameter Estimate Standard Error t ValuePr > |t| Intercept <.0001 iodine <.0001 Analysis of Variance SourceDFSum of Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total

Example: LR – Inference (cont) The data does provide strong support (P = 2.09 x ) to the claim that there is a linear relationship between iodine value and cetane number. 38

Inference for Correlation: Assumptions (x,y) are independent (x,y) is normal Linear relationship between x and y Constant variance for the residuals. 39

LR Hypothesis Test: Summary Alternative Hypothesis P-Value Upper-tailed H a :  > Δ P(T ≥ t) Lower-tailed H a :  < Δ P(T ≤ t) two-sided H a :  ≠ Δ 2P(T ≥ |t|) 40

Example: LR - Inference The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. g)Perform the hypothesis test using the F test statistic. h)Perform the hypothesis using the population correlation coefficient. 41

Example: LR – Inference - ANOVA Parameter Estimates VariableDF Parameter Estimate Standard Error t ValuePr > |t| Intercept <.0001 iodine <.0001 Analysis of Variance SourceDFSum of Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total

Example: LR - Inference The data does provide strong support (P = 2.12 x ) to the claim that there is a linear relationship between iodine value and cetane number. 43

SE µ̂h 44

Example: LR - Inference The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. i)What is the 95% confidence interval for the cetane number with the iodine value is 100. j)Predict the cetane number for the next sample of biofuel that contains an iodine value of

Example: LR – Inference Analysis of Variance SourceDFSum of Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total  ̂ = S xx = x̅ =

Example: SLR (cont) We are 95% confident that the population mean cetane number is between and when the iodine value is

SE ŷ 48

Example: LR - Inference The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. i)What is the 95% confidence interval for the cetane number with the iodine value is 100. j)Predict the cetane number for the next sample of biofuel that contains an iodine value of

Example: LR – Inference Analysis of Variance SourceDFSum of Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total  ̂ = S xx = x̅ =

Example: SLR (cont) We are 95% confident that the next cetane number is between and when the iodine value is 100. Mean response: (52.754, ) Prediction interval: ( ) 51

CI for mean response Prediction interval 52

Example: Confidence/Prediction Band 53

Multiple Regression: Examples 1 1)A portrait studio operates in cities of medium size and specializes in portraits of children. They want to open a store in a other similar community, but want to be able to predict sales. 2)So that only students that succeed are accepted into college, the registrar’s office wants to be able to predict GPA from entering high school students. 3)A researcher studied the effects of the charge rate and the temperature on the life of a new type of power cell in a preliminary small-scale experiment. 54

Multiple Regression: Examples 2 4)An experiment was run to investigate the yield of tomato plants as a function of the amount of water levels. A series of plots were randomized to different water levels and at the end of the season, the yield of the plants was determined. 5)Fernandez-Juricic et al. (2003) examined the effect of human disturbance on the nesting of house sparrows (Passer domesticus). They counted breeding sparrows per hectare in 18 parks in Madrid, Spain, and also counted the number of people per minute walking through each park (both measurement variables). 55