Objectives (BPS chapter 24)

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Chapter 12 Inference for Linear Regression
Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
CHAPTER 24: Inference for Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 10 Simple Regression.
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Simple Linear Regression Analysis
SIMPLE LINEAR REGRESSION
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Simple Linear Regression and Correlation
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Analysis
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Chapter 11 Simple Regression
STA291 Statistical Methods Lecture 27. Inference for Regression.
Inferences for Regression
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Introduction to Linear Regression
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
Lesson Inference for Regression. Knowledge Objectives Identify the conditions necessary to do inference for regression. Explain what is meant by.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
1 Lecture 4 Main Tasks Today 1. Review of Lecture 3 2. Accuracy of the LS estimators 3. Significance Tests of the Parameters 4. Confidence Interval 5.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
AP STATISTICS LESSON 14 – 1 ( DAY 1 ) INFERENCE ABOUT THE MODEL.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Chapter 10 Inference for Regression
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Regression. Height Weight How much would an adult female weigh if she were 5 feet tall? She could weigh varying amounts – in other words, there is a distribution.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Chapter 26: Inference for Slope. Height Weight How much would an adult female weigh if she were 5 feet tall? She could weigh varying amounts – in other.
BPS - 5th Ed. Chapter 231 Inference for Regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 4: Inference about regression Priyantha Wijayatunga,
Chapter 15 Inference for Regression. How is this similar to what we have done in the past few chapters?  We have been using statistics to estimate parameters.
CHAPTER 12 More About Regression
Lecture #25 Tuesday, November 15, 2016 Textbook: 14.1 and 14.3
23. Inference for regression
Chapter 4 Basic Estimation Techniques
CHAPTER 12 More About Regression
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
AP Statistics Chapter 14 Section 1.
Inference for Regression
CHAPTER 12 More About Regression
The Practice of Statistics in the Life Sciences Fourth Edition
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Unit 3 – Linear regression
Basic Practice of Statistics - 3rd Edition Inference for Regression
CHAPTER 12 More About Regression
Chapter 14 Inference for Regression
CHAPTER 12 More About Regression
Inference for Regression
Presentation transcript:

Objectives (BPS chapter 24) Inference for regression Conditions for regression inference Estimating the parameters Using technology Testing the hypothesis of no linear relationship Testing lack of correlation Confidence intervals for the regression slope Inference about prediction Checking the conditions for inference

The data in a scatterplot are a random sample from a population that may exhibit a linear relationship between x and y. Different sample  different plot. Now we want to describe the population mean response my as a function of the explanatory variable x: my = a + bx. And we want to assess whether the observed relationship is statistically significant (not entirely explained by chance events due to random sampling).

The regression model The least-squares regression line ŷ = a + bx is a mathematical model of form “sample data = fit + residual.” For each data point in the sample, the residual is the difference (y − ŷ). At the population level, the model becomes yi = (a + bxi) + (ei) with residuals ei independent and normally distributed N(0, s). The population mean response my is my = a + bx

my = a + bx The intercept a, the slope b, and the standard deviation s of y are the unknown parameters of the regression model. We rely on the random sample data to provide unbiased estimates of these parameters. The value of ŷ from the least-squares regression line is really a prediction of the mean value of y (my) for a given value of x. The least-squares regression line [ŷ = a + bx] obtained from sample data is the best estimate of the true population regression line [my = a + bx]. ŷ unbiased estimate for mean response my a unbiased estimate for intercept a b unbiased estimate for slope b

Conditions for inference The observations are independent. The relationship is indeed linear. The standard deviation of y, σ, is the same for all values of x. The response y varies normally around its mean.

s is an unbiased estimate of the regression standard deviation s. For any fixed x, the responses y follow a Normal distribution with standard deviation s. Regression assumes equal variance of y (s is the same for all values of x). The population standard deviation s for y at any given value of x represents the spread of the normal distribution of the ei around the mean my. The regression standard error, s, for n sample data points is calculated from the residuals (yi – ŷi): s is an unbiased estimate of the regression standard deviation s.

Confidence interval for the slope β Estimating the regression parameter b for the slope is a case of one-sample inference with σ unknown. Hence we rely on t distributions. The standard error of the slope b is: (s is the regression standard error.) Thus, a level C confidence interval for the slope b is: estimate ± t*SEestimate b ± t* SEb t* is t critical for t(df = n − 2) density curve with C% between –t* and +t*

Testing the hypothesis of no relationship To test for the existence of a significant relationship, we can test if the parameter for the slope b is significantly different from zero using a one-sample t-test procedure. The standard error of the slope b is: We test the hypotheses H0: b = 0 Ha: b ≠ 0, >0, or <0 (two- or one-sided) We calculate t = b/SEb which has the t (n – 2) distribution to find the P-value of the test.

Testing for lack of correlation The regression slope b and the correlation coefficient r are related and b = 0  r = 0. Similarly, the population parameter for the slope β is related to the population correlation coefficient ρ, and when β = 0  ρ = 0. Thus, testing the hypothesis H0: β = 0 is the same as testing the hypothesis of no correlation between x and y in the population from which our data were drawn.

Inference about prediction One use of regression is for predicting the value of y, ŷ, for any value of x within the range of data tested: ŷ = a + bx. But the regression equation depends on the particular sample drawn. More reliable predictions require statistical inference To estimate an individual response y for a given value of x, we use a prediction interval. If we randomly sampled many times, there would be many different values of y obtained for a particular x following N(0, σ) around the mean response µy.

The level C prediction interval for a single observation on y when x takes the value x* is: ŷ ± t*n − 2 SEŷ t* for t distribution with n – 2 df 95% prediction interval for ŷ The prediction interval represents mainly the error from the normal distribution of the residuals ei. Graphically, a series of confidence intervals for the whole range of x values is shown as a continuous interval on either side of ŷ.

Confidence interval for µy We may also want to predict the population mean value of y, µy, for any value of x within the range of data tested. Using inference, we calculate a level C confidence interval for the population mean μy of all responses y when x takes the value x*: This interval is centered on ŷ, the unbiased estimate of μy. The true value of the population mean μy at a given value of x will indeed be within our confidence interval in C% of all intervals calculated from many different random samples.

The level C confidence interval for the mean response μy at a given value x* of x is centered on ŷ (unbiased estimate of μy): ŷ ± tn − 2 * SEm^ t* for t distribution with n – 2 df 95% confidence interval for my A separate confidence interval is calculated for μy along all the values that x takes. Graphically, the series of confidence intervals for the whole range of x values is shown as a continuous interval on either side of ŷ.

The confidence interval for μy contains with C% confidence the population mean μy of all responses at a particular value of x. The prediction interval contains C% of all the individual values taken by y at a particular value of x. Least-squares regression line 95% prediction interval for ŷ 95% confidence interval for my Estimating my uses a smaller confidence interval than estimating an individual in the population because the sampling distribution is narrower than the population distribution.

Residuals are randomly scattered  good! Curved pattern  the relationship is not linear. Change in variability across plot  σ not equal for all values of x.

Example The annual bonuses ($ 1000) of six randomly selected emplyees and their years of services were recorded. We wish to analyze the relationship between the two variables. Data was analyzed using MINITAB. The output is shown below Yeas (X) 1 2 3 4 5 6 Bonus (Y) 9 17 12 Predictor Coef SE Coef T P Constant 0.933 4.192 0.22 0.835 Years 2.114 1.076 1.96 0.121 S = 4.50291 R-Sq = 49.1% R-Sq(adj) = 36.4% Predicted Values for New Observations New Obs Fit SE Fit 95% CI 95% PI 1 11.50 2.45 (4.71, 18.30) (-2.72, 25.73) Values of Predictors for New Observations Obs Years 1 5.00

Example a. What is the equation of the least squares regression line ? b. Calculate the 95% confidence interval for the true slope coefficient. c. Based on the above output, at the .05 level of significance, test if slope β is significantly different from zero.  The test is not significant, fail to reject null hypothesis

Example d. What is the predicted annual bonus of an employee with 5 years of service ? e. What is the value of the residual for the data value (5, 17)? f. Construct a 95% prediction interval for a single employee’s bonus whose year of service is 7 years.

Example f. Construct a 95% confidence interval for the mean bonus � when years of service is 7.