Inference for Regression Today we will talk about the conditions necessary to make valid inference with regression We will also discuss the various types.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Inference for Linear Regression (C27 BVD). * If we believe two variables may have a linear relationship, we may find a linear regression line to model.
Copyright © 2010 Pearson Education, Inc. Slide
Inference for Regression
Regression Inferential Methods
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
Objectives (BPS chapter 24)
Inference for Regression 1Section 13.3, Page 284.
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
SIMPLE LINEAR REGRESSION
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Chapter Topics Types of Regression Models
Simple Linear Regression Analysis
Introduction to Probability and Statistics Linear Regression and Correlation.
SIMPLE LINEAR REGRESSION
Business Statistics - QBM117 Statistical inference for regression.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Correlation & Regression
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
Regression. Height Weight How much would an adult female weigh if she were 5 feet tall? She could weigh varying amounts – in other words, there is a distribution.
Regression. Height Weight Suppose you took many samples of the same size from this population & calculated the LSRL for each. Using the slope from each.
Introduction to Probability and Statistics Chapter 12 Linear Regression and Correlation.
Introduction to Linear Regression
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Regression with Inference Notes: Page 231. Height Weight Suppose you took many samples of the same size from this population & calculated the LSRL for.
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chap 13-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 12.
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Regression. Height Weight How much would an adult female weigh if she were 5 feet tall? She could weigh varying amounts – in other words, there is a distribution.
Chapter 12 Simple Linear Regression.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Chapter 26: Inference for Slope. Height Weight How much would an adult female weigh if she were 5 feet tall? She could weigh varying amounts – in other.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Regression Inference. Height Weight How much would an adult male weigh if he were 5 feet tall? He could weigh varying amounts (in other words, there is.
Warm-Up The least squares slope b1 is an estimate of the true slope of the line that relates global average temperature to CO2. Since b1 = is very.
Regression and Correlation
QM222 Class 9 Section A1 Coefficient statistics
Inference for Least Squares Lines
Regression.
Linear Regression.
Regression.
The Practice of Statistics in the Life Sciences Fourth Edition
Chapter 12 Regression.
Regression.
Regression.
Regression.
Regression Chapter 8.
Regression.
CHAPTER 12 More About Regression
Regression.
SIMPLE LINEAR REGRESSION
Chapter 14 Inference for Regression
Day 68 Agenda: 30 minute workday on Hypothesis Test --- you have 9 worksheets to use as practice Begin Ch 15 (last topic)
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Inference for Regression Today we will talk about the conditions necessary to make valid inference with regression We will also discuss the various types of inference that can be made with regression All this will be discussed in the context of simple linear regression

Conditions for Valid Inference If these conditions hold Independence Normal errors Errors have mean zero Errors have equal SD Then we can Find CI’s for the slope and intercept Find a CI for the mean of y at a given x Find a prediction interval (PI) for an individual y at a given x

Checking the Conditions After performing regression, we can check the conditions for valid inference. Independence – good sampling and experimental techniques give independence. Mean zero errors – verify using a residual plot. Errors with constant standard deviation (homoscedasticity) – verify with residual plot. Normal errors – verify with a normal quantile plot of the residuals.

Residual Plot Residuals are sample estimates of the errors (in y) made by the line. Residual = observed y – predicted y Predicted y’s are determined by the equation of the least squares line. A residual plot is a scatter plot of the residuals vs. x.

Examining Residual Plots A residual plot indicates that the conditions of homoscedastic mean zero errors if the points appear randomly scattered with a constant spread in the y direction. There should not be an obvious pattern to the residuals, nor should they “fan out”. Fanning out indicates nonconstant variability, called heteroscedasticity.

Residual Normal Quantile Plot A normal quantile plot of the residuals should be examined to check for non- normal errors. Plots that indicate skewness or heavy tails show that regression inference using the normal distribution or the t-distribution will not be valid.

What about independence? Independence means that the (x,y) measurements do not depend on one another A simple random sample will achieve independence Common cases without independence: same individual measured twice or including related persons. You are responsible for insuring independence by planning your study well. If achieving independence is impossible, a professional statistician can help you perform valid inference using more advanced techniques.

Examples of Unmet Conditions Trying to fit a straight line to yield data - obvious mistake because it shows curvature Shows up on residual plot as curvature - NOT CENTERED AT ZERO

Examples of Unmet Conditions This data set has unequal spread - thus the errors have unequal standard deviation. Shows up on residual plot as a “horn” or “fanning out” pattern. If residuals are not centered at zero or exhibit curvature - STOP - normality is inconsequential, seek help of a professional statistician.

Back to Hand-Span Example We fit a line to the data Check conditions –Independence - used different people in simple random sample - OK –errors have mean zero - checked residual plot - OK –errors have equal SD - checked residual plot - OK –errors have light tails -OK Since the conditions hold we can: –Form CI’s for population slope and intercept –Form CI’s for mean span of all individuals of a given height –Form prediction interval of hand-span for an individual of a given height

Complete Regression Output Source | SS df MS Number of obs = F( 1, 10) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = Spancm | Coef. Std. Err. t P>|t| [95% Conf. Interval] Heightin | _cons | Top right: Adjusted R 2 =.7685, RMSE = Bottom part - Heightin row corresponds to slope, _cons row to the intercept Coef column tells us Span (cm) = Height (in) t, P>|t| tests H 0 :  1 (slope) = 0 and H 0 :  0 (intercept) = 0, gives t & p-value 95% Conf Interval gives 95% confidence interval for slope & intercept 95% confident that population slope is in (.31,.71) 95% confident that population intercept is in (-27.63, -.38)

Inference on the Slope Why test H 0 :  1 (slope) = 0? If population slope is zero, then the line is useless - knowing x gives me no additional information about y Example: If Span (cm) = Height, then knowing height tells me nothing about span, all I know is the population mean of hand spans is approximately.51 cm

Inference on the Intercept Why test H 0 :  0 (intercept) = 0? This depends on the situation. Note that the estimate for y when x=0 is the intercept. Y =  0 +  1 (x=0) =  0 Example: Hand Span for person with height 0 is cm What? Regression is valid only within the range of observed data. We didn’t observe anyone with height 0. Line fits for observed region - may or may not for other regions.

Confidence Interval for the Mean At each value of x, we have a population of values for y. We would like to form a confidence interval for the mean of y for this population Example: Estimate the mean hand-span of persons of height 70.5 inches Center of interval will come from line Span = (70.5) = cm Endpoints of interval can be calculated, however, we will use a plot to estimate them

Prediction Interval for an Individual Suppose I have an individual of height 70.5 inches. Can I form an interval that I can be 95% confident that I will include the hand span of this individual? This is called a prediction interval By necessity, it is wider than a CI for the mean - predicting an individual is more challenging Center at line, endpoints estimated with plot

Confidence and Prediction Intervals CI for mean hand-span at height=70.5 looks to be approx (21,23) Widens as we approach edge of data - less certain at edges PI for hand-span of an individual of height 70.5 inches looks to be approx (19,25) PI wider than CI PI includes most or all data

Crab Meat Example Easy to get total weight - really want weight of meat Want to predict meat weight based on total weight Scatterplot (total, meat) shows linearity Fit a straight line, it looks like an appropriate thing to do

Crab Meat Conditions Residual plot looks evenly spread and centered at zero Residuals are normal (or light tailed) Independence - took a simple random sample of crabs in the tank farm - never measured the same crab twice Conditions for inference are met

Crab Meat Output Source | SS df MS Number of obs = F( 1, 10) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = Meatmg | Coef. Std. Err. t P>|t| [95% Conf. Interval] Totalwtg | _cons | Adjusted R2 =.7285 (total weight explained 72% of variability in meat weight) RMSE = (standard deviation of errors) Equation: Meat weight (mg) = Total Weight (g) Reject null of zero slope, 95% CI for slope (8.88, 20.88) Fail to reject null of zero intercept (good!), 95% CI (-68,94)

Crab Meat CI’s and PI’s Confidence intervals for mean of meat weight given total weight Prediction intervals for individual crab’s meat weight given total weight For Total weight = 10, mean is in (100,200) and individual in (0,300)

Review and Preview We’re trying to find mathematical relationships between variables Start with scatterplots - if linear, correlation gives a good measure of the nature and strength of relationship Simple linear regression finds the equation of the least squares line Under certain conditions, an array of inferences can be performed

Review and Preview Next time, we’ll discuss other types of regression. Conditions for inference and nature of inferences will be the same.

Using StataQuest: Handspan Example Getting a Scatterplot Open the data set heightspan.dta Go to Graphs: Scatterplots: Plot Y vs. X Y=Spancm, X=Heightin Click OK Finding the correlation Go to Statistics: Correlation: Pearson (regular) Choose Spancm and Heightin, click OK r=.8767, p-val=.0002 for H 0 :  =0

StataQuest: Handspan Example Performing Regression Go to Statistics: Simple Regression Dependent=Spancm Independent=Heightin Click OK. Checking Conditions We use these plots: Plot fitted model Plot residual vs. an X, choose Heightcm Normal Quantile plot of residuals For PI’s and CI’s choose Plot fitted model