Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 12 Inference for Linear Regression
Lesson 10: Linear Regression and Correlation
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
The Simple Regression Model
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 24: Thurs., April 8th
Class 10: Tuesday, Oct. 12 Hurricane data set, review of confidence intervals and hypothesis tests Confidence intervals for mean response Prediction intervals.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Simple Linear Regression Analysis
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
Stat Notes 4 Chapter 3.5 Chapter 3.7.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Class 11: Thurs., Oct. 14 Finish transformations Example Regression Analysis Next Tuesday: Review for Midterm (I will take questions and go over practice.
Business Statistics - QBM117 Statistical inference for regression.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Correlation & Regression
Introduction to Linear Regression and Correlation Analysis
Correlation Scatter Plots Correlation Coefficients Significance Test.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inferences for Regression
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Stat 13, Tue 5/8/ Collect HW Central limit theorem. 3. CLT for 0-1 events. 4. Examples. 5.  versus  /√n. 6. Assumptions. Read ch. 5 and 6.
Ch4 Describing Relationships Between Variables. Pressure.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 19 Linear Patterns.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
ANOVA, Regression and Multiple Regression March
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Residuals. Why Do You Need to Look at the Residual Plot? Because a linear regression model is not always appropriate for the data Can I just look at the.
Regression Chapter 5 January 24 – Part II.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Part II Exploring Relationships Between Variables.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Chapter 12: Correlation and Linear Regression 1.
Stats Methods at IC Lecture 3: Regression.
Lecture Slides Elementary Statistics Twelfth Edition
Correlation and Simple Linear Regression
Stat 112 Notes 4 Today: Review of p-values for one-sided tests
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
No notecard for this quiz!!
Unit 3 – Linear regression
Correlation and Simple Linear Regression
Chapter 3 Describing Relationships Section 3.2
Regression Chapter 8.
M248: Analyzing data Block D UNIT D2 Regression.
Basic Practice of Statistics - 3rd Edition Inference for Regression
Presentation transcript:

Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and experience Normal distribution calculations R squared Checking the assumptions of the simple linear regression model: residual plots.

Teachers’ Salaries and Dating In U.S. culture, it is usually considered impolite to ask how much money a person makes. However, suppose that you are single and are interested in dating a particular person. Of course, salary isn’t the most important factor when considering whom to date but it certainly is nice to know (especially if it is high!) In this case, the person you are interested in happens to be a high school teacher, so you know a high salary isn’t an issue. Still you would like to know how much she or he makes, so you take an informal survey of 11 high school teachers that you know.

You happen to know that the person you are interested in has been teaching for 8 years. How can you use this information to better predict your potential date’s salary? Regression Analysis to the Rescue! You go back to each of the original 11 teachers you surveyed and ask them for their years of experience. Simple Linear Regression Model: E(Y|X)= , the distribution of Y given X is normal with mean and standard deviation .

Predicted salary of your potential date who has been a teacher for 8 years = Estimated Mean salary for teachers of 8 years = 40612.135+1686.0674*8 = $54,100 How far off will your estimate typically be? Root mean square error = Estimated standard deviation of Y|X = $4,610.93. Notice that the typical error of your estimate of teacher salary using experience, $4,610.93, is less than that of using only information on mean teacher salary, $6,491.20. Regression analysis enables you to better predict your potential date’s salary.

More Information About Your Potential Date’s Salary From the regression model, you predict that your potential date’s salary is $54,100 and the typical error you expect to make in your prediction is $4,611. Suppose you want to know an interval that will most of the time (say 95% of the time) contain your date’s salary? What’s the chance that your date will make more than $60,000? What’s the chance that your date will make less than $50,000? We can answer these questions by using the fact that under the simple linear regression model, the distribution of Y|X is normal, here the subpopulation of teachers with 8 years of experience has a normal distribution with mean $54,100 and standard deviation $4,611.

95% interval: For the subpopulation of teachers with 8 years of experience, 95% of the salaries will be within two SDs of the mean. An interval that will contain a randomly chosen teacher’s salary with 8 years of experience 95% of the time is: $54,100 2*$4,611 = ($44,878,$63,322). What’s the probability that your date will make more than $60,000? If you don’t have any additional information about your date other than his or her number of years of teaching, we can assume that your date is a random draw from the subpopulation of teachers with 8 years of teaching. According to the simple linear regression model, the subpopulation of teachers with 8 years of experience is estimated to have a normal distribution with mean $54,100 and standard deviation $4,611.

Properties of the Normal Distribution (Section 1.3) Suppose a variable Y has a normal distribution with mean and standard deviation . Then follows a standard normal distribution. Then the probability that Y is greater than a number c equals where Z equals standard normal distribution with mean 0 and SD 1. The probabilities for a standard normal distribution can be found in Table A. Review Section 1.3 on using the normal tables.

Probability that a teacher with 8 years of experience has salary > $60,000: Probability that a teacher with 8 years of experience has salary between $52,000 and $56,000:

R Squared How much better predictions of your potential date’s salary does the simple linear regression model provide than just using the mean teacher’s salary? This is the question that R squared addresses. R squared: Number between 0 and 1 that measures how much of the variability in the response the regression model explains. R squared close to 0 means that using regression for predicting Y|X isn’t much better than mean of Y, R squared close to 1 means that regression is much better than the mean of Y for predicting Y|X.

R Squared Formula Total sum of squares = = the sum of squared prediction errors for using sample mean of Y to predict Y Residual sum of squares = , where is the prediction of Yi from the least squares line.

What’s a good R squared? As with correlation, it depends on the context. A good R2 depends on the context. In precise laboratory work, R2 values under 90% might be too low, but in social science contexts, when a single variable rarely explains great deal of variation in response, R2 values of 50% may be considered remarkably good. The best measure of whether the regression model is providing predictions of Y|X that are accurate enough to be useful is the root mean square error, which tells us the typical error in using the regression to predict Y from X.

Checking the model The simple linear regression model is a great tool but its answers will only be useful if it is the right model for the data. We need to check the assumptions before using the model. Assumptions of the simple linear regression model: Linearity: The mean of Y|X is a straight line. Constant variance: The standard deviation of Y|X is constant. Normality: The distribution of Y|X is normal. Independence: The observations are independent.

Checking that the mean of Y|X is a straight line Scatterplot: Look at whether the mean of Y given X appears to increase or decrease in a straight line.

Residual Plot Residuals: Prediction error of using regression to predict Yi for observation i: , where Residual plot: Plot with residuals on the y axis and the explanatory variable (or some other variable on the x axis.

Residual Plot in JMP: After doing Fit Line, click red triangle next to Linear Fit and then click Plot Residuals. What should the residual plot look like if the simple linear regression model holds? Under simple linear regression model, the residuals should have approximately a normal distribution with mean zero and a standard deviation which is the same for all X. Simple linear regression model: Residuals should appear as a “swarm” of randomly scattered points about their (which is always zero). A pattern in the residual plot that for a certain range of X the residuals tend to be greater than zero or tend to be less than zero indicates that the mean of Y|X is not a straight line.

Summary Normal distribution can be used to calculate probability that Y takes on certain values given X R squared: measure of how much regression improves on ignoring X when predicting Y. Assumptions of simple linear regression model must be checked in order for model to be used. Residual plots can be used to check the linearity assumption. Tuesday’s class: Section 2.4 (more on checking assumptions, outliers and influential points, lurking variables).