Stat 112 -- Notes 4 Chapter 3.5 Chapter 3.7.

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Objectives 10.1 Simple linear regression
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Inference for Regression
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Class 8: Tues., Oct. 5 Causation, Lurking Variables in Regression (Ch. 2.4, 2.5) Inference for Simple Linear Regression (Ch. 10.1) Where we’re headed:
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
The Simple Regression Model
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Lecture 24: Thurs., April 8th
Class 10: Tuesday, Oct. 12 Hurricane data set, review of confidence intervals and hypothesis tests Confidence intervals for mean response Prediction intervals.
Class 7: Thurs., Sep. 30. Outliers and Influential Observations Outlier: Any really unusual observation. Outlier in the X direction (called high leverage.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
CHAPTER 8 Estimating with Confidence
CHAPTER 3 Describing Relationships
Lecture 17 Interaction Plots Simple Linear Regression (Chapter ) Homework 4 due Friday. JMP instructions for question are actually for.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Correlation & Regression
Correlation and Regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter Correlation and Regression 1 of 84 9 © 2012 Pearson Education, Inc. All rights reserved.
Correlation and Regression
BPS - 3rd Ed. Chapter 211 Inference for Regression.
CHAPTER 8 Estimating with Confidence
1 Chapter 10, Part 2 Linear Regression. 2 Last Time: A scatterplot gives a picture of the relationship between two quantitative variables. One variable.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Stat 112 Notes 15 Today: –Outliers and influential points. Homework 4 due on Thursday.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Rule of sample proportions IF:1.There is a population proportion of interest 2.We have a random sample from the population 3.The sample is large enough.
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 19 Linear Patterns.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Section 10.1 Confidence Intervals
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Lecture 10: Correlation and Regression Model.
Stat 112 Notes 5 Today: –Chapter 3.7 (Cautions in interpreting regression results) –Normal Quantile Plots –Chapter 3.6 (Fitting a linear time trend to.
Statistics for Managers Using Microsoft® Excel 5th Edition
©2011 Brooks/Cole, Cengage Learning Elementary Statistics: Looking at the Big Picture1 Lecture 35: Chapter 13, Section 2 Two Quantitative Variables Interval.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Inference about the slope parameter and correlation
Chapter 13 Simple Linear Regression
The simple linear regression model and parameter estimation
CHAPTER 3 Describing Relationships
Cautions about Correlation and Regression
Stat 112 Notes 4 Today: Review of p-values for one-sided tests
CHAPTER 8 Estimating with Confidence
CHAPTER 8 Estimating with Confidence
CHAPTER 3 Describing Relationships
Chapter 3 Describing Relationships Section 3.2
Simple Linear Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 8 Estimating with Confidence
CHAPTER 3 Describing Relationships
CHAPTER 8 Estimating with Confidence
CHAPTER 3 Describing Relationships
Presentation transcript:

Stat 112 -- Notes 4 Chapter 3.5 Chapter 3.7

Teachers’ Salaries and Dating In U.S. culture, it is usually considered impolite to ask how much money a person makes. However, suppose that you are single and are interested in dating a particular person. Of course, salary isn’t the most important factor when considering whom to date but it certainly is nice to know (especially if it is high!) In this case, the person you are interested in happens to be a high school teacher, so you know a high salary isn’t an issue. Still you would like to know how much she or he makes, so you take an informal survey of 11 high school teachers that you know.

You happen to know that the person you are interested in has been teaching for 8 years. How can you use this information to better predict your potential date’s salary? Regression Analysis to the Rescue! You go back to each of the original 11 teachers you surveyed and ask them for their years of experience. Simple Linear Regression Model: E(Y|X)= , the distribution of Y given X is normal with mean and standard deviation .

Predicted salary of your potential date who has been a teacher for 8 years = Estimated Mean salary for teachers of 8 years = 40612.135+1686.0674*8 = $54,100 How far off will your estimate typically be? Root mean square error = Estimated standard deviation of Y|X = $4,610.93. Notice that the typical error of your estimate of teacher salary using experience, $4,610.93, is less than that of using only information on mean teacher salary, $6,491.20. Regression analysis enables you to better predict your potential date’s salary.

R Squared How much better predictions of your potential date’s salary does the simple linear regression model provide than just using the mean teacher’s salary? This is the question that R squared addresses. R squared: Number between 0 and 1 that measures how much of the variability in the response the regression model explains. R squared close to 0 means that using regression for predicting Y|X isn’t much better than mean of Y, R squared close to 1 means that regression is much better than the mean of Y for predicting Y|X.

R Squared Formula Total sum of squares = = the sum of squared prediction errors for using sample mean of Y to predict Y Residual sum of squares = , where is the prediction of Yi from the least squares line.

What’s a good R squared? A good R2 depends on the context. In precise laboratory work, R2 values under 90% might be too low, but in social science contexts, when a single variable rarely explains great deal of variation in response, R2 values of 50% may be considered remarkably good. The best measure of whether the regression model is providing predictions of Y|X that are accurate enough to be useful is the root mean square error, which tells us the typical error in using the regression to predict Y from X.

More Information About Your Potential Date’s Salary: Prediction Intervals From the regression model, you predict that your potential date’s salary is $54,100 and the typical error you expect to make in your prediction is $4,611. Suppose you want to know an interval that will most of the time (say 95% of the time) contain your date’s salary? We can find such a prediction interval by using the fact that under the simple linear regression model, the distribution of Y|X is normal, here the subpopulation of teachers with 8 years of experience has a normal distribution with estimated mean $54,100 and estimated standard deviation $4,611.

Prediction Interval A 95% prediction interval has the property that if we repeatedly take samples from a population with the simple regression model where are fixed at their current values and then sample with ,the prediction interval will contain 95% of the time.

Prediction Interval for Your Date’s Salary Suppose your date has 8 years of experience.

Prediction Intervals in JMP After using Fit Line, click the red triangle next to Linear Fit and click Confid Curves Indiv. Use the crosshair tool (under Tools) to find the exact prediction interval for a particular x value.

Association vs. Causality A high means that x has a strong linear relationship with y – there is a strong association between x and y. It does not imply that x causes y. Alternative explanations for high : Reverse is true. Y causes X. There may be a lurking (confounding) variable related to both x and y which is the common cause of x and y

Example A community in the Philadelphia area is interested in how crime rates affect property values. If low crime rates increase property values, the community may be able to cover the costs of increased police protection by gains in tax revenues from higher property values. Data on the average housing price and crime rate (per 1000 population) communities in Pennsylvania near Philadelphia for 1996 are shown in housecrime.JMP.

Questions Can you deduce a cause-and-effect relationship from these data? What are other explanations for the association between housing prices and crime rate other than that high crime rates cause low housing prices? Does the simple linear regression model appear to hold?

Extrapolation When constructing estimates of or predicting individual values of a dependent value based on , caution must be used if is outside the range of the observed x’s. The data does not provide information about whether the simple linear regression model continues to hold outside of the range of the observed x’s. Example: The crime rate in Center City Philadelphia is 366.1. Does the simple linear regression model fit from housecrimerate.JMP provide an accurate prediction of the average house price in Center City.