Lecture 17: Tues., March 16 Inference for simple linear regression (Ch. 7.3-7.4) R2 statistic (Ch. 8.6.2) Association is not causation (Ch. 7.5.3) Next.

Slides:



Advertisements
Similar presentations
Inference in the Simple Regression Model
Advertisements

Lesson 10: Linear Regression and Correlation
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
Ch11 Curve Fitting Dr. Deshi Ye
Simple Linear Regression
Chapter 10 Simple Regression.
Simple Linear Regression
Linear Regression with One Regression
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Stat 112 – Notes 3 Homework 1 is due at the beginning of class next Thursday.
Lecture 6 Notes Note: I will homework 2 tonight. It will be due next Thursday. The Multiple Linear Regression model (Chapter 4.1) Inferences from.
Chapter Topics Types of Regression Models
Class 10: Tuesday, Oct. 12 Hurricane data set, review of confidence intervals and hypothesis tests Confidence intervals for mean response Prediction intervals.
Class 9: Thurs., Oct. 7 Inference in regression (Ch ) –Confidence intervals for slope –Hypothesis test for slope –Confidence intervals for mean.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
1 Simple Linear Regression and Correlation Chapter 17.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Pertemua 19 Regresi Linier
Lecture 17 Interaction Plots Simple Linear Regression (Chapter ) Homework 4 due Friday. JMP instructions for question are actually for.
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Correlation and Regression Analysis
Simple Linear Regression Analysis
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Correlation & Regression
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
STA291 Statistical Methods Lecture 27. Inference for Regression.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Statistical Methods Statistical Methods Descriptive Inferential
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
AP STATISTICS LESSON 14 – 1 ( DAY 1 ) INFERENCE ABOUT THE MODEL.
Economics 173 Business Statistics Lecture 10 Fall, 2001 Professor J. Petry
Chapter 8: Simple Linear Regression Yang Zhenlin.
© 2001 Prentice-Hall, Inc.Chap 13-1 BA 201 Lecture 18 Introduction to Simple Linear Regression (Data)Data.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
1 Simple Linear Regression and Correlation Least Squares Method The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
BPS - 5th Ed. Chapter 231 Inference for Regression.
Introduction. We want to see if there is any relationship between the results on exams and the amount of hours used for studies. Person ABCDEFGHIJ Hours/
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Warm-Up The least squares slope b1 is an estimate of the true slope of the line that relates global average temperature to CO2. Since b1 = is very.
Regression Analysis AGEC 784.
Inference for Least Squares Lines
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Linear Regression.
AP Statistics Chapter 14 Section 1.
CHAPTER 12 More About Regression
Simple Linear Regression - Introduction
CHAPTER 29: Multiple Regression*
Chapter 12 Inference on the Least-squares Regression Line; ANOVA
Regression Models - Introduction
Simple Linear Regression
CHAPTER 12 More About Regression
Simple Linear Regression
CHAPTER 12 More About Regression
Inferences for Regression
Inference for Regression
Presentation transcript:

Lecture 17: Tues., March 16 Inference for simple linear regression (Ch. 7.3-7.4) R2 statistic (Ch. 8.6.2) Association is not causation (Ch. 7.5.3) Next class: Diagnostics for asssumptions of simple linear regression model (Ch. 8.2-8.3)

Regression Goal of regression: Estimate the mean response Y for subpopulations X=x, Example: Y= catheter length required, X=height Simple linear regression model: Estimate and by least squares – choose to minimize the sum of squared residuals (prediction errors)

Car Price Example A used-car dealer wants to understand how odometer reading affects the selling price of used cars. The dealer randomly selects 100 three-year old Ford Tauruses that were sold at auction during the past month. Each car was in top condition and equipped with automatic transmission, AM/FM cassette tape player and air conditioning. carprices.JMP contains the price and number of miles on the odometer of each car.

Inference for Simple Linear Regression Inference based on the ideal simple linear regression model holding. Inference based on taking repeated random samples ( ) from the same subpopulations ( ) as in the observed data. Types of inference: Hypothesis tests for intercept and slope Confidence intervals for intercept and slope Confidence interval for mean of Y at X=X0 Prediction interval for future Y for which X=X0

Ideal Simple Linear Regression Model Assumptions of ideal simple linear regression model There is a normally distributed subpopulation of responses for each value of the explanatory variable The means of the subpopulations fall on a straight-line function of the explanatory variable. The subpopulation standard deviations are all equal (to ) The selection of an observation from any of the subpopulations is independent of the selection of any other observation.

Sampling Distributions of and See handout. See Display 7.7 Standard deviation is smaller for (i) larger n, (ii) smaller , (iii) larger spread in x (higher )

Hypothesis tests for and Hypothesis test of vs. Based on t-test statistic, p-value has usual interpretation, probability under the null hypothesis that |t| would be at least as large as its observed value, small p-value is evidence against null hypothesis Interpretation of null hypothesis: X is not a useful predictor of Y, mean of Y is not associated with X. Test for vs. is based on an analogous test statistic. Test statistics and p-values can be found on JMP output under parameter estimates, obtained by using fit line after fit Y by X. For car price data, convincing evidence that both intercept and slope are not zero (p-value <.0001 for both).

Confidence Intervals for and Confidence intervals provide a range of plausible values for and 95% Confidence Intervals: Table A.2 lists . It is approximately 2. Finding CIs in JMP: Go to parameter estimates, right click, click Columns and then click Lower 95% and Upper 95%. For car price data set, CIs:

Two prediction problems The used-car dealer has an opportunity to bid on a lot of cars offered by a rental company. The rental company has 250 Ford Tauruses, all equipped with automatic transmission, air conditioning and AM/FM cassette tape players. All of the cars in this lot have about 40,000 miles on the odometer. The dealer would like an estimate of the average selling price of all cars in this lot (or, virtually equivalently, average selling price of population of Ford Tauruses with above equipment and 40,000 miles on the odometer). The used-car dealer is about to bid on a 3-year old Ford Taurus equipped with automatic transmission, air conditioner and AM/FM cassette tape player and with 40,000 miles on the odometer. The dealer would like to predict the selling price of this particular car.

Prediction problem (a) Goal is to estimate the conditional mean of selling price given odometer reading=40,000, Point estimate is What is a range of plausible values for ?

Confidence Intervals for Mean of Y at X=X0 What is a plausible range of values for ? 95% CI for : , Note about formula Precision in estimating is not constant for all values of X. Precision decreases as X0 gets farther away from sample average of X’s JMP implementation: Use Confid Curves fit command under red triangle next to Linear Fit after using Fit Y by X, fit line. Use the crosshair tool to find the exact values of the confidence interval endpoints for a given X0.

Prediction Problem (b) Goal is to estimate the selling price of a given car with odometer reading=40,000. What are likely values for a future value Y0 at some specified value of X (=X0)? Best prediction is the estimated mean response for X=X0: A prediction interval is an interval of likely values along with a measure of the likelihood that interval will contain response. 95% prediction interval for X0: If repeated samples are obtained from the subpopulations and a prediction interval is formed, the prediction interval will contain the value of Y0 for a future observation from the subpopulation X0 95% of the time.

Prediction Intervals Cont. Prediction interval must account for two sources of uncertainty: Uncertainty about the location of the subpopulation mean Uncertainty about where the future value will be in relation to its mean Prediction Error = Random Sampling Error + Estimation Error

Prediction Interval Formula 95% prediction interval at X0 Compare to 95% CI for mean at X0: Prediction interval is wider due to random sampling error in future response As sample size n becomes large, margin of error of CI for mean goes to zero but margin of error of PI doesn’t. JMP implementation: Use Confid Curves Indiv command under red triangle next to Linear Fit after using Fit Y by X, fit line. Use the crosshair tool to find the exact values of the confidence interval endpoints for a given X0.

R-Squared The R-squared statistic, also called the coefficient of determination, is the percentage of response variation explained by the explanatory variable. Unitless measure of strength of relationship between x and y Total sum of squares = . Best sum of squared prediction error without using x. Residual sum of squares =

R-Squared Example R2=.6501. Read as “65.01 percent of the variation in car prices was explained by the linear regression on odometer.”

Interpreting R2 R2 takes on values between 0 and 1, with higher R2 indicating a stronger linear association. If the residuals are all zero (a perfect fit), then R2 is 1. If the least squares line has slope 0, R2 will be 0. R2 is useful as a unitless summary of the strength of linear association.

Caveats about R2 R2 is not useful for assessing model adequacy, i.e., does simple linear regression model hold (use residual plots) or whether or not there is an association (use test of vs. ) A good R2 depends on the context. In precise laboratory work, R2 values under 90% might be too low, but in social science contexts, when a single variable rarely explains great deal of variation in response, R2 values of 50% may be considered remarkably good.

Association is not causation A high means that x has a strong linear relationship with y – there is a strong association between x and y. It does not imply that x causes y. Alternative explanations for high : Reverse is true. Y causes X. There may be a lurking (confounding) variable related to both x and y which is the common cause of x and y No cause and effect relationship can be inferred unless X is randomly assigned to units in a random experiment. A researcher measures the number of television sets per person X and the average life expectancy Y for the world’s nations. The regression line has a positive slope – nations with many TV sets have higher life expectancies. Could we lengthen the lives of people in Rwanda by shipping them TV sets?

Example A community in the Philadelphia area is interested in how crime rates affect property values. If low crime rates increase property values, the community may be able to cover the costs of increased police protection by gains in tax revenues from higher property values. Data on the average housing price and crime rate (per 1000 population) for communities in Pennsylvania near Philadelphia for 1996 are shown in housecrime.JMP.

Questions Can you deduce a cause-and-effect relationship from these data? What are other explanations for the association between housing prices and crime rate other than that high crime rates cause low housing prices? Does the ideal simple linear regression model appear to hold?