Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Slides:



Advertisements
Similar presentations
Inference for Regression
Advertisements

Objectives (BPS chapter 24)
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Introduction to Probability and Statistics Linear Regression and Correlation.
CHAPTER 3 Describing Relationships
Correlation and Regression Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Correlation & Regression
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inferences for Regression
+ Chapter 12: Inference for Regression Inference for Linear Regression.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chapter 10 Correlation and Regression
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Chapter 8: Simple Linear Regression Yang Zhenlin.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
1 Chapter 12: Analyzing Association Between Quantitative Variables: Regression Analysis Section 12.1: How Can We Model How Two Variables Are Related?
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
CHAPTER 12 More About Regression
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 12 More About Regression
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 12 More About Regression
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 12 More About Regression
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Algebra Review The equation of a straight line y = mx + b
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

Chapter 14: Inference for Regression

A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data - relationships between 2 numeric, quantitative variables measured on same individual  Each individual appears as an point (x, y) on the scatter plot  Explanatory variable; response variable

Scatterplot; label & scale; look for overall patterns (DOFS)

Measuring Linear Association: Correlation or “r” Correlation (r) measures direction and strength of a linear relationship between two quantitative variables Correlation (r) is always between -1 and 1; makes no sense to have r = -13 or r = 27 Correlation (r) is not resistant (look at formula; based on mean) Correlation is for scatter plots (not LSRL) r is in standard units, so r doesn’t change if units are changed If we change from yards to feet, r is not effected

Measuring Linear Association: Correlation or “r”  r ≈0  not strong linear relationship  r close to 1  strong positive linear relationship  r close to -1  strong negative linear relationship

One better...Least Squares Regression Line (LSRL)

Least Squares Regression (predicts values)

 May be asked to interpret slope of LSRL & y-intercept, in context  Caution: Interpret slope of LSRL as the predicted or average change or expected change in the response variable given a unit change in the explanatory variable  NOT change in y for a unit change in x; LSRL is a model; models are not perfect

Extrapolation... What is it again??

Outliers & Influential Points  All influential points are outliers, but not all outliers are influential points. Influential points/observations: If removed would significantly change LSRL (slope and/or y-intercept)

Coefficient of Determination; r 2  r 2 tells us how well our LSRL describes our data; how well does this linear model fit the data  r 2 is always between 0 and 1 ; 0 ≤ r 2 ≤ 1  r 2, “fraction of the variation of the values of y that are explained by LSRL”  VERSUS r, correlation, -1 ≤ r ≤ 1; describes direction and strength of the linear relationship in a scatter plot

Chapter 14: Inference for Regression  We are now going to take all of that previous knowledge about bi-variate data and apply it to inference (forming judgments about population parameters on the basis of random sampling; a statistic)  Remember, = a + bx is just an estimate, a predictor, a statistic (like or ), based on a sample  Statistics vary from sample to sample

SRS BMW Cars (age & price)  What about another SRS of n = 7? Would data/points possibly be different?  So then would LSRL be different?  What about another SRS?  Data varies from sample to sample  Do we know the true population parameter? Do we have info on ALL BMW’s?

SRS BMW Cars (age & price)  So this LSRL is just based on THESE 7 pieces of data  We don’t know the true, unknown population parameter regression line, y = β o + β 1 x  But we can estimate the true, unknown regression line using a confidence interval... OR... we can test a claim using an hypothesis test

Let’s talk about conditions...  We need to be aware/check conditions before we perform inference (confidence intervals, hypothesis testing) with any situation (means, proportions, linear regression, one- sample, two-sample, Chi-Square, etc.)  If conditions are not met, our inference may be very inaccurate; worthless information

Conditions for Linear Regression Inference  1. Linearity: trend is linear (Use Residuals Plot to Check)  2. Normality: errors follow a Normal distribution with a mean of zero; N (0, σ ) (Use QQ Plot/Normal Probability Plot to Check)  3. Constant standard deviation: the standard deviation σ must be the same for all values of the predictor variable (Use Residuals Plot to Check)  4. Independence: Errors must be independent of one another (review raw data and collection process)

Residuals... we look at these to determine if conditions 1 & 3 are met  Least Squares Regression Line is not perfect, but it’s the best model we have  All points on the scatter plot don’t fit perfectly on the LSRL; very common  Vertical distances from point to LSRL are called “residuals,” or left-overs

Residuals: Observed y value – expected y value  LSRL is the line that creates the least “left-overs,” aka least residuals

Graphical Tool: Residuals Plot  We plot the residuals (left overs, points on scatterplot that are above or below LSRL) to determine if a line is the best model to describe our scatterplot of bivariate data  Perhaps a line isn’t the best model…. Maybe a quadratic curve or a log curve or square root function is a better model for the data

Residuals Plot (truck example)  On left is scatter plot & LSRL; on right is residuals plot

Graphical Tool: Residuals Plot  To check the linearity and the constant standard deviation conditions, should have no obvious pattern, random, unstructured  In the below case, both conditions are met

Residuals Plot  If there is an obvious pattern, conditions 1 & 3 are not met

Condition #2: Normality...  Errors must follow a Normal distribution  Can examine a Normal Probability Plot (NPP) (or a QQ Plot) of the residuals (left-overs)  If NPP is fairly linear, then condition #2 is satisfied

NPP that shows that errors do not follow a Normal distribution

Condition #4... Independence  Errors must be independent of one another  Exam the collection method of the data if possible  In most cases, we must assume independence until if/when we discover otherwise

Equation...for LSRL (sample statistic) Sample statistic: = a + b x where x is the value of the explanatory variable b is estimated slope (sample statistic) a is estimated y-intercept (sample statistic) is the estimated value of the response variable (sample statistic)

Equation... for true, unknown population parameter line Population parameter: y = β o + β 1 x x is the value of the explanatory variable β 1 is the true, actual (but unknown) population slope β 0 is the true, actual (but unknown) population y-intercept y is the true, actual (but unknown) value of the population parameter response variable

Hypothesis testing...  Majority of time, we are most interested in performing an hypothesis test on slope (not y-intercept) H o : Slope = 0 (OR β 1 = 0 OR there is no linear association between two variables OR correlation = 0) H a : Slope ≠ 0 (> or <) (OR β 1 ≠ 0 OR there is a linear association between the two variables OR correlation ≠ 0)... or > or <

Hypothesis testing...  Same 4 steps:  State null and alternative hypothesis  Check conditions  Do calculations  Interpret results in context

Random sample of 9 th grade students going on their annual backpacking trip each fall Is there a linear relationship between body weight and backpack weight? Body Weight (lbs) vs. Backpack Weight (lbs) Body Weight Backpack Weight

H o : No linear relationship between body weight & backpack weight (or β 1 = 0) H a : There is a linear relationship between body weight & backpack weight (or β 1 ≠ 0) Conditions: Assume all conditions have been checked & met. Calculations: Enter data into Minitab (one column for body weight & another for backpack weight); then go to regression, simple regression. Careful of response & predictor (backwards). Choose linear, 95% confidence. Interpretation: Decision, α level, p-value, context. Body Weight (lbs) vs. Backpack Weight (lbs) Body Weight Backpack Weight

Construct a confidence interval at the 95% level. Conditions: Assume all conditions have been checked & met. Calculations: Enter data into Minitab (one column for body weight & another for backpack weight); then go to regression, simple regression. Careful of response & predictor (backwards). Choose linear, 95% confidence. Interpretation: We are 95% confident that the true, unknown population parameter, the true slope, β, is between... Body Weight (lbs) vs. Backpack Weight (lbs) Body Weight Backpack Weight

Do customers who stay longer at buffets give larger/smaller tips?  Xx Time (minutes)Tip ($)

Do customers who stay longer at buffets give larger/smaller tips?  A statistics student investigated this question as part of her project. She obtains a SRS of receipts which included this information.  Does this data provide convincing evidence that customers who stay longer tip differently than customers who stay shorter periods of time? H o : β = 0 (no relationship between variables) H a : β ≠ 0 (customers who stay longer give larger tips) 

Do customers who stay longer at buffets give larger/smaller tips? H o : β = 0 (no relationship between variables) H a : β ≠ 0 (customers who stay longer give larger tips) Conditions: Assume all conditions have been checked and met. Calculations: Enter data into Minitab and run calculations. Interpretation: Decision, α level, p-value, context.

Homework...  Homework  Section Quiz  Our next test...