Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan.

Slides:



Advertisements
Similar presentations
Objectives 10.1 Simple linear regression
Advertisements

Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6, 9.1 Least squares line Interpreting.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Correlation and Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 12 Simple Regression
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
CHAPTER 3 Describing Relationships
Correlation and Regression Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Linear Regression/Correlation
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Correlation & Regression
Linear Regression.
Inference for regression - Simple linear regression
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Relationship of two variables
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory variables.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 9.1 Inference for correlation Inference for.
Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/6/12 Simple Linear Regression SECTIONS 9.1, 9.3 Inference for slope (9.1)
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
LECTURE 9 Tuesday, 24 FEBRUARY STA291 Fall Administrative 4.2 Measures of Variation (Empirical Rule) 4.4 Measures of Linear Relationship Suggested.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6 Least squares line Interpreting coefficients.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/20/12 Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Stat 1510: Statistical Thinking and Concepts REGRESSION.
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/6/12 Simple Linear Regression SECTION 2.6 Interpreting coefficients Prediction.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 9.1 Inference for correlation Inference for.
The simple linear regression model and parameter estimation
CHAPTER 3 Describing Relationships
AP Statistics Chapter 14 Section 1.
CHAPTER 3 Describing Relationships
Inference for Regression
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Linear Regression/Correlation
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Least-Squares Regression
BA 275 Quantitative Business Methods
Chapter 3 Describing Relationships Section 3.2
Simple Linear Regression
CHAPTER 12 More About Regression
Least-Squares Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan Duke University

Exam 2 Grades In Class: Lab: Total:

Test whether this data provides evidence that Melanoma is found significantly more often on the left side of the body: one categorical variable -> single proportion 2011 Hollywood movies: If the sample is the same as the population, then no need for inference! Standard deviation of a bootstrap distribution is the standard error Comments on In-Class Exam

Most common reason for points off: applying the wrong method The first step should ALWAYS be asking yourself: What is/are the variable(s)? Are they categorical or quantitative? Always plot/visualize your data. Outliers can strongly affect the results; you should either explain why they are left in, or else remove them Comments on Lab Exam

For any one or two variables, resample( ) gives a confidence interval For any two variables, reallocate( ) tests for an association between the variables No conditions to check! Automatically deals with missing data! Only two commands to remember! No distributions to remember! Simulation Methods

MODELING

Can you estimate the temperature on a summer evening, just by listening to crickets chirp? Crickets and Temperature Response Variable, y Explanatory Variable, x We will fit a model to predict temperature based on cricket chirp rate

A linear model predicts a response variable, y, using a linear function of explanatory variables Simple linear regression predicts on response variable, y, as a linear function of one explanatory variable, x We will create a model that predicts temperature as a linear function of cricket chirp rate Linear Model

Regression Line Goal: Find a straight line that best fits the data in a scatterplot

Predicted and Actual Values The actual response value, y, is the response value observed for a particular data point The predicted response value,, is the response value that would be predicted for a given x value, based on a model In linear regression, the predicted values fall on the regression line directly above each x value The best fitting line is that which makes the predicted values closest to the actual values

Predicted and Actual Values

Residual The residual for each data point is or the vertical distance from the point to the line Want to make all the residuals as small as possible. How would you measure this?

Least Squares Regression Least squares regression chooses the regression line that minimizes the sum of the squared residuals

Least Squares Regression

The estimated regression line is Slope: increase in predicted y for every unit increase in x Intercept: predicted y value when x = 0 Equation of the Line InterceptSlope

Regression in R > lm(Temperature~Chirps) Call: lm(formula = Temperature ~ Chirps) Coefficients: (Intercept) Chirps

Regression Model Which is a correct interpretation? a)The average temperature is b)For every extra 0.23 chirps per minute, the predicted temperate increases by 1 degree c)Predicted temperature increases by 0.23 degrees for each extra chirp per minute d)For every extra 0.23 chirps per minute, the predicted temperature increases by 

Units It is helpful to think about units when interpreting a regression equation y units x units degrees chirps per minute degrees/ chirps per min

Prediction The regression equation can be used to predict y for a given value of x If you listen and hear crickets chirping about 140 times per minute, your best guess at the outside temperature is

Prediction

If the crickets are chirping about 180 times per minute, your best guess at the temperature is (a) 60  (b) 70  (c) 80 

Exam Scores Calculate your residual.

Prediction The intercept tells us that the predicted temperature when the crickets are not chirping at all is . Do you think this is a good prediction? (a) Yes (b) No

Regression Caution 1 Do not use the regression equation or line to predict outside the range of x values available in your data (do not extrapolate!) If none of the x values are anywhere near 0, then the intercept is meaningless!

Duke Rank and Duke Shirts a)positively associated b)negatively associated c)not associated d)other Are the rank of Duke among schools applied to and the number of Duke shirts owned

Duke Rank and Duke Shirts a)positively associated b)negatively associated c)not associated d)other Are the rank of Duke among schools applied to and the number of Duke shirts owned

Regression Caution 2 Computers will calculate a regression line for any two quantitative variables, even if they are not associated or if the association is not linear ALWAYS PLOT YOUR DATA! The regression line/equation should only be used if the association is approximately linear

Regression Caution 3 Outliers (especially outliers in both variables) can be very influential on the regression line ALWAYS PLOT YOUR DATA! l.aspx?ID=L455

Life Expectancy and Birth Rate Coefficients: (Intercept) LifeExpectancy Which of the following interpretations is correct? (a)A decrease of 0.89 in the birth rate corresponds to a 1 year increase in predicted life expectancy (b)Increasing life expectancy by 1 year will cause the birth rate to decrease by 0.89 (c)Both (d)Neither

Regression Caution 4 Higher values of x may lead to higher (or lower) predicted values of y, but this does NOT mean that changing x will cause y to increase or decrease Causation can only be determined if the values of the explanatory variable were determined randomly (which is rarely the case for a continuous explanatory variable)

Explanatory and Response Unlike correlation, for linear regression it does matter which is the explanatory variable and which is the response

r = 0 Challenge: If the correlation between x and y is 0, what would the regression line be?

The population/true simple linear model is  0 and  1, are unknown parameters Can use familiar inference method! InterceptSlope Simple Linear Model Random error

Confidence intervals and hypothesis tests for the slope can be done using the familiar formulas: Population Parameter:  1, Sample Statistic: Use t-distribution with n – 2 degrees of freedom Inference for the Slope

Inference for Slope Give a 95% confidence interval for the true slope. Is the slope significantly different from 0? (a) Yes (b) No

Confidence Interval > qt(.975,5) [1] We are 95% confident that the true slope, regressing temperature on cricket chirp rate, is between and degrees per chirp per minute.

Hypothesis Test > 2*pt(16.21,5,lower.tail=FALSE) [1] e-05 There is strong evidence that the slope is significantly different from 0, and that there is an association between cricket chirp rate and temperature.

Small Samples The t-distribution is only appropriate for large samples (definitely not n = 7)! We should have done inference for the slope using simulation methods...

If results are very significant, it doesn’t really matter if you get the exact p-value… you come to the same conclusion!

Details here Group project on regression (modeling) If you want to change groups, me TODAY OR TOMORROW. If other people in your lab section want to change, I’ll move people around. Need a data set with a quantitative response variable and multiple explanatory variables; explanatory variables must have at least one categorical and at least one quantitative Proposal due next Wednesday Proposal Project 2