Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6, 9.1 Least squares line Interpreting.

Slides:



Advertisements
Similar presentations
Simple Linear Regression Conditions Confidence intervals Prediction intervals Section 9.1, 9.2, 9.3 Professor Kari Lock Morgan Duke University.
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Objectives 10.1 Simple linear regression
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Simple Regression Model
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
CHAPTER 3 Describing Relationships
Correlation and Regression Analysis
Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Correlation & Regression
Relationship of two variables
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory variables.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 9.1 Inference for correlation Inference for.
Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/6/12 Simple Linear Regression SECTIONS 9.1, 9.3 Inference for slope (9.1)
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
WARM-UP Do the work on the slip of paper (handout)
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Simple linear regression Tron Anders Moger
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
^ y = a + bx Stats Chapter 5 - Least Squares Regression
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6 Least squares line Interpreting coefficients.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/20/12 Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/6/12 Simple Linear Regression SECTION 2.6 Interpreting coefficients Prediction.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 9.1 Inference for correlation Inference for.
1 Objective Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend. Section 10.3 Regression.
CHAPTER 3 Describing Relationships
Linear Regression Special Topics.
CHAPTER 3 Describing Relationships
CHAPTER 29: Multiple Regression*
Two Quantitative Variables: Linear Regression
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
^ y = a + bx Stats Chapter 5 - Least Squares Regression
CHAPTER 3 Describing Relationships
Least-Squares Regression
BA 275 Quantitative Business Methods
Chapter 3 Describing Relationships Section 3.2
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Least-Squares Regression
Chapter 3: Describing Relationships
Least-Squares Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Algebra Review The equation of a straight line y = mx + b
Warm-up: Pg 197 #79-80 Get ready for homework questions
9/27/ A Least-Squares Regression.
Chapter 3: Describing Relationships
Homework: PG. 204 #30, 31 pg. 212 #35,36 30.) a. Reading scores are predicted to increase by for each one-point increase in IQ. For x=90: 45.98;
CHAPTER 3 Describing Relationships
Presentation transcript:

Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6, 9.1 Least squares line Interpreting coefficients Prediction Cautions Inference for slope, correlation

Statistics: Unlocking the Power of Data Lock 5 ANOVA is used to test for an association between a)Two categorical variables b)One categorical and one quantitative variable c)Two quantitative variables Review

Statistics: Unlocking the Power of Data Lock 5 A  2 test is used to test for an association between a)Two categorical variables b)One categorical and one quantitative variable c)Two quantitative variables Review

Statistics: Unlocking the Power of Data Lock 5 MODELING

Statistics: Unlocking the Power of Data Lock 5 Can you estimate the temperature on a summer evening, just by listening to crickets chirp? We will fit a model to predict temperature based on cricket chirp rate Crickets and Temperature

Statistics: Unlocking the Power of Data Lock 5 Crickets and Temperature Response Variable, y Explanatory Variable, x

Statistics: Unlocking the Power of Data Lock 5 Linear Model A linear model predicts a response variable, y, using a linear function of explanatory variables Simple linear regression predicts on response variable, y, as a linear function of one explanatory variable, x

Statistics: Unlocking the Power of Data Lock 5 Regression Line Goal: Find a straight line that best fits the data in a scatterplot

Statistics: Unlocking the Power of Data Lock 5 Equation of the Line The estimated regression line is InterceptSlope Slope: increase in predicted y for every unit increase in x Intercept: predicted y value when x = 0

Statistics: Unlocking the Power of Data Lock 5 Regression in StatKey

Statistics: Unlocking the Power of Data Lock 5 Regression in RStudio

Statistics: Unlocking the Power of Data Lock 5 Regression Model Which is a correct interpretation? a) The average temperature is  b) For every extra 0.23 chirps per minute, the predicted temperate increases by 1 degree c) Predicted temperature increases by 0.23 degrees for each extra chirp per minute d) For every extra 0.23 chirps per minute, the predicted temperature increases by 

Statistics: Unlocking the Power of Data Lock 5 Units It is helpful to think about units when interpreting a regression equation y units x units degrees chirps per minute degrees/ chirps per min

Statistics: Unlocking the Power of Data Lock 5 Prediction The regression equation can be used to predict y for a given value of x If you listen and hear crickets chirping about 140 times per minute, your best guess at the outside temperature is

Statistics: Unlocking the Power of Data Lock 5 Prediction

Statistics: Unlocking the Power of Data Lock 5 Prediction If the crickets are chirping about 180 times per minute, your best guess at the temperature is (a) 60  (b) 70  (c) 80 

Statistics: Unlocking the Power of Data Lock 5 Prediction The intercept tells us that the predicted temperature when the crickets are not chirping at all is . Do you think this is a good prediction? (a) Yes (b) No

Statistics: Unlocking the Power of Data Lock 5 Regression Caution 1 Do not use the regression equation or line to predict outside the range of x values available in your data (do not extrapolate!) If none of the x values are anywhere near 0, then the intercept is meaningless!

Statistics: Unlocking the Power of Data Lock 5 Duke Rank and Duke Shirts a)positively associated b)negatively associated c)not associated d)other Are the rank of Duke among schools applied to and the number of Duke shirts owned

Statistics: Unlocking the Power of Data Lock 5 Regression Caution 2 Computers will calculate a regression line for any two quantitative variables, even if they are not associated or if the association is not linear ALWAYS PLOT YOUR DATA! The regression line/equation should only be used if the association is approximately linear

Statistics: Unlocking the Power of Data Lock 5 Regression Caution 3 Outliers (especially outliers in both variables) can be very influential on the regression line ALWAYS PLOT YOUR DATA! l.aspx?ID=L455

Statistics: Unlocking the Power of Data Lock 5 Life Expectancy and Birth Rate Coefficients: (Intercept) LifeExpectancy Which of the following interpretations is correct? (a)A decrease of 0.89 in the birth rate corresponds to a 1 year increase in predicted life expectancy (b)Increasing life expectancy by 1 year will cause birth rate to decrease by 0.89 (c)Both (d)Neither

Statistics: Unlocking the Power of Data Lock 5 Regression Caution 4 Higher values of x may lead to higher (or lower) predicted values of y, but this does NOT mean that changing x will cause y to increase or decrease Causation can only be determined if the values of the explanatory variable were determined randomly (which is rarely the case for a continuous explanatory variable)

Statistics: Unlocking the Power of Data Lock 5 Explanatory and Response Unlike correlation, for linear regression it does matter which is the explanatory variable and which is the response

Statistics: Unlocking the Power of Data Lock 5 Regression Line How do we find the best fitting line???

Statistics: Unlocking the Power of Data Lock 5 Predicted and Actual Values

Statistics: Unlocking the Power of Data Lock 5 Predicted and Actual Values

Statistics: Unlocking the Power of Data Lock 5 Residual The residual is also the vertical distance from each point to the line

Statistics: Unlocking the Power of Data Lock 5 Residual Want to make all the residuals as small as possible. How would you measure this?

Statistics: Unlocking the Power of Data Lock 5 Least Squares Regression Least squares regression chooses the regression line that minimizes the sum of squared residuals

Statistics: Unlocking the Power of Data Lock 5 Least Squares Regression

Statistics: Unlocking the Power of Data Lock 5 Sample to Population Everything we have done so far is based solely on sample data Now, we will extend from the sample to the population

Statistics: Unlocking the Power of Data Lock 5 Intercept Slope Simple Linear Model Random error

Statistics: Unlocking the Power of Data Lock 5 Inference for the Slope

Statistics: Unlocking the Power of Data Lock 5 Inference for Slope Give a 95% confidence interval for the true slope. Is the slope significantly different from 0? (a) Yes (b) No

Statistics: Unlocking the Power of Data Lock 5 Confidence Interval

Statistics: Unlocking the Power of Data Lock 5 Hypothesis Test

Statistics: Unlocking the Power of Data Lock 5 Test for a correlation between temperature and cricket chirps (r = ). Correlation Test

Statistics: Unlocking the Power of Data Lock 5 Two Quantitative Variables The t-statistic (and p-value) for a test for a non-zero slope and a test for a non-zero correlation are identical! They are equivalent ways of testing for an association between two quantitative variables.

Statistics: Unlocking the Power of Data Lock 5 Small Samples The t-distribution is only appropriate for large samples (definitely not n = 7)! We should have done inference for the slope using simulation methods...

Statistics: Unlocking the Power of Data Lock 5

r = 0 Challenge: If the correlation between x and y is 0, what would the regression line be?

Statistics: Unlocking the Power of Data Lock 5 To Do Read Section 2.6, 9.1 Do HW 7 (due Monday, 3/31)  NO LATE HOMEWORK ACCEPTED – SOLUTIONS WILL BE POSTED IMMEDIATELY AFTER CLASS TO HELP YOU PREPARE FOR EXAM 2