FPP 10 kind of Regression 1. Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2.

Slides:



Advertisements
Similar presentations
Chapter 8 Linear Regression.
Advertisements

CHAPTER 8: LINEAR REGRESSION
Scatter Diagrams and Linear Correlation
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Stat 217 – Day 26 Regression, cont.. Last Time – Two quantitative variables Graphical summary  Scatterplot: direction, form (linear?), strength Numerical.
CHAPTER 3 Describing Relationships
Basic Practice of Statistics - 3rd Edition
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions.
Slide Copyright © 2008 Pearson Education, Inc. Chapter 4 Descriptive Methods in Regression and Correlation.
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
Chapter 6 & 7 Linear Regression & Correlation
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Applied Quantitative Analysis and Practices LECTURE#22 By Dr. Osman Sadiq Paracha.
Chapter 10 Correlation and Regression
Summarizing Bivariate Data
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition, we would like to.
Scatterplot and trendline. Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Chapters 8 & 9 Linear Regression & Regression Wisdom.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict.
Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition, we would like to.
 Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Lecture 5 Chapter 4. Relationships: Regression Student version.
LEAST-SQUARES REGRESSION 3.2 Least Squares Regression Line and Residuals.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
CHAPTER 3 Describing Relationships
BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u To describe the change in Y per unit X u To predict the average level of Y at a given.
Stat 1510: Statistical Thinking and Concepts REGRESSION.
CHAPTER 5: Regression ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
Chapter 8 Linear Regression.
Lecture 9 Sections 3.3 Objectives:
Chapter 4: Basic Estimation Techniques
Statistics 200 Lecture #6 Thursday, September 8, 2016
Chapter 4 Basic Estimation Techniques
Sections Review.
Topic 10 - Linear Regression
CHAPTER 3 Describing Relationships
Examining Relationships Least-Squares Regression & Cautions about Correlation and Regression PSBE Chapters 2.3 and 2.4 © 2011 W. H. Freeman and Company.
SCATTERPLOTS, ASSOCIATION AND RELATIONSHIPS
Simple Linear Regression
Chapter 8 Part 2 Linear Regression
CHAPTER 3 Describing Relationships
Least-Squares Regression
1.7 Nonlinear Regression.
Least-Squares Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Algebra Review The equation of a straight line y = mx + b
Honors Statistics Review Chapters 7 & 8
Review of Chapter 3 Examining Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

FPP 10 kind of Regression 1

Plan of attack Introduce regression model Correctly interpret intercept and slope Prediction Pit falls to avoid 2

Regression line Correlation coefficient a nice numerical summary of two quantitative variables It indicates direction and strength of association But does it quantify the association? It would be of interest to do this for Predictions Understanding phenomena 3

Regression line Correlation measures the direction and strength of the straight-line (linear) relationship between two quantitative variables If a scatter plot shows a linear relationship, we would like to summarize this overall pattern by drawing a line on the scatter plot This line represents a mathematical model. Later we will make the mathematical model a statistical one. 4

Slope intercept form review 5

Regression line Slope intercept form notation Regression form notation 6

Regression Price of Homes Based on Square Feet Price = SQFT r =

Which line is best Price = SQFT (red) Price = SQFT (blue) Price = SQFT (green) 8

Which model to use Different people might draw different lines by eye on a scatterplot What are some ways we can determine which model(line) out of all the possible models(lines) is the “best” one? What are some ways that we can numerically rank the different models? (i.e. the different lines) This will come later in the course 9

Slope interpretation The slope, β, of a regression line is almost always important for interpreting the data. The slope is a rate of change. It is the mean amount of change in y-hat when x increases by 1 10

Slope interpretation Price of Homes Based on Square Feet Price = SQFT r = For every 1 sqft increase in size of home on average the house price increases by $159.8 dollars 11

Intercept interpretation The intercept, α, of the regression line is the value of y-hat when x = 0. Although we need the value of the intercept to draw the line, it is statistically meaningful only when x can actually take values close to zero. 12

Intercept interpretation Price of Homes Based on Square Feet Price = SQFT r = If the sqft of a home was 0 on average the house price will be -$90, dollars This doesn’t make much sense here because x (sqft) doesn’t take on values close to zero. 13

Prediction Price of Homes Based on Square Feet Price = SQFT r = For a 3500 sqft home we would predict the selling price to be price = *3500 price = $469,

OECD data: Income and unemployment in the U.S. What is the relationship between households’ disposable income and the nation’s unemployment rate? Data from the U.S to 1998 (data provided by the economics department at Duke) 15

Disposable income vs unemployment rates 16

Disposable income and unemployment rates regression output 17

Facts about regression There is a close relationship between the correlation coefficient and the slope of a regression line They have the same sign They are proportional to each other The intercept has no relationship with the correlation coefficient but here is the formula 18

Facts about regression The distinction between explanatory and response variable is essential in regression If you have a slope computed using x as the explanatory and y as the response variable you can’t “back solve” to get a slope and intercept for the regression model with x being the response and y the explanatory variables. If you want to predict x given a y then you must find the intercept and slope with y being the explanatory variable and x being the response 19

Facts about regression R 2 (coefficient of determination) provides a one number summary of how well regression line fits data R 2 is the percentage of variation in Y’s explained by the regression line R 2 lies between 0 and 1 Values near 1 indicate regression predicts y’s in data set very closely Values near 0 indicate regression does not predict the y’s in the data set very closely 20

Facts about regression Example: The correlation coefficient between sale price and square feet was r = Thus the coefficient of determination is R 2 =(0.8718) 2 =0.76 So 76% of the variability in sale price is explained by (taken into account by) the regression line with square feet. 21

Does regression fit data well? A regression line is reasonable if Association between two variables is indeed linear When points are randomly scattered around line Income/unemployment rate data well-described by regression line. 22

Regression of AIDS rates per 1000 people of GNP per capita Line is too low for GDP values near zero and too high for big GDP values. We shouldn’t use line for predictions 23

Changing the response variable When the regression line fits the data badly, sometimes you can transform variables to obtain a better fitting line. With monetary variables, typically this can be accomplished by taking logarithms. 24

Regression of log(AIDS) on log(GNP) Much better fit Predict log(AIDS) from log(GNP). Exponentiate to estimate AIDS 25

Birth and death rates in 74 countries 26

Warnings about regression Predicting y at values of x beyond the range of x in the data is called extrapolation This is risky, because we have no evidence to believe that the association between x and y remains linear for unseen x values Extrapolated predictions can be absolutely wrong 27

Extrapolation Diamond price and carat Explanatory variable is measured by carats and response variable is dollars Predict price of hope diamond 28

Extrapolation The relationship between diamond carat and price doesn’t remain linear after a carat size of about

Extrapolation Green line is linear fit with only diamonds less then 0.4 carats Blue line is linear fit with all carat sizes Red curve a quadratic fit 30

Lurking variable A variable not being considered could be driving the relationship In practice this is a difficult issue to tackle. Especially when everything seems OK 31

Influential point An outlier in either the X or Y direction which, if removed, would markedly change the value of the slope and y-interept. applet 32

Causality On its own, regression only quantifies an association between x and y It does not prove causality Under a carefully designed experiment (or in some cases observational studies) regression can be used to show causality. 33