Regression Chapter 5 January 24 – 25 2012 Part II.

Slides:



Advertisements
Similar presentations
Chapter 12 Inference for Linear Regression
Advertisements

Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Chapter 3 Bivariate Data
Chapter 6: Exploring Data: Relationships Lesson Plan
Scatter Diagrams and Linear Correlation
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
BPS - 5th Ed. Chapter 51 Regression. BPS - 5th Ed. Chapter 52 u Objective: To quantify the linear relationship between an explanatory variable (x) and.
CHAPTER 3 Describing Relationships
Basic Practice of Statistics - 3rd Edition
Correlation & Regression
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
Objectives (BPS chapter 5)
Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions.
Introduction to Linear Regression and Correlation Analysis
AP Statistics Section 15 A. The Regression Model When a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative.
Chapter 3 concepts/objectives Define and describe density curves Measure position using percentiles Measure position using z-scores Describe Normal distributions.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u Objective: To quantify the linear relationship between an explanatory variable (x) and.
Chapter 5 Regression BPS - 5th Ed. Chapter 51. Linear Regression  Objective: To quantify the linear relationship between an explanatory variable (x)
BPS - 5th Ed. Chapter 51 Regression. BPS - 5th Ed. Chapter 52 u Objective: To quantify the linear relationship between an explanatory variable (x) and.
Examining Relationships in Quantitative Research
Chapters 8 & 9 Linear Regression & Regression Wisdom.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict.
Examining Relationships in Quantitative Research
^ y = a + bx Stats Chapter 5 - Least Squares Regression
ANOVA, Regression and Multiple Regression March
Stat 1510: Statistical Thinking and Concepts REGRESSION.
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Part II Exploring Relationships Between Variables.
Chapter 5: 02/17/ Chapter 5 Regression. 2 Chapter 5: 02/17/2004 Objective: To quantify the linear relationship between an explanatory variable (x)
Chapter 12: Correlation and Linear Regression 1.
Regression Analysis AGEC 784.
CHAPTER 3 Describing Relationships
Chapter 4.2 Notes LSRL.
Essential Statistics Regression
CHAPTER 3 Describing Relationships
Cautions about Correlation and Regression
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Stats Club Marnie Brennan
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
No notecard for this quiz!!
Unit 3 – Linear regression
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
^ y = a + bx Stats Chapter 5 - Least Squares Regression
CHAPTER 3 Describing Relationships
Least-Squares Regression
Basic Practice of Statistics - 5th Edition Regression
HS 67 (Intro Health Stat) Regression
Basic Practice of Statistics - 3rd Edition Regression
Least-Squares Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Algebra Review The equation of a straight line y = mx + b
Basic Practice of Statistics - 3rd Edition Lecture Powerpoint
Chapters Important Concepts and Terms
CHAPTER 3 Describing Relationships
Presentation transcript:

Regression Chapter 5 January 24 – Part II

Mad Correlators and Regressors on the loose Regression lines and why they are important The least square line Don’t even think about doing this without software Residuals and why they matter Influential observations Cautions about correlations and regression Association DOES NOT equal causation

Regression lines and why they are important As social scientists there are few things we want to do including: –Describe phenomena and events –Explain why phenomena and events are the way they are –Predict what will happen Regression lines and the equations that produce them help us do all of this

What is a regression line A regression line is a straight line that describes how a dependent variable (y) changes in response to changes in an independent variable (x). It will have the format: y = a +bx (+e) –“a” is where the line intercepts the Y axis –“b” is the slope of the line –“e” is error (you always have some)

As with before, As we learned in the last lesson, it is best to start your exploration of the association with a scatterplot. You can usually fit a regression line right on to your scatter plot

The Least Square Regression Line The least square regression line is the line that is mathematically closest to each and every point in the scatterplot. As the book notes, it is a relatively simple procedure and one that almost any statistical software package will do.

Here it is in Excel using data from Ex. 5.5 in your book

And here are the results

Please Remember When drafting a regression line it matters which variable you identify as response and explanatory Slope (b) tells us what happens to (y) when we have a one unit change in (x) (beta) simply expresses (b) in standardized or “z” units. Strength of association is given by “r” And something new is added “goodness of fit”

R 2 or Rsq R 2 is a measure of “goodness of fit” used in regression. This statistic seeks to offer an answer to the question: How well does the sum of squares line of best fit, fit the data? It is the fraction of the variation in the values of y that is explained by the regression of y on x. R 2 varies between 0 (no meaningful fit) and 1 (perfect fit). In truth you should always be suspicious of results that approach 1 too closely.

Regression requires certain assumptions be met to yield meaningful results The y variable in an OLS (ordinary least squares) regression must be measured at the interval or ratio level, the x variable can be at any level of measure. –However, Interval and Ratio level (x) variables produce outputs that are easier to interpret –Ordinal and Nominal level (x) variables must be restated in binary terms. If they are not already measured as yes/no, this/the other, 1,0 then they must be recoded into dummy variables For example, imagine a variable showing which party respondents to a survey voted for. There are no “no answers” or “other parties” involved. It is a nominal coded: –Party (a) = 1 –Party (b) = 2 –Party (c) = 3 To create dummy variables we recode these as –Variable a voted for party a = 1 did not vote for party a = 0 –Variable b voted for party b = 1 did not vote for party b = 0 We don’t have a variable for Party C as it is already included. If someone did not vote for party a or party b then by default they must have voted for party C.

Regression Assumptions and Requirements continued The relationship must be linear All of the observations in an OLS regression must be independent of one another. Including one case ought not to cause another to be automatically included. These are things you will check before you even begin a regression

Some things you need to check after doing a regression analysis (my favourite) for each value of the independent variable, the values of the dependent variable must be normally distributed. The variance of the distribution of the dependent variable must be the same for all values of the independent variable

Residuals (errors) We can check that our regression meets these requirements for the independence of observations, the normal distribution of values for the dependent variable for each value of the independent variable and for a constant variance of the dependent variable for all values of the independent variable. We do this by plotting and analyzing the residuals.

Residuals are the error terms for each case in our scatter plot. The regression line predicts the (y,x) location for each case or observation. The distance from there to where the observation really is, is the residual for the observation. Residuals can be calculated in different ways to suit different tasks Later in the term we will look at how you can use plots of residuals to test that your analysis meets the regression assumptions.

An important point about the residuals of an OLS is that they have a mean of zero. For now we just want to use them to check goodness of fit and see if we can find any points that look like they deviate too far from the line.

Influential Observations Observation 16 is clearly an outlier. It is an “influential observation” that is potentially distorting the analysis The question is what to do with it? If you can formulate a methodologically viable reason why the influential observation ought to be removed from the analysis you can. The author of the textbook has provided an applet that shows how removing this outline can impact on the resulting regression line

More cautions about regression Did I mention before that this only works if your data exhibits a linear relationship between the variables? Correlations and OLS are not resistant to extreme values for the variables Beware of extrapolating too far, the book gives nice example of growth rates for children. If you know the rate for 8 and 10 year olds don’t assume the slope of this line will continue for 25 year olds. Beware of lurking (intervening) and possibly hidden variables that impact on your analysis

And the biggest warning Association does not equal proof of causation. Having said that some associations are better than others –The association is strong and statistically significant –The association is consistent and demonstrated repeatedly in different studies –Higher doses of the explanatory variable are associated with stronger responses

The alleged cause must proceed the response in time The alleged cause is plausible (a theoretical argument can be made as to why the association ought to exist).