Linear Regression 1 Sociology 5811 Lecture 19 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Slides:



Advertisements
Similar presentations
Review ? ? ? I am examining differences in the mean between groups
Advertisements

1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Greg C Elvers.
Overview Correlation Regression -Definition
© McGraw-Hill Higher Education. All Rights Reserved. Chapter 2F Statistical Tools in Evaluation.
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Introduction to Linear Regression.
Lecture 3 Cameron Kaplan
Statistics for the Social Sciences
SIMPLE LINEAR REGRESSION
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
SIMPLE LINEAR REGRESSION
Business Statistics - QBM117 Least squares regression.
Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Relationships Among Variables
Least Squares Regression Line (LSRL)
Lecture 5 Correlation and Regression
Chapter 8: Bivariate Regression and Correlation
Lecture 15 Basics of Regression Analysis
SIMPLE LINEAR REGRESSION
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
Correlation and Regression. The test you choose depends on level of measurement: IndependentDependentTest DichotomousContinuous Independent Samples t-test.
Chapter 6 & 7 Linear Regression & Correlation
Linear Functions 2 Sociology 5811 Lecture 18 Copyright © 2004 by Evan Schofer Do not copy or distribute without permission.
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Examining Relationships in Quantitative Research
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
STA291 Statistical Methods Lecture LINEar Association o r measures “closeness” of data to the “best” line. What line is that? And best in what terms.
Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
CORRELATION. Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson’s coefficient of correlation.
Simple Linear Regression In the previous lectures, we only focus on one random variable. In many applications, we often work with a pair of variables.
Examining Relationships in Quantitative Research
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.
LESSON 6: REGRESSION 2/21/12 EDUC 502: Introduction to Statistics.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,
Simple Linear Regression In many scientific investigations, one is interested to find how something is related with something else. For example the distance.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Multiple Regression.
The simple linear regression model and parameter estimation
Statistical analysis.
Correlation, Bivariate Regression, and Multiple Regression
Reasoning in Psychology Using Statistics
Statistical analysis.
Regression 1 Sociology 8811 Copyright © 2007 by Evan Schofer
Chapter 15 Linear Regression
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Regression and Residual Plots
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
CHAPTER 29: Multiple Regression*
Simple Linear Regression
Multiple Regression.
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION
Introduction to Regression
Review I am examining differences in the mean between groups How many independent variables? OneMore than one How many groups? Two More than two ?? ?
MGS 3100 Business Analysis Regression Feb 18, 2016
REGRESSION ANALYSIS 11/28/2019.
Presentation transcript:

Linear Regression 1 Sociology 5811 Lecture 19 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Announcements Final Project Proposals Due next week! Any questions? Today’s Class The linear regression model

Review: Linear Functions Linear functions can summarize the relationship between two variables: –Formula: Happy = Income Linear functions can also be used to “predict” (estimate) a case’s value of variable (Y i ) based on its value of another variable (X i ) If you know the constant and slope “Y-hat” indicates an estimation function: b YX denotes the slope of Y with respect to X

Review: The Linear Regression Model The value of any point (Y i ) can be modeled as: The value of Y for case (i) is made up of A constant (a) A sloping function of the case’s value on variable X (b YX ) An error term (e), the deviation from the line By adding error (e), an abstract mathematical function can be applied to real data points

Review: The Linear Regression Model Visually: Y i = a + bX i + e i Y=2+.5X Constant (a) = 2 a = 2 bX = 3(.5) = 1.5 Case 7: X=3, Y=5 e = 1.5

Review: Estimating Linear Equations Question: How do we choose the best line to describe our real data? Idea: The best regression line is the one with the smallest amount of error The line comes as close as possible to all points Error is simply deviation from the regression line Note: to make all deviation positive, we square it, producing the “sum of squares error”:

Review: Estimating Linear Equations A poor estimation (big error) Y=1.5-1X

Review: Estimating Linear Equations Better estimation (less error) Y=2+.5X

Review: Estimating Linear Equations Look at the improvement (reduction) in error: High Error vs. Low Error

Review: Estimating Linear Equations Goal: Find values of constant (a) and slope (b) that produce the lowest squared error –The “least squares” regression line The formula for the slope (b) that yields the “least squares error” is: Where s 2 x is the variance of X And s YX is the covariance of Y and X A concept we must now define and discuss

Covariance Variance: Sum of deviation about Y-bar over N-1 Covariance (s YX ): Sum of deviation about Y-bar multiplied by deviation around X-bar:

Covariance Covariance: A measure of how much variance of a case in X is accompanied by variance in Y It measures whether deviation (from mean) in X tends to be accompanied by similar deviation in Y –Or if cases with positive deviation in X have negative deviation in Y –This is summed up for all cases in the data The covariance is one numerical measure that characterizes the extent of linear association –As is the correlation coefficient (r)

Covariance Covariance: based on multiplying deviation in X and Y Y-bar =.5 X-bar = -1 This point deviates a lot from both means (3)(2.5) = 7.5 dev = 2.5 dev = 3 This point deviates very little from X-bar, Y-bar (.4)(-.25) =-.01

Covariance Some points fall above both means (or below both means) Y-bar =.5 X-bar = -1 Points falling above both means (or below both means) contribute positively to the covariance: Two positive (or two negative) deviations multiply to give a positive number

Covariance Points falling above one mean but below the other = one positive and one negative deviation Y-bar =.5 X-bar = -1 One positive and one negative deviation multiply to be negative.

Covariance Covariance is positive if cases cluster on diagonal from lower-left to upper-right –Cases that deviate positively on X also deviate positively on Y (and negative X with negative Y) Covariance is negative if cases cluster on opposite diagonal (upper-left to lower-right) –Cases with positive deviation on X are negative on Y (and negative on X with positive on Y) If points are scattered all around, positives and negatives cancel out – the covariance is near zero

Covariance and Slope Note that the covariance has properties similar to the slope In fact, the covariance can be used to calculate a regression slope that minimizes error for all points –The “Ordinary Least Squares” error slope.

Covariance and Slope The slope formula can be written out as follows:

Computing the Constant Once the slope has been calculated, it is simple to determine the constant (a): Simply plug in the values of Y-bar, X-bar, and b Notes: The calculated value of b is called a “coefficient” The value of a is called the constant.

Regression Example Example: Study time and student achievement. –X variable: Average # hours spent studying per day –Y variable: Score on reading test CaseXY Y axis X axis X-bar = 1.8 Y-bar = 18.8

Regression Example Slope = covariance (X and Y) / variance of X –X-bar = 1.8, Y-bar = 18.8 CaseXY X Dev Y Dev XD*YD Sum of X deviation * Y deviation = 51.73

Regression Example Calculating the Covariance: Standard deviation of X = 1.4 Variance = square of S.D. = 1.96 Finally:

Regression Example Results: Slope b = 5.3, constant a = 9.3 Equation: TestScore = *HrsStudied Question: What is the interpretation of b? Answer: For every hour studied, test scores increase by 5.3 points Question: What is the interpretation of the constant? Answer: Individuals who studied zero hours are predicted to score 9.3 on a the test.

Computing Regressions Regression coefficients can be calculated in SPSS –You will rarely, if ever, do them by hand SPSS will estimate: –The value of the constant (a) –The value of the slope (b) –Plus, a large number of related statistics and results of hypothesis testing procedures

Example: Education & Job Prestige Example: Years of Education versus Job Prestige –Previously, we made an “eyeball” estimate of the line Our estimate: Y = 5 + 3X

Example: Education & Job Prestige The actual SPSS regression results for that data: Estimates of a and b: “Constant” = a = Slope for “Year of School” = b = Equation: Prestige = Education A year of education adds 2.5 points job prestige

Example: Education & Job Prestige Comparing our “eyeball” estimate to the actual OLS regression line Our estimate: Y = 5 + 3X Actual OLS regression line computed in SPSS

Example: Education & Job Prestige Much more information is provided: This information allows us to do hypothesis tests about constant & slope The R and R-Square indicate how well the line summarizes the data

R-Square Issue: Even the “best” regression line misses data points. We still have some error. Question: How good is our line at summarizing the relationship between two variables? –Do we have a lot of error? –Or only a little? (i.e., the line closely estimates cases) Specifically, does knowledge of X help us accurately understand values of Y? Solution: The R-Square statistic –Also called “coefficient of determination”

R-Square Variance around Y-bar can be split into two parts: Y-bar “Explained Variance” Y=2+.5X “Error Variance”

R-Square The total variation of a case Y i around Y-bar can be partitioned into two parts (like ANOVA): 1. Explained variance –Also called “Regression Variance” –The variance we predicted based on the line 2. Error variance –The variance not accounted for by the line Summing squared deviation for all cases give us:

R-Square The R-Square statistic is computed as follows: Question: What is R-square if the line is perfect? (i.e., it hits every point, there is no error) Answer: R-square = 1.00 Question: What is R-square if the line is NO HELP in estimating points… (lots of error) Answer: R-square is zero

R-Square Properties of R-square: 1. Tells us the proportion of all variance in Y that is explained as a linear function of X –It measures “how good” our line is at predicting Y 2. Ranges from 0 to 1 –1 indicates that perfect prediction of Y by X –0 indicates that the line explains no variance in Y The R-square indicates how well a variable (or groups of variables) account for variation Y.

Interpreting R-Square R-square is often used as an overall indicators of the “success” of a regression model Higher R-square is considered “better” than lower How high an R-square is “good enough”? –It varies depending on the dependent variable –Orderly phenomena can yield R-square >.9 –“Messy”, random phenomena can yield values like.05 –Look at literature to know what you should expect.

Interpreting R-Square But, finding variables that produce a high R- square is not the only important goal –Not all variables that generate high R-square are sensible to include in a regression analysis –Example: Suppose you want to predict annual income Hourly wage is a very good predictor… Because it is tautologically linked to the dependent variable More sociologically interesting predictors would be social class background, education, race, etc. –Example: Conservatism predicts approval of Bush.