Lecture 8 Relationships between Scale variables: Regression Analysis

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Chapter 12 Inference for Linear Regression
Lesson 10: Linear Regression and Correlation
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
Objectives (BPS chapter 24)
Graduate School Gwilym Pryce
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
1 Lecture 8 Regression: Relationships between continuous variables Slides available from Statistics & SPSS page of Social.
1 Module II Lecture 3: Misspecification: Non-linearities Graduate School Quantitative Research Methods Gwilym Pryce.
1 Lecture 8 Regression: Relationships between continuous variables Slides available from Statistics & SPSS page of Social.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Linear Regression and Correlation Analysis
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation and Regression Analysis
Simple Linear Regression and Correlation
Leon-Guerrero and Frankfort-Nachmias,
Introduction to Regression Analysis, Chapter 13,
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Multiple Linear Regression Analysis
Lecture 5 Correlation and Regression
Correlation & Regression
Chapter 8: Bivariate Regression and Correlation
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Chapter 11 Simple Regression
Hypothesis Testing in Linear Regression Analysis
Correlation and Regression
Chapter 6 & 7 Linear Regression & Correlation
Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 3 Precise & Approximate Relationships Between Variables Dr Gwilym Pryce.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Introduction to Linear Regression
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Regression Analysis Part C Confidence Intervals and Hypothesis Testing
Prediction, Goodness-of-Fit, and Modeling Issues Prepared by Vera Tabakova, East Carolina University.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Chapter 8: Simple Linear Regression Yang Zhenlin.
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Module II Lecture 1: Multiple Regression
The simple linear regression model and parameter estimation
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Inference for Regression
Chapter 11: Simple Linear Regression
Correlation and Regression
CHAPTER 26: Inference for Regression
Prepared by Lee Revere and John Large
Simple Linear Regression
Simple Linear Regression
Correlation and Regression
Chapter 14 Inference for Regression
Presentation transcript:

Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk Lecture 8 Relationships between Scale variables: Regression Analysis

Notices: Register

Plan: 1. Linear & Non-linear Relationships 2. Fitting a line using OLS 3. Inference in Regression 4. Ommitted Variables & R2 5. Types of Regression Analysis 6. Properties of OLS Estimates 7. Assumptions of OLS 8. Doing Regression in SPSS

1. Linear & Non-linear relationships between variables Often of greatest interest in social science is investigation into relationships between variables: is social class related to political perspective? is income related to education? is worker alienation related to job monotony? We are also interested in the direction of causation, but this is more difficult to prove empirically: our empirical models are usually structured assuming a particular theory of causation

Relationships between scale variables The most straight forward way to investigate evidence for relationship is to look at scatter plots: traditional to: put the dependent variable (I.e. the “effect”) on the vertical axis or “y axis” put the explanatory variable (I.e. the “cause”) on the horizontal axis or “x axis”

Scatter plot of IQ and Income:

We would like to find the line of best fit:

What does the output mean?

Sometimes the relationship appears non-linear:

… and so a straight line of best fit is not always very satisfactory:

Could try a quadratic line of best fit:

But we can simulate a non-linear relationship by first transforming one of the variables:

… or a cubic line of best fit: (overfitted?)

Or could try two linear lines: “structural break”

2. Fitting a line using OLS The most popular algorithm for drawing the line of best fit is one that minimises the sum of squared deviations from the line to each observation: Where: yi = observed value of y = predicted value of yi = the value on the line of best fit corresponding to xi

Regression estimates of a, b: or Ordinary Least Squares (OLS): This criterion yields estimates of the slope b and y-intercept a of the straight line:

3. Inference in Regression: Hypothesis tests on the slope coefficient: Regressions are usually run on samples, so what can we say about the population relationship between x and y? Repeated samples would yield a range of values for estimates of b ~ N(b, sb) I.e. b is normally distributed with mean = b = population mean = value of b if regression run on population If there is no relationship in the population between x and y, then b = 0, & this is our H0

What does the standard error mean?

Hypothesis test on b: (1) H0: b = 0 H1: b  0 (I.e. slope coefficient, if regression run on population, would = 0) H1: b  0 (2) a = 0.05 or 0.01 etc. (3) Reject H0 iff P < a (N.B. Rule of thumb: P < 0.05 if tc  2, and P < 0.01 if tc  2.6) (4) Calculate P and conclude.

Example using SPSS output: (1) H0: no relationship between house price and floor area. H1: there is a relationship (2), (3), (4): P = 1- CDF.T(24.469,554) = 0.000000 Reject H0

4. Ommitted Variables & R2 Q/ is floor area the only factor 4. Ommitted Variables & R2 Q/ is floor area the only factor? How much of the variation in Price does it explain?

R-square R-square tells you how much of the variation in y is explained by the explanatory variable x 0 < R2 < 1 (NB: you want R2 to be near 1). If more than one explanatory variable, use Adjusted R2

Example: 2 explanatory variables

Scatter plot (with floor spikes)

3D Surface Plots: Construction, Price & Unemployment Q = -246 + 27P - 0.2P2 - 73U + 3U2

Construction Equation in a Slump Q = 315 + 4P - 73U + 5U2

5. Types of regression analysis: Univariate regression: one explanatory variable what we’ve looked at so far in the above equations Multivariate regression: >1 explanatory variable more than one equation on the RHS Log-linear regression & log-log regression: taking logs of variables can deal with certain types of non-linearities & useful properties (e.g. elasticities) Categorical dependent variable regression: dependent variable is dichotomous -- observation has an attribute or not e.g. MPPI take-up, unemployed or not etc.

6. Properties of OLS estimators OLS estimates of the slope and intercept parameters have been shown to be BLUE (provided certain assumptions are met): Best Linear Unbiased Estimator

“Best” in that they have the minimum variance compared with other estimators (i.e. given repeated samples, the OLS estimates for α and β vary less between samples than any other sample estimates for α and β). “Linear” in that a straight line relationship is assumed. “Unbiased” because, in repeated samples, the mean of all the estimates achieved will tend towards the population values for α and β. “Estimates” in that the true values of α and β cannot be known, and so we are using statistical techniques to arrive at the best possible assessment of their values, given the information available.

7. Assumptions of OLS: For estimation of a and b to be BLUE and for regression inference to be correct: 1. Equation is correctly specified: Linear in parameters (can still transform variables) Contains all relevant variables Contains no irrelevant variables Contains no variables with measurement errors 2. Error Term has zero mean 3. Error Term has constant variance

4. Error Term is not autocorrelated I.e. correlated with error term from previous time periods 5. Explanatory variables are fixed observe normal distribution of y for repeated fixed values of x 6. No linear relationship between RHS variables I.e. no “multicolinearity”

8. Doing Regression analysis in SPSS To run regression analysis in SPSS, click on Analyse, Regression, Linear:

Select your dependent (i. e. ‘explained’) variable and independent (i Select your dependent (i.e. ‘explained’) variable and independent (i.e. ‘explanatory’) variables:

e.g. Floor area and bathrooms: Floor area = a + b Number of bathrooms + e

Confidence Intervals for regression coefficients Population slope coefficient CI: Rule of thumb:

e.g. regression of floor area on number of bathrooms, CI on slope: = 64.6  7.6 95% CI = ( 57, 72)

Confidence Intervals in SPSS: Analyse, Regression, Linear, click on Statistics and select Confidence intervals:

Our rule of thumb said 95% CI for slope = ( 57, 72) Our rule of thumb said 95% CI for slope = ( 57, 72). How does this compare?

Past Paper: (C2)Relationships (30%) Suppose you have a theory that suggests that time watching TV is determined by gregariousness the less gregarious, the more time spent watching TV Use a random sample of 60 observations from the TV watching data to run a statistical test for this relationship that also controls for the effects of age and gender. Carefully interpret the output from this model and discuss the statistical robustness of the results. Suppose you have a theory that suggests that time watching TV is determined by gregariousness (the less gregarious, the more time spent watching TV). Use the random sample of 60 observations from TV watching data to run a statistical test for this relationship that also controls for the effects of age and gender. Carefully interpret the output from this model and discuss the statistical robustness of the results.

Reading: Regression Analysis: *Field, A. chapters on regression. *Moore and McCabe Chapters on regression. Kennedy, P. ‘A Guide to Econometrics’ Bryman, Alan, and Cramer, Duncan (1999) “Quantitative Data Analysis with SPSS for Windows: A Guide for Social Scientists”, Chapters 9 and 10. Achen, Christopher H. Interpreting and Using Regression (London: Sage, 1982).