PO 141: INTRODUCTION TO PUBLIC POLICY Summer I (2015) Claire Leavitt Boston University.

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Regression and correlation methods
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Chapter 12 Multiple Regression
The Simple Regression Model
Chapter Topics Types of Regression Models
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
Topic 3: Regression.
Ch. 14: The Multiple Regression Model building
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
Relationships Among Variables
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Correlation and Linear Regression
Understanding Multivariate Research Berry & Sanders.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Bivariate Regression Analysis The most useful means of discerning causality and significance of variables.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Introduction to Linear Regression
POSC 202A: Lecture 12/10 Announcements: “Lab” Tomorrow; Final ed out tomorrow or Friday. I will make it due Wed, 5pm. Aren’t I tender? Lecture: Substantive.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Correlation & Regression Analysis
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
11 Chapter 5 The Research Process – Hypothesis Development – (Stage 4 in Research Process) © 2009 John Wiley & Sons Ltd.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
Methods of Presenting and Interpreting Information Class 9.
Stats Methods at IC Lecture 3: Regression.
Chapter 4: Basic Estimation Techniques
Chapter 14 Introduction to Multiple Regression
Regression and Correlation
Chapter 4 Basic Estimation Techniques
Regression Analysis AGEC 784.
26134 Business Statistics Week 5 Tutorial
Basic Estimation Techniques
Chapter 11 Simple Regression
Understanding Standards Event Higher Statistics Award
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
BPK 304W Correlation.
Correlation and Simple Linear Regression
I271B Quantitative Methods
CHAPTER 29: Multiple Regression*
Correlation and Simple Linear Regression
Chapter 7: The Normality Assumption and Inference with OLS
Seminar in Economics Econ. 470
Statistics II: An Overview of Statistics
Product moment correlation
Inferences for Regression
Introduction to Regression
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

PO 141: INTRODUCTION TO PUBLIC POLICY Summer I (2015) Claire Leavitt Boston University

TABLE OF CONTENTS Statistical Inference –Confidence intervals, hypothesis testing, errors, Bayesian inference Regression analysis –Meaning, key terms, OLS, assumptions Problems with regression analysis Applying regression to policy analysis

CONFIDENCE INTERVALS  How confident are we that the true population mean lies within a certain range? We use samples to answer this question  The range in question is the distance from the mean  Confidence intervals based on probabilistic assessments

HYPOTHESIS TESTING  How do we test whether a hypothesis is true or not? We reject or fail to reject (accept) a certain hypothesis with a certain level of confidence  Null hypothesis: The proposition/contention that is being tested  Alternative hypothesis: What must be accepted if we reject the null hypothesis

HYPOTHESIS TESTING  Come up with a null hypothesis (H 0 ) and an alternative hypothesis (H 1 )  Calculate a test statistic that quantifies how far your sample mean is from the true population mean H 0.  The test statistic accounts for how far your sample mean is from the hypothesized population mean (H 0 ) and is also based on the standard error of the mean value of your sample

HYPOTHESIS TESTING  Each test statistic is also based on degrees of freedom: n – 1.  This accounts for the fact that we’re estimating the standard error based on an estimate of the sample mean. How accurate is the estimate? The more degrees of freedom, the more accurate our estimate of the standard error  High test statistic = significant results!

HYPOTHESIS TESTING  With a high test statistic, either your sample is flawed or the null hypothesis is wrong. Which is more likely?  Assess p-value: How likely it is that your observed sample mean was the result of random chance?  A p-value of.03 means that there is only a 3% chance that your sample mean is the result of randomness. There is a 97% chance that your sample mean actually accurately reflects the truth about the population

TYPE I AND TYPE II ERRORS  Type I error: Convict when innocent  Reject the null hypothesis in favor of the alternative when the null hypothesis is true; false positive  Type II error: Acquit when guilty  Fail to reject the null hypothesis when the alternative hypothesis is true; false negative  The goal of the justice system is to minimize Type I errors

BAYESIAN INFERENCE  Bayesian inference is the consistent use of probability to minimize uncertainty—policymakers must update their assessments of likely outcomes based on what has already occurred (evidence); probabilities are conditional upon new information/knowledge coming to light  Example: The O.J. Simpson trial

REGRESSION ANALYSIS  Regression analysis is a good (though not perfect) alternative to controlled experiments, especially for social scientists  Regression analysis isolates the effect of a single explanatory variable on an outcome that researchers are trying to explain

REGRESSION ANALYSIS  Dependent variable: The outcome researchers are trying to explain  Independent variable: The factor that researchers believe might cause the outcome they’re interested in  Control variable: An explanatory/independent variable that probably affects the outcome of interest but is NOT the factor researchers are most interested in

EXPERIMENTS VS. REGRESSION  Experiments:  Treatment and control groups: only one difference (treatment) between two groups in an overall sample  Treatment is applied randomly to the sample  High internal validity—because the treatment is applied randomly, sample does not need to be representative. (We know the treatment works.)  Low external validity—can the results be extrapolated to a larger population? (The treatment works for some people, but will it work for everyone?)

EXPERIMENTS VS. REGRESSION  Regression:  Often, social scientific experiments are impractical, impossible or unethical; regression is the next best thing  Regression seeks to isolate the effect of a “treatment variable” (what researchers are testing) by controlling for other possible effects  “To control for” means to hold the control variable constant; control variables function as the statistical equivalent of control groups in experiments

LINEAR RELATIONSHIPS  Linear regression is a method for fitting a straight line that “best” expresses the relationship between certain explanatory variables and the dependent variable (outcome of interest)  Linear relationships are easier to interpret than any other relationship between variables

KEY TERMS: INTERCEPT  The intercept is where the regression line crosses the y-axis; practically, the intercept can mean the “starting point” or baseline for a relationship  E.g., height and weight —the intercept is the baseline weight from where we begin to assess the relationship (how height contributes to weight)

KEY TERMS: SLOPE  Tells us how Y changes if you change X by one unit  Represents the steepness of a line, demonstrating the severity of the relationship; is there a strong or weak relationship between X and Y?  A slope of 0 means there is no relationship between X and Y  Equation of a line: y = a + β x + ε

KEY TERMS: RESIDUALS  Error terms or residuals are the distances between the OLS regression line and the individual data points  Error terms represent what the regression line does not explain  Example: Height explains some of weight, but not all—the error terms represents what other factors contribute to weight)  Example: Party affiliation is a great predictor of vote choice but other factors matter too

KEY TERMS: RESIDUALS

ORDINARY LEAST SQUARES (OLS)  A linear relationship, expressed by an intercept, a slope and an error term (y = a + βx + ε), means fitting the best line to the data. What do we mean by best?  OLS minimizes the sum of the squared vertical distances from data points to the regression line  Why the squared distances?

ORDINARY LEAST SQUARES (OLS)

LINEAR REGRESSION  Linear regression can be bivariate (one independent variable) or multivariate (one explanatory variable and many control variables)  Variables can be continuous or binary (dummy variables)  y = a + β 1 x 1 + β 2 x 2 + β 3 x 3… + β i x i + ε

LINEAR REGRESSION  Example: A health policy analyst is trying to assess the relationship between number of sodas consumed per week and weight, controlling for other factors that might affect weight: age, hours exercised/week and sex, where 1 = female and 0 = male  What does this equation mean?  y = x x 2 – 3.2x x 4 + ε

LINEAR REGRESSION  y = x x 2 – 3.2x x 4 + ε  When looking at a regression equation, what do we care about?  Size: How big is the effect?  Direction: Is the relationship positive or negative?  Significance: Is this relationship true for the larger population of interest?

SIGNIFICANCE  If a relationship is statistically significant, it means that there is a very small chance that a) the results we see (from a sample) were merely the result of chance/randomness and b) that the relationship isn’t true (for the larger population of interest)

SIGNIFICANCE  Standard significance levels:.10,.5 and.01  These levels are arbitrary, but the smaller the better  Hypothesis testing on slope coefficients:  H 0 : β 1 = 0 (there is no relationship between X and Y)  H 1 : β 1 > 0; β 1 < 0; β 1 ≠ 0 (there is a relationship between X and Y)

SIGNIFICANCE  We compute a test statistic just as we did for sample-means testing, based on:  Standard error of β 1 (what is the dispersion, or spread, we would see if we were to perform the same regression on multiple samples? How much variation is there in our multiple estimates of β 1 ?)

SIGNIFICANCE  How big is the standard error relative to the coefficient, β 1,itself? The test statistic quantifies this ratio:  A p-value is associated with a test statistic, just as in sample-means testing, based on the number of degrees of freedom  P-value = the probability of getting the observed value for β 1 if the null hypothesis were true

LINEAR TRANSFORMATIONS  Because linear relationships are easy to interpret, we may want to transform exponential relationships into linear terms using the common logarithm (base 10)  Example: Say you want to explain income  $100,000 can be transformed into 5 because log ,000 = 5  $101,000 can be transformed into because log ,000 =  $102,000 can be transformed into because log ,000 = 5.008, and so on in a linear fashion; the increase in log(y) for every one-unit change in x is constant

OLS ASSUMPTIONS  In order to perform OLS regression, we must make assumptions about what the error terms (extraneous, non-explanatory factors) look like  Linear regression assumes: 1. The expected value (mean) of error terms is equal to zero 2. Error terms are independent of one another and of the dependent variable 3. Error terms are distributed around the regression line with the same variance

OLS ASSUMPTIONS 1. The expected value (mean) of error terms = 0

OLS ASSUMPTIONS 2. Error terms are independent of one another

OLS ASSUMPTIONS 3. Error terms are distributed around the regression line with the same variance

OTHER PROBLEMS  Correlation ≠ causation  Endogeneity (internal) problems: Which direction does the causal arrow run? Does X explain Y or does Y explain X?  Example: Do higher rates of adult literacy lead to greater economic output, or does greater economic output lead to higher rates of adult literacy?

OTHER PROBLEMS  Omitted variable bias: Is a significant variable masking some other variable and obscuring its effect?  r 2 versus significance: r 2 tells us how much of the variation in Y can be explained by all the independent variables in the regression. But relationships can still be significant even if they only explain a portion of the overall variation in outcome

OTHER PROBLEMS  Multicollinearity: When two variables are highly correlated with one another  Why is this a problem?  If two variables are closely related and both are included in a regression, it becomes difficult to isolate the effects of only one of those variables (thus, both variables might appear insignificant)

OTHER PROBLEMS  Data mining: Including too many independent variables in a regression  Why is this a problem?  If you include too many variables, your r 2 will be high and your model may have more predictive power, but you might obscure the significance of variables that actually do have important effects on the outcome