Download presentation
Presentation is loading. Please wait.
Published byChristal Carpenter Modified over 8 years ago
1
如何解釋迴歸估計結果
2
Class sizes and test scores Empirical problem: Class size and educational output. Policy question: What is the effect of reducing class size by one student per class? by 8 students/class?
3
What do data say about class sizes and test scores? The California Test Score Data Set All K-6 and K-8 California school districts (n = 420) Variables: 5 th grade test scores (Stanford-9 achievement test, combined math and reading), district average. Student-teacher ratio (STR) = number of students in the district divided by number of full-time equivalent teachers.
4
Question: Do districts with smaller classes (lower STR) have higher test scores? And by how much?
5
The class size/test score policy question: What is the effect of reducing STR by one student/teacher on test scores ? Object of policy interest:. This is the slope of the line relating test score and STR.
6
This suggests that we want to draw a line through the Test Score v.s. STR scatterplot, but how?
7
Interpretation of the estimated slope and intercept Districts with one more student per teacher on average have test scores that are 2.28 points lower. That is, =-2.28. The intercept (taken literally) means that, according to this estimated line, districts with zero students per teacher would have a (predicted) test score of 698.9. This interpretation of the intercept makes no sense – it extrapolates the line outside the range of the data – in this application, the intercept is not itself economically meaningful.
8
Predicted values and residuals: One of the districts in the data set is Antelope, CA, for which ST R = 19.33 and Score = 657.8
9
The initial policy question: Suppose new teachers are hired so the student-teacher ratio falls by one student per class. What is the effect of this policy intervention (this “treatment”) on test scores? Does our regression analysis give a convincing answer? Not really – districts with low STR tend to be ones with lots of other resources and higher income families, which provide kids with more learning opportunities outside school…this suggests that corr(u i,STR i ) > 0, so E(u i |X i ) 0.
10
Digression on Causality The original question (what is the quantitative effect of an intervention that reduces class size?) is a question about a causal effect: the effect on Y of applying a unit of the treatment is 1. But what is, precisely, a causal effect? The common-sense definition of causality isn’t precise enough for our purposes. In this course, we define a causal effect as the effect that is measured in an ideal randomized controlled experiment.
11
Ideal Randomized Controlled Experiment Ideal: subjects all follow the treatment protocol – perfect compliance, no errors in reporting, etc.! Randomized: subjects from the population of interest are randomly assigned to a treatment or control group (so there are no confounding factors) Controlled: having a control group permits measuring the differential effect of the treatment Experiment: the treatment is assigned as part of the experiment: the subjects have no choice, which means that there is no “reverse causality” in which subjects choose the treatment they think will work best.
12
控制實驗 Example (Harris et al.): “ 隨機控制實驗:瞭解接受遠端祈禱 心臟病病人的治療效果 “
13
Harris et el. 實驗設計原則: “ 隨機, 控制, 雙盲, 事前,同時實驗. “ 隨機:病人隨機分配到禱告與否 控制:有些病人沒有禱告 雙盲:病人或醫師不知道為實驗或對照組 事前:在治療前隨機分配 同時:實驗同時進行
14
Harris et el. 的設計
16
Harris et el. 的結論 “ 結論:遠端禱告有效 ”
17
Omitted variable bias Suppose the true model is The estimated model is The covariance between Xi and error term is 17
18
Therefore, So, the only two cases for omitted variable bias to disappear is either to equals zero (the simple OLS case), or Cov (X, Z)=0. If randomized controlled experiment data is available, Cov (X, Z)=0 (X is randomized) and hence there is no omitted variable bias. 18
19
What is an ideal randomized controlled experiment for measuring the effect on Test Score of reducing STR? How does our regression analysis of observational data differ from this ideal? The treatment is not randomly assigned In the US – in our observational data – districts with higher family incomes are likely to have both smaller classes and higher test scores. As a result it is plausible that E(u i |X i =x) 0. If so, Least Squares Assumption #1 does not hold. If so, is biased: does an omitted factor make class size seem more important than it really is?
20
常用迴歸模型假設
21
Multiple regression has some key virtues: It provides an estimate of the effect on Y of arbitrary changes X. It resolves the problem of omitted variable bias, if an omitted variable can be measured and included. It can handle nonlinear relations (effects that vary with the X’s). Still, multiple regression might yield a biased estimator of the true causal effect— it might not yield “valid” inferences
22
How should one conduct multiple regression? A general approach to variable selection and “model specification” Specify a “base” or “benchmark” model. Specify a range of plausible alternative models, which include additional candidate variables. Does a candidate variable change the coefficient of interest ( )? Is a candidate variable statistically significant? Use judgment, not a mechanical recipe. And don’t just maximize R 2. 22
23
Variables we would like to see in the California data set School characteristics student-teacher ratio teacher quality computers (non-teaching resources) per student measures of curriculum design Student characteristics English proficiency availability of extracurricular enrichment home learning environment parent’s education level 23
24
Variables actually in the California class size data set student-teacher ratio (STR) percent English learners in the district (PctEL) percent eligible for subsidized/free lunch percent on public income assistance average district income 24
25
A look at more of the California data 25
26
Digression: presentation of regression results in a table Listing regressions in “equation” form can be cumbersome with many regressors and many regressions. Tables of regression results can present the key information compactly. 26
27
Information to include: variables in the regression (dependent and independent). estimated coefficients. standard errors. results of F-tests of joint hypotheses. some measure of fit (adjusted R 2 ). number of observations.
28
28
29
Summary: Multiple Regression Multiple regression allows you to estimate the effect on Y of a change in X 1, holding X 2 constant. If you can measure a variable, you can avoid omitted variable bias from that variable by including it. There is no simple recipe for deciding which variables belong in a regression - you must exercise judgment. One approach is to specify a base model - relying on a-priori reasoning - then explore the sensitivity of the key estimate(s) in alternative specifications. 29
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.