Download presentation
Presentation is loading. Please wait.
Published byStuart Carpenter Modified over 8 years ago
1
Before the class starts: Login to a computer Read the Data analysis assignment 1 on MyCourses If you use Stata: Start Stata Start a new do file Open the PDF documentation about regression If you use RStudio: Start RStudio Start a new R script Open R in Action, chapter 8
2
Linear model and its estimator(s)
3
Regression analysis
4
Idea of regression analysis Concepts: Dependent variable ( explained variable, response variable, predicted variable, regressand ) Independent variable ( explanatory variable, control variable, predictor variable, regressor ) Objective: Explain the variation in the dependent variables by using the variation in the independent variables For example Explain patient satisfaction with physician productivity, physician quality, and physician accesibility
5
Model y = β 0 + β 1 x 1 + β 2 x 2 + … + β k x k + u Example Patient satisfaction = β 0 + β 1 physician productivity + β 2 physician quality + β 3 physician accesibility + u x1x1 x2x2 xkxk … y u β1β1 β2β2 βkβk β0β0
6
Graphical illustration of linear regression One dependent and one or more independent variables Explains conditional mean of the dependent variable The dependent variable should be normally distributed around the mean The variance (width) of the dependent variable should not depend on the independent variables Wooldridge, J. M. (2009). Introductory econometrics: a modern approach (4th ed.). Mason, OH: South Western, Cengage Learning. (p. 26)
7
Interpretation of the model Patient satisfaction = β 0 + β 1 physician productivity + β 2 physician quality + β 3 physician accesibility + u Ceteris paribus (holding other variables constant), one unit increase in physician productivity is associated with β 1 increase in patient satisfaction
8
Goodness of fit: R 2 and adjusted R 2 R2R2 the proportion of variance explained “coefficient of determination” positively biased, can only go up Adjusted R 2 Penalizes for large number of variables and small sample size Not unbiased either
9
Example data
10
Estimation Model: prestige= β 0 + β 1 education + u The estimates β 0 and β 1 define the regression line The rule that is used to obtain estimates given the data is called estimator
11
Estimation Model: prestige= β 0 + β 1 education + u Properties of a good estimator of β 0 and β 1 Estimates using population data equal population values (consistency) Estimates are correct on average (unbiasedness) Variance of the estimates is smaller than variance of estimates from alternative estimators (efficiency) Estimates are normally distributed (or at least have a known distribution, normality)
12
Estimation Model: prestige= β 0 + β 1 education + u One good rule: “Choose β 0 and β 1 so that the sum of squared residuals is as small as possible” This is know as the ordinary least squares (OLS) estimator. Linear model with OLS estimator is known as OLS regression Residuals is the difference between fitted value and observed value: the part of data not explained by the model
13
Summary of the assumptions 1.All relationships are linear 2.Independence of observations (No perfect collinearity and non-zero variances of independent variables) 4.Error term has expected value of zero given any values of independent variables 5.Error term has equal variance given any values of independent variables 6.Error term is normally distributed Important to check after estimation (post-estimation diagnostics)
14
Regression with excel
15
Data analysis assignment 1
16
Task Do a regression analysis with a statistical software of your choice using the Prestige dataset used in the class. Try to explain income with the other variables. You should first explain income itself and then, if you see it necessary, to explain the logarithm of income. The part about logarithm transformation in Wooldridge's book is really worth reading. Document your thought process: how did you explore the data, how you checked the assumptions, and how the model evolved.
17
How to get your analysis file started Stata Load the data following the instructions Explore the data using e.g. describe, summarize, inspect, codebook, graph matrix, and stem RStudio Load the data following the instructions Load the psych, car, and texreg packages by adding library command to start of the R file. (If a package is not found, you need to install it) Explore the data using e.g. describe, lowerCor, corr.test, and scatterplotMatrix
18
How to submit your answer Stata Set your working directory Start your do file with log using assingment1, replace text End your do file with log close After each graph add graph export plotX.pdf Open the Word document template from MyCourses Copy-paste the content of assignment1.log to the document template and insert the exported figures into right places. In word, write comments in normal style and use headings where appropriate RStudio Compile a notebook in MS Word format In word, write comments in normal style and use headings where appropriate
19
Regress income on prestige, education, and share of women Stata regress income prestige educat percwomn estimates store m1 Rstudio m1 <- lm(income ~ prestige + educat + percwomn, data = Prestige) summary(m1)
20
Source | SS df MS Number of obs = 102 -------------+------------------------------ F( 3, 98) = 58.89 Model | 1.1712e+09 3 390386379 Prob > F = 0.0000 Residual | 649654273 98 6629125.24 R-squared = 0.6432 -------------+------------------------------ Adj R-squared = 0.6323 Total | 1.8208e+09 101 18027855.6 Root MSE = 2574.7 ------------------------------------------------------------------------------ income | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- prestige | 141.4353 29.90961 4.73 0.000 82.0807 200.79 educat | 177.1991 187.6323 0.94 0.347 -195.1511 549.5492 percwomn | -50.8957 8.556185 -5.95 0.000 -67.87517 -33.91623 _cons | -253.8499 1086.157 -0.23 0.816 -2409.293 1901.593 ------------------------------------------------------------------------------ Call: lm(formula = income ~ education + prestige + women, data = Prestige) Residuals: Min 1Q Median 3Q Max -7715.3 -929.7 -231.2 689.7 14391.8 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -253.850 1086.157 -0.234 0.816 education 177.199 187.632 0.944 0.347 prestige 141.435 29.910 4.729 7.58e-06 *** women -50.896 8.556 -5.948 4.19e-08 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2575 on 98 degrees of freedom Multiple R-squared: 0.6432, Adjusted R-squared: 0.6323 F-statistic: 58.89 on 3 and 98 DF, p-value: < 2.2e-16 Stata RStudio
21
Extract fitted values and residuals Stata Use the predict command Plot the distributions using the kdensity command RStudio Use the residuals and fitted command Plot the distributions using the plot(density()) command combination
22
Diagnose the model using the following list of plots PlotStata commandR command Getting helphelp regress postestimation diagnostics plots Chapter 8 of R in Action Q-Q plot of studentized residuals qqPlotqnorm Residual-versus-fitted- plot rvfplotresidualPlot Component-plus- residual plot cprplotcrPlots Added-variables plotsavplotsavPlots Residual-versus- leverage plots Lvr2plotinfluencePlot
23
Modify the model and or data Stata Delete outliers with drop Apply log transformation of variables Repeat the regression model Apply diagnostic plots RStudio Delete outliers with subset Apply log transformation of variables Repeat the regression model Apply diagnostic plots
24
Extract fitted values and residuals Stata Use the predict command Plot the distributions using the kdensity command RStudio Use the residuals and fitted command Plot the distributions using the plot(density()) command combination
25
Optional: add categorical variable type Stata Add i.type to regression model RStudio Add type to regression model
26
Report several models as one table Stata Use estimates table m1 m2 m3… RStudio Use screenreg(list(m1, m2, m3, …))
27
Simulation demonstration: heteroskedasticity
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.