Before the class starts: Login to a computer Read the Data analysis assignment 1 on MyCourses If you use Stata: Start Stata Start a new do file Open the.

Slides:



Advertisements
Similar presentations
EPI 809/Spring Probability Distribution of Random Error.
Advertisements

Simple Linear Regression and Correlation
Heteroskedasticity The Problem:
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
Sociology 601, Class17: October 27, 2009 Linear relationships. A & F, chapter 9.1 Least squares estimation. A & F 9.2 The linear regression model (9.3)
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Valuation 4: Econometrics Why econometrics? What are the tasks? Specification and estimation Hypotheses testing Example study.
Chapter 13 Multiple Regression
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Multiple Regression Predicting a response with multiple explanatory variables.
SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
Chapter 12 Multiple Regression
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Chapter Topics Types of Regression Models
Simple Linear Regression Analysis
Ch. 14: The Multiple Regression Model building
Interpreting Bi-variate OLS Regression
1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Returning to Consumption
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Introduction to Linear Regression
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 3: Basic techniques for innovation data analysis. Part II: Introducing regression.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Linear Models Alan Lee Sample presentation for STATS 760.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
Before the class starts: Login to a computer Read the Data analysis assignment 4 on MyCourses If you use Stata: Start Stata Start a new do file Open the.
Before the class starts: Login to a computer Read the Data analysis assignment 1 on MyCourses If you use Stata: Start Stata Start a new do file Open the.
Lecture 6 Feb. 2, 2015 ANNOUNCEMENT: Lab session will go from 4:20-5:20 based on the poll. (The majority indicated that it would not be a problem to chance,
Chapter 14 Introduction to Multiple Regression
Chapter 20 Linear and Multiple Regression
Chapter 12 Simple Linear Regression and Correlation
assignment 7 solutions ► office networks ► super staffing
Statistics for Managers using Microsoft Excel 3rd Edition
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
QM222 Class 8 Section A1 Using categorical data in regression
Regression model with multiple predictors
Correlation and Simple Linear Regression
CHAPTER 29: Multiple Regression*
Chapter 12 Simple Linear Regression and Correlation
Correlation and Simple Linear Regression
Multiple Regression Chapter 14.
Simple Linear Regression
Simple Linear Regression and Correlation
The Simple Regression Model
Presentation transcript:

Before the class starts: Login to a computer Read the Data analysis assignment 1 on MyCourses If you use Stata: Start Stata Start a new do file Open the PDF documentation about regression If you use RStudio: Start RStudio Start a new R script Open R in Action, chapter 8

Linear model and its estimator(s)

Regression analysis

Idea of regression analysis Concepts: Dependent variable ( explained variable, response variable, predicted variable, regressand ) Independent variable ( explanatory variable, control variable, predictor variable, regressor ) Objective: Explain the variation in the dependent variables by using the variation in the independent variables For example Explain patient satisfaction with physician productivity, physician quality, and physician accesibility

Model y = β 0 + β 1 x 1 + β 2 x 2 + … + β k x k + u Example Patient satisfaction = β 0 + β 1 physician productivity + β 2 physician quality + β 3 physician accesibility + u x1x1 x2x2 xkxk … y u β1β1 β2β2 βkβk β0β0

Graphical illustration of linear regression One dependent and one or more independent variables Explains conditional mean of the dependent variable The dependent variable should be normally distributed around the mean The variance (width) of the dependent variable should not depend on the independent variables Wooldridge, J. M. (2009). Introductory econometrics: a modern approach (4th ed.). Mason, OH: South Western, Cengage Learning. (p. 26)

Interpretation of the model Patient satisfaction = β 0 + β 1 physician productivity + β 2 physician quality + β 3 physician accesibility + u Ceteris paribus (holding other variables constant), one unit increase in physician productivity is associated with β 1 increase in patient satisfaction

Goodness of fit: R 2 and adjusted R 2 R2R2 the proportion of variance explained “coefficient of determination” positively biased, can only go up Adjusted R 2 Penalizes for large number of variables and small sample size Not unbiased either

Example data

Estimation Model: prestige= β 0 + β 1 education + u The estimates β 0 and β 1 define the regression line The rule that is used to obtain estimates given the data is called estimator

Estimation Model: prestige= β 0 + β 1 education + u Properties of a good estimator of β 0 and β 1 Estimates using population data equal population values (consistency) Estimates are correct on average (unbiasedness) Variance of the estimates is smaller than variance of estimates from alternative estimators (efficiency) Estimates are normally distributed (or at least have a known distribution, normality)

Estimation Model: prestige= β 0 + β 1 education + u One good rule: “Choose β 0 and β 1 so that the sum of squared residuals is as small as possible” This is know as the ordinary least squares (OLS) estimator. Linear model with OLS estimator is known as OLS regression Residuals is the difference between fitted value and observed value: the part of data not explained by the model

Summary of the assumptions 1.All relationships are linear 2.Independence of observations (No perfect collinearity and non-zero variances of independent variables) 4.Error term has expected value of zero given any values of independent variables 5.Error term has equal variance given any values of independent variables 6.Error term is normally distributed Important to check after estimation (post-estimation diagnostics)

Regression with excel

Data analysis assignment 1

Task Do a regression analysis with a statistical software of your choice using the Prestige dataset used in the class. Try to explain income with the other variables. You should first explain income itself and then, if you see it necessary, to explain the logarithm of income. The part about logarithm transformation in Wooldridge's book is really worth reading. Document your thought process: how did you explore the data, how you checked the assumptions, and how the model evolved.

How to get your analysis file started Stata Load the data following the instructions Explore the data using e.g. describe, summarize, inspect, codebook, graph matrix, and stem RStudio Load the data following the instructions Load the psych, car, and texreg packages by adding library command to start of the R file. (If a package is not found, you need to install it) Explore the data using e.g. describe, lowerCor, corr.test, and scatterplotMatrix

How to submit your answer Stata Set your working directory Start your do file with log using assingment1, replace text End your do file with log close After each graph add graph export plotX.pdf Open the Word document template from MyCourses Copy-paste the content of assignment1.log to the document template and insert the exported figures into right places. In word, write comments in normal style and use headings where appropriate RStudio Compile a notebook in MS Word format In word, write comments in normal style and use headings where appropriate

Regress income on prestige, education, and share of women Stata regress income prestige educat percwomn estimates store m1 Rstudio m1 <- lm(income ~ prestige + educat + percwomn, data = Prestige) summary(m1)

Source | SS df MS Number of obs = F( 3, 98) = Model | e Prob > F = Residual | R-squared = Adj R-squared = Total | e Root MSE = income | Coef. Std. Err. t P>|t| [95% Conf. Interval] prestige | educat | percwomn | _cons | Call: lm(formula = income ~ education + prestige + women, data = Prestige) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) education prestige e-06 *** women e-08 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2575 on 98 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 98 DF, p-value: < 2.2e-16 Stata RStudio

Extract fitted values and residuals Stata Use the predict command Plot the distributions using the kdensity command RStudio Use the residuals and fitted command Plot the distributions using the plot(density()) command combination

Diagnose the model using the following list of plots PlotStata commandR command Getting helphelp regress postestimation diagnostics plots Chapter 8 of R in Action Q-Q plot of studentized residuals qqPlotqnorm Residual-versus-fitted- plot rvfplotresidualPlot Component-plus- residual plot cprplotcrPlots Added-variables plotsavplotsavPlots Residual-versus- leverage plots Lvr2plotinfluencePlot

Modify the model and or data Stata Delete outliers with drop Apply log transformation of variables Repeat the regression model Apply diagnostic plots RStudio Delete outliers with subset Apply log transformation of variables Repeat the regression model Apply diagnostic plots

Extract fitted values and residuals Stata Use the predict command Plot the distributions using the kdensity command RStudio Use the residuals and fitted command Plot the distributions using the plot(density()) command combination

Optional: add categorical variable type Stata Add i.type to regression model RStudio Add type to regression model

Report several models as one table Stata Use estimates table m1 m2 m3… RStudio Use screenreg(list(m1, m2, m3, …))

Simulation demonstration: heteroskedasticity