Covariance x – x > 0 x (x,y) y – y > 0 y x and y axes.

Slides:



Advertisements
Similar presentations
Dummy Variables and Interactions. Dummy Variables What is the the relationship between the % of non-Swiss residents (IV) and discretionary social spending.
Advertisements

Sociology 601 Class 24: November 19, 2009 (partial) Review –regression results for spurious & intervening effects –care with sample sizes for comparing.
Linear Regression: Making Sense of Regression Results
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: exercise 3.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Valuation 4: Econometrics Why econometrics? What are the tasks? Specification and estimation Hypotheses testing Example study.
From last time….. Basic Biostats Topics Summary Statistics –mean, median, mode –standard deviation, standard error Confidence Intervals Hypothesis Tests.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.
Sociology 601 Class 25: November 24, 2009 Homework 9 Review –dummy variable example from ASR (finish) –regression results for dummy variables Quadratic.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Interpreting Bi-variate OLS Regression
Sociology 601 Class 26: December 1, 2009 (partial) Review –curvilinear regression results –cubic polynomial Interaction effects –example: earnings on married.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Addressing Alternative Explanations: Multiple Regression
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
CHAPTER 15 Simple Linear Regression and Correlation
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Lecture 5. Linear Models for Correlated Data: Inference.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
1 Regression-based Approach for Calculating CBL Dr. Sunil Maheshwari Dominion Virginia Power.
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
Announcements There’s an in class exam one week from today (4/30). It will not include ANOVA or regression. On Thursday, I will list covered material and.
Chapter 15 Multiple Regression Model Building
Chapter 14 Introduction to Multiple Regression
Chapter 20 Linear and Multiple Regression
QM222 Class 9 Section A1 Coefficient statistics
QM222 Class 10 Section D1 1. Goodness of fit -- review 2
From t-test to multilevel analyses Del-2
Let’s Get It Straight! Re-expressing Data Curvilinear Regression
QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.
QM222 Class 11 Section A1 Multiple Regression
Multiple Regression Analysis and Model Building
QM222 Class 18 Omitted Variable Bias
QM222 Class 8 Section A1 Using categorical data in regression
The slope, explained variance, residuals
Relationship with one independent variable
Regression Statistics
Simple Linear Regression
Hypothesis Testing Make a tentative assumption about a parameter
QM222 Your regressions and the test
QM222 Class 15 Section D1 Review for test Multicollinearity
Simple Linear Regression
Relationship with one independent variable
EPP 245 Statistical Analysis of Laboratory Data
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Presentation transcript:

Covariance x – x > 0 x (x,y) y – y > 0 y x and y axes

Covariance x – x < 0 x (x,y) y – y > 0 y x and y axes

Covariance So what happens on balance? x Below average values of x are with above average values of y Above average values of x are also above average values of y So what happens on balance? y Below average values of x are also below average values of y Above average values of x are with below average values of y

Covariance x What happens on balance? Calculate the average of the squared deviations. y

Covariance x What happens on balance? Calculate the average of the squared deviations. y

Covariance Example x Sxy= 1.999 Wage y Aptitude

Correlation x rxy= 0.476 Wage y Aptitude

Perfect Correlation

Fit That Line ! y=2,500+1,800x y=10,000+1,000x y=13,000+750x

Fit That Line ! y=8,135 + 1,233x minimizes the squared errors

Word Problem Students in a small class were polled by a researcher attempting to establish a relationship between hours of study in a week preceding a test and the result of the test. If you get data on hours studied and exam results, which variable is the dependent variable? why?

Word Problem y=39.406 + 2.122x

Regression Statistics Word Problem Excel Regression Output (Data Analysis Add-In) Regression Statistics Multiple R 0.770 R Squared 0.594 Adj. R Squared 0.543 Standard Error 10.710 Obs. 10 ANOVA df SS MS F Significance Regression 1 1340.452 1341.452 11.686 0.009 Residual 8 917.648 114.706 Total 9 2258.100 Coeff. Std. Error t stat p value Lower 95% Upper 95% Intercept 39.401 12.153 3.242 0.012 11.375 67.426 hours 2.122 0.621 3.418 0.691 3.554

Word Problem Excel Regression Output (StatPad Add-In) Regression analysis to predict score from hours. The prediction equation is: Score = 39.401 2.122 hours 0.594 R squared 10.710 Standard error of estimate 10 Number of observations 11.686 F statistic 0.009 P value 95% Coeff LowerCI UpperCI StdErr t p Significant Constant 11.375 67.426 12.153 3.242 0.012 Yes (p<0.05) hours 2.122 0.691 3.554 0.621 3.418 Excel Regression Output (StatPad Add-In)

The Nine Lives of Goldfish Regression Statistics Multiple R 0.671 R Squared 0.450 Adj. R Squared 0.340 Standard Error 45.214 Obs. 7 ANOVA df SS MS F Significance Regression 1 8360.48 8360.048 4.089 0.099 Residual 5 10221.667 2044.333 Total 6 18581.714 Coeff. Std. Error t stat p value Lower 95% Upper 95% Intercept 91.500 22.607 4.047 0.010 33.387 149.613 filter -69.833 34.533 -2.022 -158.603 18.936

Predicting Job Performance Regression Statistics R Squared 0.107 Adj. R Squared Standard Error 1.955 Obs. 3525 ANOVA df SS MS F Significance Regression 3 1620.806 540.269 141.287 0.000 Residual 3521 13463.982 3.824 Total 3524 15084.788 Coeff. Std. Error t stat p value Lower 95% Upper 95% Intercept 4.865 0.171 28.423 4.529 5.200 Age -0.037 0.002 -20.263 -0.041 -0.034 Seniority 0.011 0.003 3.325 0.001 0.004 0.017 Cognitive -0.032 0.033 -0.983 0.326 -0.097 0.032 Simple Regression: Perform = 3.956 – 0.022 age

Predicting Job Performance Perform = 4.865 – 0.037 age + 0.011 seniority - 0.032 cognitive Age 35 36 Seniority 10 Cognitive 1 Predicted Performance 3.626 3.589 Net Difference -0.037 45 46 10 1 3.251 3.214 -0.037 Age 35 Seniority 20 21 Cognitive 1 Predicted Performance 3.731 3.742 Net Difference 0.011 Note importance of ceteris paribus (all else constant)

Predicting Job Performance Perform = 4.865 – 0.037 age + 0.011 seniority - 0.032 cognitive And holding seniority constant at 10 and cognitive constant at 1

Predicting Job Performance Perform = 4.865 – 0.037 age + 0.011 seniority - 0.032 cognitive And holding seniority constant at 20 and cognitive constant at -1 With linear models, other values don’t matter; just all else constant

Predicting Job Perf. With a Dummy Variable Regression Statistics R Squared 0.110 Adj. R Squared 0.109 Standard Error 1.953 Obs. 3525 ANOVA df SS MS F Significance Regression 34 1657.286 414.321 108.614 0.000 Residual 3520 13427.502 3.815 Total 3524 15084.788 Coeff. Std. Error t stat p value Lower 95% Upper 95% Intercept 4.820 0.172 28.096 4.484 5.156 Age -0.037 0.002 -20.231 -0.041 -0.034 Seniority 0.010 0.003 3.271 0.001 0.004 0.017 Cognitive -0.025 0.033 -0.756 0.450 -0.090 0.040 Structured int. 2.850 0.922 3.092 1.043 4.658 Structured Interview Dummy Variable: 1=yes, 0=no

Predicting Job Perf. With a Dummy Variable Perform = 4.820 – 0.037 age + 0.010 seniority - 0.025 cognitive + 2.850 structured interview Age 35 Seniority 10 Cognitive 1 Structured Interview Predicted Performance 3.600 6.450 Net Difference 2.850 45 5 2 1 3.155 6.005 2.850 Dummy variable turns “on” and “off” with all else constant.

Predicting Job Perf. With a Dummy Variable Perform = 4.865 – 0.037 age + 0.010 seniority - 0.025 cognitive + 2.850 structured interview And holding seniority constant at 10 and cognitive constant at 1

Predicting Job Perf. With a Dummy Variable Note new y-intercept Seniority=20, Cognitive=0

Multiple Dummy Variables Source | SS df MS Number of obs = 3525 ---------+------------------------------ F( 14, 3510) = 125.63 Model | 5035.58483 14 359.684631 Prob > F = 0.0000 Residual | 10049.2032 3510 2.86302087 R-squared = 0.3338 ---------+------------------------------ Adj R-squared = 0.3312 Total | 15084.7881 3524 4.28058685 Root MSE = 1.692 ------------------------------------------------------------------------------ perform | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- age | -.0301543 .0016933 -17.808 0.000 -.0334742 -.0268344 seniorty | .0016888 .002762 0.611 0.541 -.0037265 .007104 cognitve | .0119113 .0286362 0.416 0.677 -.0442339 .0680565 strucint | 3.665569 .7995184 4.585 0.000 2.098001 5.233137 job1 | 1.928286 .1277788 15.091 0.000 1.677758 2.178814 job2 | .426524 .1260009 3.385 0.001 .1794815 .6735664 job3 | .1407506 .1306411 1.077 0.281 -.1153896 .3968908 job4 | .2921016 .1347211 2.168 0.030 .0279621 .5562411 job5 | -1.069262 .1331017 -8.033 0.000 -1.330227 -.8082974 job6 | -1.179162 .1377497 -8.560 0.000 -1.449239 -.9090839 job7 | -1.304191 .1406734 -9.271 0.000 -1.580001 -1.028381 job8 | -.8530246 .1381293 -6.176 0.000 -1.123846 -.5822027 job9 | -.6652395 .1501504 -4.430 0.000 -.9596304 -.3708487 job10 | -1.012177 .1420816 -7.124 0.000 -1.290748 -.7336058 _cons | 5.021799 .1643372 30.558 0.000 4.699593 5.344005 Note: job1-job10 are dummy variables representing 10 different job classes (job11 is the omitted reference category)

Interaction Variables Source | SS df MS Number of obs = 3525 ---------+------------------------------ F( 6, 3518) = 121.08 Model | 2581.89927 6 430.316544 Prob > F = 0.0000 Residual | 12502.8888 3518 3.55397635 R-squared = 0.1712 ---------+------------------------------ Adj R-squared = 0.1697 Total | 15084.7881 3524 4.28058685 Root MSE = 1.8852 ------------------------------------------------------------------------------ perform | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- age | -.006 .0034204 -1.705 0.088 -.0125379 .0008743 seniorty | .011 .0030589 3.559 0.000 .0048879 .0168827 cognitve | -.005 .0318774 -0.167 0.867 -.0678283 .0571719 strucint | 2.129 .8937022 2.383 0.017 .3770909 3.881545 manual | -1.513 .2391962 -6.327 0.000 -1.982442 -1.044488 manl_age | -.042 .004011 -10.439 0.000 -.0497349 -.0340066 _cons | 6.009 .2354444 25.526 0.000 5.548275 6.471517 Note: manual is a dummy variable indicating a manual occupation; manl_age is age interacted with manual (i.e. manl_age = manual*age)

Interaction Variables Note different slopes, too. Seniority=20, Cognitive=0, StrucInt=0

Another Interaction Variable Example Source | SS df MS Number of obs = 15321 -------------+------------------------------ F( 5, 15315) = 800.50 Model | 804247599 5 160849520 Prob > F = 0.0000 Residual | 3.0773e+09 15315 200936.252 R-squared = 0.2072 -------------+------------------------------ Adj R-squared = 0.2069 Total | 3.8816e+09 15320 253367.252 Root MSE = 448.26 ------------------------------------------------------------------------------ earnwkly | Coef. -------------+---------------------------------------------------------------- married | 136.003 female | -169.837 exper | 2.946 parttime | -227.716 exp_pt | -1.896 _cons | 700.802 exper is potential labor market experience (age-educ-6) parttime is a dummy variable indicating a part-time worker exp_pt is exper interacted with perttime (i.e. exp_pt = exper*parttime)

Interaction Variables Married=1, Female=1

Adjusted R2 Source | SS df MS Number of obs = 3525 Model | 5035.58483 14 359.684631 Prob > F = 0.0000 Residual | 10049.2032 3510 2.86302087 R-squared = 0.3338 ---------+------------------------------ Adj R-squared = 0.3312 Total | 15084.7881 3524 4.28058685 Root MSE = 1.692 ------------------------------------------------------------------------------ perform | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- age | -.0301543 .0016933 -17.808 0.000 -.0334742 -.0268344 seniorty | .0016888 .002762 0.611 0.541 -.0037265 .007104 cognitve | .0119113 .0286362 0.416 0.677 -.0442339 .0680565 strucint | 3.665569 .7995184 4.585 0.000 2.098001 5.233137 job1 | 1.928286 .1277788 15.091 0.000 1.677758 2.178814 job2 | .426524 .1260009 3.385 0.001 .1794815 .6735664 job3 | .1407506 .1306411 1.077 0.281 -.1153896 .3968908 job4 | .2921016 .1347211 2.168 0.030 .0279621 .5562411 job5 | -1.069262 .1331017 -8.033 0.000 -1.330227 -.8082974 job6 | -1.179162 .1377497 -8.560 0.000 -1.449239 -.9090839 job7 | -1.304191 .1406734 -9.271 0.000 -1.580001 -1.028381 job8 | -.8530246 .1381293 -6.176 0.000 -1.123846 -.5822027 job9 | -.6652395 .1501504 -4.430 0.000 -.9596304 -.3708487 job10 | -1.012177 .1420816 -7.124 0.000 -1.290748 -.7336058 _cons | 5.021799 .1643372 30.558 0.000 4.699593 5.344005 Note: job1-job10 are dummy variables representing 10 different job classes (job11 is the omitted reference category)

Causality ? Workforce Optimization Sue Bostrom: Leadership on IT—What’s It Worth? September 10, 2001 “For those who still doubt that Internet-related investments will pay off, consider this: A PricewaterhouseCoopers study released earlier this year found that productivity gains in 2000 were 2.7 times greater for Internet-enabled companies than for businesses that have not leveraged the Web.” http://business.cisco.com/prod/tree.taf%3Fpublic_view=true&kbns=1&asset_id=66966.html

Causality Reasons for an estimated statistical relationship The explanatory variable is the direct cause of the response (dependent) variable The response variable is causing a change in the explanatory variable (reverse causality) The explanatory variable is a contributing, but not sole, cause of the response variable Confounding variables may exist Both variables may stem from a common cause Both variables are changing over time Coincidence Source: Jessica M. Utts (1999) Seeing Through Statistics, 2nd ed., Pacific Grove, CA: Duxbury, p. 186.