Multiple Regression. Multiple regression Typically, we want to use more than a single predictor (independent variable) to make predictions Regression.

Slides:



Advertisements
Similar presentations
2004 Longitudinal Study Technical Report California Regional Occupational Centers & Programs (ROCP) Douglas E. Mitchell Website.
Advertisements

1 Gender Pay Gap Quiz Welcome to the gender pay gap quiz! This short quiz uses multiple choice questions to explain the causes and impact of the gender.
Committed to Connecting the World International Telecommunication Union May 2010 Doris Olaya Market Information and Statistics (STAT) Division Telecommunication.
The Regression Equation  A predicted value on the DV in the bi-variate case is found with the following formula: Ŷ = a + B (X1)
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Info sessions on pay equity February Pay Equity Act The purpose of the legislation is to redress differences in compensation due to the systemic.
Chapter 10 The Gender Gap in Earnings: Methods and Evidence regression analysis evidence regression analysis evidence.
The Gender Gap in Earning: Methods and Evidence Chapter 10.
SAS Programming: Working With Variables. Data Step Manipulations New variables should be created during a Data step Existing variables should be manipulated.
CH. 12: GENDER, RACE, AND ETHNICITY IN THE LABOR MARKET Chapter objectives:  Document levels and trends in earnings differentials by gender and race.
Presentation to: Abu Dhabi – NYU Workshop By: Nadereh Chamlou, Senior Advisor, MNA, The World Bank Silvia Muzi, The World Bank Hanane Ahmed, The World.
CEET1 Determinants of job separation and occupational mobility Chandra Shah MONASH UNIVERSITY - ACER CENTRE FOR THE ECONOMICS OF EDUCATION AND TRAINING.
Discriminant Analysis To describe multiple regression analysis and multiple discriminant analysis. Discriminant Analysis.
Lecture 1 Outline: Tue, Jan 13 Introduction/Syllabus Course outline Some useful guidelines Case studies and
Faculty Salary Equity Study University of North Carolina at Chapel Hill Faculty Council Report November 1, 2002.
HUMAN RESOURCES How to Avoid the Traps. TITLE VII CIVIL RIGHTS ACT n Signed by Lyndon Johnson in 1964 n Remains most important piece of EEO legislation.
Lecture 27 Polynomial Terms for Curvature Categorical Variables.
Stat 112: Lecture 20 Notes Chapter 7.2: Interaction Variables. Chapter 8: Model Building. I will Homework 6 by Friday. It will be due on Friday,
Elaboration Elaboration extends our knowledge about an association to see if it continues or changes under different situations, that is, when you introduce.
Class 20: Thurs., Nov. 18 Specially Constructed Explanatory Variables –Dummy variables for categorical variables –Interactions involving dummy variables.
Pay Discrimination © Nancy Brown Johnson, Fairness and Monkeys Monkeys and Fairness.
17-1 McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Business Statistics: Communicating with Numbers By Sanjiv.
REASONS FOR GENDER INEQUALITIES IN WEALTH contd. Some people argue that gender inequalities in wealth are no longer a significant issue because... Women.
Two-Way Analysis of Variance STAT E-150 Statistical Methods.
Employee Law Challenge. Requires employers to pay men & women similar wage rates for similar work? Name the Act… 2 point question 1. Civil Rights Act.
TRUE or FALSE 1. The labor force participation rate of women has risen from 37.6% in 1960 to 60.6% in The hourly earnings of full-time working.
Working in the Voluntary Sector Thoria Mohamed May 2012.
© 2010 McGraw Hill Ryerson 12-1 COMPENSATION Third Canadian Edition Milkovich, Newman, Cole.
Human Resources for Local Government: Compelling Challenges Mohamad Alkadry, Ph.D. Associate Professor and Director Master of Public Administration Program.
Economics of Gender Chapter 9 Assist.Prof.Dr.Meltem INCE YENILMEZ.
Wage differentials in Greece Inter-industry wage differentials Occupational wage differentials Gender pay gap Minimum vs average wage Public sector / private.
Assignments Due Friday 14, 2014 Assignments Chapter 2 Review Quiz Chapter 2 Review Homework* Due Wednesday 19, 2014 Unit 1 Test (Over Ch.
Avoiding Discrimination
Modeling Possibilities
1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©
~ Pattern or Practice Discrimination ~ Engaging in widespread, regular intentional discrimination (e.g. standard operation procedure)
Think of a job that you plan on having in the future. Describe the job and education that is needed and what type of salary do you hope to have once you.
Recent Trends in Worker Quality: A Midwest Perspective Daniel Aaronson and Daniel Sullivan Federal Reserve Bank of Chicago November 2002.
Employment, unemployment and economic activity Coventry working age population by gender Source: Annual Population Survey, Office for National Statistics.
Faculty Senate Gender Equity Task Force Report K. Klimek (Chair) History, R. Hernandez-Julian, L. Hathorn, A. Sgoutas, K. Elkins, B. Mathews.
Welcome to Econ 325 Economics of Gender Week 10 Beginning April 2.
Stat 112 Notes 20 Today: –Interaction Variables (Chapter ) –Interpreting slope when Y is logged but not X –Model Building (Chapter 8)
Gender Statistics in the Labour Market Angela Me UNECE Statistics Division.
The Equal Pay Act (1963). The Equal Pay Act of 1963 A little history: In 1942, as American women man the home front during World War II, the National.
Equal Pay – What does it mean? the employment relations experts.
The Redesign... Simplifying the basic bar chart EXAMPLE 1 Simplifying the basic bar chart EXAMPLE 1.
Marion Hughes Sociology 391 Spring Q. 110: How many days out of the past 30 have you used marijuana?  0  1-5  6-10    21+ Recoded.
ANOVA GenderManagement Level Average Salary MaleSupervisory 47,500 MaleMid-level 68,000 MaleExecutive122,550 FemaleSupervisory 45,300 FemaleMid-level 61,000.
Labor Force: Includes all people who are at least 16 years old and are working or actively looking for work. In the U.S. two thirds of all people 16 years.
Selecting Employees DeNotra Geddis April 11, 2005.
Developing a Hiring System Measuring Applicant Qualifications or Statistics Can Be Your Friend!
Section 9.4 Multiple Regression. Section 9.4 Objectives Use a multiple regression equation to predict y-values.
Sources of discrimination Equilibrium in a perfectly competitive labor market has no discrimination Wages equal marginal revenue products Everyone who.
Lecture 1 Outline: Thu, Sep 4 Introduction/Syllabus Course outline Some useful guidelines Case studies and
1 Chapter 20 Model Building Introduction Regression analysis is one of the most commonly used techniques in statistics. It is considered powerful.
Info sessions on 2010 pay equity maintenance April 2016.
European Socio-Economic Classification: A Validation Exercise Figen Deviren Office for National Statistics.
STUC – SG Biannual – June 2013 Employment in Scotland is increasing and unemployment is decreasing. Scotland is outperforming the UK on all headline labour.
Labor Outcomes of Immigrants to the U.S.: Occupational Mobility and Returns to Education Gabriela Sánchez-Soto.
R ETURN TO COMMUTING IN S WEDEN Sergii Troshchenkov PhD student L.A.S.E.R.
Labor: Labor Market Trends/Labor and Wages Ch. 9
Effect Size.
Discrimination Definition of discrimination: members of a minority group (women, blacks, Muslims, immigrants, etc.) are treated differentially (less favorably)
Lecture 12 More Examples for SLR More Examples for MLR 9/19/2018
Population and Labor Force
Chapter Ten Evidence on the Sources of Gender Differences in Earnings and Occupations: Supply-Side Factors versus Labor Market Discrimination Francine.
Section 5 Multiple Regression.
The Benefits of Education
World wide WageIndicator conference
Male and female delinquent trajectories
Presentation transcript:

Multiple Regression

Multiple regression Typically, we want to use more than a single predictor (independent variable) to make predictions Regression with more than one predictor is called “multiple regression”

Motivating example: Sex discrimination in wages In 1970’s, Harris Trust and Savings Bank was sued for discrimination on the basis of sex. Analysis of salaries of employees of one type (skilled, entry- level clerical) presented as evidence by the defense. Did female employees tend to receive lower starting salaries than similarly qualified and experienced male employees?

Variables collected 93 employees on data file (61 female, 32 male). bsal: Annual salary at time of hire. sal77 : Annual salary in educ: years of education. exper: months previous work prior to hire at bank. fsex: 1 if female, 0 if male senior: months worked at bank since hired age: months So we have six x’s and and one y (bsal). However, in what follows we won’t use sal77.

Comparison for male and females This shows men started at higher salaries than women (t=6.3, p<.0001). But, it doesn’t control for other characteristics.

Relationships of bsal with other variables Senior and education predict bsal well. We want to control for them when judging gender effect.

Multiple regression model For any combination of values of the predictor variables, the average value of the response (bsal) lies on a straight line: Just like in simple regression, assume that ε follows a normal curve within any combination of predictors.

Output from regression (fsex = 1 for females, = 0 for males) Term Estimate Std Error t Ratio Prob>|t| Int <.0001 Fsex <.0001 Senior <.0001 Age Educ Exper

Predictions Example: Prediction of beginning wages for a woman with 10 months seniority, that is 25 years old, with 12 years of education, and two years of experience: Pred. bsal = * * * * *24 =

Interpretation of coefficients in multiple regression Each estimated coefficient is amount Y is expected to increase when the value of its corresponding predictor is increased by one, holding constant the values of the other predictors. Example: estimated coefficient of education equals For each additional year of education of employee, we expect salary to increase by about 92 dollars, holding all other variables constant. Estimated coefficient of fsex equals For employees who started at the same time, had the same education and experience, and were the same age, women earned $767 less on average than men.

Which variable is the strongest predictor of the outcome? The coefficient that has the strongest linear association with the outcome variable is the one with the largest absolute value of T, which equals the coefficient over its SE. It is not size of coefficient. This is sensitive to scales of predictors. The T statistic is not, since it is a standardized measure. Example: In wages regression, seniority is a better predictor than education because it has a larger T.

Hypothesis tests for coefficients The reported t-stats (coef. / SE) and p-values are used to test whether a particular coefficient equals 0, given that all other coefficients are in the model. Examples: 1) Test whether coefficient of education equals zero has p-value = Hence, reject the null hypothesis; it appears that education is a useful predictor of bsal when all the other predictors are in the model. 2) Test whether coefficient of experience equals zero has p-value = Hence, we cannot reject the null hypothesis; it appears that experience is not a particularly useful predictor of bsal when all other predictors are in the model.

Hypothesis tests for coefficients The test statistics have the usual form (observed – expected)/SE. For p-value, use area under a t-curve with (n-k) degrees of freedom, where k is the number of terms in the model. In this problem, the degrees of freedom equal (93-6=87).

CIs for regression coefficients A 95% CI for the coefficients is obtained in the usual way: coef. ± (multiplier) SE The multiplier is obtained from the t-curve with (n-k) degrees of freedom. (If degrees of freedom is greater than 26 use normal table) Example: A 95% CI for the population regression coefficient of age equals: (0.63 – 1.96*0.72, *0.72)

Warning about tests and CIs Hypothesis tests and CIs are meaningful only when the data fits the model well. Remember, when the sample size is large enough, you will probably reject any null hypothesis of β =0. When the sample size is small, you may not have enough evidence to reject a null hypothesis of β =0. When you fail to reject a null hypothesis, don’t be too hasty to say that a predictor has no linear association with the outcome. It is likely that there is some association, it just isn’t a very strong one.

Checking assumptions Plot the residuals versus the predicted values from the regression line. Also plot the residuals versus each of the predictors. If non-random patterns in these plots, the assumptions might be violated.

Plot of residuals versus predicted values This plot has a fan shape. It suggests non-constant variance (heteroscedastic). We need to transform variables.

Plots of residuals vs. predictors

Summary of residual plots There appears to be a non-random pattern in the plot of residuals versus experience, and also versus age. This model can be improved.

Modeling categorical predictors When predictors are categorical and assigned numbers, regressions using those numbers make no sense. Instead, we make “dummy variables” to stand in for the categorical variables.

Collinearity When predictors are highly correlated, standard errors are inflated Conceptual example: Suppose two variables Z and X are exactly the same. Suppose the population regression line of Y on X is Y = X Fit a regression using sample data of Y on both X and Z. We could plug in any value for the coefficients of X and Z, so long as they add up to 5. Equivalently this means that the standard errors for the coefficients are huge

General warnings for multiple regression Be even more wary of extrapolation. Because there are several predictors, you can extrapolate in many ways Multiple regression shows association. It does not prove causality. Only a carefully designed observational study or randomized experiment can show causality