Occupational Factors Affecting the Income of Canada ’ s Residents in the 1970 ’ s Group 5 Ben Wright Bin Ren Hong Wang Jake Stamper James Rogers Yuejing Wu
Data Source: Census of Canada Collected by Canadian Government in 1971 Collected by Canadian Government in different occupational categories 102 different occupational categories 4 occupational categories had incomplete data 4 occupational categories had incomplete data Categories represent data aggregated over 1000’s of employees Categories represent data aggregated over 1000’s of employees Definition of variables - Gender: % of women in occupation Gender: % of women in occupation Years of Education: Average number of years of education per worker Years of Education: Average number of years of education per worker Job prestige: rating assigned based on social survey conducted in the mid-1960 ’ s Job prestige: rating assigned based on social survey conducted in the mid-1960 ’ s Job types: Job types: Blue collar (e.g. janitor) Blue collar (e.g. janitor) Professional (e.g lawyer) Professional (e.g lawyer) White collar (e.g. insurance agent) White collar (e.g. insurance agent)
What factors affected the occupational income of Canada ’ s residents in 1971? Step1: Data preparation Step1: Data preparation Removal of incomplete observations Removal of incomplete observations (4 types of employment were not classified into a type: baby sitters, athletes, newsboys, and farmers) (4 types of employment were not classified into a type: baby sitters, athletes, newsboys, and farmers) Removal of non-descriptive statistics Removal of non-descriptive statistics (Census code) (Census code)
Step2: Exploratory data analysis 1.Professional occupations have higher average income, prestige scores, and years of education of than blue and white collar jobs 2.White collar jobs (on average) employ a larger percentage of women
Step3: pair-wise scatter plot to see the relationships between variables
Step4: Linear regression Data output R 2 = F-stat: 120 P-value: < VariableCoefficientStdDevT-valueP-stat Education Women Prestige Type (b.c.) Type (prof.) Type (w.c.)
Step5: Test the validity of linear regression: Normality? Data is skewed towards higher incomes
Step5: Test the validity of linear regression: Heteroskedasticity? Data is heteroskedastic -> need to perform data transformation R 2 =.90 Variance is not constant
Step6: Log Transformation (log income) Approximates a normal distribution
Results of linear regression on log transformation education is not a significant variable and can be removed from the model VariableCoef.StdDevT-valueP-stat Education Women e-15 Prestige e-09 Type (b.c.) <2e-16 Type (prof.) <2e-16 Type (w.c) <2e-16
Are different models needed for different ranges of variables? Linear model explains the entire range of observations Linear relationship Linear relationship Variables: Women Prestige Type
Outliers affecting the model Possible outliers Model may not account for a variable which explains these data points
Model disregarding outlier The total sum of squared residuals is further reduced by removing outliers
Final Model This means that regardless of your job type, if you switched between jobs with the same level of prestige (e.g 62) to one which had a lower percentage of women (e.g. 57% to 10%), you could increase you income substantially (~$3,500)
Conclusions The level of prestige (more than education) associated with a particular occupation best describes the income it will earn The level of prestige (more than education) associated with a particular occupation best describes the income it will earn Occupations which employ a higher percentage of women will offer a lower income Occupations which employ a higher percentage of women will offer a lower income Job type (i.e. b.c., w.c., or prof) can be used to explain income differences between occupations Job type (i.e. b.c., w.c., or prof) can be used to explain income differences between occupations