Presentation is loading. Please wait.

Presentation is loading. Please wait.

Updated Unified Category System for Census Occupations

Similar presentations


Presentation on theme: "Updated Unified Category System for Census Occupations"— Presentation transcript:

1 Updated Unified Category System for 1960-2000 Census Occupations
Peter B. Meyer US Bureau of Labor Statistics (but none of this represents official measurement or policy) SSHA 2006, Minneapolis; Nov 4, 2006 Outline Tentative standard categories Users and bug fixes How Census assigns occupation codes Imputation practice Social Science History Association conference It's in Minneapolis, Nov 2-5, 2006 Minneapolis, Nov 2-5, 2006

2 Census Occupational Classifications
U.S. Bureau of Census determines a list of 3 digit occupation codes each ten years Then puts one for employed respondents to the decennial Census and some other surveys Vast data is available in these categories: CPS, ATUS, SIPP, NLS, ACS, decennial Census But not always consistently over long time spans Research efforts may require some standard

3 Tradeoffs in Classification Systems
Precise job distinctions vs. Consistency, duration, and sample size High tech occupations vs. other technical occupations blacksmith, database admin (shorter, more precise series) electrical engineer (longer evolving series) “Superstars” jobs like athletes and musicians (need precision) Licensed jobs (need long comparable occupations) Conformity with other data Avoid “sparseness” – many missing year-occ cells Meaning of occupation: function, tasks, skills, background, social class There is no perfect classification but there are tools & criteria for better ones. Tradeoffs when mapping occupations together into unified categories Many choices designed to achieve consistency must introduce error No set of choices is ideal for all purposes– tradeoffs between precision and accuracy for example. Different research questions will require different category groups, and links to other data sets Samples size of each occupation are needed to calculate standard deviation Number of comparable occupations are neded to determine that an particular occupation is significantly different from others Exact earlier slide. Duration of category vs. accuracy and precision of category E.g. blacksmith, electrical engineer Sample sizes of each occupation vs. number of comparable occupations standard deviation of incomes within occ requires more cases But narrow distinctions may be the ones of interest English professors vs. other humanities professors High tech occupations vs. other technical/mathematical occupations Conformity with many other sources of data Past Censuses, CPS, NLSY, DOT, O*NET, ISCO, HISCO So there is no one best classification Often (I think) poor fitting ones are used. Researchers may need tools to make them from existing data

4 Baselines to improve on
IPUMS defined occ1950 for US workers recorded in ANY Census Working paper (Meyer and Osborne, 2005) defined classification of digit occupations codes from 1960 to present It was adapted from the 500+ categories in 1990 Census: 379 categories have same name or almost same as 1990 125 were eliminated to help harmonize with other years (Example to follow) 19 categories have expanded (changed name or a not-elsewhere-classified category was given more scope) 3 categories added for 1960 data which doesn’t fit in

5 Some distinctions are lost in standardization
Census reports and IPUMS data show how many respondents would be coded in each of two classification systems. 1970 code occupation title 1980/90 component category titles Civilian Labor Force % of category 284 Sales workers, except clerks, retail trade 263 Sales workers, motor vehicles and boats 185,160 37.06% 266 Sales workers, furniture and home furnishings 98,941 19.80% 267 Sales workers; radio, television, hi fi, and appliances 76,674 15.35% 268 Sales workers, hardware and building supplies 81,668 16.35% 269 Sales workers, parts 39,120 7.83% 274 Sales workers, other commodities 16,008 3.20% 277 Street and door to door sales workers 2,082 0.42% However, the title of 1980 occupation 263 is specifically restricted to motor vehicles and boats, while the 1970 title is not. If we were to use the 1980 category name and apply it to 1970 data, we would have had a category that explicitly mislabeled most of its members. Instead, we combined the workers in category 284 in 1970 into the category called “Salespersons not elsewhere classified”. Because occ1950 uses the predefined 1950 categories, no categories were renamed, or “n.e.c.” categories created or expanded, to extend consistency in definition across years.

6 User input and new data since 2005
Sent these programs to 19 people who expressed interest Open-source code idea (helps find errors; also is public property) Corrections from users did come in Philip Cohen, UNC Sociology, identified some problems/mistakes. Sarah Porter, research assistant at U of Iowa working with Jennifer Glass, wrote a program to do some similar mappings. Comparing to that program I found mistakes in mine. Dual-coded 1990/2000 data sets highlighted some surprises Experimented with imputations (example to follow) Visited the Census office where they assign these codes.

7 Census Bureau's National Processing Center in Jeffersonville, IN
All occupation and industry codes are assigned there Security was pretty high. I was not a sworn Census agent and this got in the way a bit.  Louisville, KY, is just south of it I interviewed four specialists who assign occupation & industry codes.

8 Information used when coding
“what kind of work" “most important activities or duties" employer name “what kind of industry” city and state ("PSU") of respondent's home industry type (manufacturing, service, other) years of education, age, sex not income, although it was available before Jan '94 software. Tens of thousands of job titles are mapped to a code in a reference book they have, if industry also matches. Some cases may be "autocoded" by software and coder checks After coding, public use samples have 3-digit occupation code and 3-digit industry code Quality of assignments from public use samples are limited coder cannot do this; referralist can

9 Imputation: Statisticians and Actuaries
These were separate categories in and after 1970 But in 1960 they were all in “statisticians and actuaries” When standardizing (2005) they were put in “statisticians” Counts of Actuaries and Statisticians in Census Sample 1960 1970 1980 1990 Actuaries . 45 129 182 Statisticians 199 237 352 338 I’m aware that no one has expressed interest in the question of how many actuaries there were in 1960. However if you wanted time series on occupations in general going back to 1960, for comparative reasons, it helps to extend any of them back is of interest because there were so few semiconductors and computers then data is my control for what computers have done. Also I thought if you were going to probabilistically assign anybody to an occupation, statisticians were the right group to start with. We can’t offend them; they do this kind of thing to other people all the time. -- 65% of actuaries worked in industry 717 (the insurance industry), whereas only 10% of statisticians did. -- 88% of actuaries worked in the private sector, whereas only 60% of statisticians did -- 10% of statisticians were foreign-born; only 4% of actuaries were -- About half of statisticians were female. Only a third of actuaries were. -- The mean salary of actuaries was 50% higher than the mean salary of statisticians -- Actuaries had much higher mean business income. Will try to infer which of the 1960 people were actuaries.

10 Statisticians and Actuaries
Pooled all statisticians and actuaries Good predictors of whether respondent is an actuary: Recorded in a later year Employed in insurance, accounting/auditing, or professional services industries Employed in private sector High salary income High business income, or to earn mostly business income Is employed Lives in Connecticut, Minnesota, Nebraska, or Wisconsin Ran many logistic regressions predicting the actuaries

11 Statisticians and Actuaries
For 1970 data that logistic regression predicts occupation right 88% of the time Revised counts of actuaries and statisticians after imputation 1960 1970 1980 1990 Actuaries 29 45 129 182 Statisticians 170 237 352 338

12 Statisticians and Actuaries
Why work this arcane problem? More accurate standardized “statistician” category Longer actuary time series Reduces sparseness – empty cells Builds a technique for this data mining Benefits scale up through IPUMS

13 Imputing judges In 1960 Census, lawyers and judges were one category
Later, they’re separate, and separate in “standard” system Without more info, we categorize all in 1960 as “lawyers”. We wish to impute which ones are judges Useful fact: private sector ones were all called lawyers Predictors for the public sector ones, of who’s a judge: Older Employed in state government High salary income Low business income Educated less than 16 years Employed at time of survey

14 Logit regression predicting judges in 1970-90 Census
Dependent variable: maximum likelihood probability this individual is a judge. Coefficient Std error p-value Year -0.005 0.011 0.633 Age 0.155 0.033 0.000 Age-squared -0.001 0.040 Federal government employee -1.440 0.137 State government employee 0.499 0.263 0.058 Ln(salary) -1.795 3.094 0.562 Ln(salary) squared 0.052 0.333 0.877 Ln(salary) cubed 0.003 0.012 0.798 Ln(business income) -0.041 0.036 0.261 Fraction of earned income that is business income -0.714 1.053 0.498 Education less than 16 years 2.235 0.320 Years of formal education -0.044 0.046 0.336 Is employed at time of survey 0.224 0.241 0.352 Constant 13.017 23.428 0.578 Judges are likely to be OLDER,

15 Thus we assign judge occupation code
gen logitindex = * year * age * age * age * indfed * indstate * lnwage * lnwage * lnwage * lnwage * lnwage * lnwage * lnbus * busfrac * (educyrs<16) * educyrs * employed /* constant */ ; … gen logitval=exp(logitindex)/(1.0+exp(logitindex)) replace logitval=.0001 if !govtemployee /* this is a perfect predictor */ replace logitval=.0001 if !indfed & !indstate & !indlocal /* this too */ gen assigned = logitval>.46 /* Now ‘assigned’ has a 1 for imputed judges */ Threshhold probability is chosen to match the number of judges expected to be there, based on annual trend. Can get 83% accurate predictions from such a rule on 1970 data. This mis-assigns a few who should have stayed lawyers. Having data-mined the coefficients, we forge this tool to cut between the lawyers and the judges.

16 Newly Imputed Judges 1960 1970 1980 1990 Lawyers 1971 2570 5082 7603
Respondents in Census samples after imputation 1960 1970 1980 1990 Lawyers 1971 2570 5082 7603 Judges 82 123 298 331

17 What's next? Use dual-coded CPS datasets with 1990 and 2000 codes to make a few more imputations Keep listening, seek more help, make it better. Publish variable at IPUMS.org Keep going? & 1980 dual coded data sets exist.

18 Industry and occupation coding
Industry codes and occupations codes are assigned by the same group of people, at the same time for each respondent. Industry is almost always decided first. The people who do that are “coders” Procedures are carefully documented I wasn’t a “sworn” Census agent and couldn’t see it done, live

19 Desirable Attributes of a Classification
For each occupation, well-behaved time-series of: mean wage wage variance fraction of the population New criterion: SPARSENESS One prefers a classification not be sparse, meaning it does not have many empty occ-year cells A good table is the one on CV from 1990 to Technicians n.e.c. shows a change that we could maybe fix. My input: For each occupation which exists in two consecutive Census samples we calculate these summary statistics in both years, and compare them. IF something did jump around there is not necessarily much we can do. We might change the definition, or think through whether the phenomenon of change is a real-world change, or a redefinition of the category. Possibly we can document this Where possible we want to the occupations to be a repeated cross section or panel. THINK ABOUT THE EXAMPLE. PROPOSED_STANDARD wfreqnum_obs1960 wfreqnum_obs1970 wfreqnum_obs1980 wfreqnum_obs1990 wfreqnum_obs2000

20 What new information would help referralists?
Information about a job title Information about employer's city and state not respondent’s But asking more questions would extend the CPS interview

21 Problems faced by referralists
Too little information from respondent “Computer work" for “kind of work” Exaggeration (example: dot com businesses) Ambiguities: "water company" for industry or employer "surveyor" occupation "boot" vs "boat" in handwriting Having to hurry Referralists confer with each other routinely, but sometimes make different choices from one another Does technological change go along with occupational ambiguity? YES. Problems with computer work, biotech. Still no nanotech in classification.

22 The information coders have

23 Who's Doing the Coding There were about 12 coders and 14 referralists in October 2006 Referralists have been coders before and usually have 9+ years of experience I interviewed three referralists, and a supervisor The ones I met handled referrals from several surveys: CPS, ATUS, SIPP, NLS, ACS others on contract All these use decennial Census occupation codes They DON’T handle the decennial Censuses

24 Information available to referralist
Can match Employer name to a known employer from their Employer Name List (ENL), same as SSEL or Business Registry. Can look on the web for that employer Can study “little red book” - SOC manual or (less often) giant Dict Occ Titles 1991 or, I’m told, look up employer in Dun and Broadstreet data They try to make a coherent choice for industry and occupation together.

25 “Coders” and “Referralists”
Coders follow carefully documented procedures from the Census headquarters in Suitland, MD Coders with two years of experience are expected to assign 94 codes an hour, with 95% accuracy (which is checked) If there is not enough information to assign industry and occupation codes by procedure, the case is forwarded electronically ("referred") to a “Referralist" (aka statistical assistant)


Download ppt "Updated Unified Category System for Census Occupations"

Similar presentations


Ads by Google