2 Updated unified category system for Census occupations Peter B. Meyer Office of Productivity and Technology, U.S. Bureau of Labor Statistics Western Economic Association meetings San Diego July 2, 2006 Nothing here is an official measurement, finding, view, or policy of the US Dept of Labor
3 Census occupational classifications Census Bureau staff assign occupations (3-digit codes) to decennial Census respondents But the list of 3 digit occ codes changes every Census Current Population Survey (CPS) uses these categories 1960 system from system from system from system from system from 2003-present Vast data is available in these categories Researchers do assign, with error, records from one classification system into another.
4 Tradeoffs in classification systems Duration of category vs. accuracy and precision of category E.g. blacksmith, electrical engineer Sample sizes of each occupation vs. number of comparable occupations standard deviation of incomes within occ requires more cases But narrow distinctions may be the ones of interest English professors vs. other humanities professors High tech occupations vs. other technical/mathematical occupations Conformity with many other sources of data Past Censuses, CPS, NLSY, DOT, O*NET, ISCO, HISCO So there is no one best classification Often (I think) poor fitting ones are used. Researchers may need tools to make them from existing data
5 Long term project goals and strategy Create classification for decennial Census occupation categories since 1960, with an interest in high tech occupations Assigning occupation to 1990 classifications other years by similar titles or function supporting tables from Census and IPUMS IPUMS variable occ1950 Changing the 1990 classification if it seemed useful Test it and compare it to alternatives Use it, share it Accumulate resources, techniques, code, contacts. Listen. Census Bureau and BLS experts IPUMS National Crosswalk Center Dictionary of Occupation Titles ETA / O*NET experts
6 Example: difficult category 1970 code 1970 occupation Title 1980 code 1980 component categories and codes Civilian Labor Force % of 1970 category 284 Sales workers, except clerks, retail trade 263 Sales workers, motor vehicles and boats 185, % 266 Sales workers, furniture and home furnishings 98, % 267 Sales workers; radio, television, hi fi, and appliances 76, % 268 Sales workers, hardware and building supplies 81, % 269Sales workers, parts39, % 274Sales workers, other commodities16, % 277Street and door to door sales workers2, % We invented a category, “salespersons not elsewhere classified” to hold all of the 284s from 1970
7 Desirable attributes of a classification (1) Mean wage within groups should not jump far between periods (2) Wage variance should not jump over time (3) Fraction of the population in this occupation should not jump around between time periods Usually these signal a categorization problem not a real-world phenomenon. We measured these. New criterion: One prefers a classification not be sparse, meaning it does not have many empty occ-year cells
8 Statisticians and actuaries Counts of actuaries and statisticians in Census sample Actuaries Statisticians These are separate categories in and after 1970 In 1960 they were all in “statisticians and actuaries” When standardizing we put all these in “statisticians” Now we try to infer which people in this population were actuaries.
9 Statisticians and actuaries Combined all statisticians and actuaries Predictors of actuary not statistician: Recorded in a later year Employed in insurance, accounting/auditing, professional services In private sector High salary income High business income, or to earn mostly business income Employed at time of Census Lives in Connecticut, Minnesota, Nebraska, or Wisconsin
10 Actuaries and statisticians For a logistic regression can predict occupation right 88% of the time Impute a prediction on 1960 data Revised counts of actuaries and statisticians after imputation Actuaries Statisticians Why work this arcane problem? More accurate statistician category, by later definition Longer time series for actuaries Reduces sparseness Builds a technique
11 Lawyers and judges Combine all lawyers and judges Exclude all private sector employees because they are all lawyers In the remainder, predictors of judge not lawyer: Older Works for state government High salary income Low or no business income Educated less than 16 years Is employed at time of survey Can get 83% accurate predictions from such a regression
12 Logit regression on Census sample Dependent variable is 1 for judges, 0 for lawyers CoefficientStd errorp-value Year Age Age-squared Federal government employee State government Ln(salary) Ln(salary) squared Ln(salary) cubed Ln(business income) Fraction of earned income that is business income Education less than 16 years Years of formal education Is employed at time of survey Constant
13 Can use those coefficients in Stata gen logitindex = * year * age * age * age * indfed * indstate * lnwage * lnwage * lnwage * lnwage * lnwage * lnwage * lnbus * busfrac * (educyrs<16) * educyrs * employed /* constant */ ; #delimit cr /* back to carriage return as statement delimiter not ; */ gen logitval=exp(logitindex)/(1.0+exp(logitindex)) replace logitval=.0001 if !govtemployee /* this is a perfect predictor */ replace logitval=.0001 if !indfed & !indstate & !indlocal /* this too */ gen assigned = logitval>.46 /* Now ‘assigned’ has a 1 for imputed judges */
14 Newly imputed judges Lawyers Judges Respondents in Census samples after imputation
15 Research questions for occupational time series Hypotheses for time series of consistently-defined occupations: Have high tech jobs had rising earnings inequality? [yes] Superstars effect? [yes] Is nurturing work valued less (England et al)? Have mathematical occupations grown in size or pay? Measuring payoffs to skills Have better job-search technologies reduced inequality within job categories? (as predicted by Stigler (1960) Researchers sometimes use only industry, not occupation, or limit time span of study to keep consistent occupation
16 Preliminary findings There are opportunities to impute occupations occasionally with reasonable accuracy The resulting records have “better-classified” occupations slightly more accurate (in four categories) Slightly less sparse (293 empty cells not 295) Effects in a substantive regression not focused on these categories is tiny
17 What’s next? Combine smallest occupations Split farmers into fewer categories In imputations, incorporate more information from Dual-coded CPS for Dual-coded “Treiman” sample from Visit Census categorizers in Jeffersonville, Indiana. Make next working paper and program code available Publish at IPUMS Accumulate more classification systems, techniques, criteria, and experts. New wiki of all classifications.
18 Worker’s tasks Worker’s function (identified e.g. by inputs and outputs) example: blacksmiths vs forging machine operators example: teachers of different subjects and ages of students Sometimes other distinctions Hierarchically (apprentices, foremen, supervisors) Certification Skills Industry (activity of the employing organization) To some extent these are separate labor markets, with separated job search, wage setting, unemployment experiences. Meaning of occupation Tasks InputsOutputs
19 Occupation attributes I Strength (1-5 from DOT) Reasoning (1-6 from DOT) Mathematical reasoning (1-6 from DOT) Language use (1-6 from DOT) Duration of specific training (from DOT) Nurturing (0/1) (England et al, 1994) many others, potentially
20 Occupation attributes II % urban (e.g. doctor in rural area) often involves traveling (or required mobility earlier) rate of growth % of immigrants authority (0/1) (England et al, 1994) high tech regulated unionized use of machines involves advocacy; or repair; or negotiation