Measuring the very long, fuzzy tail in the occupational distribution InGRID workshop New skills new jobs: Tools for harmonising the measurement of occupations Amsterdam 11 February 2014 Kea Tijdens
What is the very long tail? Distribution NL labour force over 193 occupational groups (3 digit level ISCO-08, CBS Statline)
Two ways of measuring occupations in surveys Open-ended question ‘What is your occupation?’ is typically asked in an open-ended survey question with office-recoding because in a country the stock of job titles may exceed 100,000 and the occupational distribution has a very long tail Closed survey question presenting items (occupations, job titles) for self-identification: a limited list of items (e.g. in PAPI) a search tree with many items (only CAPI or CAWI) semantic matching using a lookup database with very many items no list/database will cover all possible responses, because the number of possible items is not known provides respondent a hint what kind of answer the survey is looking for (> no responses at various levels of aggregation) 21 November 2019
Research objective Objectives What percentage of respondents can identify their occupation via a search tree? uses the text box? Once using the text box, what percentage could have identified their occupation in the search tree? which occupations are absent in the search tree? To what extent should the search tree’s database include the long tail occupations > what is the optimum size of a database? 21 November 2019
Data and methods Data from LISS panel … The LISS (Longitudinal Internet Studies for the Social Sciences) panel is administered by CentERdata (Tilburg University) The LISS panel is a probability-based online panel For our project, LISS panel repeated in Oct-2009 the WageIndicator web survey on work and wages, selection workers in paid employment (n = 3444) Occupation question with search tree Used 3-step search tree for question ‘What is your occupation?’ step 1: 23 entries, for example ‘Guards, army, police’ step 2: 207 entries step 3: approx. 1,700 occupational titles, all coded ISCO-08 Each 3rd level list allowed to tick ‘other’, followed by a text box Note: search tree hierarchy does not follow ISCO hierarchy > designed for classification, not for self-identification 21 November 2019
About the search tree WageIndicator web survey WageIndicator websites WageIndicator web survey on work and wages (start in NL in 2001) =>> continuous, volunteer survey posted on all WageIndicator websites, currently in 75 countries Web survey uses search tree for self-identification of occupation, because the alternative (open question) requires office coding =>> too expensive & too time demanding Database adapted to ISCO-08 in EurOccupations project (‘07-09) WageIndicator websites WageIndicator websites publish information about wages, labour law and career, attracting large numbers of visitors (20 mln 2012) Survey data used for the websites’ Salary Check, providing wage estimates per occupation (group of occs) Salary Check uses the same search tree as in web survey WageIndicator receives irregular emails of visitors: ‘why is my occupation not included?’ (recently for UK: ‘estates manager’)
Use of the text box What percentage of LISS respondents can identify their occupation via a search tree? uses the text box? Once using the text box, what percentage could have identified their occupation in the search tree? Initial 3444 100% Drop out 18 1% Identified occ in 3rd step 2313 67% Ticked other 1113 32% … of which could have identified occ 497 14% … of which occ was absent in search tree 600 17% … of which unidentifiable text 16
The long tail in the search tree 2313 respondents who ticked an occupation in the search tree, in total 584 occupations from the 1,700 titles
The long tail ....... 600 respondents who ticked ‘other’ and whose job title could not be coded into the database, together 479 occupational titles
Conclusion A search tree and the very long fuzzy list of occupations … the current search tree reduces the coding load with 2/3 the remaining 479 uncoded occupational titles challenges to increase the database to almost 2,300 titles among these 479 some are too aggregated titles and need specification, e.g. project leader a joint search tree + semantic matching becomes a condition for using self-identification with a search tree in a survey 21 November 2019
The end Thank you for your attention! Comments invited K.G.Tijdens@uva.nl For more information www.wageindicator.org/main/Wageindicatorfoundation/researchlab/occupation-data-base 21 November 2019