Download presentation
Presentation is loading. Please wait.
Published byArthur White Modified over 6 years ago
1
ESSNet Pilot: Web Scraping for Job Vacancy Statistics
2
Current Official Estimates (Survey)
Rationale Current Official Estimates (Survey) Web data Frequency Monthly Real-time? Industry Sector Enterprise Size Job type / skills Sub-national National Totals More frequent More timely More granular Cheaper???
3
Participants (SGA-1) United Kingdom (lead) Germany Sweden Slovenia Italy Greece
4
Broad Approach Understand the landscape of web-based job vacancy data in each country Focus first on job portals, later explore enterprise websites Try to replicate existing outputs, then investigate opportunities to produce new types of output. Develop specific approaches that are appropriate to the circumstances in each country Develop common approaches where possible
5
Data Access 1. Web scraping Job Portals 2. Job Portal APIs
4. Public Sector Agencies 3. Web scraping Enterprise Websites 5. Commercial Suppliers
6
Job Portals – Evaluation Criteria
What 1. Position 2. Occupation 3. Education 4. Type of job (temporary or permanent, full-time, or part time) When 5. Date of advertised vacancy 6. Date of application deadline 7. Date to fill a vacancy Where 8. Location of job Who 9. Direct employer or agency 10. Economic activity of employer (NACE)
7
Classification of Job Portals
2. Job Search Engines 1. Job Boards 3. Hybrid
8
Conceptual Definitions
Job Ad Job Vacancy
9
Conceptual Definitions
Job Ad Job Vacancy
10
Conceptual Definitions
Job Ad Job Vacancy
11
Conceptual Definitions
Job Ad Job Vacancy “Ghost “ Vacancy
12
Coverage Issues ‘Ghost’ Vacancies Target Population: All job vacancies
Employing business identifiable Advertised through agency Advertised on a job portal Advertised on enterprise website
13
Assessing Coverage Job Portal Job Portal Job Portal Enterprise
Advertising employer differs from reporting unit Trading name differs from legal name Duplicate names on business register Enterprise Matching Business Register Job Vacancy Survey
14
Removing Duplicates Concatenated list Final deduplicated list
Job Portal Concatenated list Deduplicate Final deduplicated list 1. Create common variable list: Job_title Job_description Location_city Location_region Date_posted Enterprise name 2. Clean data: e.g. " .NET Developer - Stoke-On-Trent - £35-£40K " 3. Run dedup to produce candidate matches 4. Active learning step (manual coding of > 100 records) 5. Rerun to automatically remove “duplicate” job ads
15
Conclusion Job portal data is very rich, but complex and messy
Difficult to align to established statistical concepts Need to understand coverage issues and how to tackle them Making progress but a long way to go.
16
Future Steps Produce measures of job portal coverage
Explore approaches for enhancing coverage (including web scraping enterprise websites) Develop methods for combining vacancy survey and job ads from the web Develop methods for feature extraction and coding/classifying textual data (to enrich existing survey data) Explore other uses of on-line job vacancy data
17
Future Steps Additional ESS partners joining from July 2017:
Portugal Belgium France Denmark … the beginnings of a longer term network?
18
Thank you for your attention!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.