Presentation is loading. Please wait.

Presentation is loading. Please wait.

ESSNet Pilot: Web Scraping for Job Vacancy Statistics

Similar presentations


Presentation on theme: "ESSNet Pilot: Web Scraping for Job Vacancy Statistics"— Presentation transcript:

1 ESSNet Pilot: Web Scraping for Job Vacancy Statistics

2 Current Official Estimates (Survey)
Rationale Current Official Estimates (Survey) Web data Frequency Monthly Real-time? Industry Sector Enterprise Size Job type / skills Sub-national National Totals More frequent More timely More granular Cheaper???

3 Participants (SGA-1) United Kingdom (lead) Germany Sweden Slovenia Italy Greece

4 Broad Approach Understand the landscape of web-based job vacancy data in each country Focus first on job portals, later explore enterprise websites Try to replicate existing outputs, then investigate opportunities to produce new types of output. Develop specific approaches that are appropriate to the circumstances in each country Develop common approaches where possible

5 Data Access 1. Web scraping Job Portals 2. Job Portal APIs
4. Public Sector Agencies 3. Web scraping Enterprise Websites 5. Commercial Suppliers

6 Job Portals – Evaluation Criteria
What 1. Position 2. Occupation 3. Education 4. Type of job (temporary or permanent, full-time, or part time) When 5. Date of advertised vacancy 6. Date of application deadline 7. Date to fill a vacancy Where 8. Location of job Who 9. Direct employer or agency 10. Economic activity of employer (NACE)

7 Classification of Job Portals
2. Job Search Engines 1. Job Boards 3. Hybrid

8 Conceptual Definitions
Job Ad Job Vacancy

9 Conceptual Definitions
Job Ad Job Vacancy

10 Conceptual Definitions
Job Ad Job Vacancy

11 Conceptual Definitions
Job Ad Job Vacancy “Ghost “ Vacancy

12 Coverage Issues ‘Ghost’ Vacancies Target Population: All job vacancies
Employing business identifiable Advertised through agency Advertised on a job portal Advertised on enterprise website

13 Assessing Coverage Job Portal Job Portal Job Portal Enterprise
Advertising employer differs from reporting unit Trading name differs from legal name Duplicate names on business register Enterprise Matching Business Register Job Vacancy Survey

14 Removing Duplicates Concatenated list Final deduplicated list
Job Portal Concatenated list Deduplicate Final deduplicated list 1. Create common variable list: Job_title Job_description Location_city Location_region Date_posted Enterprise name 2. Clean data: e.g. " .NET Developer - Stoke-On-Trent - £35-£40K " 3. Run dedup to produce candidate matches 4. Active learning step (manual coding of > 100 records) 5. Rerun to automatically remove “duplicate” job ads

15 Conclusion Job portal data is very rich, but complex and messy
Difficult to align to established statistical concepts Need to understand coverage issues and how to tackle them Making progress but a long way to go.

16 Future Steps Produce measures of job portal coverage
Explore approaches for enhancing coverage (including web scraping enterprise websites) Develop methods for combining vacancy survey and job ads from the web Develop methods for feature extraction and coding/classifying textual data (to enrich existing survey data) Explore other uses of on-line job vacancy data

17 Future Steps Additional ESS partners joining from July 2017:
Portugal Belgium France Denmark … the beginnings of a longer term network?

18 Thank you for your attention!


Download ppt "ESSNet Pilot: Web Scraping for Job Vacancy Statistics"

Similar presentations


Ads by Google