Download presentation
Presentation is loading. Please wait.
Published byRocco Cott Modified over 10 years ago
1
Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics
2
Overview – Why internet as a data source (IAD)? – Internet robots, how do they work? – Applications: Airline tickets Housing market Clothing Robot assisted data collection – Conclusion
3
Why IAD? (1) Administrative sources – Tax, social security services – Municipalities/ Provinces – Supermarkets Surveys Internet sources Less!!! Faster, better, more efficient New indicators
4
4
5
Which content is original, reliable, stable, representative and accessible? Internet sources Why IAD? (2) – Internet prices for CPI ? – Real estate sites for housing statistics ? – Internet vacancies for job statistics ? – Social media sentiment for consumer confidence ? – Trade in second-hand goods as economic indicators ? – Travel activity for tourism statistics ?
6
Robots / crawlers / bots / spiders / scrapers: how do they work? (1) Browser Website Internet Requests code, images, style, data, etc. Graphical markup You Commands
7
Robots / crawlers / bots / spiders / scrapers: how do they work? (2) Robot/ spider/ crawler Website Internet Requests Navigation code, images, style, data, etc. Data You
8
Robots / crawlers / bots / spiders / scrapers: how do they work? (3) Robot/ spider/ crawler Website Internet Requests Navigation code, images, style, data, etc. Data Monitor actively Generic software for: - site navigation - product details - monitoring Data
9
Airline tickets (1) Robot collection versus manual collection
10
Airline tickets (2) Price of a ticket over time
11
Housing Market (1)
12
Housing market (2) Dynamics of the database behind becomes visible
13
Clothing (1):
14
2 sites: very volatile data Clothing (2): Challenges: -from volatile data to stable statistics -how to classify multiple less structured data sources Seasonal pattern
15
Robot-assisted data collection (1) – Use case: few price observations on many sites – Example: price of a cinema ticket – Robot tool to automatically check if prices are changed
16
Robot-assisted data collection (2) 16
17
Conclusion – Using internet as a datasource we can measure statistical phenomena in a completely different way – It is powerful to combine fast internet data with reliable (but slower) administrative data – We should redesign statistics with the possibilities of internet data in mind Challenges: – Legal framework – The internet changes continuously: how to turn volatile data sources into reliable statistics? – We need advanced statistical methods, processes and IT
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.