Olav ten Bosch MSIS, Dublin, 14-16 April 2014 On the use of internet robots for official statistics.

Slides:



Advertisements
Similar presentations
Measuring ICT in Europe – EITOs experience Axel Pols German Association for IT, telecom and new media (BITKOM) European Information Technology Observatory.
Advertisements

1 © Netskills Quality Internet Training, University of Newcastle From My Home Page to FrontPage An Overview of Authoring Tools Patris van Boxel Netskills.
TOPIC LEARNING BTEC Level 3 Unit 28 Websites L01- Understand the customer requirements L02- Create a website design that meets the audience and purpose.
Measuring Interest Group Expectations and Trust Meeting the demands of a changing world Theme 1: Reputation and Image Analysis Presenter - Gemma van Halderen.
Copyright © 2005 EFT Network, Inc. All Rights Reserved. Centralized Returns Significantly Reduce or even Eliminate Returned Check Fees.
GAME PLAN OVERVIEW The Emergence of Modern Brokerage The Four Models of Brokerage Factors Disrupting the Brokerage Industry The 10 Trends That Will.
Barteld Braaksma and Kees Zeelenberg “Re-make / Re-model”: Should big data change the modelling paradigm in official statistics?
Determination of Administrative Data Quality : Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands.
Developing a System for Web Based Data Dissemination CSO Experience Strategies for Web based Data Dissemination Ghusoon M. Hameed IRAQ.
Xiaobin Zheng April 13 th, Outline Mobile search Mobile Web Types of services Case Study: Google Search for mobile Yahoo! Search for mobile Conclusion.
Progress Report 11/1/01 Matt Bridges. Overview Data collection and analysis tool for web site traffic Lets website administrators know who is on their.
Sydney, Australia January 23, 2003 The Invisible Web Chris Sherman Editor, SearchDay SearchEngineWatch.com Information Online 2003.
Business Case for Industriali- sation in Statistics Estonia: Small Example of a Large Trend MSIS 2013 Allan Randlepp Tuulikki Sillajõe.
Features and Functions of Information Systems. What are information systems?  Information systems consist of software, hardware and communication networks.
Online Shopping Take Charge of Your Finances
© Family Economics & Financial Education –October 2007 – Consumer Protection Unit – Online Shopping Funded by a grant from Take Charge America, Inc. to.
Web Design Basic Concepts.
Slide 14.1 Cooper et al: Tourism: Principles and Practice, 3e Pearson Education Limited 2005, © retained by authors Chapter 14 Public Sector and Policy.
Chapter 14 Public Sector and Policy
Webmaster Overview Fort Collins, CO Copyright © XTR Systems, LLC Webmaster Overview Instructor: Joseph DiVerdi, Ph.D., MBA.
Usage of new data sources at SORS Boro Nikić, Tomaž Špeh, Zvone Klun Statistical Office of the Republic of Slovenia Washington, 29 April - 1 May 2015.
Strategies for improving Web site performance Google Webmaster Tools + Google Analytics Marshall Breeding Director for Innovative Technologies and Research.
Reserving Airplane Tickets. Learning Objectives Know how to use Internet travel websites to research and reserve airplane tickets. Know how to use Internet.
Economic Indicators Lauren Rudd January 9, Same store sales 01/9/20142.
CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.
Robots and Humans in perfect harmony. Why Top Placement on Internet Usage by Consumers, posing the question…
Module 07 The Marketing Mix in Tourism and Hospitality.
United Nations Economic Commission for Europe Statistical Division Seasonal Adjustment Process with Demetra+ Anu Peltola Economic Statistics Section, UNECE.
Assessing the Capacity of Statistical Systems Development Data Group.
Multi-source tools for assessing the users’ needs & perception on statistical quality. The Spanish experience. European Conference on Quality in Official.
Timely statistical information for monetary policy purposes
1 Experience of Thai NSO in ICT Statistics. Paper presented in Joint UNCTAD-ITU-UNESCAP Regional Workshop on Information Society Measurements in Asia-Pacific,
Telephone Checks Innovative, Flexible, and Convenient Payment Solution.
By: Jessica Watkins. “Open Source software is software which can be used, modified and improved by anyone and can be redistributed freely.” Freely, in.
Olav ten Bosch and Edwin de Jonge Statistics Netherlands UNECE - Meeting on the Management of Statistical Information Systems (MSIS) Luxembourg, 7-9 April,
Understanding computer applications today. Application areas for Projects  Computers in Tourism  Computers in Education  Computers in Advertising 
Big Data activities at SURS Statistical Office of the Republic of Slovenia DIME/ITDG meeting, February 2016.
Slovene National statistical system Irena Krizman Former Director-General Statistics Slovenia.
Sustainable Tourism Networking in Europe GreenHopping Roxane Kaempf GreenHopping
Building a tourism intelligence system using big data Jon Kepa Gerrikagoitia, Ph.D. OPTIMA / Optimization Modelling & Analytics ICT - European Software.
March 2011 UNECE Statistical Division 1 Challenges & Problems of Short- Term Statistics (STS) Based on the UNECE paper on Short-Term Economic Statistics.
1 1 Energy prices in energy statistics (and IRES) Mr. Atle Tostensen Statistics Norway OG4 – 2 February 2009.
1 HICP and CPI differences in Latvia Prepared by Oskars Alksnis Central Statistical Bureau (CSB) of Latvia EU Twinning Project Forwarding Armenian Statistics.
Olav ten Bosch 23 March 2016, ESSnet big data WP2, Rome Webscraping at Statistics Netherlands.
What is Selenium Web Driver? - Selenium Training Collection.
Transforming official Statistics
Web Scraping for Collecting Price Data: Are We Doing It Right?
Data mining in web applications
Internet Made Easy! Make sure all your information is always up to date and instantly available to all your clients.
Land Registry Computerization Project
Regional data on Statistics Finland
Artur Andrysiak Economic Statistics Section, UNECE
Strategies for improving Web site performance
BTEC NCF Dip in Comp - Unit 15 Website Development Lesson 05 – Website Performance Mr C Johnston.
Meryem Demirci United Nations Statistics Division
Retailing in Electronic Commerce: Products and Services
Regional Workshop on Short-term Economic Indicators and Service Statistics September 2017 Chiba, Japan Alick Nyasulu SIAP.
04 | Web Applications Gerry O’Brien | Technical Content Development Manager Paul Pardi | Senior Content Publishing Manager.
System And Application Software
Scraping it together Leo Patterson Ross #justfutures18.
Language Tooling in Orion
Web scraping tools, an introduction
What is a Search Engine EIT, Author Gay Robertson, 2017.
Statistical Office of the Republic of Slovenia
Uses of web scraping for official statistics
Big Data Sources – Web, Social media and Text Analytics
Strategies for collecting prices on Internet
Web scraping tools, a real life application
House Price Indices Laurs Nørlund Director – National Accounts, Prices and Key Indicators Eurostat - European Commission.
Processing bulk data from the Internet
Presentation transcript:

Olav ten Bosch MSIS, Dublin, April 2014 On the use of internet robots for official statistics

Overview – Why internet as a data source (IAD)? – Internet robots, how do they work? – Applications: Airline tickets Housing market Clothing Robot assisted data collection – Conclusion

Why IAD? (1) Administrative sources – Tax, social security services – Municipalities/ Provinces – Supermarkets Surveys Internet sources Less!!! Faster, better, more efficient New indicators

4

Which content is original, reliable, stable, representative and accessible? Internet sources Why IAD? (2) – Internet prices for CPI ? – Real estate sites for housing statistics ? – Internet vacancies for job statistics ? – Social media sentiment for consumer confidence ? – Trade in second-hand goods as economic indicators ? – Travel activity for tourism statistics ?

Robots / crawlers / bots / spiders / scrapers: how do they work? (1) Browser Website Internet Requests code, images, style, data, etc. Graphical markup You Commands

Robots / crawlers / bots / spiders / scrapers: how do they work? (2) Robot/ spider/ crawler Website Internet Requests Navigation code, images, style, data, etc. Data You

Robots / crawlers / bots / spiders / scrapers: how do they work? (3) Robot/ spider/ crawler Website Internet Requests Navigation code, images, style, data, etc. Data Monitor actively Generic software for: - site navigation - product details - monitoring Data

Airline tickets (1) Robot collection versus manual collection

Airline tickets (2) Price of a ticket over time

Housing Market (1)

Housing market (2) Dynamics of the database behind becomes visible

Clothing (1):

2 sites: very volatile data Clothing (2): Challenges: -from volatile data to stable statistics -how to classify multiple less structured data sources Seasonal pattern

Robot-assisted data collection (1) – Use case: few price observations on many sites – Example: price of a cinema ticket – Robot tool to automatically check if prices are changed

Robot-assisted data collection (2) 16

Conclusion – Using internet as a datasource we can measure statistical phenomena in a completely different way – It is powerful to combine fast internet data with reliable (but slower) administrative data – We should redesign statistics with the possibilities of internet data in mind Challenges: – Legal framework – The internet changes continuously: how to turn volatile data sources into reliable statistics? – We need advanced statistical methods, processes and IT