Download presentation
Presentation is loading. Please wait.
1
Strategies for collecting prices on Internet
Olav ten Bosch June 20th 2013
2
Content Why internet as a data source (IAD)?
Internet robots, how do they work? Examples Conclusion
3
Why IAD? Administrative sources Tax, social security services
Municipalities/ Provinces Supermarkets and Surveys
4
Why IAD? Internet sources Administrative sources
Faster, better, more efficient Administrative sources Tax, social security services Municipalities/ Provinces Supermarkets … Surveys New indicators Internet sources Less!!!
6
Google Trends (1) Search on “fever” from the Netherlands 2004 - today
(31 may 2013)
7
Google Trends (2) Search on “fever” from the Netherlands Last 90 days
(31 may 2013)
8
Original Content No added value ? Content enrichment
9
Robots / crawlers / bots / spiders / scrapers: how do they work ? (1)
Internet Requests Graphical markup Website Commands code, figures, style, data, Etc. Browser You
10
5 maart 2013 - Internet Robots bij het CBS
11
Robots / crawlers / bots / spiders / scrapers: how do they work? (3)
Navigation Internet Requests Graphical markup Website Commands code, figures, style, data, etc Robot/ spider/ crawler Not You Data
12
Robots / crawlers / bots / spiders / scrapers: how do they work? (4)
Navigation Internet Requests Graphical markup Website Commands code, figures, style, data, etc Robot/ spider/ crawler Not You Data Monitor actively
13
Robots / crawlers / bots / spiders / scrapers: how do they work? (5)
Many sites have same structure / pattern: Search (ex. region / category / price) List of results, 1 or more pages (previous / next) Short description for each item Click to go to detail view of item Sites do have differences: Dynamics: “births” en “deaths” of items Comparability of items / articles / objects categories (brands, colors, sizes)
14
Housing market (1)
15
Housing market (2) Difference in update speed between 2 housing sites calculated from robot data
Verschil in dagen van verschijnen objecten op site 1 versus site 2
16
Airline tickets (2010)
17
Airline tickets (2010)
18
Airline tickets (2010)
19
Vliegreizen (2010) ? Many differences Both robots see high prices
Robot2 initialization phase
20
Airline tickets(2010)
21
Clothing:
22
Clothing: Site 1: 15 months, daily, very volatile
Site 2: 8 months, items per day, more stable
23
Clothing: from volatile data to statistics
24
Pilot for EGR Wikipedia as a secondary data source?
Wikipedia: company info for businesses
25
Cinema tickets: Few information on many sites
27
Conclusion IAD useful to reduce response burden and for innovation
Many objects on few sites => generic robot software Few objects on many sites => tool for semi-automated price collection Legislation: we operate as transparant as possible Challenges: The internet changes continuously!!! Which content is original, which is stable? From volatile data sources to stable statistics We need advanced statistical methods, processes and IT
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.