Presentation is loading. Please wait.

Presentation is loading. Please wait.

Strategies for collecting prices on Internet

Similar presentations


Presentation on theme: "Strategies for collecting prices on Internet"— Presentation transcript:

1 Strategies for collecting prices on Internet
Olav ten Bosch June 20th 2013

2 Content Why internet as a data source (IAD)?
Internet robots, how do they work? Examples Conclusion

3 Why IAD? Administrative sources Tax, social security services
Municipalities/ Provinces Supermarkets and Surveys

4 Why IAD? Internet sources Administrative sources
Faster, better, more efficient Administrative sources Tax, social security services Municipalities/ Provinces Supermarkets Surveys New indicators Internet sources Less!!!

5

6 Google Trends (1) Search on “fever” from the Netherlands 2004 - today
(31 may 2013)

7 Google Trends (2) Search on “fever” from the Netherlands Last 90 days
(31 may 2013)

8 Original Content No added value ? Content enrichment

9 Robots / crawlers / bots / spiders / scrapers: how do they work ? (1)
Internet Requests Graphical markup Website Commands code, figures, style, data, Etc. Browser You

10 5 maart 2013 - Internet Robots bij het CBS

11 Robots / crawlers / bots / spiders / scrapers: how do they work? (3)
Navigation Internet Requests Graphical markup Website Commands code, figures, style, data, etc Robot/ spider/ crawler Not You Data

12 Robots / crawlers / bots / spiders / scrapers: how do they work? (4)
Navigation Internet Requests Graphical markup Website Commands code, figures, style, data, etc Robot/ spider/ crawler Not You Data Monitor actively

13 Robots / crawlers / bots / spiders / scrapers: how do they work? (5)
Many sites have same structure / pattern: Search (ex. region / category / price) List of results, 1 or more pages (previous / next) Short description for each item Click to go to detail view of item Sites do have differences: Dynamics: “births” en “deaths” of items Comparability of items / articles / objects categories (brands, colors, sizes)

14 Housing market (1)

15 Housing market (2) Difference in update speed between 2 housing sites calculated from robot data
Verschil in dagen van verschijnen objecten op site 1 versus site 2

16 Airline tickets (2010)

17 Airline tickets (2010)

18 Airline tickets (2010)

19 Vliegreizen (2010) ? Many differences Both robots see high prices
Robot2 initialization phase

20 Airline tickets(2010)

21 Clothing:

22 Clothing: Site 1: 15 months, daily, very volatile
Site 2: 8 months, items per day, more stable

23 Clothing: from volatile data to statistics

24 Pilot for EGR Wikipedia as a secondary data source?
Wikipedia: company info for businesses

25 Cinema tickets: Few information on many sites

26

27 Conclusion IAD useful to reduce response burden and for innovation
Many objects on few sites => generic robot software Few objects on many sites => tool for semi-automated price collection Legislation: we operate as transparant as possible Challenges: The internet changes continuously!!! Which content is original, which is stable? From volatile data sources to stable statistics We need advanced statistical methods, processes and IT


Download ppt "Strategies for collecting prices on Internet"

Similar presentations


Ads by Google