Strategies for collecting prices on Internet

Slides:



Advertisements
Similar presentations
(SEM) SEARCH ENGINE MARKETING Build Your Brand & Maximize Revenue.
Advertisements

Getting Your Web Site Found. Meta Tags Description Tag This allows you to influence the description of your page with the web crawlers.
Olav ten Bosch MSIS, Dublin, April 2014 On the use of internet robots for official statistics.
Essentials For Building A Profitable Ecommerce Business Robyn Anderson ecommerce Consultant.
]. Website Must-Haves Know your audience Good design Clear navigation Clear messaging Web friendly content Good marketing strategy.
© URENIO Research Unit 2004 URENIO Online Benchmarking Application Thessaloniki 7 th of October 2004 Isidoros Passas BEng Computer System Engineering.
Strategies for improving Web site performance Google Webmaster Tools + Google Analytics Marshall Breeding Director for Innovative Technologies and Research.
Nobody’s Unpredictable Ipsos Portals. © 2009 Ipsos Agenda 2 Knowledge Manager Archway Summary Portal Definition & Benefits.
Downloading defined: Downloading is the process of copying a file (such as a game or utility) from one computer to another across the internet. When you.
MIS 424 Professor Sandvig. Overview  Why Analytics?  Two major approaches:  Server logs  Google Analytics.
Callender Design June 2004 Project Manager: Marie E. Callender.
The INTERNET Worldwide network of computers linked together.
How do I search the Internet? Narrow your topic and its description; pull out key words and categories.
EVALUATE YOUR SITE’S PERFORMANCE. Web site statistics Affiliate Sales Figures.
Olav ten Bosch and Edwin de Jonge Statistics Netherlands UNECE - Meeting on the Management of Statistical Information Systems (MSIS) Luxembourg, 7-9 April,
+ Web Design Terminology Digital Communications III- Frameworks-2.1 Terminology HTML Domain Name Hot Spot Site Maps.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
This guide tells you how to search for news, narrative commentary and analysis. It also highlights content available on some specific FT.com pages that.
A s s i g n m e n t W e e k 7 : T h e I n t e r n e t B Y : P a t r i c k O b i s p o.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
Aarkstore - Metastatic Ovarian Cancer- Global API Manufacturers, Marketed and Phase III Drugs Landscape, 2015 Category : Pharmaceuticals and Healthcare.
Redesign Goals Based On User Survey… Simpler navigation to reduce clutter.
Olav ten Bosch 23 March 2016, ESSnet big data WP2, Rome Webscraping at Statistics Netherlands.
Digital Marketing Strategy by BCP Design. Digital Marketing Strategies In a digital age, where most users have computers and mobile devices, accessing.
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
Web Scraping for Collecting Price Data: Are We Doing It Right?
Data mining in web applications
SEARCH ENGINE OPTIMIZATION.
Director’s Breakfast.
Internet Made Easy! Make sure all your information is always up to date and instantly available to all your clients.
Investment Intentions Survey 2016
Search Engine Optimization (SEO)
Risk scoring tool Prague – June 2017
About ShortPoint: ShortPoint is an innovative software company, specialized in simplifying the digital content creation and management and to boost collaboration,
Live Customer Support Solution
SEARCH ENGINES & WEB CRAWLER Akshay Ghadge Roll No: 107.
Add a +1 to make your Google marketing social.
Strategies for improving Web site performance
Web software.
SEM II : Marketing Research
COMPREHENSIVE ONE-WEEK COURSE.
What is Search Engine optimization
Best SEO Tips to Make Your Website Stand Out. SEARCH ENGINE OPTIMIZATION It is essential that you implement Search Engine Optimization strategies to make.
For basic Internet searches for news articles or interviews with the person you are researching, try Bing &/or Google. News search will help you find where.
Instructor Name Instructor Title Library Name
5 Techniques for a Proper Website Security Testing.
Search Engines & Subject Directories
SpringerLink Training August 2010
Competitor Price Monitoring
Web scraping tools, an introduction
What is a Search Engine EIT, Author Gay Robertson, 2017.
Big Data.
Maximizing Exposure for Your Non-Profit
Uses of web scraping for official statistics
Meeting of the Waste Statistics Working Group 8-9 March 2011
FRC Parts Exchange Guide to the site.
Metrics Stats n’ Stuff.
Statistics Explained is multilingual Marc Debusschere
Search Engines & Subject Directories
Search Engines & Subject Directories
Digital Marketing Offerings
WEB DESIGNING THROUGH HTML
Comparison Engines and Data Feeds
DD Sir-Infomatics Web Development Part-1.
Web scraping tools, a real life application
Navigating the New CBFlorida.com
Processing bulk data from the Internet
Website A website is a collection of web pages (documents that are accessed through the Internet) When someone gives you their web address, it generally.
Best Digital Marketing Tips For Quick Web Pages Indexing Presented By:- Abhinav Shashtri.
Prepared by G.sunil Kumar Contents:- What is E-commerce? What is SEO? What is E-Commerce SEO? Benefits of SEO What is website Types of SEO SEO On-page.
Presentation transcript:

Strategies for collecting prices on Internet Olav ten Bosch June 20th 2013

Content Why internet as a data source (IAD)? Internet robots, how do they work? Examples Conclusion

Why IAD? Administrative sources Tax, social security services Municipalities/ Provinces Supermarkets and Surveys

Why IAD? Internet sources Administrative sources Faster, better, more efficient Administrative sources Tax, social security services Municipalities/ Provinces Supermarkets … Surveys New indicators Internet sources Less!!!

Google Trends (1) Search on “fever” from the Netherlands 2004 - today (31 may 2013)

Google Trends (2) Search on “fever” from the Netherlands Last 90 days (31 may 2013)

Original Content No added value ? Content enrichment

Robots / crawlers / bots / spiders / scrapers: how do they work ? (1) Internet Requests Graphical markup Website Commands code, figures, style, data, Etc. Browser You

5 maart 2013 - Internet Robots bij het CBS

Robots / crawlers / bots / spiders / scrapers: how do they work? (3) Navigation Internet Requests Graphical markup Website Commands code, figures, style, data, etc Robot/ spider/ crawler Not You Data

Robots / crawlers / bots / spiders / scrapers: how do they work? (4) Navigation Internet Requests Graphical markup Website Commands code, figures, style, data, etc Robot/ spider/ crawler Not You Data Monitor actively

Robots / crawlers / bots / spiders / scrapers: how do they work? (5) Many sites have same structure / pattern: Search (ex. region / category / price) List of results, 1 or more pages (previous / next) Short description for each item Click to go to detail view of item Sites do have differences: Dynamics: “births” en “deaths” of items Comparability of items / articles / objects categories (brands, colors, sizes)

Housing market (1)

Housing market (2) Difference in update speed between 2 housing sites calculated from robot data Verschil in dagen van verschijnen objecten op site 1 versus site 2

Airline tickets (2010)

Airline tickets (2010)

Airline tickets (2010)

Vliegreizen (2010) ? Many differences Both robots see high prices Robot2 initialization phase

Airline tickets(2010)

Clothing:

Clothing: Site 1: 15 months, daily, very volatile Site 2: 8 months, 30 000 items per day, more stable

Clothing: from volatile data to statistics

Pilot for EGR Wikipedia as a secondary data source? Wikipedia: company info for 41 000 businesses

Cinema tickets: Few information on many sites

Conclusion IAD useful to reduce response burden and for innovation Many objects on few sites => generic robot software Few objects on many sites => tool for semi-automated price collection Legislation: we operate as transparant as possible Challenges: The internet changes continuously!!! Which content is original, which is stable? From volatile data sources to stable statistics We need advanced statistical methods, processes and IT