Download presentation
Presentation is loading. Please wait.
Published byDi Ry Modified over 6 years ago
1
How to download prices and track price changes: competitive price monitoring and price matching guerrillahub.com
2
Let’s pretend this is a valid intro where I tell you why price matching and price monitoring are important and get to the point ______________
3
Today you will learn about: ● Crawling ● Fetching data ● Parsing the right elements ● Storing and analyzing data And more importantly you will learn how to download price lists from your competitors’ websites.
4
Required tools Netpeak Spider It’s a desktop website crawler we’ll need to fetch data from target websites. It costat $14/mo and there’s a 2 weeks free trial. Google Sheets Or excel. I’m using GS because I need to share my projects, but Excel is more capable. Formulas Depending on your competitor’s website architecture you may need to remove duplicates, unnecessary data from cells and etc.
5
How to download price list from any website: 1. Inspect elements of the page where target data is stored 2. Analyze code to learn how this data is provided across all pages 3. Set up crawling to fetch information from identical code on other pages 4. Test run 5. Crawl entire website and fetch data 6. Create a spreadsheet, remove duplicates and unnecessary info 7. Save the list of remaining URLs to repeat crawling with same settings and track changes
6
Step 1 Inspect elements of the page where target data is stored
7
Open a product page highlight the price, right click on it and click inspect
8
A console will open where this element will be highlighted
9
Step 2 Analyze code to learn how this data is provided across all pages
10
You need to tell the crawler which elements to parse in order to fetch the data It can be: ● XPath ● CSS Selector ● HTML I’ll show you how to get data from XPath which works for most stores and one example of a store which assigns unique IDs to products which makes the process more complicated.
11
XPath Best way to test if fetching data from XPath will work is copying XPaths from two different pages and comparing results. They should be identical ● //*[@id="u-skip-anchor"]/span/span[1] If XPath contains unique ID, this method won’t work. ● //*[@id="new-price-465333"]/span/span[1] ● //*[@id="new-price-244103"]/span/span[1]
12
CSS Selector Fetching data from CSS Selector works on all websites, but sometimes you’ll get a lot of unnecessary information along with what you’re looking for. In this case you’d fetch price, discount, how much you save, VAT and Shipping. All of these can be removed with in Excel or Google Sheets with formulas.
13
CSS Selector Fetching data from CSS works similar to fetching it from XPath, except in this case after opening the console you’ll have to hover over the div which contains necessary information.
14
Step 3 Set up crawling to fetch information from identical code on other pages
15
Netpeak Spider Download and install: https://netpeaksoftware.com/spiderhttps://netpeaksoftware.com/spider
16
Disable all parameters to speed up crawling Crawling settings → Parameters → Uncheck all boxes
17
Enable Custom Search Crawling settings → Custom Search → Use Custom Search
18
Custom Search Settings After you find out how product names and prices are housed on product pages you can set up extraction from corresponding elements. Select the extraction method that fits your requirements (XPath, CSS Selector, HTML)
19
Custom Search Settings Add another custom search field by clicking the green button and repeat the process for any other element from the page that you are interested it
20
Step 4 Test run
21
Analyze few product pages from your target website with these parameters
22
Step 5 Crawl entire website and fetch data
23
If test run was successful start crawling entire website
24
Track progress in the Search tab Found shows how many pages contained prices and product names in corresponding names Not found shows the number of pages where prices and names were either not found (contacts page for example) or where prices were housed in different elements (category pages and lists)
25
Track progress in the Search tab It’s not unusual for crawler to not find any results on the first few hundred pages, since product pages are usually not the closest ones to the main page
26
Export data
27
Step 6 Create a spreadsheet, remove duplicates and unnecessary info
28
Removing duplicates Duplicates appear when crawler visits the same page twice, this can happen for a number of reasons. The best way to get rid of duplicates is to delete them from URL list. That way you’ll only have one instance of each product page on your list. This Google Chrome add-on is great for removing duplicates from Google Sheets.Google Chrome add-on
29
Removing unnecessary info Some websites have a complex structure, which means the only way to download prices is to download a larger CSS Selector: It means that along with price, crawler will fetch everything within this field:
30
Removing unnecessary info To remove everything except price you need to trim data in your cell. Here’s a formula you can use to remove everything from the cell before or after a certain character, word or symbol: =TRIM(LEFT(A1,FIND("word/character/symbol",A1))) — This formula will remove everything from cell A1 after a desired character =TRIM(RIGHT(A1,FIND("word/character/symbol",A1))) — This formula will remove everything from cell A1 before a desired character Create a separate column next to the initial one and apply formula to it.
31
Step 7 Save the list of remaining URLs to repeat crawling with same settings and track changes
32
After you’ve gone through previous steps, you will have a table that looks like this:
33
Copy the list of URLs from the first column and set Netpeak Spider to crawl these URLs only
34
Recrawl these URLs with the same parameters whenever you want to get an update Feel free to copy my spreadsheet with it’s formatting settings: LINKLINK
35
That’s it. Thank you for your attention and good luck with your projects
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.