Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Scraping with Python and Selenium. What is Web Scraping?  Software technique for extracting info from websites Get information programmatically that.

Similar presentations


Presentation on theme: "Web Scraping with Python and Selenium. What is Web Scraping?  Software technique for extracting info from websites Get information programmatically that."— Presentation transcript:

1 Web Scraping with Python and Selenium

2 What is Web Scraping?  Software technique for extracting info from websites Get information programmatically that would otherwise be inaccessible (i.e. there is not “Download data” button)  Many modules in many languages  We’ll be using Selenium (a module for python) More info: https://en.wikipedia.org/wiki/Web_scraping https://en.wikipedia.org/wiki/Web_scraping

3 Things to Consider  What question do you want to ask?  What data is needed?  Where can we find this data?  There’s no button to download a csv file…  What do we do now?  This is where Web Scraping comes in.

4 Tools we will need: a Browser  We’re using Google Chrome  https://www.google.com/chrome/browser/d esktop/index.html https://www.google.com/chrome/browser/d esktop/index.html

5 Tools we will need: Programming Text Editor  Pick your favorite programming text editor (NOT Notepad, Word, etc.)  Sublime Text 2: http://www.sublimetext.com/2 http://www.sublimetext.com/2  Select your operating system.  Open the dmg file  Drag it to applications folder (Mac)

6 Tools we will need: Text Editors (other editors)  Gedit: https://sourceforge.net/projects/gedit/?source =typ_redirect https://sourceforge.net/projects/gedit/?source =typ_redirect  Xcode

7 Tools we will need: Python  Install (or update) Python: https://www.python.org/downloads/ https://www.python.org/downloads/  Click “Download Python 2.7.11”  Once the download finishes, click the package and run the installer

8 Tools we will need: a Web driver  We’re using chromedriver  Found here: http://chromedriver.storage.googleapis.com/index.html?path=2. 21/ http://chromedriver.storage.googleapis.com/index.html?path=2. 21/  Select the one for your OS  Take note of where you save the driver on your file system

9 Last -- Install Selenium  Open terminal (Mac) or Command Line (PC)  Run the following command: pip install selenium Errors? Try sudo -H pip install selenium NOTE: if your version of Python is older than 2.7.9, you may not have pip. Upgrade python!

10 Before we get started, let’s talk about…  HTML Structure  Just a giant tree of tags.  Key is to focus in on the right tags.

11 Let’s get started!  Think about your question!  We will skip this step because we already know what page we want.  Go to the webpage  Find the data you want to scrape!  Inspect element…

12 Questions?

13 Start Coding

14 References Selenium API


Download ppt "Web Scraping with Python and Selenium. What is Web Scraping?  Software technique for extracting info from websites Get information programmatically that."

Similar presentations


Ads by Google