Download presentation
Presentation is loading. Please wait.
Published byTyrone Stewart Modified over 9 years ago
1
Web Scraping with Python and Selenium
2
What is Web Scraping? Software technique for extracting info from websites Get information programmatically that would otherwise be inaccessible (i.e. there is not “Download data” button) Many modules in many languages We’ll be using Selenium (a module for python) More info: https://en.wikipedia.org/wiki/Web_scraping https://en.wikipedia.org/wiki/Web_scraping
3
Things to Consider What question do you want to ask? What data is needed? Where can we find this data? There’s no button to download a csv file… What do we do now? This is where Web Scraping comes in.
4
Tools we will need: a Browser We’re using Google Chrome https://www.google.com/chrome/browser/d esktop/index.html https://www.google.com/chrome/browser/d esktop/index.html
5
Tools we will need: Programming Text Editor Pick your favorite programming text editor (NOT Notepad, Word, etc.) Sublime Text 2: http://www.sublimetext.com/2 http://www.sublimetext.com/2 Select your operating system. Open the dmg file Drag it to applications folder (Mac)
6
Tools we will need: Text Editors (other editors) Gedit: https://sourceforge.net/projects/gedit/?source =typ_redirect https://sourceforge.net/projects/gedit/?source =typ_redirect Xcode
7
Tools we will need: Python Install (or update) Python: https://www.python.org/downloads/ https://www.python.org/downloads/ Click “Download Python 2.7.11” Once the download finishes, click the package and run the installer
8
Tools we will need: a Web driver We’re using chromedriver Found here: http://chromedriver.storage.googleapis.com/index.html?path=2. 21/ http://chromedriver.storage.googleapis.com/index.html?path=2. 21/ Select the one for your OS Take note of where you save the driver on your file system
9
Last -- Install Selenium Open terminal (Mac) or Command Line (PC) Run the following command: pip install selenium Errors? Try sudo -H pip install selenium NOTE: if your version of Python is older than 2.7.9, you may not have pip. Upgrade python!
10
Before we get started, let’s talk about… HTML Structure Just a giant tree of tags. Key is to focus in on the right tags.
11
Let’s get started! Think about your question! We will skip this step because we already know what page we want. Go to the webpage Find the data you want to scrape! Inspect element…
12
Questions?
13
Start Coding
14
References Selenium API
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.