Web Scraping Lecture 10 - Selenium

Slides:



Advertisements
Similar presentations
Server-Side vs. Client-Side Scripting Languages
Advertisements

XP Browser and Basics1. XP Browser and Basics2 Learn about Web browser software and Web pages The Web is a collection of files that reside.
1 CS428 Web Engineering Lecture 18 Introduction (PHP - I)
1 Lesson 1 Quick HTML Know-How HTML and JavaScript BASICS, 4 th Edition Barksdale / Turner.
WEB DESIGN SOME FOUNDATIONS. SO WHAT IS THIS INTERNET.
The PHP Story. PHP Story PHP is a programming language. Incorporate(join) sophisticated business logic. Widely used general purpose scripting language.
Overview of HTML. Three Different Approaches  Text editor like Notepad  HTML editor such as: –KompoZer –DreamWeaver –Microsoft Expression Web –iWeb.
HOW TO ACCESS THE ONLINE TEXTBOOK
Agenda What is AJAX? What is jQuery? Demonstration/Tutorial Resources Q&A.
JQuery CS 268. What is jQuery? From their web site:
Getting Started with Dreamweaver
Adobe Dreamweaver CS3 Revealed CHAPTER ONE: GETTING STARTED WITH DREAMWEAVER.
Web 2.0: Concepts and Applications 11 The Web Becomes 2.0.
Web 2.0: Concepts and Applications 11 The Web Becomes 2.0.
How to develop your website Chapter Websites Denise R. E. Copeland
Tablet Camp 2015 Resource Guide for Students at Klein ISD 1:1 Campuses.
HTML, XHTML, and CSS Chapter 8 Adding Multimedia Content to Web Pages.
JavaScript – Quiz #9 Lecture Code:
JavaScript - A Web Script Language Fred Durao
Automated Smoke Testing on the JVM with Geb João SILVA (GS-AIS-EB) 1st Forum 29th of September, 2015 e-Business Section AUTOMATED SMOKE.
Selenium January Selenium course content  Introduction (Session-1)Session-  Automation  What is automation testing  When to go for automation.
1 What is JQuery. jQuery is a fast and concise JavaScript Library that simplifies HTML document traversing, event handling, animating, and Ajax* interactions.
Web 2.0: Concepts and Applications 11 The Web Becomes 2.0.
© 2012 LogiGear Corporation. All Rights Reserved FitNesseFitNesse Authors: Nghia Pham 1.
Web Scraping with Python and Selenium. What is Web Scraping?  Software technique for extracting info from websites Get information programmatically that.
Tata Consultancy Services1 WebDriver Basics Submitted By : Akhil K Gagan Deep Singh Naveenrajha H M Poornachandra Meduri Shubham Utsav Sunil Kumar G Vivek.
Test Automation For Web-Based Applications Portnov Computer School Presenter: Ellie Skobel.
Web Page Design The Basics. The Web Page A document (file) created using the HTML scripting language. A document (file) created using the HTML scripting.
Teaching slides Chapter 6. Chapter 6 Software user interface design & construction Contents Introduction Graphical user interface – Rich window based.
Web driver and its comparison Selenium RC. Selenium web driver: It's web automation testing out framework that assists to execute assessments throughout.
Section 10.1 Define scripting
Getting Started with Dreamweaver
JQuery Fundamentals Introduction Tutorial Videos
Test Web applications using Selenium
The Object-Oriented Thought Process Chapter 13
Tonga Institute of Higher Education IT 141: Information Systems
The Zen of UI Test Automation
CIIT-Human Computer Interaction-CSC456-Fall-2015-Mr
Lesson 14: Web Scraping TopHat Attendance
Introduction of Selenium Webdriver Using Java
Lecture 11. Web Standards Continued
Lesson 14: Web Scraping Topic: Web Scraping.
Intro to JavaScript CS 1150 Spring 2017.
PHP / MySQL Introduction
Atit Leelasuksan Rungroj Maipradit
Database Driven Websites
Haden Jackson-Robbins
Dynamic Web Pages JavaScript Jill Thomas Oct 14, 2003.
Web Scraping Lecture9 - Requests
Python, PhantomJS, & Selenium
Tonga Institute of Higher Education IT 141: Information Systems
Scrapy Web Cralwer Instructor: Bei Kang.
JavaScript.
Overview of HTML.
Getting Started with Dreamweaver
590 Web Scraping – Handling Images
Secure Web Programming
Tonga Institute of Higher Education IT 141: Information Systems
Web Scraping Lecture9 - Requests
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Teaching slides Chapter 6.
Overview The World Wide Web has changed the way that people
Overview The World Wide Web has changed the way that people
An Introduction to JavaScript
CSCE 590 Web Scraping - Selenium
Creating Your WebQuest on Zunal.com
Web Application Development Using PHP
© 2017, Mike Murach & Associates, Inc.
Presentation transcript:

Web Scraping Lecture 10 - Selenium Topics Selenium Webdriver ChromeDriver, PhantomJS Readings: Chapter 10 January 26, 2017

Overview Last Time: Lecture 8 Slides 1-29 Chapter 9: the Requests Library – filling out forms 1-simpleForm.py 2-fileSubmission.py 3- cookies.py 4-sessionCookies.py– 5-BasicAuth.py Software Architecture of systems Today: Chapter 13: References: Chapter 13, websites

Selenium Web Driver Big Picture Big Picture = Software Architecture – how components of the software fit together

References Windows Installation YouTube video Linux Installation https://www.youtube.com/watch?v=V69wc4Tmwjc Linux Installation http://blog.likewise.org/2015/01/setting-up-chromedriver-and-the-selenium-webdriver-python-bindings-on-ubuntu-14-dot-04/ Chrome Driver https://sites.google.com/a/chromium.org/chromedriver/getting-started PhantomJS Selenium Site

JavaScript < script > alert(" This creates a pop-up using JavaScript"); </ script > Mitchell, Ryan. Web Scraping with Python: Collecting Data from the Modern Web (Kindle Locations 3813-3814). O'Reilly Media. Kindle Edition. Web Scraping with Python: Collecting Data from the Modern Web by Ryan Mitchell

Examples of Javascript

jQuery jQuery is an extremely common library, used by 70% of the most popular Internet sites and about 30% of the rest of the Internet. A site using jQuery is readily identifiable because it will contain an import to jQuery somewhere in its code, such as: < script src =" http:// ajax.googleapis.com/ ajax/ libs/ jquery/ 1.9.1/ jquery.min.js" > </ script > dynamically creates HTML content that appears only after the JavaScript is executed.

Google analytics

Google Maps Embedded in websites

Executing Javascript with Selenium

Selenium Self Service Carolina Demo

Ajax and Dynamic HTML

Installation Not just pip here; there is the separate ChromeDriver executable that forms the interface between your python program using selenium and the Browser (in this case Chrome)

ChromeDriver - WebDriver for Chrome Latest Release: ChromeDriver 2.27 https://sites.google.com/a/chromium.org/chromedriver/downloads Pick your OS Unzip and remember where it is

PhantonJS – headless WebDriver http://phantomjs.org/download.html

Setting Up ChromeDriver and the Selenium-WebDriver Python bindings on Ubuntu 14.04 install Google Chrome for Debian/Ubuntu: sudo apt-get install libxss1 libappindicator1 libindicator7 wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb sudo dpkg -i google-chrome*.deb sudo apt-get install –f install xvfb so we can run Chrome headlessly: sudo apt-get install xvfb https://christopher.su/2015/selenium-chromedriver-ubuntu/

Chromedriver – Unbuntu 14.4 sudo apt-get install unzip wget -N http://chromedriver.storage.googleapis.com/2.26/chromedriver_linux64.zip unzip chromedriver_linux64.zip chmod +x chromedriver sudo mv -f chromedriver /usr/local/share/chromedriver sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver https://christopher.su/2015/selenium-chromedriver-ubuntu/

Install Selenium and pyvirtualdisplay pip install pyvirtualdisplay selenium Now, we can do stuff like this with Selenium in Python: from pyvirtualdisplay import Display from selenium import webdriver display = Display(visible=0, size=(800, 600)) display.start() driver = webdriver.Chrome() driver.get('http://christopher.su') print driver.title

Selenium Selectors

Still can use BeatiufulSoup

from selenium.webdriver.common.by import By

By Selection strategies

PhantonJS – headless WebDriver Again http://phantomjs.org/download.html

XPath Syntax XPath (short for XML Path) is a query language used for navigating and selecting portions of an XML document. founded by the W3C in 1999 used in languages such as Python, Java, and C# when dealing with XML documents. Although BeautifulSoup does not support XPath, many of the other libraries in this book do. It can often be used in the same way as CSS selectors (such as mytag# idname), although it is designed to work with more generalized XML documents rather than HTML documents in particular. Mitchell, Ryan. Web Scraping with Python: Collecting Data from the Modern Web (Kindle Locations 4051-4056). O'Reilly Media. Kindle Edition.

XPATH

XPATH

Selenium Self Service Carolina Demo if __name__ == "__main__": driver = init_driver() password = "MyPassword" #password = input("Enter MySC password: ") lookup(driver, "Selenium") time.sleep(5) driver.quit()

import time from selenium import webdriver from selenium. webdriver import time from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.common.exceptions import TimeoutException from bs4 import BeautifulSoup def init_driver(): driver = webdriver.Chrome("E:/chromedriver_win32/chromedriver.exe") driver.wait = WebDriverWait(driver, 5) return driver

def lookup(driver, query): driver. get("https://my. sc def lookup(driver, query): driver.get("https://my.sc.edu/") print ("SSC opened") try: link = driver.wait.until(EC.presence_of_element_located( (By.PARTIAL_LINK_TEXT, "Sign in to"))) #https://ssb.onecarolina.sc.edu/BANP/twbkwbis.P_WWWLogin?pkg=twbkwbis.P_GenMenu%3Fname%3Dbmenu.P_MainMnu print ("Found link", link) link.click() print ("Clicked link") #button = driver.wait.until(EC.element_to_be_clickable( # (By.NAME, "btnK"))) #box.send_keys(query) #button.click() except TimeoutException: print("Houston we have a problem First Page")

# Now try to login try: user_box = driver. wait. until(EC # Now try to login try: user_box = driver.wait.until(EC.presence_of_element_located( (By.NAME, "username"))) #https://ssb.onecarolina.sc.edu/BANP/twbkwbis.P_WWWLogin?pkg=twb kwbis.P_GenMenu%3Fname%3Dbmenu.P_MainMnu print ("Found box", user_box) user_box.send_keys("01069379") print ("ID entered") passwd_box = driver.wait.until(EC.presence_of_element_located( (By.ID, "vipid-password"))) print ("Found password box", passwd_box) passwd_box.send_keys(password) print ("password entered") button = driver.wait.until(EC.element_to_be_clickable( (By.NAME, "submit"))) print ("Found submit button", button) #box.send_keys(query) button.click() except TimeoutException: print("Houston we have a problem Login Page")