Web Scraping Lecture9 - Requests

Slides:



Advertisements
Similar presentations
WEB DESIGN TABLES, PAGE LAYOUT AND FORMS. Page Layout Page Layout is an important part of web design Why do you think your page layout is important?
Advertisements

Ruby (on Rails) CSE 190M, Spring 2009 Week 3. Web Programming in Ruby Ruby can be used to write dynamic web pages Similar to PHP, chunks of Ruby begins.
Video, audio, embed, iframe, HTML Form
J4www/jea Week 3 Version Slide edits: nas1 Format of lecture: Assignment context: CRUD - “update details” JSP models.
World Wide Web1 Applications World Wide Web. 2 Introduction What is hypertext model? Use of hypertext in World Wide Web (WWW) – HTML. WWW client-server.
CGI Programming: Part 1. What is CGI? CGI = Common Gateway Interface Provides a standardized way for web browsers to: –Call programs on a server. –Pass.
Web Development & Design Foundations with XHTML Chapter 9 Key Concepts.
UNIT-V The MVC architecture and Struts Framework.
Form Handling, Validation and Functions. Form Handling Forms are a graphical user interfaces (GUIs) that enables the interaction between users and servers.
1 Web Developer & Design Foundations with XHTML Chapter 6 Key Concepts.
Reading Data in Web Pages tMyn1 Reading Data in Web Pages A very common application of PHP is to have an HTML form gather information from a website's.
Lecture 6 – Form processing (Part 1) SFDV3011 – Advanced Web Development 1.
WEB SECURITY WEEK 3 Computer Security Group University of Texas at Dallas.
WEB API: WHY THEY MATTER ECOL 453/ Nirav Merchant
JavaScript – Quiz #9 Lecture Code:
JavaScript, Fourth Edition Chapter 5 Validating Form Data with JavaScript.
Chapter 6 Server-side Programming: Java Servlets
Web Development & Design Foundations with XHTML Chapter 9 Key Concepts.
1 © Netskills Quality Internet Training, University of Newcastle HTML Forms © Netskills, Quality Internet Training, University of Newcastle Netskills is.
CSC 2720 Building Web Applications Server-side Scripting with PHP.
Copyright © Terry Felke-Morris WEB DEVELOPMENT & DESIGN FOUNDATIONS WITH HTML5 7 TH EDITION Chapter 9 Key Concepts 1 Copyright © Terry Felke-Morris.
ECMM6018 Enterprise Networking for Electronic Commerce Tutorial 7
Copyright © Terry Felke-Morris WEB DEVELOPMENT & DESIGN FOUNDATIONS WITH HTML5 Chapter 9 Key Concepts 1 Copyright © Terry Felke-Morris.
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
FORMS How to collect information f XX rom visitors Different kinds of form controls New HTML5 form controls.
1 PHP HTTP After this lecture, you should be able to know: How to create and process web forms with HTML and PHP. How to create and process web forms with.
COSC 2328 – Web Programming.  PHP is a server scripting language  It’s widely-used and free  It’s an alternative to Microsoft’s ASP and Ruby  PHP.
Unit 4 Working with data. Form Element HTML forms are used to pass data to a server. A form can contain input elements like text fields, checkboxes, radio-buttons,
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 21, 2005 I T T C Introduction to Web Technologies.
National College of Science & Information Technology.
Version 0.1 Draft – For Review Murali Mohan Murthy
Scopus - Elsevier (Advanced Course Module 8)
PHP (Session 2) INFO 257 Supplement.
Chapter 5 Validating Form Data with JavaScript
Tonga Institute of Higher Education IT 141: Information Systems
CHAPTER 5 SERVER SIDE SCRIPTING
An authorized user can make payments on your account by logging on with their own username and password. Click on the Authorized Users tab to add an authorized.
Web Basics: HTML and HTTP
API Security Auditing Be Aware,Be Safe
ITM 352 HTML Forms, Basic Form Processing
How to Write Web Forms By Mimi Opkins.
CIIT-Human Computer Interaction-CSC456-Fall-2015-Mr
Node.js Express Web Services
Chapter 19 PHP Part III Credits: Parts of the slides are based on slides created by textbook authors, P.J. Deitel and H. M. Deitel by Prentice Hall ©
Getting web pages First we need to get the webpage by issuing a HTTP request. The best option for this is the requests library that comes with Anaconda:
Test Case Structure Test Case Module(depend on framework) MocoServer
PHP / MySQL Introduction
HTML: Basic Tags & Form Tags
Web Systems Development (CSC-215)
CGI Programming Part II UNIX Security
Web Scraping Lecture9 - Requests
Scopus - Elsevier (Advanced Course Module 8)
Tonga Institute of Higher Education IT 141: Information Systems
Web Development & Design Foundations with H T M L 5
Abel Sanchez, John R. Williams
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
Tonga Institute of Higher Education IT 141: Information Systems
Creating Forms on a Web Page
Web Programming Language
Chengyu Sun California State University, Los Angeles
RESTful Web Services.
HTML Forms 18-Apr-19.
Lecture 19: post and Public APIS
PDS, Primo, Aleph, MetaLib, SFX General workflow
PHP Forms and Databases.
Web Scraping Lecture 10 - Selenium
PHP-II.
Lecture 34: Testing II April 24, 2017 Selenium testing script 7/7/2019
Scopus - Elsevier (Advanced Course: Module 8)
HTML: Basic Tags & Form Tags
Presentation transcript:

Web Scraping Lecture9 - Requests Topics The Requests library Readings: Chapter 9 January 26, 2017

Overview Last Time: Lecture 8 Slides 1-29 Today: BeautifulSoup revisited Crawling Today: Chapter 3: Lecture 6 Slides 29-40 3-crawlSite.py - 4-getExternalLinks.py – 5-getAllExternalLinks.py – Warnings Chapter 4 APIs JSON Javascript References Scrapy site:

References https://medium.mybridge.co/python-top-10-articles-for-the-past-year-v-2017-6033ae8c65c9#.60zvzhw7u

Logging into Websites Gets is all we have done We need to pass information to the server, a user/password pair, to be able to login

Forms example http://pythonscraping.com/pages/files/form.html <h2>Tell me your name!</h2> <form method="post" action="processing.php"> First name: <input type="text" name="firstname"><br> Last name: <input type="text" name="lastname"><br> <input type="submit" value="Submit" id="submit"> </form>

!? Old Code usrlib2  urlrequest #!/usr/bin/env python # -*- coding: utf-8 -*- import urllib2 gh_url = 'https://api.github.com' req = urllib2.Request(gh_url) password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm() password_manager.add_password(None, gh_url, 'user', 'pass') auth_manager = urllib2.HTTPBasicAuthHandler(password_manager) opener = urllib2.build_opener(auth_manager) http://engineering.hackerearth.com/2014/08/21/python-requests-module/

urllib2. install_opener(opener) handler = urllib2 urllib2.install_opener(opener) handler = urllib2.urlopen(req) print handler.getcode() print handler.headers.getheader('content-type') # ------ # 200 # 'application/json' http://engineering.hackerearth.com/2014/08/21/python-requests-module/

Logging into Websites Requests HTTP for Humans

Now with Requests import requests r = requests.get('https://api.github.com', auth=('user', 'pass')) print r.status_code print r.headers['content-type'] # ------ # 200 # 'application/json'

Requests.get() >>> r = requests.get('https://github.com/timeline.json') Now, we have Response object called r using which we can get all the information. r.status r.headers r.text …

Posting and etc. >>> r = requests.post("http://httpbin.org/post") Other HTTP request types: PUT, DELETE, HEAD and OPTIONS? >>> r = requests.put("http://httpbin.org/put") >>> r = requests.delete("http://httpbin.org/delete") >>> r = requests.head("http://httpbin.org/get") >>> r = requests.options("http://httpbin.org/get")

Real Signup – for O’Reilly Newsletter

Now Submitting import requests params = {' email_addr': 'ryan.e.mitchell@ gmail.com'} r = requests.post("http://post.oreilly.com/ client/o/oreilly/forms/ quicksignup.cgi", data = params) print( r.text) Mitchell, Ryan. Web Scraping with Python: Collecting Data from the Modern Web (Kindle Locations 3659-3664). O'Reilly Media. Kindle Edition.

Chapter 9: 1-simpleForm.py import requests params = {'firstname': 'Ryan', 'lastname': 'Mitchell'} r = requests.post("http://pythonscraping.com/files/processing.php", data=params) print(r.text) _________________________________________________________ wdir='C:/Users/mmm.ENGINEERING/Documents/COURSES_Local/590WebScraping/Code/python-scraping/python-scraping-master/chapter9') Hello there, Ryan Mitchell!

Radio Buttons, Checkboxes, Etc. URL of the form: http:// domainname.com? thing1 = foo& thing2 = bar This corresponds to a form of this type: < form method =" GET" action =" someProcessor.php" > < input type =" someCrazyInputType" name =" thing1" value =" foo" /> < input type =" anotherCrazyInputType" name =" thing2" value =" bar" /> < input type =" submit" value =" Submit" /> </ form > Which corresponds to the Python parameter object: {' thing1':' foo', 'thing2':' bar'}

Submitting Files

Chapter 9: 2-fileSubmission.py import requests files = {'uploadFile': open('../files/Python-logo.png', 'rb')} r = requests.post("http://pythonscraping.com/pages/processing2.php", files=files) print(r.text)

Staying logged in How do you stay logged into websites as you move from page to page?

Chapter 9: 3-cookies.py import requests params = {'username': 'Ryan', 'password': 'password'} r = requests.post("http://pythonscraping.com/pages/cookies/welcome.php", params) print("Cookie is set to:") print(r.cookies.get_dict()) print("-----------") print("Going to profile page...") r = requests.get("http://pythonscraping.com/pages/cookies/profile.php", cookies=r.cookies) print(r.text) Mitchell, Ryan. Web Scraping with Python

3-cookies.py output produced wdir='C:/Users/mmm.ENGINEERING/Documents/COURSES_Local/590WebScraping/Code/python-scraping/python-scraping-master/chapter9') Cookie is set to: {'username': 'Ryan', 'loggedin': '1'} ----------- Going to profile page... Hey Ryan! Looks like you're still logged into the site!

Sessions

Chapter 9: 4-sessionCookies.py import requests session = requests.Session() params = {'username': 'username', 'password': 'password'} s = session.post("http://pythonscraping.com/pages/cookies/welcome.php", params) print("Cookie is set to:") print(s.cookies.get_dict()) print("-----------") print("Going to profile page...") s = session.get("http://pythonscraping.com/pages/cookies/profile.php") print(s.text)

Cookie is set to: {'username': 'username', 'loggedin': '1'} ----------- Going to profile page... Hey username! Looks like you're still logged into the site!

HTTP Basic Access Authorization import requests from requests.auth import AuthBase from requests.auth import HTTPBasicAuth auth = HTTPBasicAuth('ryan', 'password') r = requests.post(url="http://pythonscraping.com/pages/auth/login.php", auth=auth) print(r.text)

CAPTCHA’s -- later

Software Architecture

Architecture of Systems Python Re Urlrequest BeautifulSoup Requests Scrapy Selenium WebDriver

Testing 26.4. unittest — Unit testing framework Source code: Lib/unittest/__init__.py The unittest unit testing framework was originally inspired by JUnit and has a similar flavor as major unit testing frameworks in other languages. It supports test automation, sharing of setup and shutdown code for tests, aggregation of tests into collections, and independence of the tests from the reporting framework. To achieve this, unittest supports some important concepts in an object-oriented way:

test fixture A test fixture represents the preparation needed to perform one or more tests, and any associate cleanup actions. This may involve, for example, creating temporary or proxy databases, directories, or starting a server process. test case A test case is the individual unit of testing. It checks for a specific response to a particular set of inputs. unittest provides a base class, TestCase, which may be used to create new test cases. test suite A test suite is a collection of test cases, test suites, or both. It is used to aggregate tests that should be executed together. test runner A test runner is a component which orchestrates the execution of tests and provides the outcome to the user. The runner may use a graphical interface, a textual interface, or return a special value to indicate the results of executing the tests.

Next time – Selenium Web Driver