Download presentation
Presentation is loading. Please wait.
Published byLaurence Murphy Modified over 6 years ago
1
Getting web pages First we need to get the webpage by issuing a HTTP request. The best option for this is the requests library that comes with Anaconda: r = requests.get(' auth=('user', 'pass')) The username and password is optional. To get the page: content = r.text
2
Other variables and functions
r.status_code HTTP status codes returned by servers as well as any HTML and files: OK 204 No Content 400 Bad Request 401 Unauthorized 403 Forbidden 404 Not Found 408 Request Timeout 500 Internal Server Error 502 Bad Gateway (for servers passing on requests elsewhere) 504 Gateway Timeout (for servers passing on requests elsewhere)
3
JSON You can use requests to get JSON files from the web and translate it into a Python object similar to the mix of dicts and lists of the json library. json_object = r.json() content
4
Other options Ability to deal with cookies. Ability to pass parameters to servers in a variety of ways. Ability to maintain sessions with a server. Ability to issue custom headers representing different browsers ("user-agent"), etc. Ability to deal with streaming.
5
Processing webpages Best library for this is beautifulsoup: soup = bs4.BeautifulSoup(content, 'html.parser')
6
Getting elements by ID or other attributes: table = soup
Getting elements by ID or other attributes: table = soup.find(id="yxz") tds = soup.find_all(attrs={"class" : "y"}) Getting all elements of a specific tag: trs = table.find_all('tr') for tr in trs: # Do something with the "tr" variable. Getting elements inside another and get their innerHTML: tds = tr.find_all("td") for td in tds: print (td.text) All tags are lowercased during search. How to get elements
7
Generally done in JavaScript. Very similar to Python
Generally done in JavaScript. Very similar to Python. Each statement ends in a semicolon; Blocks are defined by {} function dragStart(ev) {} if (a < b) { } else { } for (a = 0; a < b; a++) {} var a = 12; var a = [1,2,3]; // Comment /** * Comment **/ Client side coding
8
Getting elements in Javascript
document is the root of the page. var a = document.getElementById("yxz") var a = document.getElementsByClassName("datatable"); var tds = document.getElementsByTagName("TD"); Getting text: alert(tds[0].innerHTML) // popup box console.log(tds[0].innerHTML ) // Browser console (F12 to open with most) Setting text: tds[0].innerHTML = "2";
9
Connecting JavaScript
JavaScript is largely run through Event Based Programming. Each HTML element has specific events associated with it. We attach a function to run to these thus: <SPAN id="clickme" onclick="functionToRun()">Push</SPAN> <BODY onload="functionToRun()">
10
Where to put JavaScript
Functions placed between <script> </script> tags in either the head or body. In the body code will run in the order the page loads if not in functions. Alternatively, can be in an external script linked to with a filename or URL in the body or head, thus: <script src="script.js"></script>
11
<HTML> <HEAD> <SCRIPT> function clicked() { var a = document.getElementById("clickme"); a.innerHTML = "changed"; } </SCRIPT> </HEAD> <BODY> <SPAN id="clickme" onclick="clicked()">Push</SPAN> </HTML> Example
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.