Getting web pages First we need to get the webpage by issuing a HTTP request. The best option for this is the requests library that comes with Anaconda: r = requests.get('https://etc', auth=('user', 'pass')) The username and password is optional. To get the page: content = r.text
Other variables and functions r.status_code HTTP status codes returned by servers as well as any HTML and files: 200 OK 204 No Content 400 Bad Request 401 Unauthorized 403 Forbidden 404 Not Found 408 Request Timeout 500 Internal Server Error 502 Bad Gateway (for servers passing on requests elsewhere) 504 Gateway Timeout (for servers passing on requests elsewhere)
JSON You can use requests to get JSON files from the web and translate it into a Python object similar to the mix of dicts and lists of the json library. json_object = r.json() content
Other options Ability to deal with cookies. Ability to pass parameters to servers in a variety of ways. Ability to maintain sessions with a server. Ability to issue custom headers representing different browsers ("user-agent"), etc. Ability to deal with streaming.
Processing webpages Best library for this is beautifulsoup: soup = bs4.BeautifulSoup(content, 'html.parser')
Getting elements by ID or other attributes: table = soup.find(id="yxz") tds = soup.find_all(attrs={"class" : "y"}) Getting all elements of a specific tag: trs = table.find_all('tr') for tr in trs: # Do something with the "tr" variable. Getting elements inside another and get their innerHTML: tds = tr.find_all("td") for td in tds: print (td.text) All tags are lowercased during search. How to get elements
Generally done in JavaScript. Very similar to Python Generally done in JavaScript. Very similar to Python. Each statement ends in a semicolon; Blocks are defined by {} function dragStart(ev) {} if (a < b) { } else { } for (a = 0; a < b; a++) {} var a = 12; var a = [1,2,3]; // Comment /** * Comment **/ Client side coding
Getting elements in Javascript document is the root of the page. var a = document.getElementById("yxz") var a = document.getElementsByClassName("datatable"); var tds = document.getElementsByTagName("TD"); Getting text: alert(tds[0].innerHTML) // popup box console.log(tds[0].innerHTML ) // Browser console (F12 to open with most) Setting text: tds[0].innerHTML = "2";
Connecting JavaScript JavaScript is largely run through Event Based Programming. Each HTML element has specific events associated with it. We attach a function to run to these thus: <SPAN id="clickme" onclick="functionToRun()">Push</SPAN> <BODY onload="functionToRun()">
Where to put JavaScript Functions placed between <script> </script> tags in either the head or body. In the body code will run in the order the page loads if not in functions. Alternatively, can be in an external script linked to with a filename or URL in the body or head, thus: <script src="script.js"></script>
<HTML> <HEAD> <SCRIPT> function clicked() { var a = document.getElementById("clickme"); a.innerHTML = "changed"; } </SCRIPT> </HEAD> <BODY> <SPAN id="clickme" onclick="clicked()">Push</SPAN> </HTML> Example