CS/COE 1520 Jarrett Billingsley HTTP CS/COE 1520 Jarrett Billingsley
Today: HTTP Review!
HTTP
What is it? HTTP stands for HyperText Transfer Protocol. it was invented along with HTML to transfer… hypertext… but it's very flexible and extensible, so we use it for just about everything on the web now it's an OSI Layer 7 (application) protocol, so it assumes all the connection/data transfer stuff is all handled for it. HTTPS (Secure) is not a separate protocol it's still HTTP, but on top of an encrypted connection - so many web/networking technologies end in "P" for protocol… HTTP, FTP, IP, TCP, UDP, aaaaaghhgh - Tim Berners-Lee originally conceived of the web as a sort of distributed file system, where anyone could create/move documents.
How it works http://example.com/some/path.html GET /some/path.html at the most basic level, it's very simple. http://example.com/some/path.html how who/where what example.com GET /some/path.html Client Server <html><head>... this is how the very first version worked. it's still essentially the same, just with some addons.
Don't Forget login bidirectional data in stateful protocols, the server keeps track of who is connected. for example, SSH and FTP are stateful. login Client Server bidirectional data jfb42 is logged in. but what happens if the connection goes down? or what if 400,000 people want to connect at once? or what if someone connects and never disconnects?
Amnesiac request 1 response 1 request 2 response 2 HTTP is stateless. every client-server exchange is independent. request 1 Client Server response 1 request 2 response 2 this avoids all those previous problems. of course, it introduces problems of its own, but… - connection goes down? no problem, the server doesn't care. it just sees requests. - lots of people connect at once? no problem, the server doesn't need to allocate a bunch of resources for each connection. - it scales much better, is what I mean. - impossible for someone to connect and use up a connection.
GET /some/path.html HTTP/1.1 Host: example.com User-Agent: Mozilla/5.0 HTTP/1.1 Requests HTTP is primarily a textual protocol. a request (from client to server) looks like this: the HTTP method we want to perform. identifies the version. GET /some/path.html HTTP/1.1 Host: example.com User-Agent: Mozilla/5.0 Accept: */* what browser or program this request is coming from. what kind of data the client is expecting in return (in this case, anything at all).
HTTP/1.1 Responses HTTP/1.1 200 OK Content-Type: text/html the server replies with something like this: the response's status. HTTP/1.1 200 OK Content-Type: text/html Content-Length: 483 Server: nginx Connection: close <html> <head>... describes the content. many other headers may be present. and after a blank line, the content! binary files (images, videos, archives) are not sent as text; after the blank line, it's just a blob o' bytes.
HTTP/2.0 HTTP/2.0 was standardized in 2015 (relatively recently!) unlike HTTP/1, it's a binary protocol this greatly reduces the computing/parsing needed but interestingly, the client and server don't really need to know that it's handled more or less transparently behind the scenes the same information is presented to client/server it's just translated between binary and text at the endpoints
HTTP Methods
A kind of OOP data = server.get('/index.html') you can think of HTTP servers as objects with methods. the original concept of the web was a distributed file system. that's not exactly how things panned out, but… data = server.get('/index.html') server.put('/myfile.txt', "hello!") server.delete('/yourfile.txt') // ha ha HTTP defines a number of methods which each server must implement. each request is like a method call. - this is just pseudocode. each method has several requirements in order to meet the specification.
Methods to GET data there are two: HEAD and GET. they're very similar. GET gets the file at a path… GET /some/path.html HTTP/1.1 Host: example.com User-Agent: Mozilla/5.0 Accept: */* HTTP/1.1 200 OK Content-Type: text/html Content-Length: 483 Server: nginx Connection: close <html> <head>... HEAD only gets the headers. HEAD /some/path.html HTTP/1.1 Host: example.com User-Agent: Mozilla/5.0 Accept: */* HTTP/1.1 200 OK Content-Type: text/html Content-Length: 483 Server: nginx Connection: close - HEAD might be used to just list the contents of a directory, or to prepare a download, or something like that.
Sending data to the server if you, say, fill out a form or upload a file… you might use the POST method. POST /login.html HTTP/1.1 Host: example.com User-Agent: Mozilla/5.0 Accept: */* Content-Length: 33 Content-Type: application/x-www-form-urlencoded username=jfb42&password=yeahright this bit looks a lot like a response from a server, but we're sending data instead. what the server actually does with the sent data is undefined… - we'd like servers to treat POST as an idempotent method, but it's not required to be. the response usually looks the same as a GET response.
Another way to send PUT is kind of like a variable assignment: it's idempotent. PUT /profile/avatar HTTP/1.1 ...skipping headers to save slide space... the server is supposed to only update the resource at the URL. so if we send this request 2+ times, nothing should change. on the other hand, each POST might create a new thing at some new URL. - nothing should change after the first PUT, that is.
A full list of methods there are only 9! but they're so abstract, it doesn't matter. Name Sends data? Safe? Idempotent? HEAD ✔ GET PUT DELETE POST PATCH OPTIONS maybe TRACE CONNECT safe methods will not alter the server's state. these are for sort of "administrative" tasks. - DELETE is for… uh, creating things? ;) - PATCH is for updating part of a resource. Like if you only want to change a few bytes in a very large file. - OPTIONS is used for negotiating what kinds of features the server supports. - TRACE is used to trace the route of the request/response. - CONNECT is used to establish a direct "tunnel" with the server.
Response codes these are 3-digit codes, and the first digit identifies the kind of code. 100s are informational: "btw here's some extra info." 200s are successful: "here's what you asked for." 200 OK: yay! 201 Created: your POST/PUT succeeded. 300s are redirections: "go here instead." 400s are client errors: "you screwed up." 401 Unauthorized: "you haven't logged in yet." 403 Forbidden: "you're logged in, but you can't access that." 404 Not Found: "idk what that is." 418 I'm a Teapot: "I'm a teapot." 500s are server errors: "I screwed up." 500 Internal Server Error: "ohh no."