HTTP CS587x Lecture Department of Computer Science Iowa State University
What to Cover WWW HTTP/1.0 Protocol highlights Problems HTTP/1.1 Highlights of improvement
World Wide Web (WWW) Core Components Servers Store files and execute remote commands Browsers (i.e., clients) Retrieve and display “pages” of content linked by hypertext Networks Send information back and forth upon request Problems How to identify an object How to retrieve an object How to interpret an object
Semantic Parts of WWW URI (Uniform Resource Identifier) protocol://hostname:port/directory/object ftp://popeye.cs.iastate.edu/welcome.txt Implementation: extend hierarchical namespace to include anything in a file system server side processing HTTP (Hyper Text Transfer Protocol) An application protocol for information sending/receiving HTML (Hypertext Markup Language) An language specification used to interpret the information received from server
HTTP Properties Request-response exchange Server runs over TCP, Port 80 Client sends HTTP requests and gets responses from server Synchronous request/reply protocol Stateless No state is maintained by clients or servers across requests and responses Each pair of request and response is treated as an independent message exchange Resource metadata Information about resources are often included in web transfers and can be used in several ways
HTTP Commands GET Transfer resource from given URL HEAD Get resource metadata (headers) only PUT Store/modify resource under a given URL DELETE Remove resource POST Provide input for a process identified by the given URL (usually used to post CGI parameters)
Response Codes of HTTP 1.0 2xx success 3xx redirection 4xx client error in request 5xx server error; can’t satisfy the request
Steps of Processing an HTTP Request The client 1. Contact its local DNS to find out the IP address of 2. Initiate a TCP connection on port Send the get request via the established socket GET /index.html HTTP/1.0 The server 4. Send its response containing the required file 5. Tell TCP to terminate connection The browser 6. Parse the file and display it accordingly 7. Repeat the same steps in the presence of any embedded objects
Server Response HTTP/ OK Content-Type: text/html Content-Length: 1234 Last-Modified: Mon, 19 Nov :31:20 GMT CS Home Page …
HTTP/1.0 Example Client Server Request file 1 Transfer file 1 Request file 2 Transfer file 2 Request file n Transfer file n Finish display page
HTTP Server Implementation public WebServerDemo(String[] args) { public static void main(String[] args) { ServerSocket ss = new ServerSocket(80); for (;;) { // accept connection Socket accept = ss.accept(); // Start a thread to process the request new Handler(accept).start(); }
HTTP Server Implementation class Handler extends Thread { // Handler for a HTTP request Socket socket; BufferedReader br; PrintWriter pw; public Handler(Socket _socket) { socket=_socket; } public void run() { br = new BufferedReader(new InputStreamReader(socket.getInputStream())); pw = new PrintWriter(new OutputStreamWriter(bos)); String line = br.readLine(); // Read HTTP request from user if(line.toUpperCase().startsWith("GET")) { // parse the string to find the file name // locate the file and send it back ::::: } //other commands: post, delete, put, etc. }
HTTP/1.0 Caching CLIENT GET request: If-modified-since – return a “not modified” response if resource was not modified since specified time Request header No-cache – ignore all caches and get resource directly from server SERVER Response header: Expires – specify to the client for how long it is safe to cache the resource
Issues with HTTP/1.0 Each resource requires a new connection Large number of embedded objects in a web page Many short lived connections Serial vs. parallel connections Serial connection downloads one object at a time (e.g., MOSAIC) causing long latency to display a whole page Parallel connection (e.g., NETSCAPE) opens several connections (typically 4) contributing to network congestion HTTP uses TCP as the transport protocol TCP is not optimized for the typical short-lived connections Most Internet traffic fit in 10 packets (overhead: 7 out of 17) Too slow for small object May never exit slow-start phase
Highlights of HTTP/1.1 Persistent connections Pipelined requests/responses Support for virtual hosting More explicit support on caching Internet Caching Protocol (ICP) Content negotiation/adaptation Range Request
Persistent Connections The basic idea was reducing the number of TCP connections opened and closed reducing TCP connection costs reducing latency by avoiding multiple TCP slow-starts avoid bandwidth wastage and reducing overall congestion A longer TCP connection knows better about networking condition (Why?) New GET methods GETALL GETLIST
Pipelined Requests/Responses Buffer requests and responses to reduce the number of packets Multiple requests can be contained in one TCP segment Note: order of responses has to be maintained Client Server Request 1 Request 2 Request 3 Transfer 1 Transfer 2 Transfer 3
Support for Virtual Hosting Problem – outsourcing web content to some company In HTTP/1.0, a request for has in its header only: GET /index.html HTTP/1.0 It is not possible to run two web servers at the same IP address, because GET is ambiguous HTTP/1.1 addresses this by adding “Host” header GET /index.html HTTP/1.1 Host:
Content Negotiation/Adaptation A resource may have more than one representation Different languages Different size of images, etc. Example GET /index.html HTTP/1.1 Host: Accept-Language: en-us, fr-BE Two approaches Agent-driven: the client receives a set of alternative representation of the response, chooses the best representation and indicates in the second request Server-driven: the server chooses the representation based on what is available at the server, the headers in the request messages, or information about the client, such as its IP
Range Request A user may want to load only some portion of content E.g., retrieve only the newly appended portion E.g., load some pages of a PDF file GET bigfile.html HTTP/1.1 Host: Range: Range: Range: 2000-
no-cache: forcible revalidation with origin server only-if-cached: obtain resource only from cache no-store: don’t allow caches to store request/response max-age: response’s should be no greater than this value max-stale: expired response OK but not older than staled value min-fresh: response should remain fresh for at least stated value no-transform: proxy should not change media type Cache-Control Request Directives
Cache-Control Response Directives public: OK to cache response anywhere private: response for specific user only no-cache: do not serve from cache without prior revalidation Must revalidate regardless of when the response becomes stale no-store: caches are not permitted to store response, request no-transform: proxy should not change media type must-revalidate: can be cached but revalidate if stale A file may be associated with an age (expiration) proxy-revalidate: force shared user agent caches to revalidate cached response max-age: response’s age should be no greater than this value s-maxage: shared caches use value as response’s maximum age (overide max-age)
Factors to Consider for Cache Replacement Cost of storing the resource (size) Cost of fetching the resource (size+distance) The time since the last modification of the resource The number of accesses to the resource in the past The probability of the resource being accessed in the near future May be a known priori or based on the past access pattern The heuristic expiration time If there is no server-specified expiration time, the cache decides on a heuristic expiration time. If no expired resource are available as candidates, then resource that are close to their expiration time are prioritized as candidates for replacement
Summary HTTP 1.0 HTTP 1.1
What covered so far HTTP DNS TCPUDP IP EthernetFDDITokenEtc.
FYI
HTTP Server (1) import java.io.*; import java.net.*; import java.util.*; public class WebServerDemo { protected String docroot; // Directory of HTML pages and other files protected int port; // Port number of web server protected ServerSocket ss; // Socket for the web server class Handler extends Thread { // Handler for a HTTP request protected Socket socket; protected PrintWriter pw; protected BufferedOutputStream bos; protected BufferedReader br; protected File docroot; public Handler(Socket _socket, String _docroot) throws Exception { socket=_socket; docroot=new File(_docroot).getCanonicalFile(); // Absolute dir of the filepath }
HTTP Server (2) public void run() { try { // Prepare our readers and writers br = new BufferedReader(new InputStreamReader(socket.getInputStream())); bos = new BufferedOutputStream(socket.getOutputStream()); pw = new PrintWriter(new OutputStreamWriter(bos)); String line = br.readLine(); // Read HTTP request from user socket.shutdownInput(); // Shutdown any further input if(line == null) { socket.close(); return; } if(line.toUpperCase().startsWith("GET")) { // Eliminate any trailing ? data, such as for a CGI GET request StringTokenizer tokens = new StringTokenizer(line," ?"); tokens.nextToken(); String req = tokens.nextToken(); String name; //... form a full filename if(req.startsWith("/") || req.startsWith("\\")) name = this.docroot+req; else name = this.docroot+File.separator+req; File file = new File(name).getCanonicalFile(); // Get absolute file path // Check to see if request doesn't start with our document root.... if(!file.getAbsolutePath().startsWith(this.docroot.getAbsolutePath())) { pw.println("HTTP/ Forbidden"); pw.println(); }
HTTP Server (3) // run() continued else if(!file.canRead()) { // No access pw.println("HTTP/ Forbidden"); pw.println(); } else if(file.isDirectory()) { // Directory, not file sendDir(bos,pw,file,req); } else { sendFile(bos, pw, file.getAbsolutePath()); } } else { // Unsupported command pw.println("HTTP/ Not Implemented"); pw.println(); } pw.flush(); bos.flush(); } catch(Exception e) { e.printStackTrace(); } try { socket.close(); } catch(Exception e) { e.printStackTrace(); } } // run() protected void sendFile(BufferedOutputStream bos, PrintWriter pw, String filename) throws Exception { try { BufferedInputStream bis = new BufferedInputStream(new FileInputStream(filename)); byte[] data = new byte[10*1024]; int read = bis.read(data); pw.println("HTTP/ Okay"); pw.println(); pw.flush(); bos.flush(); while(read != -1) { bos.write(data,0,read); read = bis.read(data); } bos.flush(); } catch(Exception e) { pw.flush(); bos.flush(); } }
HTTP Server (4) protected void sendDir(BufferedOutputStream bos, PrintWriter pw, File dir, String req) throws Exception { try { pw.println("HTTP/ Okay"); pw.println(); pw.flush(); pw.print(" Directory of " + req + " Directory of “ + req + " "); File[] contents=dir.listFiles(); for(int i=0;i<contents.length;i++) { pw.print(" <a href=\"" + req + contents[i].getName()); if(contents[i].isDirectory()) pw.print("/"); pw.print("\">"); if(contents[i].isDirectory()) pw.print("Dir -> "); pw.println(contents[i].getName() + " "); } pw.println(" "); pw.flush(); } catch(Exception e) { pw.flush(); bos.flush(); } } protected void parseParams(String[] args) throws Exception { switch(args.length) { // Check that a filepath has been specified and a port number case 1: case 0: System.err.println ("Syntax: "+this.getClass().getName()+" docroot port"); System.exit(0); default: this.docroot = args[0]; this.port = Integer.parseInt(args[1]); break; }
HTTP Server (5) public WebServerDemo(String[] args) throws Exception { System.out.println ("Checking for paramters"); parseParams(args); // Check for command line parameters System.out.print ("Starting web server "); this.ss = new ServerSocket(this.port); // Create a new server socket System.out.println ("OK"); for (;;) { // Forever Socket accept = ss.accept(); // Accept connection via server socket // Start a new handler instance to process the request new Handler(accept, docroot).start(); } // Start an instance of the web server public static void main(String[] args) throws Exception { WebServerDemo webServerDemo = new WebServerDemo(args); }