1 Lecture #7-8 HTTP – HyperText Transfer Protocol HAIT Summer 2005 Shimrit Tzur-David
2 Common Protocols In order for two remote machines to “understand” each other they should –‘‘speak the same language’’ –coordinate their ‘‘talk’’ The solution is to use protocols Examples: –FTP – File Transfer Protocol –SMTP – Simpl Transfer Protocol –HTTP – HyperText Transfer Protocol
3 File System Proxy Server Web Server HTTP Request HTTP Request HTTP Response HTTP Response
4 Department Proxy Server University Proxy Server Israel Proxy Server Web Server
5 Terminology User agent: client which initiates a request (browser, editor, Web robot, …) Origin server: the server on which a given resource resides (Web server a.k.a. HTTP server) Proxy: acts as both a server and a client
6 Resources A resource is a chunk of information that can be identified by a URL (Universal Resource Locator) A resource can be –A file –A dynamically created page What we see on the browser can be a combination of some resources
7 Universal Resource Locator There are other types of URL’s –mailto: –news: protocol://host:port/path#anchor?parameters protocol://host:port/path#anchor?parameters
8 In a URL Spaces are represented by “+” Characters such as &,+,% are encoded in the form “%xx” where xx is the ascii value in hexadecimal; For example, “&” = “%26” The inputs to the parameters are given as a list of pairs of a parameter and a value: var1=value1&var2=value2&var3=value3
9 war&peace Tolstoy
10
11 Web Servers A Web Server is an implementation of HTTP –It runs on some machine Serving dynamic Web content requires some server-side programming Programmer must understand HTTP and code must manipulate HTTP messages
12 Important Features of HTTP Persistent connection (in HTTP 1.1) Stateless Proxy caching Content negotiation –For example, the client and server can agree on a gzip encoding of the HTML page
13 An HTTP 1.0 Session A basic HTTP session has four phases: 1.Client opens the connection (a TCP connection) 2.Client makes a request 3.Server sends a response 4.Server closes the connection
14 Nesting in Page Index.html Left frameRight frame Jumping fish Fairy iconHUJI icon What we see on the browser can be a combination of several resources What we see on the browser can be a combination of several resources
15 Persistent Connections in HTTP 1.1 If a page has 10 inline images, then 11 HTTP 1.0 sessions are needed to display the page completely in a browser –Each session requires opening a new TCP/IP connection In HTTP 1.1, one persistent TCP/IP connection is sufficient –It takes less time to see the whole page
16 Stateless Protocol HTTP is a stateless protocol –Once a server has delivered the requested data to a client, the server retains no memory of what has just taken place (even if the connection is persistent) Server-side programming tools must provide a mechanism for maintaining states
17 The Format of HTTP Requests and Responses An initial line –In a request, the first line is a method –In a response, the first line is a status code Zero or more header lines A blank line, and An optional message body (e.g., a file, query data, or query output)
18 Headers HTTP 1.0 defines 16 headers –None are required HTTP 1.1 defines 46 headers –One header (Host:) is required in requests that are sent to Web servers –A response does not have to include any header How do we know who is the host when there is no host header?
19 Sending a Request > telnet 80 >GET /~dbi/index.html HTTP/1.0 [blank line]
20 The Response HTTP/ OK Date: Sun, 11 Mar :42:15 GMT Server: Apache/1.3.9 (Unix) Last-Modified: Sun, 25 Feb :42:15 GMT Content-Length: 479 Content-Type: text/html (html code …)
21 GET /~dbi/index.html HTTP/1.0 HTTP/ OK HTML code
22 GET /~dbi/no-such-page.html HTTP/1.0 HTTP/ Not Found HTML code
23 GET /index.html HTTP/1.1 HTTP/ Bad Request HTML code Why is it a Bad Request? HTTP/1.1 without Host Header
24 HTTP Requests
25 The Format of a Request methodspURLspversion header crlf : value crlf header : value crlfcrlf Entity Body headers lines
26 Request Example GET /index.html HTTP/1.1 [CRLF] Accept: image/gif, image/jpeg [CRLF] User-Agent: Mozilla/4.0 [CRLF] Host: [CRLF] Connection: Keep-Alive [CRLF] [CRLF]
27 Request Example GET /index.html HTTP/1.1 User-Agent: Mozilla/4.0 Host: Connection: Keep-Alive [blank line here] method request URL version headers
28 Common Request Methods GET returns the contents of the indicated URL HEAD returns the header information for the indicated URL –Useful for finding out info about a URL without actually retrieving it (less time) POST treats the URL as an application and send some data to it –Could be used to process a form
29 GET Request A request to get a resource from the Web The most frequently used method The request has no message body, but parameters can be sent in the request URL
30 HEAD Request A HEAD request asks the server to return the response headers only, and not the actual resource (i.e., no message body) This is useful for checking characteristics of a resource without actually downloading it, thus saving bandwidth Can be used for testing hypertext links for validity, accessibility and recent modification
31 Post Request POST request can send data to the server POST is mostly used in form-filling –The data filled into the form are translated by the browser into some special format and sent to a program on the server using the POST command
32 Post Request (cont.) There is a block of data sent with the request, in the message body There are usually extra headers to describe this message body, like Content-Type: and Content-Length: The request URL is a URL of a program to handle the sent data, not a file The HTTP response is normally the output of a program, not a static file
33 Post Example Here's a typical form submission, using POST: POST /path/register.cgi HTTP/1.0 From: User-Agent: HTTPTool/1.0 Content-Type: application/x-www-form-urlencoded Content-Length: 35 home=Ross+109&favorite+flavor=flies
34 HTTP 1.1 Request Headers The common request headers of HTTP 1.1 are described in the following slides –Accept –Accept-Encoding –Authorization –Connection –Cookie –Host –If-Modified-Since –Referer –User-Agent
35 Accept Request Headers Accept –Specifies the MIME types that the client can handle (e.g., text/html, image/gif) –Server can send different content to different clients Accept-Encoding –Indicates encodings (e.g., gzip) client can handle
36 Authorization Request Header Authorization –User identification for password-protected pages –Instead of HTTP authorization, use HTML forms to send username/password and store in state (e.g., session object )
37 Connection Request Header Connection –Connection: keep-alive means that the browser can handle persistent connection –Keep-alive is the default in HTTP 1.1 –In a persistent connection, the server can reuse the same socket over again for requests that are very close together from the same client –Connection: close means that the connection is closed after each request
38 Content-Length Request Header This header is only applicable to POST requests It specifies the size of the POST data in bytes
39 Cookie Request Header Gives cookies previously sent to the client
40 Host Request Header Indicates host and port as given in the original URL –Required in HTTP 1.1
41 If-Modified-Since Request Header This header indicates that client wants the page only if it has been changed after the specified data If-Unmodified-Since is the reverse of If- Modified-Since –It is used for PUT requests (“update this document only if nobody else has changed it since I generated it”)
42 Referer Request Header URL of referring Web page Useful for tracking traffic It is logged by many servers Can be easily spoofed Note the spelling error – correct spelling is Referrer, but use Referer
43 User-Agent Request Header The value of this header is a string identifying the browser making the request Use sparingly Again, can be easily spoofed
44 HTTP Responses
45 The Format of a Response versionspstatus codespphrase header crlf : value crlf header : value crlfcrlf Entity Body headers lines status line
46 The Initial Line of a Response The initial line of a response is also called the status line The initial line consists of –HTTP version –response status code –reason phrase that describes the status code
47 HTTP/ OK Date: Fri, 31 Dec :59:59 GMT Content-Type: text/html Content-Length: 1354 Hello World (more file contents)... Response Example
48 HTTP/ OK Date: Fri, 31 Dec :59:59 GMT Content-Type: text/html Content-Length: 1354 Hello World (more file contents)... Response Example version message body headers reason phrase status code
49 Status Codes in Responses The status code is a three-digit integer, and the first digit identifies the general category of response: –1xx indicates an informational message –2xx indicates success of some kind –3xx redirects the client to another URL –4xx indicates an error on the client's part Yes, the system blames it on the client if a resource is not found (i.e., 404) –5xx indicates an error on the server's part
50 Status Codes 1xx The 100 (Continue) Status –Allows a client to determine if the Server is willing to accept the request (based on the request headers) before the client sends the request body –The client’s request must have the header Expect: 100 (Continue) What is it good for?
51 Status Codes 2xx Status codes 2xx – Success The action was successfully received, understood, and accepted Usually upon success a status code 200 and a message OK are sent This is the default
52 More 2xx Codes 201 (Created) –Location header gives the URL 202 (Accepted) –Processing is not yet complete 204 (No Content) –Browser should keep displaying previous document
53 More 2xx Codes 205 (Reset Content) –No new document, but the browser should reset the document view –It is used to force browsers to clear fields of forms –New in HTTP 1.1
54 Status Codes 3xx Status codes 3xx – Redirection Further action must be taken in order to complete the request The client is redirected to get the resource from another URL
55 More 3xx Codes 301 – Moved Permanently –The new URL is given in the Location header –Browsers should automatically follow the link to the new URL 302 – Moved Temporarily –In HTTP 1.1 “Found” instead of “Moved Temporarily” But “Moved Temporarily” is still used –Similar to 301, except that the URL given in the Location header is temporary –Most browsers treat 301 and 302 in the same way
56 More 3xx Codes 303 – See Other –Similar to 301 and 302, except that if the original request was POST, the new document (given in the Location header) should be retrieved with GET –New in HTTP 1.1
57 More 3xx Codes 304 – Not Modified –This is a response to the If-Modified-Since request header –If the page has been modified, then it should be returned with a 200 (OK) status code
58 More 3xx Codes 307 – Temporary Redirect –New URL is given in the Location header –Only GET but not POST requests should follow the new URL –In 303 (See Other), both GET and POST requests follow the new URL –New in HTTP 1.1
59 Status Codes 4xx Status codes 4xx – Client error The request contains bad syntax or cannot be fulfilled 404 File not found
60 4xx Codes 400 – Bad Request –Syntax error in the request 401 – Unauthorized 403 – Forbidden –“permission denied” to the server to access the page 404 – Not Found
61 Status Codes 5xx Status codes 5xx – Server error The server failed to fulfill an apparently valid request For example, 502 Bad gateway
62 5xx Codes 500 – Internal Server Error 501 – Not Implemented 502 – Bad Gateway 503 – Service Unavailable –The response may include a Retry-After header to indicate when the client might try again 505 – HTTP Version Not Supported –New in HTTP 1.1
63 Response Headers
64 The Purposes of Response Headers Give forwarding location Specify cookies Supply the page modification date Instruct the browser to reload the page after a designated interval Give the document size so that persistent (keep-alive) connection can be used Designate the type of document being generated Etc.
65 Cache-Control (1.1) and Pragma (1.0) Response Header A no-cache value prevents proxies and browsers from caching the page More on this header later, when we will talk about caching Don’t use the Pragma header in responses –The meaning of “Pragma: no-cache” is only specified for requests A safer approach is to use both the Pragma header and the Cache-Control header with the no-cache value
66 Connection Response Header A value of close instructs the client not to use persistent HTTP connections In HTTP 1.1, persistent connections are the default
67 Content-Length Response Header It specifies the number of bytes in the response It is needed only if a persistent (keep-alive) connection is used
68 Content-Type Response Header It gives the MIME (Multipurpose Internet Mail Extension) type of the response document MIME types are of the form: –maintype/subtype for officially registered types –maintype/x-subtype for unregistered types Examples: text/html, image/jpeg, application/x-gzip
69 Expires Response Header It gives the time at which the document should be considered out-of-date and thus should no longer be cached It can be used, for example, if the document is valid only for a short time
70 Last-Modified Response Header This header gives the time when the document was last changed The date that is given in the Last-Modified response header can be used in later requests in the If-Modified-Since request header
71 Location Response Header This header should be included in all responses that have a 3xx status code The browser automatically retrieves the document from the new location that is given as the value of this header
72 Refresh Response Header The number of seconds until the browser should reload the page Can also include the URL of a document that should be loaded (instead of the original document) This header is not part of HTTP 1.1 but is an extension supported by Netscape and Internet Explorer
73 Retry-After Response Header This header can be used in conjunction with a 503 (Service Unavailable) response to tell the client how soon it can repeat its request
74 Set-Cookie Response Header This header specifies a cookie associated with the page; it has several fields: Each cookie requires a separate header Servlets should use the special-purpose addCookie method of HttpServletRepsonse instead of setting the value of this header directly This header is not part of HTTP 1.1 but is widely supported Set-Cookie: name=value; expires= value; path= value; domain= value; secure
75 WWW-Authenticate Response Header This header is always included with a 401 (Unauthorized) status code It gives the authentication scheme(s) and parameters applicable to the URL that was requested
76 Server Response Header Indicates the name of the vendor of the HTTP server