Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 HTTP – HyperText Transfer Protocol Representation and Management of Data on the Internet.

Similar presentations


Presentation on theme: "1 HTTP – HyperText Transfer Protocol Representation and Management of Data on the Internet."— Presentation transcript:

1 1 HTTP – HyperText Transfer Protocol Representation and Management of Data on the Internet

2 2 A Common Protocol In order for two remote machines to ‘ understand ’ each other they should –‘‘ speak the same language ’’ –coordinate their ‘‘ talk ’’ The solution is to use protocols Examples: –FTP – File Transfer Protocol –SMTP – Simple Mail Transfer Protocol –NNTP – Network News Transfer Protocol –HTTP – HyperText Transfer Protocol

3 3 Why HTTP was Needed? According to Tim Berners-Lee (1991), a protocol was needed with the following features: –A subset of the file transfer protocol –The ability to request an index search –Automatic format negotiation –The ability to refer the client to another server

4 4 File System Proxy Server Web Server HTTP Request HTTP Request HTTP Response HTTP Response www.cs.huji.ac.il:80 http://www.cs.huji.ac.il/~dbi

5 5 Department Proxy Server University Proxy Server Israel Proxy Server Web Server www.w3.org:80

6 6 Terminology User agent: client which initiates a request (browser, editor, web robot, … ) Origin server: the server on which a given resource resides (web server a.k.a. HTTP server) Proxy: acts as both a server and a client Gateway: server which acts as intermediary for other servers Tunnel: acts as a blind relay between two connections

7 7 Resources A resource is a chunk of information that can be identified by a URL (Universal Resource Locator) A resource can be –A file –A dynamically created page What we see on the browser can be a combination of some resources

8 8 Universal Resource Locator There are other types of URL ’ s –mailto: –news: protocol://host:port/path#anchor?parameters http://www.cs.huji.ac.il/~dbi/index.html#info http://www.google.com/search?hl=en&g=blabla protocol://host:port/path#anchor?parameters

9 9 In a URL Spaces are represented by “ + ” Characters such as &,+,% are encoded in the form “ %xx ” where xx is the ascii value in hexadecimal; For example, “ & ” = “ %26 ” The inputs to the parameters are given as a list of pairs of a parameter and a value: var1=value1&var2=value2&var3=value3

10 10 war&peace Tolstoy

11 11 http://www.google.com/search?hl=en&q=war%26peace+Tolstoy

12 12 Nesting in Page Index.html Left frameRight frame Jumping fish Fairy iconHUJI icon What we see on the browser can be a combination of some resources What we see on the browser can be a combination of some resources

13 13 Nested Objects Suppose a client accesses a page containing 10 inline images, how many sessions will be required to display the page completely? The answer is 11 HTTP sessions – why? Some browsers/servers support a feature called keep-alive which can keep the connection open until it is explicitly closed How can this help?

14 14 HTTP Session A basic HTTP session has four phases: 1.Client opens the connection (a TCP connection) 2.Client makes the request 3.Server sends a response 4.Server closes the connection

15 15 Stateless Protocol HTTP is a stateless protocol, which means that once a server has delivered the requested data to a client, the connection is closed, and the server retains no memory of what has just taken place What are the difficulties in working with a stateless protocol? How would you implement a site for buying some items? So why don ’ t we have states in HTTP?

16 16 Format of Request and Response An initial line Zero or more header lines A blank line (i.e., a CRLF by itself), and An optional message body (e.g., a file, query data, or query output) Note: CRLF = “ \r\n ” (usually ASCII 13 followed by ASCII 10)

17 17 Format of Request methodspURLspversion header crlf : value crlf header : value crlfcrlf Entity Body headers lines

18 18 Request Example GET /index.html HTTP/1.1 [CRLF] Accept: image/gif, image/jpeg [CRLF] User-Agent: Mozilla/4.0 [CRLF] Host: www.cs.huji.ac.il:80 [CRLF] Connection: Keep-Alive [CRLF] [CRLF]

19 19 Request Example GET /index.html HTTP/1.1 Accept: image/gif, image/jpeg User-Agent: Mozilla/4.0 Host: www.cs.huji.ac.il:80 Connection: Keep-Alive [blank line here] method request URL version headers

20 20 Request Methods GET returns the contents of the indicated document HEAD returns the header information for the indicated document –Useful for finding out info about a resource without retrieving it POST treats the document as an application and sends some data to it

21 21 More Methods PUT replaces the content of the document with some data DELETE deletes the indicated document TRACE invokes a remote loop-back of the request. The final recipient SHOULD reflect the message back to the client Usually these methods are not allowed

22 22 GET Request A request to get a resource from the Web The most frequently used method The request has no message body, but parameters can be sent in the request URL (i.e., the URL without the host part)

23 23 HEAD Request A HEAD request asks the server to return the response headers only, and not the actual resource (i.e., no message body) This is useful for checking characteristics of a resource without actually downloading it, thus saving bandwidth Used for testing hypertext links for validity, accessibility and recent modification

24 24 Post POST request can send data to the server POST is mostly used in form-filling –The data filled into the form are translated by the browser into some special format and sent to a program on the server using the POST command

25 25 Post (cont.) There is a block of data sent with the request, in the message body There are usually extra headers to describe this message body, like Content-Type: and Content-Length: The request URL is a URL of a program to handle the sent data, not a file The HTTP response is normally the output of a program, not a static file

26 26 Post Example Here's a typical form submission, using POST: POST /path/register.cgi HTTP/1.0 From: frog@cs.huji.ac.il User-Agent: HTTPTool/1.0 Content-Type: application/x-www-form-urlencoded Content-Length: 35 home=Ross+109&favorite+flavor=flies

27 27 Headers HTTP 1.0 defines 16 headers –none are required HTTP 1.1 defines 46 headers –one header (Host:) is required in requests that are sent to Web servers

28 28 Examples of Headers If an HTTP message includes a body, there are usually header lines in the message that describe the body, for example Content-Type: –gives the MIME-type of the data in the body, such as text/html or image/gif Content-Length: –gives the number of bytes in the body Why would we like to use these headers?

29 29 Another Header Example Last-Modified: –Gives the modification date of the resource that is being returned –It's used in caching and other bandwidth- saving activities –Greenwich Mean Time should be used and the format is Last-Modified: Fri, 31 Dec 1999 23:59:59 GMT

30 30 Host Header In HTTP 1.1 –A request that is sent to a Web server must include a Host header –A request that is sent to a proxy does not have to include the Host header In HTTP 1.0 –A request does not have to include the Host header How do we know who is the host when there is no host header?

31 31 Initial Line of a Response The initial line of a response is also called the status line The initial line consists of –HTTP version –response status code –reason phrase that describes the status code

32 32 Format of Response versionspstatus codespphrase header crlf :value crlf header:value crlfcrlf Entity Body headers lines status line

33 33 HTTP/1.0 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354 Hello World (more file contents)... Response Example

34 34 HTTP/1.0 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354 Hello World (more file contents)... Response Example version message body headers reason phrasestatus code

35 35 Status Code The status code is a three-digit integer, and the first digit identifies the general category of response: –1xx indicates an informational message –2xx indicates success of some kind –3xx redirects the client to another URL –4xx indicates an error on the client's part Yes, the system blames it on the client if a resource is not found (i.e., 404) –5xx indicates an error on the server's part

36 36 Status Code 1xx The 100 (Continue) Status –Allows a client to determine if the Server is willing to accept the request (based on the request headers) before the client sends the request body –The client ’ s request must have the header Expect: 100 (Continue) What is this good for?

37 37 Status Code 2xx Status codes 2xx – Success The action was successfully received, understood, and accepted Usually upon success a status code 200 and a message OK are sent

38 38 Status Code 3xx Status codes 3xx – Redirection Further action must be taken in order to complete the request The client is redirected to get the resource from another URL

39 39 3xx Codes 301 – Moved Permanently 302 – Moved Temporarily –The Location: header in the response gives the correct URL for either 301 or 302 –Most browsers retry the new location automatically 304 – Not Modified –This is a response to If-Modified-Since: header in a the request

40 40 Status Code 4xx Status codes 4xx – Client error The request contains bad syntax or cannot be fulfilled 404 File not found

41 41 4xx Codes 400 – Syntax Error 401 – Unauthorized 403 – Forbidden – “ permission denied ” 404 – Not Found

42 42 Status Code 5xx Status codes 5xx – Server error The server failed to fulfill an apparently valid request For example, 502 Bad gateway

43 43 5xx Codes 502 – Bad Gateway 503 – Service Unavailable –The response may include a Retry-After: header to indicate when the client might try again

44 44 Response Information Description of information in the headers: –ServerType of server –DateDate and time –Content-LengthNumber of bytes –Content-TypeMime type –Content-LanguageEnglish, for example –Content-EncodingData compression –Last-ModifiedDate when last modified –ExpiresDate when file becomes invalid

45 45 Manually Experimenting with HTTP >host www www.cs.huji.ac.il is a nickname for vafla.cs.huji.ac.il vafla.cs.huji.ac.il has address 132.65.80.39 vafla.cs.huji.as.il mail is handled (pri=10) by cs.huji.ac.il >telnet www.cs.huji.ac.il 80 Trying 132.65.80.39 … Connected to vafla.cs.huji.ac.il. Escape character is ‘ ^] ’.

46 46 Sending a Request >GET /~dbi/index.html HTTP/1.0 [blank line]

47 47 The Response HTTP/1.1 200 OK Date: Sun, 11 Mar 2001 21:42:15 GMT Server: Apache/1.3.9 (Unix) Last-Modified: Sun, 25 Feb 2001 21:42:15 GMT Content-Length: 479 Content-Type: text/html (html code … )

48 48 GET /~dbi/index.html HTTP/1.0 HTTP/1.1 200 OK HTML code

49 49 GET /~dbi/no-such-page.html HTTP/1.0 HTTP/1.1 404 Not Found HTML code

50 50 GET /index.html HTTP/1.1 HTTP/1.1 400 Bad Request HTML code Why is it a Bad Request? HTTP/1.1 without Host Header

51 51 Redirection Process Client asks for /foo, which is really a directory Server guesses that client meant /foo/ and so it replies with –302 Moved –Location: /foo/ Most browsers retry the new location automatically

52 52 Advantages of Redirection Simple Uses: Fix clients naming errors Complex Uses: Server can send client dynamically to a different page depending on –Who they are –What server is managing their session, etc. Note the changing URL in the browser

53 53 Why Location Header is Needed? Client must know of the new URL, so that it will convert relative URLs correctly Suppose there is a relative URL “ a.html ” –Client should convert it to “ /foo/a.html ” and not to “ /a.html ”

54 54 New Features in HTTP 1.1 Persistent connections Virtual Hosts –That is why the HOST header is required

55 55 Persistent vs. Non-Persistent Connection A page that we see on the browser can include more than one resource The resources are sent from the server to the client one after the other Sending the resources to the browser can be by using a persistent connection or by using a non-persistent connection

56 56 Non-Persistent Connection 1.Browser opens TCP connection to port 80 of server (handshake) 2.Browser sends http request message 3.Server receives request, locates object, sends response 4.Server closes TCP connection 5.Browser receives response, parses object 6.Browser repeats steps 1-5 for each embedded object

57 57 Persistent Connection 1.Browser opens TCP connection to port 80 of server (handshake) 2.Browser sends http request message 3.Server receives request, locates object, sends response 4.Browser receives response, parses object 5.Browser repeats steps 2-4 for each embedded object 6.TCP connection closes on demand or timeout

58 58 Advantages of Persistent Connection CPU time saved in routers and hosts HTTP requests and responses can be pipelined on a connection Network congestion is reduced Latency on subsequent requests is reduced What are the disadvantages of persistent connection?

59 59 Pipelines 2 types of persistent connections –without pipelining the client issues a new request only after the previous response has arrived – with pipelining client sends the request as soon as it encounters a reference multiple requests/responses –on the same IP packet, or –on back-to-back packets

60 60 Virtual Hosts With HTTP 1.1, one server at one IP address can be multi-homed: –“ www.cs.huji.ac.il ” and “ www.math.huji.ac.il ” can live on the same server –These are called virtual hosts –Without this mechanism, we have to use 2 different IP addresses It is like several people sharing one phone An HTTP request must specify the host name (and possibly port) for which the request is intended (this is done using the Host header)

61 61 Virtual Hosting (cont.) Virtual hosting –reduces hardware expenditures –extends the ability to support additional servers –makes load balancing and capacity planning much easier Without it –each host name requires a unique IP address, and we are quickly running out of IP addresses with the explosion of new domains

62 62 Caching Caching improves performance Eliminates the need to send requests in many cases (reduces network round- trips), using an expiration mechanism Eliminates the need to send full responses in other cases (reduces network bandwidth), using a validation mechanism

63 63 For example, how much traffic is reduced if it is not required to send the Google icon on each search result?

64 64 Client Caching client server cache Client GET /fruit/apple.gif Server responds with Last-Modified-Date:... Server returns either 304 Not Modified or object Client sends GET /fruit/apple.gif … If-Modified-Since: … Client caches object and last-modified-date

65 65 Network Caches client server proxy server GET /fruit/apple.gif

66 66 Benefit of Caching client 10Mbps LAN RR 1.5Mbps server 15 req/sec 100Kbits/req proxy server 40% hit rate Internet

67 67 Expiration Model Servers may provide an expiration time using the Expires header –By checking the expiration time, the cache can return a fresh response without contacting the server If the expiration time is not specified, the cache can heuristically estimate the expiration times (e.g., using header values, such as the Last-Modified time)

68 68 The Risk in Caching Response might not be “ semantically transparent ” –the response is different from what would have been returned by the origin server The cache should verify that the copy is fresh (i.e., expiration time has not passed) The copy is stale if it is not fresh

69 69 Validators A validator is any mechanism that may help in determining whether a copy is fresh or stale –A strong validator is, for example, a counter that is incremented whenever the resource is changed –A weak validator is, for example, a counter that is incremented only when a significant change is made For example, if the only change in the site is the number of visitors …

70 70 Using the Cache To check whether a copy is fresh, the cache must either –Use the expiration model, or –Compare the Last-Modified time or some validator with the origin server In the second case, the origin server either –Responds with the message 304(Not Modified), or –Sends a full response with the entity body

71 71 Some Cache-Control Headers Cache-control headers specify directives to the cache –Can be included in either requests or responses max-age=[seconds] – max amount of time that an object will be considered fresh s-maxage=[seconds] – only applies to proxy caches must-revalidate – tells caches that they must obey freshness information proxy-revalidate – only applies to proxy chaches

72 72 Old Way not to Use the Cache The Pragma: no-cache request header indicates that the request should not be satisfied from a cache Same as the no-cache cash-directive Directive applies to any recipient along the request/response chain Don’t use pragma – only applies to requests and exists just for compatibility with HTTP 1.0

73 73 HTTP/1.1 304 Not Modified Date: Fri, 31 Dec 1999 23:59:59 GMT [blank line] If-Modified-Since Header The If-Modified-Since : header is used with a GET request If the requested resource has been modified since the given date, the server returns the resource as it normally would (i.e., header is ignored) Otherwise, the server returns a 304 Not Modified response, including the Date: header, but with no message body

74 74 If-Unmodified-Since Header The If-Unmodified-Since: header can be used with any method If the requested resource has not been modified since the given date, the server returns the resource as it normally would Otherwise, the server returns a 412 Precondition Failed response HTTP/1.1 412 Precondition Failed [blank line]

75 75 If-None-Match Header ETag is a validator generated by the server (i.e., unique identifier) –It is part of the HTTP 1.1 specification If the ETag matches when an If-None- Match header is specified, then the object is really the same, and is not returned

76 76 Cooperative Caching

77 77 Cooperative Caching (cont.) Higher level cache (e.g., national cash) –larger user population –higher hit rates Multiple Web cashes which cooperate  Improve overall performance Cooperative cashes usually built from clusters –divide the traffic overhead –improve storage capacity

78 78 Cooperative Caching (cont.) Which cashes should be asked for a particular doc? Hash routing (of URLs) – an object will not be present in more than one cash

79 79 Hop by Hop HTTP/1.1 introduces the concept of hop-by-hop headers: –Message headers that apply only to a given connection, and not to the entire path –It enables much more power with the usage of proxies (cashes) –The headers give information that is directed to the proxies on the way to the client

80 80 Chunked Encoding Music, video clips and other multimedia content is sent to the client by chunks of data Among other problems, are the difficulties that –One data chunk varies in size and composition from the next –The size of the chunks may not be specified in the headers and so it may be difficult to recognize the end of a chunk –There should be a way to deal with ‘ infinite ’ responses in order to deal with data chunks that are very big (or with infinite files that are created dynamically)

81 81 Compression Most image formats (GIF, JPEG, MPEG) are precompressed Many other data types used in the Web are not precompressed Compression could save almost 40% of the bytes sent via HTTP There is a need for negotiating the type of encoding of the compressed resource

82 82 Compression (cont.) Client sends the header Accept-Encoding –The header indicates the content-encodings that the client can handle and the ones that the client prefers Server Sends –Content-Encoding header – for end-to-end encoding indication –Transfer-Encoding header – for hop-to-hop encoding indication (supported only in HTTP/1.1)

83 83 Authentication Many sites require users to provide a username and password in order to access the documents housed on the server This requirement provides a mechanism for keeping track of users (more than just a security mechanism)

84 84 Authentication Client sends ordinary request message Server responds with –401 Authorization Required status code –WWW-Authenticate header which specifies how to perform authentication Client resends the requested message, but this time including the Authorization header (e.g., user- name & password) Client continues to add this header for each following request to that server

85 85 Authentication www.cs.huji.ac.il client server GET ~dbi/getGrade.jspAuthorization Required GET ~dbi/getGrade.jsp Authorization: user snoopy:passwordofsnoopy Response

86 86 Cookies Cookies are an alternative way to identify browsers Cookies are essentially small files that are saved in the file system of the client A cookie can store information on the client and thus helps in recognizing the client and getting required information about the client How can cookies solve the problem that HTTP is stateless?

87 87 Cookies Server response includes the Set-cookie header that has the attributes –name = VALUE –expires = DATE STRING –domain = DOMAIN NAME –path = PATH –secure Clients returns a cookie only to a server with matching URL (the server that put the cookie)

88 88 Cookies Example: –Client contacts a web site for the first time –Server response includes the header: Set-cookie : 1678453 –Client stores the cookie value and the server name in a special “ cookie file ” –For each further request for that server, the client will add the header Cookie : 1678453

89 89 Cookies (cont.) Usage: –Server requires authentication, but doesn ’ t want to hassle a user with a user-name and password –Remembering user ’ s preferences for advertising –Cookies enable creating a virtual shopping cart Problems –Users who access the same site from different machines –Privacy

90 90 Links For specifications and additional information: –http://www.w3.org/Protocols/http://www.w3.org/Protocols/ –http://www.w3.org/Protocols/Specs.htmlhttp://www.w3.org/Protocols/Specs.html –http://www.jmarshall.com/easy/http/http://www.jmarshall.com/easy/http/ –http://wdvl.com/Internet/Protocols/HTTP/art icle.htmlhttp://wdvl.com/Internet/Protocols/HTTP/art icle.html


Download ppt "1 HTTP – HyperText Transfer Protocol Representation and Management of Data on the Internet."

Similar presentations


Ads by Google