Hyper Text Transfer Protocol

Hyper Text Transfer Protocol
(HTTP)

HTTP HTTP defines how Web pages are requested and served on the Internet Early servers and browsers used an ad-hoc approach A standardized protocol, called HTTP/1.0, was derived from this The earlier approach is now called HTTP/0.9 Later, HTTP/1.0 was extended to HTTP/1.1 The protocol versions are upwardly compatible servers and browsers which can handle HTTP/1.1 can also handle HTTP/1.0 and HTTP/0.9

History: “HTTP/0.9” HTTP/0.9 was very simple:
A browser would send a request like this to a server: GET /hobbies.html In response, the server would send the contents of the requested file. Only GET requests were supported Only a file path and name could appear in a GET request The response had to be a HTML document.

History (contd.) Different browsers/servers soon extended this basic scheme in various ways To achieve some standardization, the HTTP/1.0 protocol was specified, in 1996, in a document called RFC1945 (for historical reasons, an Internet standard spec is called a Request for Comment or RFC) This was soon extended to HTTP/1.1, in RFC2068, released in January 1997 An update to RFC2068 was produced in June 1999, as RFC2616 Various other protocols, based on HTTP, have been produced from time-to-time we will see a “cookie” protocol, based on HTTP, which was specified in February 1997, in RFC2109

How HTTP Works HTTP sits on TCP, which, in turn, sits on IP
Usually, HTTP servers are configured to listen to TCP/IP Port 80 although sometimes a different port is used, particularly if two HTTP servers are running on one machine You can see how HTTP works by pretending to be a browser yourself Using telnet to connect to a server, you can issue a request and see the response

Example If you were to point a browser at the URL
you would get a HTML home-page which provides links to various pages for students, etc. The server on student.cs.ucc.ie uses the standard HTTP port, Port 80, so you can get the same page by telnetting to Port 80 on student.cs.ucc.ie and typing a GET request

Connecting to the HTTP server on student.cs.ucc.ie
On any machine, say interzone, specify the address and port in a telnet command: interzone.ucc.ie> telnet student.cs.ucc.ie 80 You will get the following response: Trying Connected to student.cs.ucc.ie. Escape character is '^]'. The HTTP server is now listening

Requesting the home page
Issue the following HTTP/1.0 request, noting that you must type two carriage returns: GET / HTTP/ [RETURN] [RETURN] The response consists of a status line, a sequence of headers and the requested home page Then you are told that the telnet connection was closed by the server, as you will see on the next slide

The reply to your request:
The server’s response: HTTP/ OK ... Content-Type: text/html <HTML> </HTML> Then your local telnet program tells you that the connection was closed by the server: Connection closed by foreign host. interzone.ucc.ie>

Getting a different page:
Consider the page whose URL is Telnet to the server: interzone.ucc.ie> telnet student.cs.ucc.ie 80 When the server is listening, ask for the page like this: GET /cs1064/jabowen/ HTTP/1.0 [RETURN] [RETURN]

What was going on above:
Once connected to a HTTP server, we can send a HTTP request line, optionally followed by request headers. In the cases above, GET / HTTP/1.0 and GET /cs1064/jabowen/ HTTP/1.0 were request lines Each request line was terminated by pressing [RETURN] In each case, the second [RETURN] marked the end of an empty list of request headers

GET requests In GET / HTTP/1.0 In GET /cs1064/jabowen/ HTTP/1.0
the / is the resource the client wants to get the HTTP/1.0 tells the server that the client is using the HTTP/1.0 protocol In GET /cs1064/jabowen/ HTTP/1.0 the /cs1064/jabowen/ is the resource the client wants to get In each case, the server responds by sending a status line, a number of response headers and the content of the requested resource.

Consider the response:
HTTP/ OK ... Content-Type: text/html <HTML> </HTML> The first line, HTTP/ OK , is a status line The next few lines, ending in the line Content-Type: text/html, are header lines The lines bounded by <HTML> and </HTML> form the content of the requested resource.

HEAD requests HEAD requests were new in HTTP/1.0
A HEAD request is similar to a GET, the only difference being the use of the word HEAD instead of the word GET, for example: HEAD /cs1064/jabowen/ HTTP/1.0 [RETURN] [RETURN] The server sends the same status line and the same response headers as if it had received a GET request, but does not send the actual content of the resource mentioned in the request. Thus, human clients can use HEAD requests to access easily information about a resource on a server without being overwhelmed by the mass of detail that would be received if the resource content were sent in the response

Example HEAD request Suppose, for example, we wanted to see information about such as its size, when it was last edited, etc. We can send the request HEAD /cs1064/jabowen/ HTTP/1.0

Response to example HEAD request:
HTTP/ OK Date: Wed, 13 Dec :21:35 GMT Server: Apache/ (Unix) PHP/4.0.3pl1 Last-Modified: Thu, 07 Dec :16:18 GMT ETag: " c6-3a2f8da2" Accept-Ranges: bytes Content-Length: 10694 Connection: close Content-Type: text/html

Analysis of response: The first line in the response HTTP/1.1 200 OK
is the status line in which HTTP/1.1 indicates that the server can use HTTP/1.1 (although it can accept requests in earlier HTTP forms) 200 is a code which indicates the status the request was given by the server OK is an English language phrase giving the meaning of the status code The other lines in the response give information either about the server or the resource:

Analysis (contd.) Date: Wed, 13 Dec 2000 12:21:35 GMT
gives date/time of the response Server: Apache/ (Unix) PHP/4.0.3pl1 gives details on server Last-Modified: Thu, 07 Dec :16:18 GMT says when resource was last modified ETag: " c6-3a2f8da2" provides a supposedly-unique string to identify this entity Accept-Ranges: byte says that this server could serve up pieces of this resource, pieces specifiable to the nearest byte Content-Length: 10694 gives the size of the resource Connection: close says that the server does not regard this as a persistent connection Content-Type: text/html gives the type of data in the resource

Another example Suppose, we wanted to learn about the resource with URL We can send the request HEAD /cs1064/jabowen/vh.gif HTTP/1.0 Response is: HTTP/ OK Date: Wed, 13 Dec :23:04 GMT Server: Apache/ (Unix) PHP/4.0.3pl1 Last-Modified: Fri, 24 Nov :46:00 GMT ETag: " a1e54f8" Accept-Ranges: bytes Content-Length: 865 Connection: close Content-Type: image/gif

A (fairly) detailed description
HTTP/1.1 A (fairly) detailed description

We have just seen some example HTTP/1.0 interactions
The same kinds of concepts we saw in these interactions will arise as we examine HTTP/1.1 in more detail The versions of HTTP have a great deal in common, so, in what follows, much of what is said will be true of all three versions Therefore,, any mention of just “HTTP” will mean that the statement applies to HTTP/0.9, HTTP/1.0 and HTTP/1.1

Overall Operation of HTTP
The HTTP protocol is a request/response protocol. request An HTTP message sent by a client to a server response An HTTP message sent by a server to a client which has made a request. client A program that establishes connections for the purpose of sending requests. server A program that accepts connections in order to service requests by sending back responses. As we shall see, a program may act as both a client and a server.

Message from a client: A client sends, over a connection, to a server
a request line in the form of a request method, a URI (Uniform Resource Identifier), and a protocol version, possibly followed by a message containing request modifiers, information about the client, and (possibly) body content.

Response from a server:
The server responds with a status line, in the form of the message's protocol version, a success or error code and an English phrase explaining the code possibly followed by a message containing server information, information about the entity in the body content (if any) and (possibly) body content.

HTTP Communication is started by a user agent and
Most communication is started by a user agent and consists of a request to be applied to a resource on some origin server. user agent A client (browser, spider, etc.) which initiates a request. resource A data object or service that can be identified by a URI. origin server The server on which a resource resides or is to be created.

Simple communication ====request chain ==========>
Involves single connection between user agent (UA) and origin server (O) This connection is denoted, in diagrams on this and future slides, by ====request chain ==========> UA O <=========response chain====

More complicated case Intermediaries present in request/response chain. ====request chain =======================> UA A B C O <======================response chain==== Above, 3 intermediaries (A, B, and C) lie between user agent and origin server. Intermediaries act as both clients and servers Request or response message that travels the whole chain passes through 4 separate connections: UA-A connection; A-B connection; B-C connection; C-O connection

Simple versus complicated
Distinction is important because some HTTP options may apply only to the connection with the nearest neighbour, only to the end-points of the chain, or to all connections along the chain.

3 forms of intermediary proxy, an agent which gateway, an agent which
receives a request for a resource whose URI is in its absolute form and, if necessary, rewrites all or part of the message and forwards the reformatted request toward the server identified by the URI. gateway, an agent which acts as a translation interface to a server for another protocol, such as WAP, etc. tunnel, an agent which acts as a relay point between two connections without changing messages; tunnels are used, for example, in security firewalls

Caching

Caching User agents, proxies and gateways (but not tunnels) may use a local cache to handle requests, instead of forwarding them on to an origin server A request/response chain is shortened if one of the parties along the chain has a cached response applicable to the request.

Example Network topology
The example caching scenarios in the next few slides will use this network: UA3____________D | UA2_____ | | | UA1_____A______B________C_________O

Caching Example 1 ====request chain ====================>
UA A B C O <==================response chain===== In the example above: the user has made a request for a resource on origin server O neither UA1 nor any of the proxies A, B or C has an appropriate cached response so the request has been forwarded all the way to O Four connections are involved in servicing the request

Caching Example 2 request chain UA1…………….... A ……... B …….. C …… O
response In the example above: the user has repeated the same request for a resource on O UA1 has a cached response to the earlier request and gives this to the user without sending the request anywhere No connection is involved in servicing the request

Caching Example 3 ===request chain => UA2 -----------------
UA1 …..…… A …….. B …….. C ……... O <=response chain== In the example above: the user at UA2 has requested the same resource on origin server O that was earlier requested by the user at UA1 UA2 has forwarded the request to proxy A proxy A has an appropriate cached response, from when it serviced the earlier request from UA1 Only one connection is involved in servicing the request

Caching Example 4 ===request chain ====> UA3 ---------- D --------
| UA1 …..…... A …….. B …….. C ……... O <===response chain=== In the example above: the user at UA3 has requested the same resource on origin server O that was earlier requested by the user at UA1 UA3 has forwarded the request to proxy D, which has forwarded it to proxy B proxy B has an appropriate cached response, from when it serviced the earlier request from UA1 Two connections are involved in servicing the request

To cache or not? Not all responses are usefully cacheable
As we will see later, some requests may contain modifiers which place special requirements on cache behavior.

Caching/Proxy architectures
A wide variety of cache and proxy architectures/configurations exist, including: national hierarchies of proxy caches to save inter-national and/or inter-continental bandwidth, systems that broadcast or multicast cache entries, organizations that distribute subsets of cached data via CD-ROM, and so on.

Connections

Temporary Connections
In most implementations of HTTP/1.0, a server closed a connection after it had serviced the request received on that connection: We saw this earlier, when the server on student.cs.ucc.ie closed the telnet connection that we had established, after it had sent its response to the HTTP/1.0 GET request we had sent The use of inline images, sound files, etc., in web pages often requires a client to make multiple requests of the same server when loading one document Thus the temporary connections provided by HTTP/1.0 meant that loading even one web page required many separate TCP connections (one to to fetch each inline image, each sound file etc.) This imposed a significant unnecessary load on HTTP servers and caused congestion on the Internet.

Advantages of Persistent Connections
Persistent HTTP connections offer a number of advantages: By opening and closing fewer TCP connections, CPU time is saved HTTP requests and responses can be pipelined on a connection, allowing a client to make multiple requests without waiting for each response Network congestion is reduced by reducing the number of packets caused by TCP opens, Latency on subsequent requests is reduced since there is no time spent in TCP's connection-opening handshake.

Persistent Connections in HTTP/1.1
Unlike HTTP/1.0 and earlier, persistent connections are the default behavior of any HTTP/1.1 connection. This means that, in HTTP/1.1, when a connection has been opened to service a request, it is kept open for further possible requests from the same client This is true even if the initial request triggered an error response from the server But, when no further request has been received after some time-out period, the server may close the connection However, a client can indicate, when making a request, that it wants the connection closed after the request is serviced

Connection Persistency Negotiation
HTTP/1.1 provides a mechanism by which a client and a server can signal the close of a TCP connection. the Connection: header field. If a HTTP/1.1 client wants a connection closed after it receives a response to its request, it should include, in the request, a Connection: header containing the token "close" . Similarly, if a HTTP/1.1 server intends to close a connection closed after it sends a response to a request, it should include, in the response, a Connection: header containing the token "close" . If either the client or the server sends the close token in a Connection: header, that request becomes the last one for the connection.

Example 1: Introduction
A human, using a telnet client, sends a HTTP/1.0 request to a HTTP/1.1 server The server assumes that the client, because it is using HTTP/1.0, cannot handle persistent connections and, in its response, signals its intention to close the connection After printing the response, the telnet client says that the connection was closed by the foreign host

Example 1 interzone.ucc.ie> telnet student.cs.ucc.ie 80
Trying Connected to student.cs.ucc.ie. Escape character is '^]'. HEAD /cs1064/jabowen/ HTTP/1.0 HTTP/ OK Date: Sat, 06 Jan :56:44 GMT Server: Apache/ (Unix) PHP/4.0.3pl1 Last-Modified: Wed, 20 Dec :34:46 GMT ETag: "2160-2dee-3a409956" Accept-Ranges: bytes Content-Length: 11758 Connection: close Content-Type: text/html Connection closed by foreign host.

A human, using a telnet client, sends a HTTP/1.1 request to a HTTP/1.1 server The server assumes that the client, because it is using HTTP/1.1, wants a persistent connection thus, there is no Connection: header in the response The telnet client prints the response for the human to see After a significant delay (the time-out period), the server realizes the client has no further request and closes the connection The telnet client then tells the human that the connection was closed by the foreign host

Example 2: interzone.ucc.ie> telnet student.cs.ucc.ie 80
Trying Connected to student.cs.ucc.ie. Escape character is '^]'. HEAD /cs1064/jabowen/ HTTP/1.1 Host: student.cs.ucc.ie HTTP/ OK Date: Sat, 06 Jan :57:08 GMT Server: Apache/ (Unix) PHP/4.0.3pl1 Last-Modified: Wed, 20 Dec :34:46 GMT ETag: "2160-2dee-3a409956" Accept-Ranges: bytes Content-Length: 11758 Content-Type: text/html A time-out period elapses before server closes connection Connection closed by foreign host.

A human, using a telnet client, sends a HTTP/1.1 request to a HTTP/1.1 server The client knows that, because it is using HTTP/1.1, the server will think it wants a persistent connection Since the client does not want a persistent connection it sends a Connection: header with a close token in the request Seeing this, the server indicates its intention to close the connection immediately, by including a Connection: header with a close token in its response The telnet client prints the response for the human to see and, immediately thereafter, tells the human that the connection was closed by the foreign host

Example 3: interzone.ucc.ie> telnet student.cs.ucc.ie 80
Trying Connected to student.cs.ucc.ie. Escape character is '^]'. HEAD /cs1064/jabowen/ HTTP/1.1 Host: student.cs.ucc.ie Connection: close HTTP/ OK Date: Sat, 06 Jan :57:58 GMT Server: Apache/ (Unix) PHP/4.0.3pl1 Last-Modified: Wed, 20 Dec :34:46 GMT ETag: "2160-2dee-3a409956" Accept-Ranges: bytes Content-Length: 11758 Content-Type: text/html Connection closed by foreign host. (No time-out delay before this from telnet client)

Pipelining Requests A client that supports persistent connections may "pipeline" its requests (i.e., send multiple requests without waiting for each response). A server must send its responses to those requests in the same order that the requests were received.

A human, using a telnet client, sends two HTTP/1.1 requests to a HTTP/1.1 server, sending the second request before it even receives a response to the first request Since he has only two requests, the client sends a Connection: header with a close token in the second request The server responds to both requests and, because of the close token in the 2nd request, indicates its intention to close the connection immediately, by including a Connection: header with a close token in its response to the 2nd request. The telnet client prints the responses for the human to see and, immediately thereafter, tells the human that the connection was closed by the foreign host

Example 4: the pipelined requests
interzone.ucc.ie> telnet student.cs.ucc.ie 80 Trying Connected to student.cs.ucc.ie. Escape character is '^]'. HEAD HTTP/1.1 Host: student.cs.ucc.ie HEAD HTTP/1.1 Connection: close

Example 4: the sequence of responses
HTTP/ OK Date: Wed, 31 Jan :01:41 GMT Server: Apache/ (Unix) PHP/4.0.3pl1 Last-Modified: Thu, 25 Jan :26:32 GMT ETag: "2160-2e25-3a702988" Accept-Ranges: bytes Content-Length: 11813 Content-Type: text/html Last-Modified: Wed, 20 Dec :42:39 GMT ETag: "13d3a-2b60-3a40a93f" Content-Length: 11104 Connection: close Connection closed by foreign host. (No time-out delay before this message from telnet client)

Pipelining Requests (contd.)
Clients which assume persistent connections and pipeline immediately after connection establishment should be prepared to retry their connection if the first pipelined attempt fails. If a client does such a retry, it must NOT pipeline before it knows the connection is persistent. Clients must also be prepared to resend their requests if the server closes the connection before sending all of the corresponding responses.

Pipelining Requests (contd.)
Care must be taken when pipelining because some requests (called non-idempotent requests) may change the state of the server (for example, by changing a database used by the server) Clients should NOT pipeline such requests Otherwise, a premature termination of the transport connection could lead to indeterminate results. A client wishing to send a non-idempotent request should wait to send that request until it has received the response status for the previous request.

Uniform Resource Identifiers

Uniform Resource Identifiers
URIs have been known by many names: WWW addresses, Universal Document Identifiers, Universal Resource Identifiers, Uniform Resource Locators (URL) Uniform Resource Names (URN). For HTTP, URIs are simply formatted strings which identify (by name, location, or any other characteristic) a resource.

General URI Syntax URIs in HTTP can be represented in absolute form or relative to some known base URI. The two forms are differentiated by the fact that absolute URIs always begin with a scheme name followed by a colon.

http-scheme URIs A URI which is based on the http scheme must be of the syntactic form " "//" host [ ":" port ] [ abs_path [ "?" query ]] where items enclosed in [] are optional If the port is not given, Port 80 is assumed.

Meaning of a http-scheme URI
The meaning of a http-scheme URL is that the identified resource is on the server at that port of the host, and the Request-URI for the resource is abs_path. Thus, for example, pointing a browser at is the same as opening a TCP/IP connection to Port 80 on student.cs.ucc.ie and sending either the HTTP/1.0 request GET /cs1064/jabowen/ HTTP/1.0 or the HTTP/1.1 request GET /cs1064/jabowen/ HTTP/1.1 Host: student.cs.ucc.ie (As we shall see later, all HTTP/1.1 requests must include a Host: header field)

Meaning of a http-scheme URI (contd.)
The lectures on HTML forms given earlier in this course used a method called POST to send user-supplied data to a server The POST method was not defined in HTTP/0.9, which only provided one method, the GET method The convention used in HTTP/0.9 to send data to a server was to encode the data in the Request-URI, in the form of a query at the end The convention is still supported in HTTP/1.1. Consider the following form: <form action=" method="get"> Home town: <input type="text" name="hometown” > <button type="submit"> Send data </button> </form> If the user entered “cork” in the input box and submitted the form, the browser’s request would include this request line: GET HTTP/1.1

The query in a URL can include several “equations”. Consider the following form: <form action=" method="get"> Surname: <input type="text" name=”surname” > Home town: <input type="text" name="hometown” > <button type="submit"> Send data </button> </form> If the user entered “sullivan” and “cork” in the input boxes and submitted the form, the browser’s request would include this request line: GET HTTP/1.1 “Equations” in a query are separated by the & character

Some characters in user-supplied data have to be specially handled when a browser is writing the query in a Request-URI The following characters, called the “reserved” characters, have a special usage in URIs: : / @ ? & = ; They have to be “URL encoded”, to send them in URI query Consider the following form: <form action=" method="get"> Name of company: <input type="text" name=“company” > Home town: <input type="text" name=”place” > <button type="submit"> Send data </button> </form> If the user entered “Black&Decker” and “Cork” in the input boxes, the browser’s request would include this request line GET HTTP/1.1 where the URL encoded form of & is %26, the 26 being the headecimal ASCII code for &

URL escape codes for the “reserved” characters: colon %3A slash %2F at % question-mark %3F equals %3D ampersand %26 semi-colon(?) %3B

The following characters, called the “unsafe” characters, should also be URL-encoded in URIs, using the hex codes specified: space % quotation mark %22 less than %3C greater than %3E hash (“#”) % percent %25 left brace %7B right brace %7D pipe (“|”) %7C Backslash %5C Caret (“^”) %5E Tilde %7E Left Sq Bracket % 5B Right Sq Bracket %5D Grave accent (“`”) %60 These characters are unsafe for different reasons

Length of URI Since a browser which is sending user-supplied data to a server includes these data in the query part of a URL, URLs can get quite long The HTTP protocol does not place any a priori limit on the length of a URI: Servers must be able to handle the URI of any resource they serve up Servers should be able to handle URIs of unbounded length if they serve up GET-based forms that could generate such URIs. A server should return 414 (Request-URI Too Long) status if a URI is longer than the server can handle.

Host names in http-scheme URIs
A fully-qualified host name of a host means either the fully-qualified domain name (i.e., a completely specified domain name ending in a top-level domain such as .com or .ie), or the numeric Internet Protocol (IP) address of the host. The fully qualified domain name is preferred; use of numeric IP addresses in URIs is strongly discouraged. and should be avoided whenever possible.

Proxy handling of host names
If a proxy receives a fully qualified domain name, the proxy must NOT change the host name. But, if a proxy receives a host name which is not a fully qualified domain name, it may add its domain to the host name it received.

Hyper Text Transfer Protocol

Similar presentations

Presentation on theme: "Hyper Text Transfer Protocol"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hyper Text Transfer Protocol

Similar presentations

Presentation on theme: "Hyper Text Transfer Protocol"— Presentation transcript:

Similar presentations

About project

Feedback