The HyperText Transfer Protocol
History HTTP has been in use since 1990 (HTTP/0.9) HTTP/1.0 was defined in RFC 1945 (May 1996) and included metainformation HTTP/1.1 was defined in RFC 2068 (January 1997) and included caching, and persistent connections "Specifies an Internet standards track protocol for the Internet Community" HTTP is "a generic, stateless, object-oriented protocol…" Good site:
Overall Operation Client sends a request in the form of a URI (Uniform Resource Identifier) and protocol version, followed by optional modifiers Server responds with a status line, metainformation, and entity Default port is 80 (but others can be used) In HTTP/1.0, a single connection is used for each request HTTP/1.1 a connection may be used for multiple requests
HTTP Version HTTP version: uses. –HTTP-Version="HTTP" "/" 1*DIGIT "." 1*DIGIT –HTTP/2.4 is lower than HTTP/2.13 To follow the specification, you MUST send a version number!
URLs and URIs (Using Backus-Naur Form or BNF) BNF Identify several things http_URL = " "//" host [":" port] [abs_path] host = port = *DIGIT abs_path= "/" rel_path rel_path = [path] [";" params] ["?" query]
Date and Time Format Three formats: Sun, 06 Nov :49:37 GMT ; RFC 822, updated by RFC 1123 Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036 Sun Nov 6 08:49: ; ANSI C's asctime() format It, too, has a long BNF!
Content/Transfer Codings Content: indicates an encoding transformation that has been, or can be applied to an entity –gzip (GNU zip) –compress (UNIX) –deflate (zlib format) Transfer: used to ensure "safe transport" through the network –If server doesn't understand, returns 501 (unimplemented) –Rarely used
Media Types Used in the content-type and accept header fields for type negotiations media-type = type "/" subtype *(";" parameter) type = token subtype = token Example: text/html or text/plain
Multipart Types Encapsulation of one or more entities within a single message body Must use CRLF to represent line breaks between body parts
Product Tokens Used to identify server and client by software name and version Examples: –User-Agent: CERN-LineMode/2.15 libwww/2.17 –Server: Apache/0.8.4
Language Tags Identifies the natural (spoken) language of the entity Examples: en, en-US, en-cockney, x-pig-latin
HTTP Messages Messages consist of two types HTTP-message = Request | Response generic-message = start-line *message-header CRLF [message-body]
Requests Requests an entity by a client from a server Request = Request-line *(general-header | request-header | entity-header) CRLF [ message-body ]
Request-Line The actual request from the client Request-line = Method SP Request-URI SP HTTP-version CRLF Method = "OPTIONS" | "GET" | "HEAD" | "POST" | "PUT" | "TRACE" Request-URI = "*" | absolute-URI | abs-path Examples: –GET /index.html HTTP/1.0 –GET HTTP/1.1 –OPTIONS * HTTP/1.0
OPTIONS Example
Request Header Fields Allow client to specify more information about itself request-header = Accept | Accept-Charset | Accept-Encoding | Accept-Language | Authorization | From | Host | If-Modified-Since | If-Match | If-None-Match | If-Range | If-Unmodified-Since | Max-Forwards | Proxy-Authorization | Range | Referer | User-Agent
Response After receiving a request, the server responds Response = Status-Line * (general-header | response-header | entity-header) CRLF [ message-body ]
Status-Line and Reason-Phrase Status-line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF Status-Code = "100" ; Continue | "101" ; Switching Protocols | "200" ; OK | "201" ; Created | "202" ; Accepted | "203" ; Non-Authoritative Information| "204" ; No Content | "205" ; Reset Content | "206" ; Partial Content | "300" ; Multiple Choices | "301" ; Moved Permanently | "302" ; Moved Temporarily | "303" ; See Other | "304" ; Not Modified | "305" ; Use Proxy | "400" ; Bad Request | "401" ; Unauthorized | "402" ; Payment Required | "403" ; Forbidden | "404" ; Not Found | "405" ; Method Not Allowed | "406" ; Not Acceptable | "407" ; Proxy Authentication Required | "408" ; Request Time-out | "409" ; Conflict | "410" ; Gone | "411" ; Length Required | "412" ; Precondition Failed | "413" ; Request Entity Too Large | "414" ; Request-URI Too Large | "415" ; Unsupported Media Type | "500" ; Internal Server Error | "501" ; Not Implemented | "502" ; Bad Gateway | "503" ; Service Unavailable | "504" ; Gateway Time-out | "505" ; HTTP Version not supported Successful Redirection Client Error Server Error
Response Header Fields Allows server to pass additional information to client Response-header = Age | Location | Proxy-Authenticate | Public | Retry-After | Server | Vary | Warning | WWW-Authenticate
Entity Header Identify optional information about the entity entity-header = Allow | Content-Base | Content-Encoding | Content-Language | Content-Length | Content-Location | Content-MD5 | Content-Range | Content-Type | Expires | Last-Modified
Persistent Connections
Request Methods OPTIONS – requests options available for communications GET – requests an entity HEAD – requests only the header info for an entity POST – used for sending info from the client as a separate entity PUT – puts info in the URI instead of a separate entity DELETE – delete an entity
Persistent Connections Open and close fewer TCP connections –Saves time –Saves CPU Allows pipelining of requests (without waiting for each response) Network congestion reduced (caused by TCP opens)
Authorization Websites are stored as 'realms' Upon receipt of unauthorized request, server responds with: –WWW-Authenticate: Basic realm="My Stuff" Client sends userID and passwd: –Authorization: Basic bob:hi
Content Negotiation Can be either Server or Client side –Accept: the types of media that will be accepted on client side. Can specify quality as well: Accept: audio/*; q=0.2; audio/basic If no accept, assumed to accept all media –Accept-Charset: specifies the character set Accept-Charset: iso –Accept-Encoding: allowable compressions Accept-Encoding: compress, gzip
Content Negotiation –Accept Language: restricts the set of languages Accept-Language: da, en-gb; q=0.8 –Accept Ranges: how an entity is requested. Can be either bytes or none Accept Ranges: bytes –Allow: The kind of methods allowed Allow: GET, HEAD, PUT –Authorization: allows client to send authorization info Authorization:
Content Negotiation –Connection: specifies the state of the connection
Content Negotiation –Content-Encoding: When present, indicates codings have been applied to entity: Content-Encoding: gzip –Content-Language: the language of the entity Content-Language: en –Content-Length: size of message body in bytes Content-Length: –Content-Range: can be used for partial entities Skip this
Content Negotiation –Content-Type: indicates the type of media Content-Type: text/html; charset=ISO –Date: Date of message (described previously) –Expires: when the response should be considered stale (same format as Date) –From: can contain an address of person responsible for entity From:
Content Negotiation –Host: specifies the host of the entity and port number (remember, there can be gateways) Host: kahuna.clayton.edu:8080 –If-Modified-Since: will not return if entity is older than date: If-Modified-Since: Sun, 9 Feb :00:00 GMT –If-Unmodified-Since: hmmm –Last-Modified: the time which the server believes the file was last modified: Last-Modified: Sun, 9 Feb :00:00 GMT
Content Negotiations –Public: specifies methods supported by the server Public: GET, OPTIONS, HEAD –Referer: allows client to specify the address of the resource from which the request URI was obtained –Retry-After: server can specify this with a 503 (Service Unavailable) response Retry-After: Sun, 9 Feb :00:00 GMT Retry-After: 30
Content Negotiations –Server: specifies software and sub-software Server: Apache/1.3.9 mod_perl/1.21 –Transfer-Encoding: shows what, if any, type of transformation was applied to safely deliver it Transfer-Encoding: chunked –Upgrade: allows client to specify additional protocols it supports Upgrade: HTTP/2.0, IRC/6.9, SHTTP/1.3 Server must send 101 (Switching Protocols) –User-Agent: specifies the client's browser –WWW-Authenticate: Basic realm="My Stuff"