HTTP WEB Risanuri Hidayat, Ir., M.Sc.
World Wide Web T. Berners-Lee, R. Fielding, H. Frystyk: “Hypertext Transfer Protocol - HTTP/1.0”, RFC 1945, Naming scheme for resources Naming scheme for resources URL, URN, URI Multimedia documents Multimedia documents MIME encoding (RFC) Transfer protocol Transfer protocol HTTP/1.0, HTTP/1.1 Implemented over TCP/IP Implemented over TCP/IP Integrated with Internet infrastructure Integrated with Internet infrastructure DNS, SMTP
Sejarah Hypertext systems: no network access protocol no network access protocol Gopher, WAIS no hyperlinks no hyperlinks CERN (Tim Berners-Lee, 1990) HTTP/0.9 (1992)
Aplikasi Internet Application remote terminal access Web file transfer streaming multimedia remote file server Internet telephony Application layer protocol smtp [RFC 821] telnet [RFC 854] http [RFC 2068] ftp [RFC 959] proprietary (e.g. RealNetworks) NSF proprietary (e.g., Vocaltec) Underlying transport protocol TCP TCP or UDP typically UDP
What is HTTP HTTP stands for Hypertext Transfer Protocol. It's the network protocol used to deliver virtually all files and other data (collectively called resources) on the World Wide Web, whether they're HTML files, image files, query results, or anything else. Usually, HTTP takes place through TCP/IP sockets (and this tutorial ignores other possibilities). A browser is an HTTP client because it sends requests to an HTTP server (Web server), which then sends responses back to the client. The standard (and default) port for HTTP servers to listen on is 80, though they can use any port. HTTP is used to transmit resources, not just files. A resource is some chunk of information that can be identified by a URL
HTTP GET// 1.1 URL or pathnamemethodHTTP versionheadersmessage body HTTP/1.1200OK resource data HTTP versionstatus codereasonheadersmessage body Resource := MIME-encoded data Content negotiation Authentication Methods: GET, HEAD, POST PUT, DELETE, TRACE, OPTIONS, CONNECT
URL URL Resource ID (IP number, port number, pathname) Network address 2:60:8c:2:b0:5a file Web server WebExamples/earth.html8888 DNS lookup Socket
HTTP Transactions HTTP uses the client-server model: An HTTP client opens a connection and sends a request message to an HTTP server; An HTTP client opens a connection and sends a request message to an HTTP server; the server then returns a response message, usually containing the resource that was requested. the server then returns a response message, usually containing the resource that was requested. After delivering the response, the server closes the connection (making HTTP a stateless protocol, i.e. not maintaining any connection information between transactions).
HTTP Protocol http: hypertext transfer protocol WWW’s application layer protocol client/server model client: browser that requests, receives, “displays” WWW objects client: browser that requests, receives, “displays” WWW objects server: WWW server sends objects in response to requests server: WWW server sends objects in response to requests http1.0: RFC 1945 http1.1: RFC 2068 PC running Explorer Server running Apache Web server SUN running Netscape Navigator http request http response
HTTP Protocol http: TCP transport service: client initiates TCP connection (creates socket) to server, port 80 server accepts TCP connection from client http messages (application-layer protocol messages) exchanged between browser (http client) and WWW server (http server) TCP connection closed http is “stateless” server maintains no information about past client requests Protocols that maintain “state” are complex! past history (state) must be maintained if server/client crashes, their views of “state” may be inconsistent, must be reconciled
HTTP Protocol The format of the request and response messages are similar, and English- oriented. Both kinds of messages consist of: an initial line, an initial line, zero or more header lines, zero or more header lines, a blank line (i.e. a CRLF by itself), and a blank line (i.e. a CRLF by itself), and an optional message body (e.g. a file, or query data, or query output). an optional message body (e.g. a file, or query data, or query output).
Request Initial Request Line A request line has three parts, separated by spaces: a method name, the local path of the requested resource, and the version of HTTP being used. A request line has three parts, separated by spaces: a method name, the local path of the requested resource, and the version of HTTP being used. A typical request line is: A typical request line is: GET /path/to/file/index.html HTTP/1.0 GET is the most common HTTP method; it says "give me this resource". Other methods include POST and HEAD-- more on those later. Method names are always uppercase. GET is the most common HTTP method; it says "give me this resource". Other methods include POST and HEAD-- more on those later. Method names are always uppercase.later The path is the part of the URL after the host name, also called the request URI (a URI is like a URL, but more general). The path is the part of the URL after the host name, also called the request URI (a URI is like a URL, but more general). The HTTP version always takes the form "HTTP/x.x", uppercase The HTTP version always takes the form "HTTP/x.x", uppercase
HTTP Request Header Format Two types of messages: request, response http request message: ASCII (human-readable format) ASCII (human-readable format) GET /somedir/page.html HTTP/1.1 Connection: close User-agent: Mozilla/4.0 Accept: text/html, image/gif,image/jpeg Accept-language:en (extra carriage return, line feed) request line (GET, POST, HEAD commands) header lines Carriage return, line feed indicates end of message
HTTP Request Header Format
Response/Reply Initial Response Line (Status Line). The initial response line, called the status line, also has three parts separated by spaces: the HTTP version, the HTTP version, a response status code that gives the result of the request, and a response status code that gives the result of the request, and an English reason phrase describing the status code. an English reason phrase describing the status code. Typical status lines are: HTTP/ OK or HTTP/ OK or HTTP/ Not Found Notes: HTTP/ Not Found Notes:
HTTP Reply Header Format HTTP/ OK Connection: close Date: Thu, 06 Aug :00:15 GMT Server: Apache/1.3.0 (Unix) Last-Modified: Mon, 22 Jun 1998 …... Content-Length: 6821 Content-Type: text/html data data data data data... status line (protocol status code status phrase) header lines data, e.g., requested html file
HTTP Reply Status Code 200 OK request succeeded, requested object later in this message request succeeded, requested object later in this message 301 Moved Permanently requested object moved, new location specified later in this message (Location:) requested object moved, new location specified later in this message (Location:) 400 Bad Request request message not understood by server request message not understood by server 404 Not Found requested document not found on this server requested document not found on this server 505 HTTP Version Not Supported
Sample HTTP Exchange To retrieve the file at the URL first open a socket to the host port 80 (use the default port of 80 because none is specified in the URL). Then, send something like the following through the socket: GET /path/file.html HTTP/1.0 From: User-Agent: HTTPTool/1.0 [blank line here]
Sample HTTP Exchange The server should respond with something like the following, sent back through the same socket: HTTP/ OK Date: Fri, 31 Dec :59:59 GMT Content-Type: text/html Content-Length: 1354 <html><body> Happy New Millennium! Happy New Millennium! (more file contents)... </body></html> After sending the response, the server closes the socket.
User-server interaction: authentication Authentication goal: control access to server documents stateless: client must present authorization in each request authorization: typically name, password authorization: header line in request authorization: header line in request if no authorization presented, server refuses access, sends a WWW authenticate: if no authorization presented, server refuses access, sends a WWW authenticate: header line in response client server usual http request msg 401: authorization req. WWW authenticate: usual http request msg + Authorization:line usual http response msg usual http request msg + Authorization:line usual http response msg time
User-server interaction: cookies Server sends “cookie” to client in response Set-cookie: # Client present cookie in later requests cookie: # Server matches presented-cookie with server-stored cookies authentication authentication remembering user preferences, previous choices remembering user preferences, previous choices client server usual http request msg usual http response + Set-cookie: # usual http request msg cookie: # usual http response msg usual http request msg cookie: # usual http response msg cookie- spectific action cookie- spectific action
User-server interaction: conditional GET Goal: don’t send object if client has up-to-date stored (cached) version client: specify date of cached copy in http request If-modified-since: If-modified-since: server: response contains no object if cached copy up-to-date: HTTP/ Not Modified client server http request msg If-modified-since: http response HTTP/ Not Modified object not modified http request msg If-modified-since: http response HTTP/ OK … object modified
Message format: multimedia extensions MIME: multimedia mail extension, RFC 2045, 2056 additional lines in msg header declare MIME content type From: To: Subject: Picture of yummy crepe. MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Type: image/jpeg base64 encoded data base64 encoded data. multimedia data type, subtype, parameter declaration method used to encode data MIME version encoded data
MIME types Text example subtypes: plain, html Image example subtypes: jpeg, gif Audio exampe subtypes: basic (8-bit mu-law encoded), 32kadpcm (32 kbps coding) Video example subtypes: mpeg, quicktime Application other data that must be processed by reader before “viewable” example subtypes: msword, octet- stream
HTTP Headers (samples) User-Agent Mozilla/4.0 Mozilla/4.0 Accepts: (client-side) text/html, image/* text/html, image/* Content-type: (server-side) text/html text/html Expires, Last-Modified, If-Modified-Since absolute time stamps (1-sec resolution) absolute time stamps (1-sec resolution) Eg: Thu, 03 Jun :16:34 GMT= Eg: Thu, 03 Jun :16:34 GMT= Accept-Language, Accept-Charset Content-encoding Mean #bytes per header: 300 (requests), 160 (responses) * Require parsing !
HTTP/1.1 Improvements B/W optimization persistent connections persistent connections pipelining pipelining does not block waiting for previous responses end-of-message mechanism Content-range Content-range access only specified “range” of a resource Explicit cache control (Cache-control) Digest authentication (Content-MD5)
Web Caches (proxy server) User sets browser: WWW accesses via web cache client sends all http requests to web cache if object at web cache, web cache immediately returns object in http response if object at web cache, web cache immediately returns object in http response else requests object from origin server, then returns http response to client else requests object from origin server, then returns http response to client Goal: satisfy client request without involving origin server client Proxy server client http request http response http request http response http request http response origin server origin server
Why WWW Caching? Assume: cache is “close” to client (e.g., in same network) smaller response time: cache “closer” to client decrease traffic to distant servers link out of institutional/local ISP network often bottleneck link out of institutional/local ISP network often bottleneck origin servers public Internet institutional network 10 Mbps LAN 1.5 Mbps access link institutional cache
Web caching (in)effectiveness Observed hit ratios below 50% even lower byte-weighted ratios ! even lower byte-weighted ratios ! Possible remedies ? Prefetching Prefetching Delta-encoding Delta-encoding HTML macros HTML macros Duplicate suppression (digest-based) Duplicate suppression (digest-based)
HTTP status & perspective J. C. Mogul, “What’s wrong with HTTP (and why it doesn’t matter)”, Proc. USENIX Technical Conference, 1999 Definitely not optimal Definitely not optimal Probably adequate Probably adequate It works well enough It’s not the only game in town Two-way initiation of operations Two-way initiation of operations Real-time Real-time Deferred delivery Deferred delivery Revising it again would be too hard HTTP/1.0 -> HTTP/1.1 evolution took 4+ years ! HTTP/1.0 -> HTTP/1.1 evolution took 4+ years !