Hypertext Transfer Protocol (HTTP) Paul Amer CISC 856: TCP/IP and Upper Layer Protocols Fall 2003
HTTP Background Created by Tim Berners-Lee at CERN – physicists, not computer scientists – to share data from physics experiments – because ftp was “too heavy” Standardized and much expanded by IETF Aerial View of CERN
Basic HTTP Protocol HTTP HTTP TCP TCP IP IP Link Link Physical Goal: transfer data Stateless Request-Response Protocol
Terminology: URL vs URN ? URN : Uniform Resource Name e.g., a book’s ISBN URN does not tell where to find a resource URL : Uniform Resource Locator e.g., www.freebooks.com/bookXYZ.html URL tells where a resource is
HTTP Version HTTP/0.9 HTTP/1.0 HTTP/1.1 HTTP/<major>.<minor> HTTP/0.9 HTTP/1.0 HTTP/1.1
Overview of a browser HTTP request TCP connection HTTP response URL User-Agent (browser/client) Origin Server DNS Server DNS query DNS response HTTP request TCP connection HTTP response optional TCP connections
HTTP/0.9 – simple ftp -- Client says “GET /index.html” Client/Server, request/response: -- Client says “GET /index.html” -- Server returns file named index.html Implementable in ~40 lines of C No support for -- variant kinds of text (languages) -- variant kinds of pictures (gif, jpeg, png) -- object versioning and caching -- error codes Simple, But Enough to Start a Revolution
Request Message 3 request methods: A-PDU GET, HEAD, POST request line headers blank line body 3 request methods: GET, HEAD, POST GET /pub/index.html HTTP/1.0 Date: Wed, 20 Mar 2002 10:00:02 GMT Pragma: no-cache From: amer@udel.edu User-Agent: Mozilla/4.03 GET: Retrieves resource indicated by URI HEAD: Retrieves ONLY metadata indicated by URI POST: Pushes the resource in place of the URI (or creates a new one) or inputs the entity (data part) to the program(ex: CGI script) pointed by the URI
Request Methods GET HEAD POST PATCH LINK UNLINK PUT DELETE OPTION TRACE CONNECT Methods present in HTTP/1.0 & HTTP/1.1 New Methods added in HTTP/1.1
Response Message status line headers blank line body HTTP/1.1 200 OK Date: Tue, 08 Oct 2002 00:31:35 GMT Server: Apache/1.3.27 tomcat/1.0 Last-Modified: 7Oct2002 23:40:01 GMT ETag: "20f-6c4b-3da21b51" Accept-Ranges: bytes Content-Length: 27723 Keep-Alive: timeout=5, max=300 Connection: Keep-Alive Content-Type: text/html Why a human readeable phrase? To make extensibility easier. You can add new status codes without waiting for a protocol standard
Status Codes Classes: 200 OK 201 Created 202 Accepted 204 No Content 301 Moved Permanently 302 Moved Temporarily 304 Not Modified 400 Bad Request 401 Unauthorized 403 Forbidden 404 Not Found 500 Internal Server Error 501 Not Implemented 502 Bad Gateway 503 Service Unavailable Classes: 1xx: Informational - not used, reserved for future 2xx: Success - action was successfully received, understood, and accepted 3xx: Redirection - further action needed to complete request 4xx: Client Error - request contains bad syntax or cannot be fulfilled 5xx: Server Error - server failed to fulfill an apparently valid request 404, 403, 200, 304
Headers Request Line A Blank Line Body Entity Headers Request Headers General Headers Status Line A Blank Line Body Entity Headers Response Headers General Headers
Headers (cont’d) General Headers Request Headers Date Pragma Cache Control Connection Trailer Transfer-Encoding Upgrade Via Warning Authorization From If-Modified-Since Referer User-Agent Accept Accept-Charset Accept-Encoding Language Expect Host If-Match If-None-Match If-Range If-Unmodified-Since Max-Forwards Proxy-Authorization Range TE Headers present in HTTP/1.0 & HTTP/1.1 New Headers added in HTTP/1.1
Headers (cont’d) Entity Headers Response Headers Allow Content-Encoding Content-Length Content-Type Expires Last-Modified extension-header Content-Language Content-Location Content-MD5 Content-Range Location Server WWW-Authenticate Accept-Ranges Age ETag Proxy-Authenticate Retry-After Vary Headers present in HTTP/1.0 & HTTP/1.1 New Headers added in HTTP/1.1
Performance Issues
FTP vs. HTTP
FTP vs. HTTP (cont’d)
HTTP/1.0 Nonpersistent Connections Client Server SYN SYN-ACK 3-way handshake ACK GET URI HTTP/1.0 URL ACK OK ACK ACK DATA web page web page transferred client parses HTML web page ACK Nonpersistent Connection (short-lived connection) -- use a new TCP connection for each request; -- seldom get past the “slow-start” region. -- fail to maximize their use of the available bandwidth FIN connection close ACK FIN ACK
HTTP/1.0 Nonpersistent (cont’d) SYN SYN-ACK ACK Client Server GET URI HTTP/1.0 OK FIN image 1 transfer connect close 3-Way Handshake URL DATA web page Client parses HTML web page HTML file gets parsed GET Image 1 will be issued SYN SYN-ACK ACK GET IMAGE1 HTTP/1.0 OK FIN image 2 transfer connect close 3-Way Handshake URL DATA image1
Nonpersistent with pipelining (a.k.a. Parallel Connections Hack) SYN SYN-ACK ACK Client Server GET URI HTTP/1.0 OK FIN DATA Transfer CONN. Close 3-Way Handshake URL DATA Client initiates new TCP connections for each embedded object after parsing the HTML file SYN SYN-ACK ACK Client Server GET URI HTTP/1.0 OK FIN DATA Transfer CONN. Close 3-Way Handshake URL DATA SYN SYN-ACK ACK Client Server GET URI HTTP/1.0 OK FIN DATA Transfer CONN. Close 3-Way Handshake URL DATA Increases network congestion, burstiness Is not fair for well-behaved connections
Nonpersistent with pipelining (a. k. a Nonpersistent with pipelining (a.k.a. Parallel Connections Hack) (cont’d) RTTs 00 01 port 1114 02 03 04 05 06 1118 1115 1116 1117 07 08 09 1120 10 1121 1119 11 12
HTTP Delay Estimation Non Persistent Non Persistent with Pipelining assume web page with 2 images Non Persistent Non Persistent with Pipelining Client Server Client Server Time Delay in RTTs = 4 Delay Due to Connection Request/Handshake Delay Due to HTML Page Request Time Delay in RTTs = 6 Delay Due to Object Request
Potential HTTP 1.0 Inefficiencies v1.0 fetches single URL per TCP connection Mean size of responses only a few thousand bytes inefficient use of available network bandwidth Server resources wasted User perceived latency is high
HTTP/1.1 Default: Persistent Connections SYN SYN-ACK ACK Client Server 3-Way Handshake GET URI HTTP/1.0 OK-ACK ACK DATA DATA Transfer URL GET image 3 GET URI HTTP/1.0 OK-ACK ACK DATA DATA Transfer URL Client parses HTML GET image 1 Conn. timeout GET URI HTTP/1.0 OK-ACK ACK DATA DATA Transfer URL FIN CONN. Close ACK GET image 2 GET URI HTTP/1.0 OK-ACK ACK DATA DATA Transfer URL Either client or server can close the connection
Why Persistent Connections? CPU time is saved in routers and hosts Memory used for TCP PCB can be saved in hosts. Reduced Network Congestion Reduced perceived latency on subsequent requests HTTP can evolve more gracefully
Persistent with pipelining SYN SYN-ACK ACK Client Server 3-Way Handshake GET Image 1 OK-ACK URL Image 1 GET Image 2 GET URI HTTP/1.0 OK-ACK ACK DATA DATA Transfer URL OK-ACK URL Image 2 GET Image 3 OK-ACK URL Image 3 ACK DATA Transfer Image 1 DATA Client parses HTML GET image 1 ACK DATA Image 2 DATA Transfer
Persistent with pipelining (cont’d) ACK DATA Transfer Image 3 DATA Conn. timeout FIN CONN. Close ACK Reduces user perceived latency even more than persistent connections Encouraged, but not default
HTTP Delay Estimation (cont’d) Persistent w/o pipelining Persistent with pipelining Client Server Client Server Time Delay in RTTs = 3 Time Delay in RTTs = 4 Delay Due to Connection Request/Handshake Delay Due to HTML Page Request Delay Due to Object Request
Questions ? References: -- Jeffrey C. Mogul (DEC) -- John Heidemann (USC) -- Balachander Krishnamurthy (AT&T) -- James F. Kurose/Keith W. Ross -- W. Richard Stevens -- Behrouz A. Forouzan -- Ali Yilmaz Kumcu (Univ of Delaware)
Why HTTP/1.1 ? v1.0 fetches single URL per TCP connection Mean size of responses only a few thousand bytes TCP Congestion control not used due to short transfers Server resources wasted User perceived latency is high Naïve caching Goal: Making HTTP a good Internet citizen, while improving performance for both clients and servers
What is different in HTTP/1.1 ? Over 40 header fields as compared to 16 in v1.0 Host header Reliable caching (semantically transparent) Use of Age header for cache control If-Modified-Since, If-Unmodified-Since Range requests (Range, If-Range, Content-Range) A cache behaves in a “semantically transparent” manner, with respect to a particular response, when its use affects neither the requesting client nor the origin server, except to improve performance. When a cache is semantically transparent, the client receives exactly the same response that it would have received had its request been handled directly by the origin server.
What is different in HTTP/1.1 ? (cont’d) Persistent Connections Default behavior for HTTP/1.1 Server can indicate connection will be closed by: Connection: close Request/responses can be pipelined Major performance gain for users, and major goodness to the network Chunked Encoding Hop-by-Hop behavior New Methods Trace, Put, Delete, Options, Upgrade Trace, Used in concert with Via: and Max-Forwards: for debugging Some entities do not have known length (e.g. those generated by scripts) Chunked Encoding allows transmission of entities where the length is not known in advance New header Upgrade: Supports protocol switching on open connection Content-Transfer-Encoding: Hop-by-hop compression of entities
HTTP on top of TCP Different Connection Types HTTP/1.0 style connections (serial) Extended HTTP/1.0 style (by Netscape) open 4 parallel connections for embedded objects Persistent Connections HTTP/1.1 default Persistent with pipelining
Design Issues for P-HTTP Effects on Reliability Interaction with current proxy-servers Connection Lifetimes Server Resource Utilization Server Congestion control Network Resources Users Perceived Performance (UPP)
Quantifying TCP connection overhead Throughput (bits/second) Connection Length (bytes) Figure 3-2: Throughput vs. connection length, RTT = 70 msec
Experimental Results Network Latency (seconds) (NP HTTP/1.0) Network Latency (seconds) Number of in lined images Figure 6-1: Latencies for a remote server, image size = 2544 bytes
Experimental Results (contd..) (NP HTTP/1.0) Network Latency (seconds) Number of in lined images Figure 6-2: Latencies for a remote server, image size = 45566 bytes
Persistent Connection and Pipelining Differences between HTTP/1.0 and HTTP/1.1 Persistent Connection and Pipelining Support for Semantically Transparent Caching Range Requests Chunked Encoding Expect/Continue Host Header
Effects of changes in HTTP Protocol HTTP 1.1 Feature Implication Persistent Connection Lowers Number of Connection Setups Pipelining Shortens Inter arrival of requests Expires Lowers number of validations Entity Tags Lowers Frequency of Validations Max-Age, Max-Stale etc Changes Frequency of Validations Range Request Lowers bytes transferred Chunked Encoding Lowers User perceived Latency Expect/Continue Lowers error response/bandwidth Host Header Reduces Proliferation of IP addresses
What’s Wrong with HTTP/1.1 ? HTTP’s Existing Data Type Model Resources Entities and Entity Tags Problems with Current Model How to specify HTTP Caching Consistent Handling of Partial Results Categorization of Headers
Food For Thought !!! Network What’s the Bottleneck CPU Hard Disk/Memory
Client Server SYN SYN-ACK 3-Way Handshake ACK GET URI HTTP/1.0 OK URL ACK ACK ACK DATA DATA DATA Transfer ACK Client parses HTML FIN CONN. Close ACK
TCP-PDUs and round-trip times for HTTP/1.0 Client Server 0 RTT SYN Client opens TCP connection SYN-ACK 1 RTT ACK Client sends HTTP GET request for /index.html DAT Server reads from Disk ACK DAT 2 RTT FIN ACK Client parses /index.html ACK FIN Client Opens TCP Connection SYN ACK 3 RTT SYN-ACK ACK Client Sends HTTP Request for image DAT Server Reads from Disk ACK DAT 4 RTT Image Begins to Arrive
TCP-PDUs for HTTP/1.1 (persistent connections) Client Server 0 RTT SYN Client opens TCP connection SYN-ACK 1 RTT ACK Client Sends HTTP GET request for /index.html DAT Server Reads from Disk ACK DAT 2 RTT ACK Client Parses HTML DAT Client Sends HTTP Request for image 1-n ACK Server Reads from Disk DAT 3 RTT Image Begins to Arrive
TCP-PDUs for HTTP/1.1 (persistent connections with pipelining) Client Server 0 RTT SYN Client opens TCP connection SYN-ACK 1 RTT ACK Client Sends HTTP GET request for /index.html DAT Server Reads from Disk ACK DAT 2 RTT ACK Client Parses HTML GET Image-1 Client Sends HTTP Request for image 1-n GET Image-2 Server Reads from Disk GET Image-3 ACK 3 RTT DAT Image Begins to Arrive