15-441: Computer Networking The “Web” Thomas Harris (slides from Srini Seshan’s Fall ’01 course)

Slides:



Advertisements
Similar presentations
Nick Feamster CS 3251: Computer Networking I Spring 2013
Advertisements

1 Server Selection & Content Distribution Networks (slides by Srini Seshan, CS CMU)
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Web basics HTTP – – URI/L/Ns – HTML –
Chapter 9 Application Layer, HTTP Professor Rick Han University of Colorado at Boulder
Distributed Systems Spring 2009
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
15-744: Computer Networking L-20 Applications. L -20; © Srinivasan Seshan, Application Networking HTTP APIs Assigned reading [BSR99] An Integrated.
15-744: Computer Networking
HTTP and Web Content Delivery COS 461: Computer Networks Spring 2011 Mike Freedman
Internet Networking Spring 2002 Tutorial 13 Web Caching Protocols ICP, CARP.
Web Content Delivery Reading: Section and COS 461: Computer Networks Spring 2010 (MW 3:00-4:20 in CS105) Mike Freedman
Chapter 2 Application Layer Computer Networking: A Top Down Approach Featuring the Internet, 3 rd edition. Jim Kurose, Keith Ross Addison-Wesley, July.
Web, HTTP and Web Caching
15-744: Computer Networking L-21: Caching and CDNs.
1 Computer Networks Application layer (Part 2). 2 Application layer protocols Last class –Application layer functions –Application protocols DNS ftp This.
15-744: Computer Networking L-20 Applications. L -20; © Srinivasan Seshan, Next Lecture: Application Networking HTTP Adaptive applications.
15-744: Computer Networking L-21 Applications. L -21; © Srinivasan Seshan, Next Lecture: Application Networking HTTP Adaptive applications.
2/9/2004 Web and HTTP February 9, /9/2004 Assignments Due – Reading and Warmup Work on Message of the Day.
Web Content Delivery Reading: Section and COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in CS105) Mike Freedman Teaching Assistants:
Rensselaer Polytechnic Institute CSC-432 – Operating Systems David Goldschmidt, Ph.D.
Network Protocols: Design and Analysis Polly Huang EE NTU
CS640: Introduction to Computer Networks Aditya Akella Lecture 18 - The Web, Caching and CDNs.
HTTP Protocol Specification
HTTP Reading: Section and COS 461: Computer Networks Spring
CP476 Internet Computing Lecture 5 : HTTP, WWW and URL 1 Lecture 5. WWW, HTTP and URL Objective: to review the concepts of WWW to understand how HTTP works.
Rensselaer Polytechnic Institute Shivkumar Kalvanaraman, Biplab Sikdar 1 The Web: the http protocol http: hypertext transfer protocol Web’s application.
Maryam Elahi University of Calgary – CPSC 441.  HTTP stands for Hypertext Transfer Protocol.  Used to deliver virtually all files and other data (collectively.
CS640: Introduction to Computer Networks Aditya Akella Lecture 4 - Application Protocols, Performance.
Protocol(TCP/IP, HTTP) 송준화 조경민 2001/03/13. Network Computing Lab.2 Layering of TCP/IP-based protocols.
1 Introductory material. This module illustrates the interactions of the protocols of the TCP/IP protocol suite with the help of an example. The example.
Hui Zhang, Fall Computer Networking Web, HTTP, Caching.
15-744: Computer Networking L-21: Caching and CDNs Amit Manjhi.
1 CS 4396 Computer Networks Lab TCP/IP Networking An Example.
HyperText Transfer Protocol (HTTP) RICHI GUPTA CISC 856: TCP/IP and Upper Layer Protocols Fall 2007 Thanks to Dr. Amer, UDEL for some of the slides used.
CSE 461 HTTP and the Web. This Lecture  HTTP and the Web (but not HTML)  Focus  How do Web transfers work?  Topics  HTTP, HTTP1.1  Performance Improvements.
CIS679: Lecture 13 r Review of Last Lecture r More on HTTP.
Application layer (Part 2)
CSE 524: Lecture 4 Application layer protocols. Administrative ● Reading assignment Chapter 2 ● Mid-term exam may be delayed to 11/2/2004 – Mostly on.
Operating Systems Lesson 12. HTTP vs HTML HTML: hypertext markup language ◦ Definitions of tags that are added to Web documents to control their appearance.
HTTP evolution - TCP/IP issues Lecture 4 CM David De Roure
Network Protocols: Design and Analysis Polly Huang EE NTU
2: Application Layer 1 Chapter 2: Application layer r 2.1 Principles of network applications  app architectures  app requirements r 2.2 Web and HTTP.
15-744: Computer Networking L-21: Caching and CDNs.
Web Technologies Lecture 1 The Internet and HTTP.
HTTP Here, we examine the hypertext transfer protocol (http) – originally introduced around 1990 but not standardized until 1997 (version 1.0) – protocol.
EE 122: Lecture 21 (HyperText Transfer Protocol - HTTP) Ion Stoica Nov 20, 2001 (*)
CS 6401 The World Wide Web Outline Background Structure Protocols.
Overview of Servlets and JSP
Computer Networking Lecture 25 – The Web.
1 COMP 431 Internet Services & Protocols HTTP Persistence & Web Caching Jasleen Kaur February 11, 2016.
Data Communications and Computer Networks Chapter 2 CS 3830 Lecture 7 Omar Meqdadi Department of Computer Science and Software Engineering University of.
Computer Networking The Web. 2 Web history 1945: Vannevar Bush, “As we may think”, Atlantic Monthly, July, describes the idea of a distributed.
HyperText Transfer Protocol (HTTP) Deepti Kulkarni CISC 856: TCP/IP and Upper Layer Protocols Fall 2008 Acknowledgements Professor Amer Richi Gupta.
Content Distribution Networks (CDNs)
Week 11: Application Layer 1 Web and HTTP r Web page consists of objects r Object can be HTML file, JPEG image, Java applet, audio file,… r Web page consists.
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
Aditya Akella The Web Aditya Akella 18 Apr, 2002.
Whole Page Performance Leeann Bent and Geoffrey M. Voelker University of California, San Diego.
Block 5: An application layer protocol: HTTP
Web Caching? Web Caching:.
Web Caching, Content Delivery Networks, Consistent Hashing
Internet Networking recitation #12
Lecture 6 – Web Optimizations
CSE 461 HTTP and the Web.
15-441: Computer Networking
EE 122: HyperText Transfer Protocol (HTTP)
Hypertext Transfer Protocol (HTTP)
CSCI-351 Data communication and Networks
Presentation transcript:

15-441: Computer Networking The “Web” Thomas Harris (slides from Srini Seshan’s Fall ’01 course)

Lecture 26: May 1, Overview HTTP Basics HTTP Fixes Web Caches Content Distribution Networks

Lecture 26: May 1, HTTP Basics HTTP layered over bidirectional byte stream Almost always TCP Interaction Client sends request to server, followed by response from server to client Requests/responses are encoded in text How to mark end of message? Size of message  Content-Length Must know size of transfer in advance Delimiter  MIME style Content-Type Server must “byte-stuff” Close connection Only server can do this

Lecture 26: May 1, HTTP Request Request line Method GET – return URI HEAD – return headers only of GET response POST – send data to the server (forms, etc.) URI E.g. with a proxyhttp:// E.g. /index.html if no proxy HTTP version

Lecture 26: May 1, HTTP Request Request headers Authorization – authentication info Acceptable document types/encodings From – user If-Modified-Since Referrer – what caused this page to be requested User-Agent – client software Blank-line Body

Lecture 26: May 1, HTTP Request Example GET / HTTP/1.1 Accept: */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Host: Connection: Keep-Alive

Lecture 26: May 1, HTTP Response Status-line HTTP version 3 digit response code 1XX – informational 2XX – success 3XX – redirection 4XX – client error 5XX – server error Reason phrase

Lecture 26: May 1, HTTP Response Headers Location – for redirection Server – server software WWW-Authenticate – request for authentication Allow – list of methods supported (get, head, etc) Content-Encoding – E.g x-gzip Content-Length Content-Type Expires Last-Modified Blank-line Body

Lecture 26: May 1, HTTP Response Example HTTP/ OK Date: Tue, 27 Mar :49:38 GMT Server: Apache/ (Unix) (Red-Hat/Linux) mod_ssl/2.7.1 OpenSSL/0.9.5a DAV/1.0.2 PHP/4.0.1pl2 mod_perl/1.24 Last-Modified: Mon, 29 Jan :54:18 GMT ETag: "7a11f-10ed-3a75ae4a" Accept-Ranges: bytes Content-Length: 4333 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: text/html …..

Lecture 26: May 1, Typical Workload Multiple (typically small) objects per page Request sizes In one measurement paper  median 1946 bytes, mean bytes Why such a difference? Heavy-tailed distribution Pareto – p(x) = ak a x -(a+1) File sizes Why different than request sizes? Also heavy-tailed Pareto distribution for tail Lognormal for body of distribution

Lecture 26: May 1, Typical Workload Popularity Zipf distribution (P = kr -1 ) Surprisingly common Embedded references Number of embedded objects = pareto Temporal locality Modeled as distance into push-down stack Lognormal distribution of stack distances Request interarrival Bursty request patterns

Lecture 26: May 1, HTTP Caching Clients often cache documents Challenge: update of documents If-Modified-Since requests to check HTTP 0.9/1.0 used just date HTTP 1.1 has file signature as well When/how often should the original be checked for changes? Check every time? Check each session? Day? Etc? Use Expires header If no Expires, often use Last-Modified as estimate

Lecture 26: May 1, Example Cache Check Request GET / HTTP/1.1 Accept: */* Accept-Language: en-us Accept-Encoding: gzip, deflate If-Modified-Since: Mon, 29 Jan :54:18 GMT If-None-Match: "7a11f-10ed-3a75ae4a" User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Host: Connection: Keep-Alive

Lecture 26: May 1, Example Cache Check Response HTTP/ Not Modified Date: Tue, 27 Mar :50:51 GMT Server: Apache/ (Unix) (Red- Hat/Linux) mod_ssl/2.7.1 OpenSSL/0.9.5a DAV/1.0.2 PHP/4.0.1pl2 mod_perl/1.24 Connection: Keep-Alive Keep-Alive: timeout=15, max=100 ETag: "7a11f-10ed-3a75ae4a"

Lecture 26: May 1, HTTP 0.9/1.0 One request/response per TCP connection Simple to implement Disadvantages Multiple connection setups  three-way handshake each time Several extra round trips added to transfer Multiple slow starts

Lecture 26: May 1, Single Transfer Example ClientServer SYN ACK DAT FIN ACK 0 RTT 1 RTT 2 RTT 3 RTT 4 RTT Server reads from disk FIN Server reads from disk Client opens TCP connection Client sends HTTP request for HTML Client parses HTML Client opens TCP connection Client sends HTTP request for image Image begins to arrive

Lecture 26: May 1, More Problems Short transfers are hard on TCP Stuck in slow start Loss recovery is poor when windows are small Lots of extra connections Increases server state/processing Server also forced to keep TIME_WAIT connection state Why must server keep these? Tends to be an order of magnitude greater than # of active connections, why?

Lecture 26: May 1, Overview HTTP Basics HTTP Fixes Web Caches Content Distribution Networks

Lecture 26: May 1, Netscape Solution Use multiple concurrent connections to improve response time Different parts of Web page arrive independently Can grab more of the network bandwidth than other users Doesn’t necessarily improve response time TCP loss recovery ends up being timeout dominated because windows are small

Lecture 26: May 1, Persistent Connection Solution Multiplex multiple transfers onto one TCP connection Serialize transfers  client makes next request only after previous response How to demultiplex requests/responses Content-length and delimiter  same problems as before Block-based transmission – send in multiple length delimited blocks Store-and-forward – wait for entire response and then use content-length PM95 solution – use existing methods and close connection otherwise

Lecture 26: May 1, Persistent Connection Example Client Server ACK DAT ACK 0 RTT 1 RTT 2 RTT Server reads from disk Client sends HTTP request for HTML Client parses HTML Client sends HTTP request for image Image begins to arrive DAT Server reads from disk DAT

Lecture 26: May 1, Persistent Connection Solution Serialized requests do not improve interactive response Pipelining requests Getall – request HTML document and all embeds Requires server to parse HTML files Doesn’t consider client cached documents Getlist – request a set of documents Implemented as a simple set of GETs Prefetching Must carefully balance impact of unused data transfers Not widely used due to poor hit rates

Lecture 26: May 1, Persistent Connection Performance Benefits greatest for small objects Up to 2x improvement in response time Server resource utilization reduce due to fewer connection establishments and fewer active connections TCP behavior improved Longer connections help adaptation to available bandwidth Larger congestion window improves loss recovery

Lecture 26: May 1, Remaining Problems Application specific solution to transport protocol problems Stall in transfer of one object prevents delivery of others Serialized transmission Much of the useful information in first few bytes Can “packetize” transfer over TCP HTTP 1.1 recommends using range requests MUX protocol provides similar generic solution Solve the problem at the transport layer Fix TCP so it works well with multiple simultaneous connections

Lecture 26: May 1, Overview HTTP Basics HTTP Fixes Web Caches Content Distribution Networks

Lecture 26: May 1, Web Caching Why cache HTTP objects? Reduce client response time Reduce network bandwidth usage Wide area vs. local area use These two objectives are often in conflict May do exhaustive local search to avoid using wide area bandwidth Prefetching uses extra bandwidth to reduce client response time

Lecture 26: May 1, Web Proxies Also used for security Proxy is only host that can access Internet Administrators makes sure that it is secure Performance How many clients can a single proxy handle? Caching Provides a centralized coordination point to share information across clients How to index Early caches used file system to find file Metadata now kept in memory on most caches

Lecture 26: May 1, Caching Proxies – Sources for misses Capacity How large a cache is necessary or equivalent to infinite On disk vs. in memory  typically on disk Compulsory First time access to document Non-cacheable documents CGI-scripts Personalized documents (cookies, etc) Encrypted data (SSL) Consistency Document has been updated/expired before reuse Conflict  no such issue

Lecture 26: May 1, Cache Hierarchies Use hierarchy to scale a proxy to more than limited population Why? Larger population = higher hit rate Larger effective cache size Why is population for single proxy limited? Performance, administration, policy, etc. NLANR cache hierarchy Most popular 9 top level caches Internet Cache Protocol based (ICP) Squid/Harvest proxy

Lecture 26: May 1, ICP Simple protocol to query another cache for content Uses UDP – why? ICP message contents Type – query, hit, hit_obj, miss Other – identifier, URL, version, sender address (is this needed?) Special message types used with UDP echo port Used to probe server or “dumb cache” Transfers between caches still done using HTTP

Lecture 26: May 1, Squid Cache ICP Use Upon query that is not in cache Sends ICP_Query to each peer (or ICP_Decho to echo port of peer caches that do not speak ICP) May also send ICP_Secho to origin server’s echo port Sets time to short period (default 2 sec) Peer caches process queries and return either ICP_Hit or ICP_Miss Proxy begins transfer upon reception of ICP_Hit, ICP_Decho or ICP_Secho Upon timer expiration, proxy request object from closest (RTT) parent proxy Would be better to direct to parent that is towards origin server

Lecture 26: May 1, Squid Client Parent Child Web page request ICP Query

Lecture 26: May 1, Squid Client Parent Child ICP MISS

Lecture 26: May 1, Squid Client Parent Child Web page request

Lecture 26: May 1, Squid Client Parent Child Web page request ICP Query

Lecture 26: May 1, Squid Client Parent Child Web page request ICP MISS ICP HIT

Lecture 26: May 1, Squid Client Parent Child Web page request

Lecture 26: May 1, ICP vs HTTP Why not just use HTTP to query other caches? ICP is lightweight – positive and negative Makes it easy to process quickly Caches may process many more ICP requests than HTTP requests HTTP has many functions that are not supported by ICP ICP does not evolve with HTTP changes Adds extra RTT to any proxy-proxy transfer

Lecture 26: May 1, Optimal Cache Mesh Behavior Minimize number of hops through mesh Each hop add significant latency ICP hops can cost a 2 sec timeout each! Strict hierarchies cost disk lookup, etc. Especially painful for misses Share across many users and scale to many caches ICP does not scale to a large number of peers Cache and fetch data close to clients

Lecture 26: May 1, Problems Over 50% of all HTTP objects are uncacheable – why? Not easily solvable Dynamic data  stock prices, scores, web cams CGI scripts  results based on passed parameters Obvious fixes SSL  encrypted data is not cacheable Most web clients don’t handle mixed pages well  many generic objects transferred with SSL Cookies  results may be based on passed data Hit metering  owner wants to measure # of hits for revenue, etc. What will be the end result?

Lecture 26: May 1, Proxy Implementation Problems Aborted transfers Many proxies transfer entire document even though client has stopped  eliminates saving of bandwidth Making objects cacheable Proxy’s apply heuristics  cookies don’t apply to some objects, guesswork on expiration May not match client behavior/desires Client misconfiguration Many clients have either absurdly small caches or no cache How much would hit rate drop if clients did the same things as proxies

Lecture 26: May 1, Questions – Population Size How does population size affect hit rate? Critical to understand usefulness of hierarchy or placement of caches Issues: frequency of access vs. frequency of change (ignore working set size  infinite cache) UW/Msoft measurement  hit rate rises quickly to about 5000 people and very slowly beyond that Proxies/Hierarchies don’t make much sense for populations > 5000 Single proxies can easily handle such populations Hierarchies only make sense for policy/administrative reasons

Lecture 26: May 1, Questions – Common Interests Do different communities have different interests? I.e. do CS and English majors access same pages? IBM and Pepsi workers? Has some impact  UW departments have about 5% higher hit rate than randomly chosen UW groups Many common interests remain Is this true in general? UW students have more in common than IBM & Pepsi workers Some related observations Geographic caching – server traces have shown that there is geographic locality to interest UW & MS hierarchy performance is bad – could be due to size or interests?

Lecture 26: May 1, Overview HTTP Basics HTTP Fixes Web Caches Content Distribution Networks

Lecture 26: May 1, CDN Replicate content on many servers Challenges How to replicate content Where to replicate content How to find replicated content How to choose among know replicas How to direct clients towards replica Discussed in DNS/server selection lecture DNS, HTTP 304 response, anycast, etc. Akamai

Lecture 26: May 1, How Akamai Works Clients fetch html document from primary server E.g. fetch index.html from cnn.com URLs for replicated content are replaced in html E.g. replaced with Client is forced to resolve aXYZ.g.akamaitech.net hostname

Lecture 26: May 1, How Akamai Works How is content replicated? Akamai only replicates static content Modified name contains original file Akamai server is asked for content First checks local cache If not in cache, requests file from primary server and caches file

Lecture 26: May 1, How Akamai Works Root server gives NS record for akamai.net Akamai.net name server returns NS record for g.akamaitech.net Name server chosen to be in region of client’s name server TTL is large G.akamaitech.net nameserver choses server in region Should try to chose server that has file in cache - How to choose? Uses aXYZ name and consistent hash TTL is small

Lecture 26: May 1, Consistent Hash “view” = subset of all hash buckets that are visible Desired features Smoothness – little impact on hash bucket contents when buckets are added/removed Spread – small set of hash buckets that may hold an object regardless of views Load – across all views # of objects assigned to hash bucket is small

Lecture 26: May 1, Consistent Hash – Example Construction Assign each of C hash buckets to Klog(C) random points on unit interval Map object to random position on unit interval Hash of object = closest bucket Monotone  addition of bucket does not cause movement between existing buckets Spread & Load  small set of buckets that lie near object Balance  no bucket is responsible for large portion of unit interval

Lecture 26: May 1, How Akamai Works End-user cnn.com (content provider)DNS root serverAkamai server Akamai high-level DNS server Akamai low-level DNS server Closest Akamai server Get index. html Get /cnn.com/foo.jpg 12 Get foo.jpg 5

Lecture 26: May 1, Akamai – Subsequent Requests End-user cnn.com (content provider)DNS root serverAkamai server 12 Akamai high-level DNS server Akamai low-level DNS server Closest Akamai server Get index. html Get /cnn.com/foo.jpg