Download presentation
Presentation is loading. Please wait.
Published byStephany Theodora Hodge Modified over 8 years ago
1
Aditya Akella (aditya@cs.cmu.edu) The Web Aditya Akella 18 Apr, 2002
2
Aditya Akella (aditya@cs.cmu.edu) In Today’s Lecture… Web caches Content Distribution Networks Peer-to-Peer Networks
3
Aditya Akella (aditya@cs.cmu.edu) In Today’s Lecture… Web caches Caching Proxies Cache hierarchies, ICP Towards Optimal Caches Discussion Content Distribution Networks Peer-to-Peer Networks
4
Aditya Akella (aditya@cs.cmu.edu) Web Caching Why cache HTTP objects? Reduce client response time Serve locally Reduce network bandwidth usage Wide area vs. local area use
5
Aditya Akella (aditya@cs.cmu.edu) Caching Web Proxies Sharing the Internet connection Small businesses – network connectivity can be shared between many workstations Filtering Proxy is only host that can access Internet Security -- Administrators makes sure that it is secure Policy -- Filter requests Caching Provides a centralized coordination point to share information across clients
6
Aditya Akella (aditya@cs.cmu.edu) Caching Proxies - Sources for Misses Capacity Can only cache a limited set of objects Cache typically on disk Compulsory retrievals First time access to document Non-cacheable documents CGI-scripts Personalized documents (cookies, etc) Encrypted data (SSL) Consistency Document has been updated/expired before reuse
7
Aditya Akella (aditya@cs.cmu.edu) Cache Hierarchies Population for single proxy limited Performance, administration, policy, etc Use hierarchy to scale a proxy to more than limited population Why is hierarchy scalable? Larger population = higher hit rate Larger effective cache size But, need caches talk to each other… Internet Cache Protocol (ICP)
8
Aditya Akella (aditya@cs.cmu.edu) ICP Simple protocol to query another cache for content Uses UDP to avoid overhead ICP message contents Type – query, hit, hit_obj, miss Other – identifier, URL, version, sender address
9
Aditya Akella (aditya@cs.cmu.edu) Squid Cache ICP Use Upon query that is not in cache Sends ICP_Query to each peer and parents Sets time to short period (default 2 sec) Peer caches process queries and return either ICP_Hit or ICP_Miss Proxy begins transfer upon reception of ICP_Hit Upon timer expiration, proxy request object from closest (RTT) parent proxy Better -- Direct to parent that is towards origin server
10
Aditya Akella (aditya@cs.cmu.edu) Squid Client Parent Child Web page request ICP Query
11
Aditya Akella (aditya@cs.cmu.edu) Squid Client Parent Child ICP MISS
12
Aditya Akella (aditya@cs.cmu.edu) Squid Client Parent Child Web page request
13
Aditya Akella (aditya@cs.cmu.edu) Squid Client Parent Child Web page request ICP Query
14
Aditya Akella (aditya@cs.cmu.edu) Squid Client Parent Child Web page request ICP MISS ICP HIT
15
Aditya Akella (aditya@cs.cmu.edu) Squid Client Parent Child Web page request
16
Aditya Akella (aditya@cs.cmu.edu) Optimal Cache Mesh Behavior Minimize number of hops through mesh Each hop adds significant latency ICP hops can cost a 2 sec timeout each! Especially painful for misses Share across many users and scale to many caches ICP does not scale to a large number of peers Cache and fetch data close to clients
17
Aditya Akella (aditya@cs.cmu.edu) Hinting Have proxies store content as well as metadata about contents of other proxies (hints) Minimizes number of hops through mesh Size of hint cache is a concern – size of key vs. size of document Having hints can help consistency Makes it possible to push updated documents or invalidations to other caches How to keep hints up-to-date? Not critical – incorrect hint results in extra lookups not incorrect behavior Can do updates to peers at regular intervals
18
Aditya Akella (aditya@cs.cmu.edu) An Example - Summary Cache Typical cache has 8GB of space and 8KB objects 1M objects Using 16byte hashes (MD5) 16MB per peer Solution: Bloom filters Bloom filters can help save Proxy contents summarized as a M bit value Each page stored contributes k hash values in range [1..M] Bits for k hashes set in summary Check for page => if all pages k hash bits are set in summary it is likely that proxy has summary Delayed propagation of hints Waits until threshold %age of cached documents are not in summary
19
Aditya Akella (aditya@cs.cmu.edu) Leases Only consistency mechanism in HTTP is for clients to poll server for updates Should HTTP also support invalidations? Problem: server would have to keep track of many, many clients who may have document Possible solution: leases Leases – server promises to provide invalidates for a particular lease duration Server can adapt time/duration of lease as needed To number of clients, frequency of page change…
20
Aditya Akella (aditya@cs.cmu.edu) How Useful can Caching Be? Over 50% of all HTTP objects are uncacheable – why? Many issues -- Not easily solvable Dynamic data stock prices, scores, web cams CGI scripts results based on passed parameters SSL encrypted data is not cacheable Most web clients don’t handle mixed pages well many generic objects transferred with SSL Hit metering owner wants to measure # of hits for revenue, etc. Are proxies the best solution?
21
Aditya Akella (aditya@cs.cmu.edu) Web Proxies - Problems Implementation issues Aborted transfers Cache size settings What if clients did what proxies already do? Utility of proxy caching could be marginal Hierarchies – no longer useful Faster processors => proxies can handle large populations Hierarchy useful only for policy, administration
22
Aditya Akella (aditya@cs.cmu.edu) In Today’s Lecture… Web caches Content Distribution Networks Server Selection Akamai Peer-to-Peer Networks
23
Aditya Akella (aditya@cs.cmu.edu) CDN Replicate content on many servers Help serve data from the nearest replica Challenges – to name a few… How to find replicated content How to choose among know replicas How to direct clients towards best replica DNS, HTTP 304 response, anycast, etc. How to replicate content Where to replicate content
24
Aditya Akella (aditya@cs.cmu.edu) CDN Replicate content on many servers Help serve data from the nearest replica Challenges – to name a few… How to choose among know replicas How to direct clients towards best replica DNS, HTTP 304 response, anycast, etc. How to find replicated content How to replicate content Where to replicate content Akamai
25
Aditya Akella (aditya@cs.cmu.edu) Server Selection Service is replicated in many places in network Which server to pick? Lowest load to balance load on servers Best performance to improve client performance Based on Geography? RTT? Throughput? Any alive node to provide fault tolerance
26
Aditya Akella (aditya@cs.cmu.edu) Redirecting Clients How to direct clients to a particular server? As part of routing anycast, cluster load balancing As part of application HTTP redirect As part of naming DNS
27
Aditya Akella (aditya@cs.cmu.edu) Redirection - Routing Based Anycast Give service a single IP address Each node implementing service advertises route to address Packets get routed routed from client to “closest” service node Closest is defined by routing metrics May not mirror performance/application needs What about the stability of routes?
28
Aditya Akella (aditya@cs.cmu.edu) Redirection - Routing Based Cluster load balancing Router in front of cluster of nodes directs packets to server Must be done on connection by connection basis Forces router to keep per connection state How to choose server Easiest to decide based on arrival of first packet in exchange Primarily based on local load Can be based on later packets (e.g. HTTP Get request) but makes system more complex
29
Aditya Akella (aditya@cs.cmu.edu) Redirection - Application Based HTTP supports simple way to indicate that Web page has moved Server gets Get request from client Chooses best server is best suited for particular client and object and returns HTTP redirect to that server Can make informed application specific decision May introduce additional overhead Multiple connection setup, name lookups, etc. While good solution in general HTTP Redirect has some design flaws – especially with current browsers
30
Aditya Akella (aditya@cs.cmu.edu) Redirection - Naming Based Client does name lookup for service Name server chooses appropriate server address What information can it base decision on? Server load/location must be collected Name service client Typically the local name server for client Round-robin Randomly choose replica Avoid hot-spots
31
Aditya Akella (aditya@cs.cmu.edu) Redirection - Naming Based Predicted application performance How to predict? Only have limited info at name resolution Multiple techniques Static metrics to get coarse grain answer Current performance among smaller group How does this affect caching? Typically want low TTL to adapt to load changes What do the first and subsequent lookup do?
32
Aditya Akella (aditya@cs.cmu.edu) How Akamai Works Clients fetch html document from primary server E.g. fetch index.html from cnn.com URLs for replicated content are replaced in html E.g. replaced with Client is forced to resolve aXYZ.g.akamaitech.net hostname
33
Aditya Akella (aditya@cs.cmu.edu) How Akamai Works How is content replicated? Akamai only replicates static content Modified name contains original file Akamai server is asked for content First checks local cache If not in cache, requests file from primary server and caches file
34
Aditya Akella (aditya@cs.cmu.edu) How Akamai Works Root server gives NS record for akamai.net Akamai.net name server returns NS record for g.akamaitech.net Name server chosen to be in region of client’s name server TTL is large G.akamaitech.net nameserver choses server in region Should try to chose server that has file in cache - How to choose? Uses aXYZ name ( uses consistent hash) TTL is small
35
Aditya Akella (aditya@cs.cmu.edu) How Akamai Works End-user cnn.com (content provider)DNS root serverAkamai server 123 4 Akamai high-level DNS server Akamai low-level DNS server Closest Akamai server 11 6 7 8 9 10 Get index. html Get /cnn.com/foo.jpg 12 Get foo.jpg 5
36
Aditya Akella (aditya@cs.cmu.edu) Akamai – Subsequent Requests End-user cnn.com (content provider)DNS root serverAkamai server 12 Akamai high-level DNS server Akamai low-level DNS server Closest Akamai server 7 8 9 10 Get index. html Get /cnn.com/foo.jpg
37
Aditya Akella (aditya@cs.cmu.edu) In Today’s Lecture… Web caches Content Distribution Networks Peer-to-Peer Networks Overview Napster, Gnutella, Freenet
38
Aditya Akella (aditya@cs.cmu.edu) Peer-to-Peer Networks Each node in the network has identical capabilities No notion of a server or a client Typically each member stores content that it desires Basically a replication system for files Have multiple accessible sources of data Peer-to-peer networks allow files to be anywhere Searching is the key challenge Dynamic member list makes it more difficult
39
Aditya Akella (aditya@cs.cmu.edu) The Lookup Problem Internet N1N1 N2N2 N3N3 N6N6 N5N5 N4N4 Publisher Key=“title” Value=MP3 data… Client Lookup(“title”) ?
40
Aditya Akella (aditya@cs.cmu.edu) Centralized Lookup (Napster) Publisher@ Client Lookup(“title”) N6N6 N9N9 N7N7 DB N8N8 N3N3 N2N2 N1N1 SetLoc(“title”, N4) Simple, but O( N ) state and a single point of failure Key=“title” Value=MP3 data… N4N4
41
Aditya Akella (aditya@cs.cmu.edu) Flooded Queries (Gnutella) N4N4 Publisher@ Client N6N6 N9N9 N7N7 N8N8 N3N3 N2N2 N1N1 Robust, but worst case O( N ) messages per lookup Key=“title” Value=MP3 data… Lookup(“title”)
42
Aditya Akella (aditya@cs.cmu.edu) N4N4 Publisher Client N6N6 N9N9 N7N7 N8N8 N3N3 N2N2 N1N1 Lookup(“title”) Key=“title” Value=MP3 data… Routed Queries (Freenet)
43
Aditya Akella (aditya@cs.cmu.edu) Napster Simple centralized scheme motivated by ability to sell/control How to find a file: On startup, client contacts central server and reports list of files Query the index system return a machine that stores the required file Ideally this is the closest/least-loaded machine Fetch the file directly from peer Advantages: Simplicity, easy to implement sophisticated search engines on top of the index system Disadvantages: Robustness, scalability
44
Aditya Akella (aditya@cs.cmu.edu) Gnutella Distribute file location On startup client contacts any servent (server + client) in network Servent interconnection used to forward control (queries, hits, etc) Idea: multicast the request How to find a file: Send request to all neighbors Neighbors recursively multicast the request Eventually a machine that has the file receives the request, and it sends back the answer Advantages: Totally decentralized, highly robust Disadvantages: Not scalable
45
Aditya Akella (aditya@cs.cmu.edu) Gnutella Details Basic message header Unique ID, TTL, Hops Message types Ping – probes network for other servents Pong – response to ping, contains IP addr, # of files, # of Kbytes shared Query – search criteria + speed requirement of servent QueryHit – successful response to Query, contains addr + port to transfer from, speed of servent, number of hits, hit results, servent ID Push – request to servent ID to initiate connection, used to traverse firewalls Ping, Queries are flooded QueryHit, Pong, Push reverse path of previous message
46
Aditya Akella (aditya@cs.cmu.edu) Gnutella: Example Assume: m1’s neighbors are m2 and m3; m3’s neighbors are m4 and m5;… A B C D E F m1 m2 m3 m4 m5 m6 E? E
47
Aditya Akella (aditya@cs.cmu.edu) Freenet Additional goals to file location/replication: Provide publisher anonymity, security Resistant to attacks Files are stored according to associated key Core idea: try to cluster information about similar keys Messages Random 64bit ID used for loop detection TTL TTL 1 are forwarded with finite probablity -- Helps anonymity Depth counter Opposite of TTL – incremented with each hop Depth counter initialized to small random value
48
Aditya Akella (aditya@cs.cmu.edu) Data Structure Each node maintains a common stack id – file identifier next_hop – another node that store the file id file – file identified by id being stored on the local node Forwarding: Each message contains the file id it is referring to If file id stored locally, then stop Forwards data back to upstream requestor Requestor adds file to cache, adds entry in routing table If not, search for the “closest” id in the stack, and forward the message to the corresponding next_hop id next_hop file … …
49
Aditya Akella (aditya@cs.cmu.edu) Query Example Note: doesn’t show file caching on the reverse path 4 n1 f4 12 n2 f12 5 n3 9 n3 f9 3 n1 f3 14 n4 f14 5 n3 14 n5 f14 13 n2 f13 3 n6 n1 n2 n3 n4 4 n1 f4 10 n5 f10 8 n6 n5 query(10) 1 2 3 44’ 5
50
Aditya Akella (aditya@cs.cmu.edu) Freenet Requests Any node forwarding reply may change the source of the reply (to itself or any other node) Helps anonymity Each query is associated a TTL that is decremented each time the query message is forwarded; to obscure distance to originator: TTL can be initiated to a random value within some bounds When TTL=1, the query is forwarded with a finite probability Each node maintains the state for all outstanding queries that have traversed it help to avoid cycles If data is not found, failure is reported back Requestor then tries next closest match in routing table
51
Aditya Akella (aditya@cs.cmu.edu) Freenet Request 1 AB C D E F Data Request Data Reply Request Failed 2 3 12 6 7 4 11 10 9 5 8
52
Aditya Akella (aditya@cs.cmu.edu) Freenet Search Features Nodes tend to specialize in searching for similar keys over time Gets queries from other nodes for similar keys Nodes store similar keys over time Caching of files as a result of successful queries Similarity of keys does not reflect similarity of files Routing does not reflect network topology
53
Aditya Akella (aditya@cs.cmu.edu) Freenet File Creation Key for file generated and searched helps identify collision Not found (“All clear”) result indicates success Source of insert message can be change by any forwarding node Creation mechanism adds files/info to locations with similar keys New nodes are discovered through file creation Erroneous/malicious inserts propagate original file further
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.