Download presentation
Presentation is loading. Please wait.
Published byRachel Wagner Modified over 10 years ago
1
Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002 / Sitaram Iyer / Tuesday July 23 / Monterey, CA
2
Web Caching 1.Latency, 2.External traffic, 3.Load on web servers and routers. Deployed at: Corporate network boundaries, ISPs, Web Servers, etc.
3
Centralized Web Cache Web Cache Browser Cache Web Server Browser Cache Client InternetCorporate LAN
4
InternetCorporate LAN Cooperative Web Cache Browser Cache Web Server Browser Cache Client Web Cache
5
Internet Decentralized Web Cache Browser Web Server Browser Cache Client Corporate LAN Browser Cache Squirrel
6
Distributed Hash Table Peer-to-peer location service: Pastry Completely decentralized and self-organizing Fault-tolerant, scalable, efficient Operations: Insert(k,v) Lookup(k) k6,v6 k1,v1 k5,v5 k2,v2 k4,v4 k3,v3 node s Peer-to-peer routing and location substrate
7
Why peer-to-peer ? 1.Cost of dedicated web cache No additional hardware 2.Administrative effort Self-organizing network 3.Scaling implies upgrading Resources grow with clients
8
Setting Corporate LAN 100 - 100,000 desktop machines Located in a single building or campus Each node runs an instance of Squirrel Sets it as the browsers proxy
9
Mapping Squirrel onto Pastry Two approaches: Home-store Directory
10
Home-store model client home LAN Internet URL hash
11
Home-store model client home …thats how it works!
12
Directory model Client nodes always cache objects locally. Home-store: home node also stores objects. Directory: the home node only stores pointers to recent clients, and forwards requests.
13
Directory model client home Internet LAN
14
Directory model client home Randomly choose entry from table
15
Directory: Advantages Avoids storing unnecessary copies of objects. Rapidly changing directory for popular objects seems to improve load balancing. Home-store scheme can incur hotspots.
16
Directory: Disadvantages Cache insertion only happens at clients, so: active clients store all the popular objects, inactive clients waste most of their storage. Implications: 1.Reduced cache size. 2.Load imbalance.
17
Directory: Load spike example Web page with many embedded images, or Periods of heavy browsing. Many home nodes point to such clients! Evaluate …
18
Trace characteristics Microsoft in : Redmond Cambrid ge Total duration1 day31 days Number of clients36,782105 Number of HTTP requests16.41 million0.971 million Peak request rate606 req/sec186 req/sec Number of objects5.13 million0.469 million Number of cacheable objects 2.56 million0.226 million Mean cacheable object reuse 5.4 times3.22 times
19
Total external traffic 85 90 95 100 105 0.0010.010.1110100 Directory Home-store No web cache Centralized cache Redmond [lower is better] Per-node cache size (in MB) Total external traffic (GB)
20
Total external traffic 5.5 5.6 5.7 5.8 5.9 6 6.1 0.0010.010.1110100 Total external traffic (GB) [lower is better] Per-node cache size (in MB) Directory Home-store No web cache Centralized cache Cambridge
21
LAN Hops 0% 20% 40% 60% 80% 100% 0123456 Total hops within the LAN Redmond CentralizedHome-storeDirectory % of cacheable requests
22
LAN Hops 0% 20% 40% 60% 80% 100% 012345 % of cacheable requests CentralizedHome-storeDirectory Cambridge Total hops within the LAN
23
Load in requests per sec 1 10 100 1000 10000 100000 01020304050 Number of times observed Max objects served per-node / second Home-store Directory Redmond
24
Load in requests per sec 1 10 100 1000 10000 100000 1e+06 1e+07 01020304050 Number of times observed Max objects served per-node / second Home-store Directory Cambridge
25
Load in requests per min 1 10 100 050100150200250300350 Number of times observed Max objects served per-node / minute Home-store Directory Redmond
26
Load in requests per min 1 10 100 1000 10000 020406080100120 Number of times observed Max objects served per-node / minute Home-store Directory Cambridge
27
Fault tolerance Sudden node failures result in partial loss of cached content. Home-store:Proportional to failed nodes. Directory:More vulnerable.
28
Fault tolerance Home-storeDirectory Redmond Mean 1% Max 1.77% Mean 1.71% Max 19.3% Cambrid ge Mean 1% Max 3.52% Mean 1.65% Max 9.8% If 1% of Squirrel nodes abruptly crash, the fraction of lost cached content is:
29
Conclusions Possible to decentralize web caching. Performance comparable to a centralized web cache, Is better in terms of cost, scalability, and administration effort, and Under our assumptions, the home- store scheme is superior to the directory scheme.
30
Other aspects of Squirrel Adaptive replication –Hotspot avoidance –Improved robustness Route caching –Fewer LAN hops
31
Thanks.
32
(backup) Storage utilization Redmond Home-storeDirectory Total 97641 MB61652 MB Mean per-node 2.6 MB1.6 MB Max per-node 1664 MB
33
(backup) Fault tolerance Home-storeDirectory Equations Mean H/O Max H max /O Mean (H+S)/O Max max(H max,S max )/O Redmond Mean 0.0027% Max 0.0048% Mean 0.198% Max 1.5% Cambridge Mean 0.95% Max 3.34% Mean 1.68% Max 12.4%
34
(backup) Full home-store protocol server client other req home req a : object or notmod from home b : object or notmod from origin 3 1 b 2 (WAN) (LAN) origin b : req
35
(backup) Full directory protocol dir server e : cGET req origin other req home req client req 2 b : not-modified 3 e 3 2 1 c,e : req c,e : object 1 4 a, d 2 a, d : req 1 a : no dir, go to origin. Also d 2 3 1 not-modified object or dele- gate
36
(backup) Peer-to-peer Computing Decentralize a distributed protocol: – Scalable – Self-organizing – Fault tolerant – Load balanced Not automatic!!
37
Decentralized Web Cache Browser Cache Browser Cache Web Server LAN Internet
38
Challenge Decentralized web caching algorithm: Need to achieve those benefits in practice! Need to keep overhead unnoticeably low. Node failures should not become significant.
39
Peer-to-peer routing, e.g., Pastry Peer-to-peer object location and routing substrate = Distributed Hash Table. Reliably maps an object key to a live node. Routes in log 16 (N) steps (e.g. 3-4 steps for 100,000 nodes)
40
Home-store is better! Simpler home-store scheme achieves load balancing by hash function randomization. Directory scheme implicitly relies on access patterns for load distribution.
41
Directory scheme seems better… Avoids storing unnecessary copies of objects. Rapidly changing directory for popular objects results in load balancing.
42
Interesting difference Consider: – Web page with many images, or – Heavily browsing node Directory:many pointers to some node. Home-store:natural load balancing. Evaluate …
43
Fault tolerance Home-storeDirectory Redmond Mean 0.0027% Max 0.0048% Mean 0.2% Max 1.5% Cambrid ge Mean 0.95% Max 3.34% Mean 1.7% Max 12.4% When a single Squirrel node crashes, the fraction of lost cached content is:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.