Squirrel: A decentralized peer-to- peer web cache Paper by Sitaram Iyer, Antony Rowstron and Peter Druschel (© 2002) Presentation* by Alexander Prohaska.

Slides:



Advertisements
Similar presentations
Peer-to-Peer Infrastructure and Applications Andrew Herbert Microsoft Research, Cambridge
Advertisements

Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002.
Squirrel: A peer-to-peer web cache Sitaram Iyer Joint work with Ant Rowstron (MSRC) and Peter Druschel.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Scalable Content-Addressable Network Lintao Liu
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
1 PASTRY Partially borrowed from Gabi Kliot ’ s presentation.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Bowstron & Peter Druschel Presented by: Long Zhang.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
Storage Management and Caching in PAST, a large-scale, persistent peer- to-peer storage utility Authors: Antony Rowstorn (Microsoft Research) Peter Druschel.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
Secure routing for structured peer-to-peer overlay networks Miguel Castro, Ayalvadi Ganesh, Antony Rowstron Microsoft Research Ltd. Peter Druschel, Dan.
Pastry Partially borrowed for Gabi Kliot. Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems  Antony Rowstron.
1 Pastry and Past Based on slides by Peter Druschel and Gabi Kliot (CS Department, Technion) Alex Shraer.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
SCALLOP A Scalable and Load-Balanced Peer- to-Peer Lookup Protocol for High- Performance Distributed System Jerry Chou, Tai-Yi Huang & Kuang-Li Huang Embedded.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.
Pastry And Squirrel Presented by Eirik T. Laberg Håvard Semundseth Orri G. Pálsson.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Wide-area cooperative storage with CFS
An Evaluation of Scalable Application-level Multicast Using Peer-to-peer Overlays Miguel Castro, Michael B. Jones, Anne-Marie Kermarrec, Antony Rowstron,
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems (Antony Rowstron and Peter Druschel) Shariq Rizvi First.
Mobile Ad-hoc Pastry (MADPastry) Niloy Ganguly. Problem of normal DHT in MANET No co-relation between overlay logical hop and physical hop – Low bandwidth,
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
1 PASTRY. 2 Pastry paper “ Pastry: Scalable, decentralized object location and routing for large- scale peer-to-peer systems ” by Antony Rowstron (Microsoft.
PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Peer-to-Peer Supported Cache System for File Transfer Joonbok Lee
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel, Middleware 2001.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Pastry Antony Rowstron and Peter Druschel Presented By David Deschenes.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
1 COMP 431 Internet Services & Protocols HTTP Persistence & Web Caching Jasleen Kaur February 11, 2016.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
Overview on Web Caching COSC 513 Class Presentation Instructor: Prof. M. Anvari Student name: Wei Wei ID:
Plethora: A Locality Enhancing Peer-to-Peer Network Ronaldo Alves Ferreira Advisor: Ananth Grama Co-advisor: Suresh Jagannathan Department of Computer.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
Peer-to-Peer Networks 05 Pastry Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
Fabián E. Bustamante, Fall 2005 A brief introduction to Pastry Based on: A. Rowstron and P. Druschel, Pastry: Scalable, decentralized object location and.
Chapter 29 Peer-to-Peer Paradigm Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Distributed Web Systems Peer-to-Peer Systems Lecturer Department University.
Coral: A Peer-to-peer Content Distribution Network
Pastry Scalable, decentralized object locations and routing for large p2p systems.
Controlling the Cost of Reliability in Peer-to-Peer Overlays
Web Caching? Web Caching:.
Plethora: Infrastructure and System Design
PASTRY.
Presentation transcript:

Squirrel: A decentralized peer-to- peer web cache Paper by Sitaram Iyer, Antony Rowstron and Peter Druschel (© 2002) Presentation* by Alexander Prohaska *Slides partly taken from [2]

Squirrel - A decentralized P2P cache Little motivation for Squirrel Large organizations sometimes have clusters of more than 20 machines acting as web cache to handle peak loads  Expensive hardware and administration Decentralized peer-to-peer web cache  Every computer in the intranet takes a little part of the web caches job  Got the web cache for free! What?

Squirrel - A decentralized P2P cache Overview Web caching Pastry Squirrel Evaluation Conclusion

Squirrel - A decentralized P2P cache Web caching Goals Principle Web cache Cooperative web cache Decentralized web cache Centralized vs. decentralized web cache

Squirrel - A decentralized P2P cache Goals of web caching Goals  Reduce latency  Decrease external traffic  Reduce load on web servers and routers  Deployed at corporate network boundaries, ISPs, etc. Assumptions for corporate LAN  Located in a single geographical region Latency between two LAN PCs << latency to external servers Inner-LAN bandwidth >> external bandwidth

Squirrel - A decentralized P2P cache Principle of web caching Web browser makes HTTP GET requests for internet objects  Serviced from local browser cache web cache(s) or origin web server depending on which has a fresh copy of the object

Squirrel - A decentralized P2P cache Principle of web caching (2) Object is not in local browser cache (cache miss or object is uncacheable)  Forward request to next level towards origin server Object is found in local browser cache  Check for freshness If fresh, object is returned to the web browser, otherwise local browser cache issues a conditional GET (cGET) request to the next level for validation  Typically an If-Modified-Since request with timestamp  Responses to cGET are the new object or “not modified”

Squirrel - A decentralized P2P cache Web cache internetcorporate LAN browser cache browser cache client centralized web cache web server

Squirrel - A decentralized P2P cache Cooperative web cache internetcorporate LAN browser cache browser cache client web cache cluster web server

Squirrel - A decentralized P2P cache Decentralized web cache internetcorporate LAN browser cache browser cache client Squirrel web server

Squirrel - A decentralized P2P cache Centralized vs. decentralized Centralized web cache Administrative costs Dedicated hardware Scaling implies upgrading Single point of failure Decentralized (Squirrel) Self-organizing network No additional hardware Resources grow with clients Resilient of concurrent node failure

Squirrel - A decentralized P2P cache Overview Web caching Pastry Squirrel Evaluation Conclusion

Squirrel - A decentralized P2P cache Features of Pastry Self-organizing, decentralized & scalable DHT Circular 128-bit namespace for NodeIds and ObjectIds (uniformly, random) Objects are mapped to the live node numerically closest to ObjectIds A pastry node can route within log 2 b N routing steps to the numerically closest node for a given ObjectId Automatically adapts to the arrival, departure and failure of nodes NodeIds ObjectIds

Squirrel - A decentralized P2P cache Overview Web caching Pastry Squirrel Evaluation Conclusion

Squirrel - A decentralized P2P cache Squirrel Environment Scheme Mapping Squirrel onto Pastry  Home-store model  Directory model  Comparison

Squirrel - A decentralized P2P cache Environment of Squirrel desktop machines in a corporate LAN Located in a single geographical region, e.g. a building or campus Each node  runs an instance of Squirrel,  disables the browsers cache and  sets Squirrel as the browser’s proxy

Squirrel - A decentralized P2P cache Scheme of Squirrel Web browser issues requests to Squirrel proxy Squirrel checks if the object is cacheable If it’s cacheable and already in the cache, Squirrel checks freshness If it’s not fresh, Squirrel maps the object to another Squirrel node using Pastry That’s the home node for the object

Squirrel - A decentralized P2P cache Mapping Squirrel onto Pastry Client nodes always cache objects locally Two approaches  Home-store model Home node also stores object  Directory model Home node remembers a small directory of (up to e.g. 4) pointers to recent clients (delegates) Home node forwards subsequent requests to randomly chosen delegates with fresh object

Squirrel - A decentralized P2P cache Home-store model (first request) client home URL hash LANinternet web server = in cache after request

Squirrel - A decentralized P2P cache Home-store model (second request) home URL hash LANinternet web server = in cache before request client = in cache after request

Squirrel - A decentralized P2P cache Directory model (first request) client home 3. LANinternet 4. web server “not cached” pointer to node with object in cache = in cache after request

Squirrel - A decentralized P2P cache Directory model (second request) home 3. LANinternet web server pointer to node with object in cache = in cache after request = in cache before request client

Squirrel - A decentralized P2P cache More on home-store model Advantages  It’s simple  Hash function does mapping of objects to nodes, so popular files are uniformly distributed Disadvantages  Stores objects at home nodes and wastes storage  Copies of objects in the cache of non-home nodes are not shared in the network

Squirrel - A decentralized P2P cache More on directory model Advantages  Avoids storing unnecessary copies at the home node  Rapidly changing directory for popular objects seems to improve load balancing Disadvantages  Only requesting clients store objects in cache, so active clients store all the popular objects and inactive clients waste most of their storage Seems to decline load balancing Improving or declining load balancing?

Squirrel - A decentralized P2P cache Overview Web caching Pastry Squirrel Evaluation Conclusion

Squirrel - A decentralized P2P cache Evaluation Trace characteristics Total external traffic Latency Load Fault tolerance

Squirrel - A decentralized P2P cache Trace characteristics Microsoft located in : RedmondCambridge Total duration1 day31 days Number of clients Number of HTTP requests16.41 million0.971 million Mean request rate190 req/s0.362 req/s Peak request rate606 req/s186 req/s Number of objects5.13 million0.469 million Number of cacheable objects2.56 million0.226 million Mean cacheable object reuse 5,4 times3,22 times Total external bandwidth88,1GB5,7GB Hit ratio29%38%

Squirrel - A decentralized P2P cache Total external bandwidth RedmondCambridge The number of bytes transferred between Squirrel (LAN) and the internet (lower is better)

Squirrel - A decentralized P2P cache Latency (Redmond) 0% 20% 40% 60% 80% 100% Total hops within the LAN CentralizedHome-storeDirectory % of cacheable requests Home-store needs between 3 and 4 hops to locate home node + 1 hop for the return path (mean 4,11 hops) Directory needs as many hops as home-store + 1 for forwarding to delegate (mean 4,56 hops) Additional, there are other cases

Squirrel - A decentralized P2P cache Latency (Cambridge) 0% 20% 40% 60% 80% 100% Total hops within the LAN CentralizedHome-storeDirectory % of cacheable requests Like the Redmond trace, only shifted left by app. 2 Home-store: mean 1,8 hops Directory : mean 2,0 hops

Squirrel - A decentralized P2P cache Peak load on single nodes Home-store model LocationObjects/sObjects/mAv. objects/sAv. objects/m Redmond483886,660 Cambridge551250,0270,7 LocationObjects/sObjects/mAv. objects/sAv. objects/m Redmond8651,536 Cambridge9350,0381,13 Home-store model (hash function) causes drastically lower load compared to the directory model (access pattern) Directory model

Squirrel - A decentralized P2P cache Fault tolerance Sudden node failures result in partial loss of cached content  Home-store: proportional to failed nodes  Directory: more vulnerable If 1% of all Squirrel nodes abruptly crash, the fraction of lost cached content is: Home-storeDirectory Redmond Mean 1% Max 1.77% Mean 1.71% Max 19.3% Cambridge Mean 1% Max 3.52% Mean 1.65% Max 9.8%

Squirrel - A decentralized P2P cache Overview Web caching Pastry Squirrel Evaluation Conclusion

Squirrel - A decentralized P2P cache Conclusions Possible to decentralize web caching Performance comparable to a centralized web cache Better in terms of cost, scalability and administrative effort Under the made assumptions, the home-store scheme is superior to the directory scheme

Squirrel - A decentralized P2P cache References [1] Sitaram Iyer, Antony Rowstron, Peter Druschel, Squirrel: A decentralized peer to peer web cache, July 2002, [2] Sitaram Iyer, Presentation slides from PODC, July 2002, Monterey, CA., podc.pdfhttp://research.microsoft.com/PAST/squirrel- podc.pdf [3] Antony Rowstron, Peter Druschel, Pastry: Scalable, decentralized object location and routing for large scale peer-to-peer systems, November 2001, [4] Miguel Castro, Peter Druschel, Y. Charlie Hu, and Antony Rowstron, September 2002, Topology-aware routing in structured peer-to-peer overlay networks,

Squirrel - A decentralized P2P cache Finish Thanks for your attention Any questions?

Squirrel - A decentralized P2P cache Outtakes The following slides were taken out for timing reasons.

Squirrel - A decentralized P2P cache Pastry Features Node state Routing table Routing Routing example Self-organization

Squirrel - A decentralized P2P cache Pastry node state – an example Set of nodes with |L|/2 smaller and |L|/2 larger numerically closest NodeIds |M| “physically” closest nodes Prefix-based routing entries

Squirrel - A decentralized P2P cache Routing tables in Pastry NodeIds are in base 2 b One row for each prefix of local NodeId (log 2 b N populated on average) One column for each possible digit in the NodeId representation (2 b -1) Configuration parameter b defines the tradeoff:  (log 2 b N) * (2 b -1) entries to route within  log 2 b N routing hops  Typical value for b is 4

Squirrel - A decentralized P2P cache Routing with Pastry D B C A E No guarantee to find shortest path Gives rise to relatively good path Next node is chosen from an exponentially decreasing number of nodes Distance during each successive routing step is exponentially increasing  Dist(A,B)<Dist(B,C) ……

Squirrel - A decentralized P2P cache An example for routing in Pastry 0XXX 1XXX2XXX3XXX START 0112 routes a message to key First hop fixes first digit (2) Second hop fixes second digit (20) END 2001 closest live node to xxx1xxx2xxx 3xxx

Squirrel - A decentralized P2P cache Self-organization in Pastry Node arrival  Compute NodeId X (typically SHA-1 hash of IP address)  Find an nearby Pastry node automatically  Send special “join” message to X Pastry will route this message to the existing node numerically closest to the new node All nodes that route or receive the message update their state table and send it to X Build up own state table by taking the appropriate parts of the received state tables

Squirrel - A decentralized P2P cache Self-organization in Pastry (2) Node departure  Nodes may fail or depart without warning  Pastry guarantees that each node lazily repairs its leaf set unless |L|/2 nodes with adjacent NodeIds have failed simultaneously L is the second configuration parameter, typical value is between 16 and 32  Inaccurate routing table entries must be replaced to preserve the integrity of routing tables Meanwhile, messages are routed over a node that is numerically closer to the destination than the current node

Squirrel - A decentralized P2P cache More on directory model (2) Load spike example  Requests for web pages with many embedded images will all be served by a few recent clients  Many home nodes point to such clients Declining or improving load balance?  Evaluation will show

Squirrel - A decentralized P2P cache Hit Ratio Defined as the fraction of all objects that are serviced by the web cache Hit ratio of the Squirrel cache is indirectly related to the external bandwidth Squirrel approaches hit ratio of centralized cache with increasing per-node contribution At about 100 MB per node cache size, Squirrel achieves 28% and 37% for Redmond and Cambridge traces (out of possible 29% and 38%), respectively