Peer-to-Peer Networks 11 Past Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.

Slides:



Advertisements
Similar presentations
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Advertisements

Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
Storage management and caching in PAST Antony Rowstron and Peter Druschel Presented to cs294-4 by Owen Cooper.
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.
Storage management and caching in PAST, a large-scale, persistent peer- to-peer storage utility Antony Rowstron, Peter Druschel.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
1 PASTRY Partially borrowed from Gabi Kliot ’ s presentation.
Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems Peter Druschel, Rice University Antony Rowstron, Microsoft.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Applications over P2P Structured Overlays Antonino Virgillito.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
Storage Management and Caching in PAST, a large-scale, persistent peer- to-peer storage utility Authors: Antony Rowstorn (Microsoft Research) Peter Druschel.
Introducing: Cooperative Library Presented August 19, 2002.
Pastry Partially borrowed for Gabi Kliot. Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems  Antony Rowstron.
1 Pastry and Past Based on slides by Peter Druschel and Gabi Kliot (CS Department, Technion) Alex Shraer.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
peer-to-peer file systems
Secure routing for structured peer-to-peer overlay networks (by Castro et al.) Shariq Rizvi CS 294-4: Peer-to-Peer Systems.
Large Scale Sharing GFS and PAST Mahesh Balakrishnan.
Performance Evaluation of Peer-to-Peer Video Streaming Systems Wilson, W.F. Poon The Chinese University of Hong Kong.
1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
Wide-area cooperative storage with CFS
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
Freenet: A Distributed Anonymous Information Storage and Retrieval System Presentation by Theodore Mao CS294-4: Peer-to-peer Systems August 27, 2003.
1 PASTRY. 2 Pastry paper “ Pastry: Scalable, decentralized object location and routing for large- scale peer-to-peer systems ” by Antony Rowstron (Microsoft.
PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Security Michael Foukarakis – 13/12/2004 A Survey of Peer-to-Peer Security Issues Dan S. Wallach Rice University,
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
Storage Management and Caching in PAST A Large-scale persistent peer-to-peer storage utility Presented by Albert Tannous CSE 598D: Storage Systems – Dr.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
Freenet “…an adaptive peer-to-peer network application that permits the publication, replication, and retrieval of data while protecting the anonymity.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel, Middleware 2001.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
Pastry Antony Rowstron and Peter Druschel Presented By David Deschenes.
Peer to Peer Network Design Discovery and Routing algorithms
Peer-to-Peer Networks 03 CAN (Content Addressable Network) Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Peer-to-Peer Networks 11 Past Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
Peer-to-Peer (P2P) File Systems. P2P File Systems CS 5204 – Fall, Peer-to-Peer Systems Definition: “Peer-to-peer systems can be characterized as.
Peer-to-Peer Networks 10 Fast Download Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
Peer-to-Peer Networks 05 Pastry Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
Fabián E. Bustamante, Fall 2005 A brief introduction to Pastry Based on: A. Rowstron and P. Druschel, Pastry: Scalable, decentralized object location and.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Pastry Scalable, decentralized object locations and routing for large p2p systems.
Internet Networking recitation #12
Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service
Peer-to-Peer Storage Systems
Applications (2) Outline Overlay Networks Peer-to-Peer Networks.
Presentation transcript:

Peer-to-Peer Networks 11 Past Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg

PAST  PAST: A large-scale, persistent peer-to-peer storage utility -by Peter Druschel (Rice University, Houston – now Max-Planck-Institut, Saarbrücken/Kaiserlautern) -and Antony Rowstron (Microsoft Research)  Literature -A. Rowstron and P. Druschel, "Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility", 18th ACM SOSP'01, all pictures from this paper -P. Druschel and A. Rowstron, "PAST: A large-scale, persistent peer-to- peer storage utility", HotOS VIII, May

Goals of PAST  Peer-to-Peer based Internet Storage -on top of Pastry  Goals -File based storage -High availability of data -Persistent storage -Scalability -Efficient usage of resources 3

Motivation  Multiple, diverse nodes in the Internet can be used -safety by different locations  No complicated backup -No additional backup devices -No mirroring -No RAID or SAN systems with special hardware  Joint use of storage -for sharing files -for publishing documents  Overcome local storage and data safety limitations 4

Interface of PAST  Create: fileId = Insert(name, owner-credentials, k, file) -stores a file at a user-specified number k of divers nodes within the PAST network -produces a 160 bit ID which identifies the file (via SHA- 1)  Lookup: file = Lookup(fileId) -reliably retrieves a copy of the file identified fileId  Reclaim: Reclaim(fileId, owner-credentials) -reclaims the storage occupied by the k copies of the file identified by fileId 5

Interface of PAST  Other operations do not exist: -No erase to avoid complex agreement protocols -No write or rename to avoid write conflicts -No group right management to avoid user, group managements -No list files, file information, etc.  Such operations must be provided by additional layer 6

Relevant Parts of Pastry  Leafset: -Neighbors on the ring  Routing Table -Nodes for each prefix + 1 other letter  Neighborhood set -set of nodes which have small TTL 7

Interfaces of Pastry  route(M, X): -route message M to node with nodeId numerically closest to X  deliver(M): -deliver message M to application  forwarding(M, X): -message M is being forwarded towards key X  newLeaf(L): -report change in leaf set L to application 8

Insert Request Operation  Compute fileId by hashing -file name -public key of client -some random numbers, called salt  Storage (k x filesize) -is debited against client‘s quota  File certificate -is produced and signed with owner‘s private key -contains fileID, SHA-1 hash of file‘s content, replciation factor k, the random salt, creation date, etc. 9

Insert Request Operation  File and certificate are routed via Pastry to node responsible for fileID  When it arrives in one node of the k nodes close to the fileId the node checks the validityof the file it is duplicated to all other k-1 nodes numerically close to fileId  When all k nodes have accepted a copy Each nodes sends store receipt is send to the owner  If something goes wrong an error message is sent back and nothing stored 10

Lookup  Client sends message with requested fileId into the Pastry network  The first node storing the file answers -no further routing  The node sends back the file  Locality property of Pastry helps to send a close- by copy of a file 11

Reclaim  Client‘s nodes sends reclaim certificate -allowing the storing nodes to check that the claim is authentificated  Each node sends a reclaim receipt  The client sends this recept to the retrieve the storage from the quota management 12

Security  Smartcard -for PAST users which want to store files -generates and verifies all certificates -maintain the storage quotas -ensure the integrity of nodeID and fileID assignment  Users/nodes without smartcard -can read and serve as storage servers  Randomized routing -prevents intersection of messages  Malicious nodes only have local influence 13

Storage Management  Goals -Utilization of all storage -Storage balancing -Providing k file replicas  Methods -Replica diversion exception to storing replicas nodes in the leafset -File diversion if the local nodes are full all replicas are stored at different locations 14

Causes of Storage Load Imbalance  Statistical variation -birthday paradoxon (on a weaker scale)  High variance of the size distribution -Typical heavy-tail distribution, e.g. Pareto distribution  Different storage capacity of PAST nodes 15

Heavy Tail Distribution  Discrete Pareto Distribution for x ∈ {1,2,3,…} -with constant factor  Heavy tail -only for small k moments E[X k ] are defined -Expectation is defined only if α>2 -Variance and E[X 2 ] only exist if α>3 -E[X k ] is defined ony if α>k+1  Often observed: -Distribution of wealth, sizes of towns, frequency of words, length of molecules,..., -file length, WWW documents Heavy-Tailed Probability Distributions in the World Wide Web, Crovella et al

Per-Node Storage  Assumption: -Storage of nodes differ by at most a factor of 100  Large scale storage -must be inserted as multiple PAST nodes  Storage control: -if a node storage is too large it is asked to split and rejoin -if a node storage is too small it is rejected 17

Replica Diversion  The first node close to the fileId checks whether it can store the file -if yes, it does and sends the store receipt  If a node A cannot store the file, it tries replica diversion -A chooses a node B in its leaf set which is not among the k closest asks B to store the copy -If B accepts, A stores a pointer to B and sends a store receipt  When A or B fails then the replica is inaccessible -failure probability is doubled 18

Policies for Replica Diversion  Acceptance of replicas at a node -If (size of a file)/(remaining free space) > t then reject the file for different t`s for close nodes (t pri ) and far nodes (t div ), where t pri > t div -discriminates large files and far storage  Selecting a node to store a diverted replica -in the leaf set and -not in the k nodes closest to the fileId -do not hold a diverted replica of the same file  Deciding when to divert a file to different part of the Pastry ring -If one of the k nodes does not find a proxy node -then it sends a reject message -and all nodes for the replicas discard the file 19

File Diversion  If k nodes close to the chosen fileId -cannot store the file -nor divert the replicas locally in the leafset  then an error message is sent to the client  The client generates a new fileId using different salt -and repeats the insert operation up to 3 times -then the operation is aborted and a failure is reported to the application  Possibly the application retries with small fragments of the file 20

Maintaining Replicas  Pastry protocols checks leaf set periodically  Node failure has been recognized if a node is unresponsive for some certain time -Pastry triggers adjustment of the leaf set PAST redistributes replicas -if the new neighbor is too full, then other nodes in the nodes will be uses via replica diversion  When a new node arrives -files are not moved, but pointers adjusted (replica diversion) -because of ratio of storage to bandwidth 21

File Encoding  k replicas is not the best redundancy strategy  Using a Reed-Solomon encoding -with m additional check sum blocks to n original data blocks -reduces the storage overhead to (m+n)/n times the file size if all m+n shares are distributed over different nodes -possibly speeds upt the access spee  PAST -does NOT use any such encoding techniques 22

Caching  Goal: -Minimize fetch distance -Maximize query throughput -Balance the query load  Replicas provide these features -Highly popular files may demand many more replicas this is provided by cache management  PAST nodes use „unused“ portion to cache files -cached copies can be erased at any time e.g. for storing primary of redirected replicas  When a file is routed through a node during lookup or insert it is inserted into the local cache  Cache replacement policy: GreedyDual-Size -considers aging, file size and costs of a file 23

Experimental Results Caching 24

Summary  PAST provides a distributed storage system -which allows full storage usage and locality features  Storage management -based ond Smartcard system provides a hardware restriction -utilization moderately increases failure rates and time behavior 25

Peer-to-Peer Networks 11 Past Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg