LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment W. LITWIN CERIA Laboratory H.YAKOUBEN Paris Dauphine University Paris Dauphine University T. SCHWARZ Santa Clara University (USA)
LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment 2 June 8th, 2007 Plan Objective Overview: SDDS & P2P LH* RS P2P Architecture Addressing Properties Churn Management Conclusion
Design a new SDDS for a structured P2P environment A High available Data Structure and treatment of CHURN LH*RS P2P key search requires at most one forwarding message LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment 3 June 8th, 2007 Objective High availability to deal with churn At most one forwarding message for key search or insert or scan (fastest known performance) Very Large Scalable Files
A File of records identified by keys SDDS client nodes face the applications and send queries to SDDS server nodes No centralized addressing Servers contain application or parity data In buckets Overflowing servers split on new servers Servers do not notify clients about splits SDDS (1993)
Clients use images of the file state for addressing Key based Range queries Scans … Images get adjusted towards the file state during queries by Image Adjustment Messages Triggered by incorrect addressing by the client IAMs reflect the file evolution by splits or, rarely, merges. IAMs reflect also the location changes because of failures and recovery SDDS (1993)
6 SDDS Typology LH* sa SDDS(1993) Data Structures Classics Tree m-d Tree 1-d Tree RP*, k-RP*,DRT, DRT*, LH*, DDH, EH*, Hash High Availability 1-dimensional d-dimensional IH* LH* rs LH*s k-Availability Security LH* m LH* g LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 P2P Data Structures with Logarithmic complexity CHORD BATON VBI-Tree
New Peer 7 LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Clients Growth through splits under inserts Peer SDDS Expansion
8 LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Clients SDDS Client Image Evolution Image Adjustment Messaging
Available at CERIA site Announced at DbWorld Managing LH* RS and RP* files In distributed RAM Uner Windows Over 1gbs Ethernet Various functions Response time reaching 30 microsec Up to 300 times faster than disk files SDDS 2007 Prototype
Autonomous nodes store and search data By flooding in early systems Freenet, Napster, Gnutella… Structured P2P reduce the flooding Using decentralized data structures Distributed Hash Table (DHT) especially Few folks know the concept is due to B. Devine FODO 93 Chord, P-tree, VBI, Baton… Structured P2P schemes are specific SDDS schemes P2P (1995 ?)
11 Client Address Calculus a’ h i’ (C ) ; /* a’ is the address of peer destination of the key C*/ if a’ < n’ then a h i’+1 (C ) ; Algorithm LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Global Addressing Rule a h i (C ) ; /* a is the address of peer destination of the key C*/ if a < n then a h i+1 (C ) ; /* (i, n) state of an SDDS file, they are only known to the file coordinator node Algorithm LH* RS P2P Addressing h i (C ) = C mod 2 i
File starts with i = 0 and n = 0 and a single data bucket 0 Every bucket m keeps the bucket level j of hash function h i last used to split, j = 0 initially. Overflowing bucket m alerts the coordinator Coordinator notifies bucket n to split Bucket n applies h i + 1 About half of keys migrates to new bucket n + 2 i Bucket n and the new one set j = j + 1 Coordinator performs n = n + 1 if n = 2 i then i = i + 1 and n = 0 LH* RS P2P File Expansion
Architecture based on LH* RS 13 LH* RS P2P j i’ n’ Client Part Server Part LH* P2P Peer LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 LH* RS P2P Peer LH* RS Client LH* RS DB Candidate Peer Client & Spare Storage LH* RS P2P Peer LH* RS Client LH* RS PB Pupils
14 i’ = j ; /* i’ is presumed « i » in client’s part of peer image n‘ = m +1 /* n’ is presumed value of «n» in client’s part of peer image if n’ = 2 i’ then i’ = j + 1 ; n’ = 0 ; /* m is the address of peer that’s split /* j value is before the split Algorithm LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Local Client & Pupil Image Adjustment During Splits
15 Before splitting Coordinator Peer (CP) P0 j=2 i’=1 n’=1 P2 P1 i=1 n=1 j=1 i’=1 n’=0 j=2 i’=1 n’=1 After splitting j=2 i’=2 n’=0 CP P0 j=2 i’=1 n’=1 P2 P1 i=2 n=0 j=2 i’=2 n’=0 j=2 i’=1 n’=1 P3 i’= j =1; n’= m+1= 1+1; If n’=2 1 then n’=0; i’= i’+1 and (i’, n’)= (2,0) LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Example
16 a’ h j (C ) ; if a’ a then/* Addressing ERROR -> Forwarding*/ a” h j-1 (C ) ; if a”> a and a”<a’ then a’ a”; /* Send Key C to peer a’ Algorithm LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Server Address Calculus
17 Peer Image Adjustment i’ j-1, n’ a+1 ;/* a is the good address of pair*/ if n’> 2 i’ then n’ 0 ; i’ i’+1 ; Algorithm LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 PC i=3 n=2 P0 j=4 i’=3 n’=1 Pairs P4 j=3 i’=2 n’=2 P9 j=4 i’=3 n’=2 P1 j=4 i’=3 n’=2 9 9 IAM Checking and forward the key using A2 Addressing Algorithm
18 Example of the File Expansion PC i=2 n=2 P0 j=3 i’=2 n’=1 Peers P2 j=2 i’=1 n’=1 P5 j=3 i’=2 n’=2 Candidate Peer i’=0 n’=0 P6 j=3 i’=2 n’=3 i=2 n=3 i’=2 n’=3 j=3 Pupil i’=2 n’=1 LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Assign a Tutor for Candidate Peer: LH-hash of the client IP Address TUTOR, Update Pupil LH* RS P2P
19 1. The maximal number of forwarding messages for the key search is one. 2. The maximal number of rounds for the scan search is two. 3. The worst case addressing performance of LH* RS P2P as defined by Property 1 is the fastest possible for any SDDS or a practical structured P2P addressing scheme. LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Properties of LH* RS P2P :
20 LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Proofs n j=i+1 j=i n+2 i 2 i -1a’ 0 a Property 1: Case 1 : i = i’ Peer a use a client image (i’,n’) and didn’t receive any IAM since the last split
21 LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Proofs n j=i+1 j=i n+2 i 2 i -1 a’ 0 a Property 1: Case 1 : Peer a use a client image (i’,n’) and didn’t receive any IAM since the last split
22 LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Proofs n j=i+1 j=i n+2 i 2 i -1a 0 Property 1: Case 2 : i = i’ + 1 Peer a use a client image (i’,n’) and didn’t receive any IAM since the last split 1.n ≤ a’ a 2. a < a’ < 2 i’ a’<n a’
23 LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Proofs Property 2: Peer a sends the scan to all buckets in its image Those who split in the meantime resend to children No child can have a child It would need to split also But then peer a would first need to split again as well Then its client image would get re-adjusted
24 LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Proofs Property 3: The only better worst case performance is zero forwarding messages This requires the notification of every split to every peer It would be against the scalability goal of every SDDS & structured P2P scheme
25 LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 LH* RS P2P Churn Management Bucket reliability group with k parity buckets protect against up to k bucket failures per group Parity BucketData bucketRank Parity RecordData Record Tutoring Records The Tutoring records are stored and have a same treatment as data records
26 LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Peer leaves with notice Coordinator Peer j i’,n’ j j j P0 Pm PlPl Spare Peer Notification … … Say that’s OK LH* RS P2P Churn Management
27 LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Peer leaves without notice or fails Coordinator Peer j i’,n’ j j j P l-1 Pm PlPl Parity Bucket Query Forward LH* RS Bucket Recovery LH*RSP2P Churn Management
28 LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Peer leaves without notice or fails Coordinator Peer j i’,n’ j j j P l-1 Pm PlPl Parity Bucket Answer LH* RS Bucket Recovery LH*RSP2P Churn Management
29 LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Sure Search : Protects against old data read (communication failure) Coordinator Peer j i’,n’ j j j P l-1 Pm PlPl Parity Bucket Query Answer LH*RSP2P Churn Management j j i’,n’ PlPl
30 Conclusion LH* RS P2P require at most one forward message when addressing error occur Is the fastest known SDDS and P2P key based addressing algorithm Protects efficiently against churn Allows to manage very large scalable files LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007
31 Current & Future Work LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Implementation of the peer node architecture and of tutoring functions Using existing LH* RS prototype Created by Rim Moussa & shown at VLDB 2004 Performance Analysis Variants
32 THE END LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Work Partly Supported by the IST eGov-Bus project
33 [1] Adina Crainiceanu, Prakash Linga, Johannes Gehrke, and Jayavel Shanmugasundaram. Querying Peer-to-Peer Networks Using P-Trees. In Proceedings of the Seventh International Workshop on the Web and Databases (WebDB 2004)., June [2] Bolosky W. J, Douceur J. R, Howell J. The Farsite Project: A Retrospective. Operating System Review, April 2007, p [3] Devine R. Design and Implementation of DDH: A Distributed Dynamic Hashing Algorithm, Proc. Of the 4 th Intl. Foundation of Data Organisation and Algorithms –FODO, [4] Litwin, W. Neimat, M-A., Schneider, D. LH*: Linear Hashing for Distributed Files. ACM- SIGMOD Int. Conf. On Management of Data, 93. [5] Litwin, W., Neimat, M-A., Schneider, D. LH*: A Scalable Distributed Data Structure. ACM- TODS, (Dec., 1996). [6] Litwin, W., Neimat, M-A. High Availability LH* Schemes with Mirroring, Intl. Conf on Cooperating systems,, IEEE Press [7] Litwin, W. Moussa R, Schwarz T. LH*rs- A Highly Available Distributed Data Storage. Proc of 30 th VLDB Conference,, [8] Litwin, W. Moussa R, Schwarz T. LH*rs- A Highly Available Scalable Distributed Data Structure. ACM-TODS, Sept [9] Steven D. Gribble, Eric A. Brewer, Joseph M. Hellerstein, and David Culler. Scalable, Distributed Data Structures for Internet Service Construction, Proceedings of the Fourth Symposium on Operating Systems Design and Implementation (OSDI 2000) [10]Stoica, Morris, Karger, Kaashoek, Balakrishma. CHORD : A scalable Peer to Peer Lookup Service for Internet Application. SIGCOMM’O, August 27-31, 2001, References LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007
34 LH* RS P2P : A Scalable Distributed Data Structure for P2P Environment June 8th, 2007 Thanks' for your attention