Download presentation
Presentation is loading. Please wait.
Published byCynthia McDaniel Modified over 6 years ago
1
Persistence of Data in a Dynamic Unreliable Network
Fastest Flaky Slower Slow Faster Stable Fast Slowtest s GB/Node of Idle Cheap Disk Distributed Data Store w/ all the *ilities: High Availability Good Scalability High Reliability Maintainability Flexibility Reliable Substrate Presented by Rachel Rubin and Hakim Weatherspoon CS294-4: Peer-to-Peer Systems
2
©2003 Rachel Rubin and Hakim Weatherspoon
Outline Motivation/Desires Reliable distributed data store with dynamic members Harness aggregate power of system. Questions about data store How data is structured How to access data Amount of resources used to keep data durable? Storage? Bandwidth? Branching Cost of maintaining redundancy Optimized implementation Conclusion P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
3
©2003 Rachel Rubin and Hakim Weatherspoon
The Data Object Data Blocks VGUIDi VGUIDi + 1 d2 d4 d3 d8 d7 d6 d5 d9 d1 B -Tree Indirect M d'8 d'9 back pointer copy on write AGUID = hash{name+keys} GUID = cryptographically secure hash of data. That is, data is immutable/read-only GUID allows any node to store data Red arrow is hard P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
4
©2003 Rachel Rubin and Hakim Weatherspoon
Mutable Data Need mutable data for real system. Entity in network. A-GUID to V-GUID mapping. Point of serialization for integrity Atomically applies update. Versioning system Each version is inherently read-only. End result, complex objects w/ mutability. Trail of versions. Aguid that ties VGUIDs to an AGUID. Pointer to head of data is tricky. Limitation/Motivate need heartbeats (signed map of Aguid to vguid). Verifies client privileges. Atomically applies update. P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
5
Branching and Versioning
Modifying old versions of the data Possible conflict merging modified old version with the current head Branching Provides different data threads Makes time-travel more functional Multiple data threads Operational Defer Conflicts Not abort updates Disconnected operations P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
6
Overview of macrobranching
Each branch is treated as its own object Branches are created from the main object P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
7
©2003 Rachel Rubin and Hakim Weatherspoon
Macrobranching Story AGUID4 AGUID2 AGUID1 Time AGUID3 P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
8
Macrobranching Details
Writing Create branch in serializer Mark in main branch metadata new branch creation Mark in new branch which object and version it was created from New AGUID needs to be managed Reading From new AGUID Close Can no longer write to branch Merge with main branch if specified Recovery P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
9
Application: NFS w/ Branching and Time Travel
Accessing data from a point in the past and modifying from there Directories can be rolled back and modified Modifications are not automatically the main branch head Organizationally more clean P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
10
©2003 Rachel Rubin and Hakim Weatherspoon
Dynamic Access Access data reliably in a dynamic network. How much does this cost? P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
11
©2003 Rachel Rubin and Hakim Weatherspoon
DHT Advantages Spread storage burden evenly Avoid hot spots Tolerate unreliable participants O(log N) algorithm Simple DHT automatically and autonomously maintains data Decides who, what, when, where, why, and how P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
12
©2003 Rachel Rubin and Hakim Weatherspoon
Basic Assumptions P2P Purist Ideals Cooperation, Symmetry, Decentralized DHT Assumptions Simple redundancy maintenance mechanisms enter and exit Static data placement strategy (f: RB-> N) Identical per-node space and bandwidth contributions Constant rate of entering and exiting. Independence of exit events Constant steady-state number of nodes and total data size Maintenance bandwidth Average case analysis P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
13
©2003 Rachel Rubin and Hakim Weatherspoon
Basic Cost Model N: number of hosts D: data S: data + redundancy (S = kD) : entering rate : exiting rate ( = ) T: lifetime (T=N/) B: bandwidth : Membership timeout distinguish true departures from temporary downtime delay its response to failures a: availability Hosts serve as a fraction of time More redundancy is needed Effective bandwidth is reduced P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
14
BW for Redundancy Maintenance
maintenance BW 200 Kbps lifetime = Median 2001-Gnutella session = 1 hour served space = 90 MB/node << donatable storage! P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
15
Need Too Much BW to Maintain Redundancy
High Availability Scalable Storage Must Pick Two Dynamic Membership Wait! It gets worse… HW trends P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
16
©2003 Rachel Rubin and Hakim Weatherspoon
Hardware Trends The picture only gets worse Participation should be more stable to contribute meaningful fraction of disks P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
17
Solution Indirection
Distributed directory (DD) Uses a level of indirection Decouples networking layer from data layer Controls the data placement Exploits heterogeneity (availability, lifetime, and bandwidth) P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
18
Models for Comparison I: DHT vs DD
Data extracted from [Bhagwan, Savage, and Voelker 2003] P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
19
Models for Comparison II: DHT vs DD
Reliable nodes Greater than 70% availability Model 1 DHT Model 2 Reliable nodes (Model 2.a): store data and DD pointers. Unreliable nodes (Model 2.b): store DD pointers only Model 3 Reliable nodes (Model 3.a): store all data and DD pointers. Unreliable nodes (Model 3.b): do nothing (I.e. free loader) P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
20
©2003 Rachel Rubin and Hakim Weatherspoon
BW/N vs Lifetime P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
21
©2003 Rachel Rubin and Hakim Weatherspoon
BW/N vs Data/Ptr P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
22
©2003 Rachel Rubin and Hakim Weatherspoon
Replication vs Coding P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
23
©2003 Rachel Rubin and Hakim Weatherspoon
Problems with DD I Ratio of Data to Pointer Need D/P > kp m Memory Leaks No pointer to data Solved with redundancy in pointers Dangling Pointers Node is dead Node removed data but not pointer Solved with with heartbeats P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
24
©2003 Rachel Rubin and Hakim Weatherspoon
Problems with DD II Heartbeats Freshness/accuracy vs bandwidth Routing using pointers Infrastructure vs application Complexity Need to decide who, what, where, when, why, and how to maintain redundancy P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
25
©2003 Rachel Rubin and Hakim Weatherspoon
Efficient Heartbeats Exploit locality properties of Pastry/Tapestry. Efficient detection, but slow. O(N) host knowledge If objects per node > N 000 001 010 011 100 101 110 111 0** 1** 00* 01* 10* 11* 0** 1** 00* 01* 10* 11* 0** 1** Global Sweep and repair not efficient. Want detection of node removal in system. Efficient detection and then reconstruction of fragments. Detection is not efficient. System should automatically: Adapt to failure. Repair itself. Incorporate new elements. Can we guarantee data is available for 1000 years? New servers added from time to time Old servers removed from time to time Everything just works Many components with geographic separation System not disabled by natural disasters Can adapt to changes in demand and regional outages Gain in stability through statistics P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
26
©2003 Rachel Rubin and Hakim Weatherspoon
When Triggers Root knows redundancy level Threshold to trigger repair Routing to object Need infrastructure CE42 4432 3A40 1010 L1 L2 L3 L4 0128 B4F8 2218 9598 3598 4598 9098 Root 7598 Fragment-1 Fragment-2 0325 Client Global Sweep and repair not efficient. Want detection of node removal in system. Efficient detection and then reconstruction of fragments. Detection is not efficient. System should automatically: Adapt to failure. Repair itself. Incorporate new elements. Can we guarantee data is available for 1000 years? New servers added from time to time Old servers removed from time to time Everything just works Many components with geographic separation System not disabled by natural disasters Can adapt to changes in demand and regional outages Gain in stability through statistics P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
27
©2003 Rachel Rubin and Hakim Weatherspoon
Conclusions Immutable data assist in Secure read-only data and caching infrastructures Continuous adaptation and repair DHTs do NOT Consider suitability of a peer for a specific task before delegating the task to the peer Differentiating between (un)reliable saves bw. Savings increase as gap widens (e.g. reliability gap) Distributed Directory utilizes reliable nodes Need Data/Ptr > 10,000 Must prevent memory leaks with pointer redundancy Dangling pointers with heartbeats Heartbeats O(N) host knowledge P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.