Download presentation
Presentation is loading. Please wait.
1
The Oceanstore Regenerative Wide-area Location Mechanism Ben Zhao John Kubiatowicz Anthony Joseph Endeavor Retreat, June 2000
2
Wide-area Location Increasing scale more vital to distributed systems Existing wide-area location mechanisms: –SLP - WASRV extension –LDAP centroids –Berkeley SDS –Globe location system Unresolved issues: –True Scalability –Fault tolerance against: Single/multiple node failures Network partitions Data-corruption and malicious users Denial of Service attacks –Support for high updates / mobile data Oceanstore Approach –Wide-area location using Plaxton trees
3
Previous Work: Plaxton Trees Distributed tree structure where every node is the root of a tree Simple mapping from object ID to root ID of the tree it belongs to Nodes keep “nearest” neighbor pointers differing in 1 ID digit –Along with a referrer map of closest referrer nodes Insertion: –Insert object at local node –Proceed to root node hop by hop –Leave back-pointers to object at each hop Benefits: –Decouples tree traversal from any single node –Exploits locality with short-cutting back-pointers –Guaranteed # of hops O(Log(N)) where N = size of namespace Root Node 116 479 529 629 Inserting Obj #62942 116 479 529 629 675 109 Searching Obj #62942 Query: –Proceed to root node hop by hop –If intersect node w/ desired back-pointer, follow it –If root or best approximate node reached and no pointer to object, then it has not been inserted Search Client Object Location
4
Introducing Sibling Meshes Set of meta-tree structures that assist node-insertion and fault-recovery –Every node keeps n ptrs to “similar” nodes w.r.t. 1 property –Example: all nodes ending in 629 belong to a single mesh –Each node belongs to Log(N) meshes, where N = number of unique IDs in namespace –Meshes decrease in size as granularity becomes more fine Plaxton Trees / Ground Level Oceanstore Canopy 629 29 Level 9 Level Single path to root Sibling pointers Single hops to root Each mesh represents a single hop on the route to a given root. Sibling nodes maintain pointers to each other. (optional) Each referrer has pointers to the desired node’s siblings
5
Building a Plaxton Grove Incremental node addition algorithm For new node N n to be added with nodeID D: –Do a hop by hop search for D –At each hop, visit X closest nodes –For each node N i in set X: Integrate next neighbor map from N i neighbor map Take referrer list from N i Measure distance between each referrer and N n If new distance shorter, notify referrer to point to N n –Stop: when no exact match for each digit of ID found Redirect those referrers that are looking for IDs closer to you to point to you New node N n At each hop, aggregate neighbor and referrer maps from closest nodes Domains of Influence Neighbor nodes New node N n ID Granularity
6
Node Removal / Failure Simple detection of pointer corruption and node failure Recovery from node failure and data corruption –Mark node pointer with invalid tag –Use next closest sibling of failed node Invalid pointers has second chance time-to-live –Failures expected to recover within finite timeframe –Entries marked invalid with countdown timer –Each request has some chance of being forwarded to invalid node, in order to check if recovery has been completed –Referrer tracks traffic to failed node and assigns each packet a “validation” probability –Restarted node notifies referrers to remove invalid tag –Nodes which fail to recover within timer period must reinsert as new nodes Node removals = intentional exits from system –Actively announce removal to referrers,invalidation skipped –Referrers maintain backups by requesting another sibling ptr
7
Self-tuning and Stability Self-maintenance build into searches: Self tuning of non-optimal routes –Keep running totals of subpath distances –Inform nodes of better routes Stability –Overlay multiple mappings of nodeIDs onto physical nodes –Single queries handled by multiplexing into query per map –Overlap provides additional security against single, multiple server failures, network partitions, and corrupted data Temporary map pointers –Referrer and Neighbor entries have time-to-live fields –Renewal by usage (implicit) or explicit renewal messages –Implicit priority queue where least often used paths can be “forgotten” in favor of more vital paths –Natural node recovery, wait for messages to renew maps
8
Node Replication Remove single node bottleneck –Critical for load balancing –Adds fault-tolerance at Root nodes Node replication –Copy Neighbor-mapping, then regenerate –Redirect referrer traffic Replicate Groups –Share referrer mappings –Use peer monitoring to detect node failure and redirect traffic as necessary 629 116 629 116 629 116 629
9
Malicious Users and DoS (Ongoing...) Misleading/malicious advertisement –Source validation at storage nodes –orthogonal mechanism ensures association between advertisement and principal of trust Denial of Service attacks –Overload of infrastructure nodes Routing and storage load distribution via node replication –DoS source identification Probabilistic source packet stamping (Savage et. al.) Invalidation propagation –Invalidations can be given by authoritative servers –Can propagate as datagram to referrers
10
More Ongoing Issues Support for mobility –Mobile data All back-pointers point to initial node (N init ) Location updates sent to previous node and N init All back-pointers other than from root can be updated to new position by current query Traveled node pointers time out via point expiration On failure, revert back to root, then to N init for current position –Mobile clients and asynchronous operations Chain location updates on visited nodes When expected asynchronous requests satisfied, node leaves chain, and informs previous node of forward link Oceanstore location as routing infrastructure
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.