Download presentation
Presentation is loading. Please wait.
1
FutureGrid Status Report Steven Hand steven.hand@cl.cam.ac.uk Joint project with Jon Crowcroft (CUCL), Tim Harris (CUCL), Ian Pratt (CUCL), Andrew Herbert (MSR), Andy Parker (CeSC)
2
Grid systems architecture Common themes of self-organization and distribution Four motivating application areas: 1. Massively-scalable middleware 2. Advanced resource location mechanisms 3. Automatic s/w replication and distribution 4. Global data storage and publishing Experimental test beds (PlanetLab, UK eScience centres / access grid / JANET).
3
Common Techniques P2P (DHT) layer for distribution: Using Bamboo routing substrate (Intel Research) for passing messages between peers Provides a fault-tolerant and scalable overlay net Can route a message to any node in an n node network in O(ln n) hops, with O(ln n) routing information at each node Location/Distance Service: Basic idea: Euclidean co-ordinate space for the Internet Using PCA + lighthouse/virtual landmark Running PlanetLab measurements looking at sensitivity to #dimensions, etc Building “plug in” service for Bamboo neighbour selection
4
1. P2P Group Communication Built on Bamboo and location service: Use location service (co-ordinates) to get best forward route Use RPF (Scribe) algorithm to build tree Tree is max 2*delay of native IP multicast tree Can build per source, or “centered” tree based on density of group, #senders, #receivers, … Current status: General system deployed and under test on PlanetLab Whiteboard demo program works on top of this Next steps: IP multicast tunnels across multicast incapable ‘chasms’ P2P overlay for vic/rat/access grid anticipated end ‘04
5
2. Distributed resource location 1. Determine machine locations and resource availability 2. Translate to locations in a multi-dimensional search space3. Partition/replicate the search space4. Queries select portions of the search space
6
Current Focus Location-based resource co-allocation Wish to choose subset of available nodes according to resource availability and location First filter set, then use heuristic to solve constrained problems of the form far(near(S1,S2), near(S3,S4), C1) System built around P2P spatial index Three phase algorithm Find an approximate solution in terms of clusters Use simmulated annealing to minimize associated cost Select representative machine(s) for each cluster Results close to ‘brute force’ (average 10% error)
7
3. P2P Computing Attempt to use P2P communication principles to gain similar benefits for grid computing Proceeding on three axes targeting core computation, bioinformatics and batch workloads Algorithm-specific load tolerance: Want to allow decentralized independent load shedding Client submits parallel computation to M > N nodes s.t. any N results suffice to produce ‘correct’ result General case intractable: focus on algorithm-specific solns Current focus on matrix operations using erasure codes Also considering sketches as approximation technique
8
P2P Computing (2) Indexing genomic sequences (3 x 10 9 ) Based on using suffix array indexes; supports string matching, motif deletion, sequence alignments, etc Smaller memory reqs than state of art suffix tree Distributed on-line construction using P2P overlay Caching issues (memory and swap) need investigation Batch-aware ‘spread spectrum’ storage Observe many batches share considerable data Want to encourage client-driven distribution of data but avoid centralized quotas and pathological storage use Use Palimpsest P2P storage system with ‘soft guarantees’ Data discarded under load => need refresh to keep
9
4. Global Data Storage Global-scale distributed file system Mutability; shared directories; random access Data permanence, quotas Aggressive, localized caching in proportion to demand While maintaining coherence Storage Nodes Confederated, well connected, relatively stable Offer multiples of a unit of storage in return for quota it can distribute amongst users Clients Access via nearest Storage Node
10
Basic Storage Technique Immutable data blocks, mutable index blocks Block Id is H(contents), or H(public key) for index blocks Insert a block by using Bamboo to route it to the node with Id nearest to the Id of the block Maintain replicas on adjacent nodes for redundancy Send storage vouchers to user’s accountant nodes
11
Content-based chunking Blocks with same Id are reference counted Clients split file with content- based hash Rabin fingerprint over 48 byte sliding window Similar files share blocks Reduces storage req Improves caching perf B2B2 B3B3 B4B4 B5B5 B6B6 B1B1 B7B7 B4B4 B5B5 B1B1 B7B7 Read: return data Insert new block B 8 and withdraw B 2 and B 3 Insert new block B 9 and withdraw B 6 – B 7 unchanged Read Write Insert
12
Summary and Future Work Attempt to push towards a ‘Future GRID’ Four ‘strands’ with common themes and (some) common infrastructure Group communication, resource co-allocation, load flexible computing, global distributed storage All four strands making progress: Early papers / tech reports in all cases Bamboo and location-service deployed & under test Next steps include: Move PlanetLab experiments to UK eScience infrastructure Analysis and test of prototype designs/software
14
Bamboo Route convergence
15
Caching Data either returned directly, or via previous hop if block is “hot” Cached copies are “drawn- out” from primary store toward requestors Exploits local route convergence
16
Mutable index blocks May describe an arbitrary file hierarchy Index block has associated keypair (e FS, d FS ) Insert index block using hash of public key as Id Authenticate update by signing insertion voucher using private key May link to other index blocks Merge contents Organise according to access/update patterns …id… …id… …id… Voucher: H(blk), repl_factor, e FS d FS
17
Update dissemination
18
Shared file spaces Users can only update their own index blocks Sharing through overlaying Import other user’s name space, modify, re-export Copy on Write overlay Active delete markers
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.