Download presentation
Presentation is loading. Please wait.
Published byMillicent Cross Modified over 9 years ago
1
Fault tolerance in BlobSeer Bogdan Nicolae University of Rennes 1 bogdan.nicolae@irisa.fr Jesús Montes Sánchez CesViMa – Universidad Politécnica de Madrid jmontes@cesvima.upm.es
2
Data Storage and Access Face New Challenges Infrastructures ◦ Grids, clouds, petascale computing infrastructures, desktop grids? Access pattern: d istributed apps with high throughput under concurrency ◦ Huge data size, fast data generation rates PB scale storage is necessary to cope with size Order of TB/week more and more common ◦ Mutable data Poor support in massive storage systems: HadoopFS ◦ Heavy access concurrency: synchronization and consistency Thousands of clients accessing data simultaneously ◦ Versioning Support for rollback, access to historic data
3
Our Approach: BlobSeer Manipulates lightweight huge files: blobs Simple API: read/write/append Blob is fragmented into pages ◦ Allows huge data amounts to be distributed among machines ◦ Avoids contention for simultaneous accesses to disjoint parts of the data block Metadata: locate pages that make up a given blob ◦ Distributed in a fine-grain manner Versioning ◦ Write/append: generate new pages rather than overwrite any existing data ◦ Metadata is extended to incorporate the update ◦ Both the old and the new version of the blob are accessible as if they were independent blobs
4
Architecture Clients ◦ Perform fine grain blob accesses Providers ◦ Store the pages of the blob Provider manager ◦ Monitors the providers ◦ Favors data load balancing Metadata providers ◦ Store information about page location Version manager ◦ Ensures concurrency control Clients Providers Metadata providers Provider manager Version manager http://blobseer.gforge.inria.fr
5
How do Writes work? Pages are written concurrently by the clients (no sync needed) Versions are assigned Metadata is written concurrently by the clients (no sync needed) Versions are published in the order they where assigned Client #1 Client #2 Providers Metadata providers Version manager Publish
6
ClientProviders Metadata providers Version manager I II III How Does a Read Work? I. Ask the version manager for the latest published version (optional) II. Fetch the corresponding metadata from the metadata providers III. Contact providers in parallel and fetch the pages in the local buffer Full R/R, R/W concurrency
7
[0, 4] [0, 2][2, 2] [0, 1][1, 1][2, 1][3, 1] Metadata (1) Organized as a segment tree Each node covers a range of the blob identified by (offset, size) The first/second half of the range is covered by the left/right child Each leaf corresponds to a page and holds information about its location
8
[0, 4] [0, 2][2, 2] [0, 1][1, 1][2, 1][3, 1] [0, 2][2, 2] [0, 4] [1, 1][2, 1] [0, 8] [4, 4] [4, 2] [4, 1] Metadata (2) Each node holds versioning information Write/Append ◦ Add leaves and build subtree up to the root ◦ The tree may grow one level Read: descend from the root towards the leaves Tree nodes are distributed among metadata providers Clients can fetch multiple nodes in parallel
9
How Concurrent Writes Work: Example Initial version: v = 1 2 concurrent writers: gray and black Both write their pages independently Gray is first, it is enqueued on the versioning manager and assigned version v2, black gets v3 Both write independently the metadata tree nodes: black is faster and links to (the not yet created node) B2 First to finish is black, it is marked ready Next is gray, its root gets published and it is dequeued Finally black gets first in the queue and and will be published
10
How Concurrent Writes Work: Example Initial version: v = 1 2 concurrent writers: gray and black Both write their pages independently Gray is first, it is enqueued on the versioning manager and assigned version v2, black gets v3 Both write independently the metadata tree nodes: black is faster and links to (the not yet created node) B2 First to finish is black, it is marked ready Next is gray, its root gets published and it is dequeued Finally black gets first in the queue and and will be published
11
How Concurrent Writes Work: Example Initial version: v = 1 2 concurrent writers: gray and black Both write their pages independently Gray is first, it is enqueued on the versioning manager and assigned version v2, black gets v3 Both write independently the metadata tree nodes: black is faster and links to (the not yet created node) B2 First to finish is black, it is marked ready Next is gray, its root gets published and it is dequeued Finally black gets first in the queue and and will be published
12
How Concurrent Writes Work: Example Initial version: v = 1 2 concurrent writers: gray and black Both write their pages independently Gray is first, it is enqueued on the versioning manager and assigned version v2, black gets v3 Both write independently the metadata tree nodes: black is faster and links to (the not yet created node) B2 First to finish is black, it is marked ready Next is gray, its root gets published and it is dequeued Finally black gets first in the queue and and will be published
13
Impact of Metadata Distribution Under Heavy Concurrency Metric ◦ Aggregated bandwidth Configuration ◦ 90 data providers ◦ Fixed nr of metadata providers ◦ Up to 90 clients ◦ 4 writers per client ◦ Each writer outputs 8 MB ◦ Page size: 128 KB To be presented at Euro-Par 2009
14
BlobSeer: How About Fault Tolerance? Metadata? ◦ Distributed in a DHT, already benefits from some DHT-inherent FT Centralized entities? ◦ Version manager, provider manager ◦ First idea: PAXOS-like, consensus-based solutions Data? ◦ Simple replication policies not enough ◦ FT needs to be adapted both to access pattern and running environment
15
Fault tolerance in grid computing Two theoretical visions of the grid: ◦ Multiple entities: The grid is a set of computational resources ◦ Single entity: The grid a “black box” that provides a set of services Two types of fault tolerance: ◦ Resource level: Dependability issues in the grid resources ◦ Global level: Dependability issues related to the whole grid (the services provided)
16
Global level fault tolerance Improving dependability of the services provided ◦ Low-level approach (multiple entity point of view) ◦ High-level approach (single entity point of view) Is the “single entity" view possible? ◦ Maybe the grid is too large and complex to be understood as just one entity... ◦...or maybe is just a matter of perspective. The key: Abstraction
17
The Grid seen as a single entity : global-level fault tolerance GloBeM: Global Behavior Modeling ◦ Global QoS rather than dependability of individual resources Improving Fault Tolerance through Global Behavior Modeling J. Montes, A. Sanchez, J. J. Valdes, M. S. Perez, and P. Herrero, "The grid as a single entity: Towards a behavior model of the whole grid" in OTM Conferences (1), ser. Lecture Notes in Computer Science, R. Meersman and Z. Tari, Eds., vol. 5331. Springer, 2008, pp. 886-897.
18
GloBeM: Zooming In Based on historical monitoring information Uses knowledge discovery techniques (Data Mining…) Generates a behavior model in the form a Finite State Machine
19
Using GloBeM to Improve Fault Tolerance Extracted from Jesus Montes, Alberto Sanchez and Maria S. Perez, « Improving grid fault tolerance by means of global behavior modelling ». Submitted to Grid’2009.
20
Applications to BlobSeer Use GloBeM modeling techniques to improve BlobSeer’s dependability and QoS ◦ Model behavior patterns ◦ Implement adaptive strategies (e.g. reactive and/or proactive fault tolerance)
21
Applications to BlobSeer (2) Steps: ◦ Define relevant metrics to monitor Storage usage, effective bandwidth, resource failure rate ◦ Model BlobSeer using GloBeM techniques ◦ Analyze states to understand behavior Each state may correspond to a certain level of QoS ◦ Define adapted fault-tolerance policies based on BlobSeer dynamic changes
22
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.