Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fault tolerance in BlobSeer Bogdan Nicolae University of Rennes 1 Jesús Montes Sánchez CesViMa – Universidad Politécnica de Madrid.

Similar presentations


Presentation on theme: "Fault tolerance in BlobSeer Bogdan Nicolae University of Rennes 1 Jesús Montes Sánchez CesViMa – Universidad Politécnica de Madrid."— Presentation transcript:

1 Fault tolerance in BlobSeer Bogdan Nicolae University of Rennes 1 bogdan.nicolae@irisa.fr Jesús Montes Sánchez CesViMa – Universidad Politécnica de Madrid jmontes@cesvima.upm.es

2 Data Storage and Access Face New Challenges Infrastructures ◦ Grids, clouds, petascale computing infrastructures, desktop grids? Access pattern: d istributed apps with high throughput under concurrency ◦ Huge data size, fast data generation rates  PB scale storage is necessary to cope with size  Order of TB/week more and more common ◦ Mutable data  Poor support in massive storage systems: HadoopFS ◦ Heavy access concurrency: synchronization and consistency  Thousands of clients accessing data simultaneously ◦ Versioning  Support for rollback, access to historic data

3 Our Approach: BlobSeer Manipulates lightweight huge files: blobs Simple API: read/write/append Blob is fragmented into pages ◦ Allows huge data amounts to be distributed among machines ◦ Avoids contention for simultaneous accesses to disjoint parts of the data block Metadata: locate pages that make up a given blob ◦ Distributed in a fine-grain manner Versioning ◦ Write/append: generate new pages rather than overwrite any existing data ◦ Metadata is extended to incorporate the update ◦ Both the old and the new version of the blob are accessible as if they were independent blobs

4 Architecture Clients ◦ Perform fine grain blob accesses Providers ◦ Store the pages of the blob Provider manager ◦ Monitors the providers ◦ Favors data load balancing Metadata providers ◦ Store information about page location Version manager ◦ Ensures concurrency control Clients Providers Metadata providers Provider manager Version manager http://blobseer.gforge.inria.fr

5 How do Writes work? Pages are written concurrently by the clients (no sync needed) Versions are assigned Metadata is written concurrently by the clients (no sync needed) Versions are published in the order they where assigned Client #1 Client #2 Providers Metadata providers Version manager Publish

6 ClientProviders Metadata providers Version manager I II III How Does a Read Work? I. Ask the version manager for the latest published version (optional) II. Fetch the corresponding metadata from the metadata providers III. Contact providers in parallel and fetch the pages in the local buffer Full R/R, R/W concurrency

7 [0, 4] [0, 2][2, 2] [0, 1][1, 1][2, 1][3, 1] Metadata (1) ‏ Organized as a segment tree Each node covers a range of the blob identified by (offset, size) The first/second half of the range is covered by the left/right child Each leaf corresponds to a page and holds information about its location

8 [0, 4] [0, 2][2, 2] [0, 1][1, 1][2, 1][3, 1] [0, 2][2, 2] [0, 4] [1, 1][2, 1] [0, 8] [4, 4] [4, 2] [4, 1] Metadata (2) ‏ Each node holds versioning information Write/Append ◦ Add leaves and build subtree up to the root ◦ The tree may grow one level Read: descend from the root towards the leaves Tree nodes are distributed among metadata providers Clients can fetch multiple nodes in parallel

9 How Concurrent Writes Work: Example Initial version: v = 1 2 concurrent writers: gray and black Both write their pages independently Gray is first, it is enqueued on the versioning manager and assigned version v2, black gets v3 Both write independently the metadata tree nodes: black is faster and links to (the not yet created node) B2 First to finish is black, it is marked ready Next is gray, its root gets published and it is dequeued Finally black gets first in the queue and and will be published

10 How Concurrent Writes Work: Example Initial version: v = 1 2 concurrent writers: gray and black Both write their pages independently Gray is first, it is enqueued on the versioning manager and assigned version v2, black gets v3 Both write independently the metadata tree nodes: black is faster and links to (the not yet created node) B2 First to finish is black, it is marked ready Next is gray, its root gets published and it is dequeued Finally black gets first in the queue and and will be published

11 How Concurrent Writes Work: Example Initial version: v = 1 2 concurrent writers: gray and black Both write their pages independently Gray is first, it is enqueued on the versioning manager and assigned version v2, black gets v3 Both write independently the metadata tree nodes: black is faster and links to (the not yet created node) B2 First to finish is black, it is marked ready Next is gray, its root gets published and it is dequeued Finally black gets first in the queue and and will be published

12 How Concurrent Writes Work: Example Initial version: v = 1 2 concurrent writers: gray and black Both write their pages independently Gray is first, it is enqueued on the versioning manager and assigned version v2, black gets v3 Both write independently the metadata tree nodes: black is faster and links to (the not yet created node) B2 First to finish is black, it is marked ready Next is gray, its root gets published and it is dequeued Finally black gets first in the queue and and will be published

13 Impact of Metadata Distribution Under Heavy Concurrency Metric ◦ Aggregated bandwidth Configuration ◦ 90 data providers ◦ Fixed nr of metadata providers ◦ Up to 90 clients ◦ 4 writers per client ◦ Each writer outputs 8 MB ◦ Page size: 128 KB To be presented at Euro-Par 2009

14 BlobSeer: How About Fault Tolerance? Metadata? ◦ Distributed in a DHT, already benefits from some DHT-inherent FT Centralized entities? ◦ Version manager, provider manager ◦ First idea: PAXOS-like, consensus-based solutions Data? ◦ Simple replication policies not enough ◦ FT needs to be adapted both to access pattern and running environment

15 Fault tolerance in grid computing Two theoretical visions of the grid: ◦ Multiple entities: The grid is a set of computational resources ◦ Single entity: The grid a “black box” that provides a set of services Two types of fault tolerance: ◦ Resource level: Dependability issues in the grid resources ◦ Global level: Dependability issues related to the whole grid (the services provided)

16 Global level fault tolerance Improving dependability of the services provided ◦ Low-level approach (multiple entity point of view) ◦ High-level approach (single entity point of view) Is the “single entity" view possible? ◦ Maybe the grid is too large and complex to be understood as just one entity... ◦...or maybe is just a matter of perspective. The key: Abstraction

17 The Grid seen as a single entity : global-level fault tolerance GloBeM: Global Behavior Modeling ◦ Global QoS rather than dependability of individual resources Improving Fault Tolerance through Global Behavior Modeling J. Montes, A. Sanchez, J. J. Valdes, M. S. Perez, and P. Herrero, "The grid as a single entity: Towards a behavior model of the whole grid" in OTM Conferences (1), ser. Lecture Notes in Computer Science, R. Meersman and Z. Tari, Eds., vol. 5331. Springer, 2008, pp. 886-897.

18 GloBeM: Zooming In Based on historical monitoring information Uses knowledge discovery techniques (Data Mining…) Generates a behavior model in the form a Finite State Machine

19 Using GloBeM to Improve Fault Tolerance Extracted from Jesus Montes, Alberto Sanchez and Maria S. Perez, « Improving grid fault tolerance by means of global behavior modelling ». Submitted to Grid’2009.

20 Applications to BlobSeer Use GloBeM modeling techniques to improve BlobSeer’s dependability and QoS ◦ Model behavior patterns ◦ Implement adaptive strategies (e.g. reactive and/or proactive fault tolerance)

21 Applications to BlobSeer (2) Steps: ◦ Define relevant metrics to monitor  Storage usage, effective bandwidth, resource failure rate ◦ Model BlobSeer using GloBeM techniques ◦ Analyze states to understand behavior  Each state may correspond to a certain level of QoS ◦ Define adapted fault-tolerance policies based on BlobSeer dynamic changes

22 Questions?


Download ppt "Fault tolerance in BlobSeer Bogdan Nicolae University of Rennes 1 Jesús Montes Sánchez CesViMa – Universidad Politécnica de Madrid."

Similar presentations


Ads by Google