Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modularized Redundant Parallel Virtual System

Similar presentations


Presentation on theme: "Modularized Redundant Parallel Virtual System"— Presentation transcript:

1 Modularized Redundant Parallel Virtual System
Sheng-Kai Hung HPCC Lab

2 Parallel Virtual File System Overview
Developed by Clemson University Using RAID-0 like striping to distributed file Claim for high read/write performance Based on TCP/IP server-client model Centralized metadata server POSIX 、MPI-IO complaint NO fault tolerant mechanism provided

3 Our Pervious Design Parity information is stored at metadata server
A single point of failure Read/Write performance Using “delay write” to improve the parity overhead Use a buffer to store the difference of block being written Reading corresponding blocks are also needed

4 MTTF Formula

5 Examples of MTTF Assumption MTTF (hours) Group Size PVFS 528 -
MTTFD is no less than 100,000 hours (around 10 years) MTTFs 10,000 hours (around 1.5 years) MTTR is usually shorter than 4 hours Node number is 16 MTTF (hours) Group Size PVFS 528 - PVFSraid 624 1 RPVFS 86088

6 MTTF Result

7 Overhead of Using Parity
Read does not involve in the process of parity construction Read-Modify-Write Some blocks are dirtied Need 2 read, 2 write Write The whole striping units are overwritten 1 read, 2 write

8 System Architecture

9 Parity Cache Table (1/3) A pinned down memory region within the metadata node 4K entry each entry contain N data blocks plus a inode number tag and a reference count Can aggregate the written block to reduce the number of parity written We delay the writing and generating of parity block Several blocks near by can be combined in a single write

10 Parity Cache Table (2/3)

11 Parity Cache Table (3/3) When to write back the cache ?
Replacement Choosing the bigger {N,N} Ready When all the blocks needed to compute a parity block is ready Flush A routine like bdflush runs every 30 secs Potential data loss ? On average 15 secs

12 Write Performance

13 Read-Modify-Write Performance

14 Mirrored Parity Scheme (1/3)
RAID-1 Can not tolerate two faults in the same mirrored group For different groups 3 faults can be tolerated Disk overhead is 100% RAID-4 (RAID-5) Only can tolerate a single fault Disk overhead always less than 33.3%

15 Mirrored Parity Scheme (2/3)
Can tolerate faults occurred in the same mirrored group D1 、P12 faults D0、P01 faults Can tolerate at most 3 faults, except one case D1、P12、D0 all faults The concept of grouping disappeared Use the same disk overhead as Raid-1

16 Mirrored Parity Scheme (3/3)
Pro MTTF is higher Can tolerate more simultaneous fault when compared with RAID-1 With the same disk overhead Con Need at most N XOR operations to recovery the corrupted data N is the nodes involved in a parity group XOR is a cheap operation, but read 3 blocks may be a problem

17 Separate metadata cache
Accessing meta data is a serialized process Only 1 single metadata server with 1 disk Separate metadata cache from the real data cache Either on clients or on servers If on clients can save a socket connection when hitting Distributed metadata Handling the parity cache table Parity information must also be distributed Block based parity need to be modified


Download ppt "Modularized Redundant Parallel Virtual System"

Similar presentations


Ads by Google