Download presentation
Presentation is loading. Please wait.
Published byLewis Banks Modified over 9 years ago
1
A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes Yunfeng Zhu 1, Patrick P. C. Lee 2, Liping Xiang 1, Yinlong Xu 1, Lingling Gao 1 1 University of Science and Technology of China 2 The Chinese University of Hong Kong DSN’12 1
2
Fault Tolerance Fault tolerance becomes more challenging in modern distributed storage systems Increase in scale Usage of inexpensive but less reliable storage nodes Fault tolerance is ensured by introducing redundancy across storage nodes Replication Erasure codes (e.g., Reed-Solomon codes) 2 A A B B A+B A+2B A A B B A A B B A A B B
3
XOR-Based Erasure Codes Encoding/decoding involve XOR operations only Low computational overhead Different redundancy levels 2-fault tolerant: RDP, EVENODD, X-Code 3-fault tolerant: STAR General-fault tolerant: Cauchy Reed-Solomon (CRS) 3
4
Failure Recovery Recovering node failures is necessary Preserve the required redundancy level Avoid data unavailability Single-node failure recovery Single-node failure occurs more frequently than a concurrent multi-node failure
5
Example: Recovery in RDP d 0,6 d 1,6 d 2,6 d 3,6 d 4,6 d 5,6 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ d 0,0 d 0,1 d 0,2 d 0,3 d 0,4 d 0,5 d 1,0 d 1,1 d 1,2 d 1,3 d 1,4 d 1,5 d 2,0 d 2,1 d 2,2 d 2,3 d 2,4 d 2,5 d 3,0 d 3,1 d 3,2 d 3,3 d 3,4 d 3,5 d 4,0 d 4,1 d 4,2 d 4,3 d 4,4 d 4,5 d 5,0 d 5,1 d 5,2 d 5,3 d 5,4 d 5,5 d 0,7 d 1,7 d 2,7 d 3,7 d 4,7 d 5,7 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7 5 An RDP code example with 8 nodes Let’s say node0 fails. How do we recover node0?
6
Conventional Recovery Idea: use only row parity sets. Recover each lost data symbol (i.e., data chunk) independently node 0node 1node 2node 3node 4node 5node 6node 7 Read symbols: 36 Then how do we recover node 0 efficiently? Different metrics can be used to measure the efficiency of a recovery scheme 6
7
Minimize Number of Read Symbols Idea: use a combination of row and diagonal parity sets to maximize overlapping symbols [Xiang, ToS’11] node 0node 1node 2node 3node 4node 5node 6node 7 Read symbols: 27 Improve rate: 25% Read symbols: 27 Improve rate: 25% 7
8
Need A New Metric? A modern storage system is natural to be composed of heterogeneous types of storage nodes System upgrades New node addition A heterogeneous environment 8 Proxy node 0 node 1node 2 node 3 node4 node 5node 6 node 7 New node 26Mbps 68Mbps 109Mbps 110Mbps 113Mbps 10Mbps 110Mbps 86Mbps Need a new efficient failure recovery solution for heterogeneous environment! Need a new efficient failure recovery solution for heterogeneous environment!
9
Related Work Hybrid recovery Minimize number of read symbols RAID-6 XOR-based erasure codes e.g., RDP [Xiang, ToS’11], EVENODD [Wang, Globecom’10 Enumeration recovery [Khan, FAST’12] Enumerate all recovery possibilities to achieve optimal recovery for general XOR-based erasure codes Greedy recovery [Zhu, MSST’12] Efficient search of recovery solutions for general XOR-based erasure codes Regenerating codes [Dimakis, ToIT’10] Nodes encode data during recovery Minimize recovery bandwidth Heterogeneous case considered in [Li, Infocom’10], but requires node encoding and collaboration 9
10
Challenges How to enable efficient failure recovery for heterogeneous settings? Minimizing # of read symbols homogeneous settings Performance bottlenecked by poorly performed nodes How to quickly find the recovery strategy? Minimizing # of read symbols deterministic metric Minimizing general cost non-deterministic metric Recovery decision typically can’t be pre-determined
11
Our Contributions Target two RAID-6 codes: RDP and EVENODD XOR-based encoding operations Goals: Minimize search time Minimize recovery cost Cost-based single-node failure recovery for heterogeneous distributed storage systems 11
12
Our Contributions Formulate an optimization problem for single- node failure recovery in heterogeneous settings Propose a cost-based heterogeneous recovery (CHR) algorithm Narrow down search space Suitable for online recovery Implement and experiment on a heterogeneous networked storage testbed 12
13
... Node p-1Node p... Weight: Download Distribution: w0w0 w1w1 w p-1 wpwp y0y0 y1y1 y p-1 ypyp... Minimizing total recovery cost: Model Formulation Our formulation: 13 Node : v0v0 v1v1 vkvk v p-1 vpvp Node 0Node 1 Node k
14
Physical Meanings wiwi C 1 for all itotal number of symbols being read from surviving nodes inverse of transmission bandwidth of node V i total amount of transmission time to download symbols from surviving nodes monetary cost of migrating per unit of data outbound from node V i the total monetary cost of migrating data from surviving nodes (or clouds) 14
15
Solving the Model Important: Which symbols to be fetched from surviving nodes must follow inherent rules of specific coding schemes To solve the model, we introduce recovery sequence (x 0, x 1, …, x p-2, 0) –x i = 0, d i,k is recovered from its row parity set –x i = 1, d i,k is recovered from its diagonal parity set download distribution: (3, 2, 2, 3, 2) recovery sequence: (0, 0, 1, 1, 0) d 0,0 d 1,0 d 2,0 d 3,0 d 0,1 d 1,1 d 2,1 d 3,1 d 0,2 d 1,2 d 2,2 d 3,2 d 0,3 d 1,3 d 2,3 d 3,3 d 0,4 d 1,4 d 2,4 d 3,4 d 0,5 d 1,5 d 2,5 d 3,5 node 0node 1node 2node 3node 4node 5 15 An example: 1) Each recovery sequence represents a feasible recovery solution; 2) Download distribution can be represented by recovery sequence; 1) Each recovery sequence represents a feasible recovery solution; 2) Download distribution can be represented by recovery sequence;
16
Solving the Model (2) Step 1: use recovery sequence to represent downloads Step 2: narrow down search space by only considering min-read recovery sequences (i.e., download minimum number of read symbols during recovery) Step 3: reformulate the model as Minimize 16
17
Expensive Enumeration PTotal # of recovery sequences # of min-read recovery sequences # of unique min-read recovery sequences 51662 764204 11102425226 13409692474 176553612870698 19262144486202338 23419430470543228216 29268435456401166001302688 Challenge: Too many min-read recovery sequences to enumerate even we narrow down search space 17 Observation: many min-read recovery sequences return the same download distribution
18
Optimize Enumeration Process Two conditions under which different recovery sequences have same download distribution: Shift condition (0, 0, 0, 1, 1, 1, 0) (0, 0, 1, 1, 1, 0, 0) (0, 1, 1, 1, 0, 0, 0) (1, 1, 1, 0, 0, 0, 0) … Reverse condition (0, 0, 0, 1, 1, 1, 0) (0, 1, 1, 1, 0, 0, 0) 18 Key idea: not all recovery sequences need to be enumerated (details in the paper)
19
Cost-based Heterogeneous Recovery (CHR) Algorithm: Intuition Step 1: initialize a bitmap to track all possible min-read recovery sequences R Step 2: compute recovery cost of R. Step 3: mark all shifted and reverse sequences of R as being enumerated Step 4: switch to another R; return the one with minimum cost 19
20
Example Proxy node 0 node 1node 2 node 3 node4 node 5node 6 node 7 New node 26Mbps 68Mbps 109Mbps 110Mbps 113Mbps 10Mbps 110Mbps 86Mbps Our proposed CHR algorithmHybrid approach [Xiang, ToS’11]
21
Recovery Cost Comparison CHR approach Hybrid approach Conventional approach reduce by 25.89% reduce by 40.91% 21
22
Simulation Studies (1): Traverse Efficiency Evaluate the computational time of CHR PNaive traverse time (ms) CHR’s traverse time (ms) Improved rate (%) 50.02200.010054.55 70.09500.031067.37 112.31600.391083.12 1311.98401.615086.52 17107.741010.079090.65 19455.276040.537091.10 239230.7800691.280092.51 29752296.270045423.557093.96 CHR significantly reduces the traverse time of the naive approach by over 90% as p increases! 22
23
Simulation Studies (2): Robustness Efficiency Evaluate if CHR achieves the global optimal among all the feasible recovery sequences PHit Global Optimal Probability(%) Global Optimal Max Improvement(%) 594.96.12 794.55.54 1193.65.98 1393.26.46 1792.85.97 1993.15.73 CHR has a very high probability (over 93%) to hit the global optimal recovery cost! 23
24
Simulation Studies (3): Recovery Efficiency Evaluate via 100 runs for each p the recovery efficiency of CHR in a heterogeneous storage environment CHR can reduce recovery cost by up to 50% over the conventional approach CHR can reduce recovery cost by up to 30% over the hybrid approach 24
25
Experiments Experiments on a networked storage testbed Conventional vs. Hybrid vs. CHR Default chunk size = 1MB Communication via ATA over Ethernet (AoE) Consider two codes: RDP and EVENODD Only RDP results shown in this talk Recovery operation: Read chunks from surviving nodes Reconstruct lost chunks Write reconstructed chunks to a new node 25 Recovery process Gigabit switch nodes
26
Experiments Two types of Ethernet interface card equipped by physical storage devices 100Mbps set weight = 1/(100Mbps) 1Gbps set weight = 1/(1Gbps) 26 pTotal # of nodes # of nodes with 100Mbps # of nodes with 1Gbps 5624 7835 111257 131468 171899 Configuration for RDP code
27
Different Number of Storage Nodes Total recovery time for RDP CHR improves conventional by 21-31% CHR improves hybrid by 15-20% 27
28
Different Chunk Size Total recovery time for RDP (p = 11) CHR improves conventional by 18-26% CHR improves hybrid by 14-19%
29
Different Failed Nodes Total recovery time for RDP (p = 11) CHR still outperforms conventional and hybrid 29
30
Conclusions Address single-node failure recovery RAID-6 coded heterogeneous storage systems Formulate a computation-efficient optimization model Propose a cost-based heterogeneous recovery algorithm Validate the effectiveness of the CHR algorithm through extensive simulations and testbed experiments Future work: Different cost formulations Extension for general XOR-based erasure codes Degraded reads Source code: http://ansrlab.cse.cuhk.edu.hk/software/chr/ 30
31
Backup
32
Cost-based Heterogeneous Recovery (CHR) Algorithm F A bitmap that identifies if a min-read recovery sequence has been enumerated R, C A min-read recovery sequence with its recovery cost R*, C* The min-cost recovery sequence with the minimum total recovery cost 1 Initialize F[0…2 p-1 -1] with 0-bits; Initialize R with 1-bits followed by 0-bits; Initialize R* with R ; Initialize C* with MAX_VALUE 2 If R is null, then go to Step 4; Convert R into integer value v, if R has already enumerated, then go to Step 3; Mark all the shifted an reverse recovery sequences of R as being enumerated; Calculate the recovery cost C of R; Update R* and C* if necessary 3 Get the next min-read recovery sequence R and go to Step 2; 4 Finally, initialize R with all 0-bits; Calculate the recovery cost C of R; Update R* and C* if necessary Notation: Algorithm: 32
33
Example Proxy node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7 New node 26Mbps 68Mbps 109Mbps 110Mbps 113Mbps 10Mbps 110Mbps 86Mbps Step 1: Initialize F[0..63] with 0-bits, R = {1110000}, the recovery cost C = MAX_VALUE Step 2: F[7]=1, mark R’s shifted and reverse recovery sequences: F[56]=F[28]=F[14]=1; Calculate the recovery cost for R, C will be 0.7353α; R*, C* will be updated by R, C Step 2: F[7]=1, mark R’s shifted and reverse recovery sequences: F[56]=F[28]=F[14]=1; Calculate the recovery cost for R, C will be 0.7353α; R*, C* will be updated by R, C Step 3: Get the next min-read recovery sequence R and go to Step 2 Step 4: Finally, we can find that R* = {1010100} and C* = 0.5449α 33 node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7 3544533
34
Recovery Cost Comparison CHR approach Hybrid approach Conventional approach reduce by 25.89% reduce by 40.91% 34 node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7 5433453
35
Different Number of Storage Nodes Consider the overall performance of the complete recovery operation for EVENODD 35
36
Different Chunk Size Evaluate the impact of chunk size for EVENODD on the recovery time performance 36
37
Different Failed Nodes Evaluate the recovery time performance for EVENODD when the failed node is in a different column 37
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.