A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes Yunfeng Zhu 1, Patrick P. C. Lee 2, Liping Xiang 1, Yinlong.

A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes Yunfeng Zhu 1, Patrick P. C. Lee 2, Liping Xiang 1, Yinlong Xu 1, Lingling Gao 1 1 University of Science and Technology of China 2 The Chinese University of Hong Kong DSN’12 1

Fault Tolerance  Fault tolerance becomes more challenging in modern distributed storage systems Increase in scale Usage of inexpensive but less reliable storage nodes  Fault tolerance is ensured by introducing redundancy across storage nodes Replication Erasure codes (e.g., Reed-Solomon codes) 2 A A B B A+B A+2B A A B B A A B B A A B B

XOR-Based Erasure Codes  Encoding/decoding involve XOR operations only Low computational overhead  Different redundancy levels 2-fault tolerant: RDP, EVENODD, X-Code 3-fault tolerant: STAR General-fault tolerant: Cauchy Reed-Solomon (CRS) 3

Failure Recovery  Recovering node failures is necessary Preserve the required redundancy level Avoid data unavailability  Single-node failure recovery  Single-node failure occurs more frequently than a concurrent multi-node failure

Example: Recovery in RDP d 0,6 d 1,6 d 2,6 d 3,6 d 4,6 d 5,6 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ d 0,0 d 0,1 d 0,2 d 0,3 d 0,4 d 0,5 d 1,0 d 1,1 d 1,2 d 1,3 d 1,4 d 1,5 d 2,0 d 2,1 d 2,2 d 2,3 d 2,4 d 2,5 d 3,0 d 3,1 d 3,2 d 3,3 d 3,4 d 3,5 d 4,0 d 4,1 d 4,2 d 4,3 d 4,4 d 4,5 d 5,0 d 5,1 d 5,2 d 5,3 d 5,4 d 5,5 d 0,7 d 1,7 d 2,7 d 3,7 d 4,7 d 5,7 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7 5  An RDP code example with 8 nodes Let’s say node0 fails. How do we recover node0?

Conventional Recovery  Idea: use only row parity sets. Recover each lost data symbol (i.e., data chunk) independently node 0node 1node 2node 3node 4node 5node 6node 7 Read symbols: 36 Then how do we recover node 0 efficiently? Different metrics can be used to measure the efficiency of a recovery scheme 6

Minimize Number of Read Symbols  Idea: use a combination of row and diagonal parity sets to maximize overlapping symbols [Xiang, ToS’11] node 0node 1node 2node 3node 4node 5node 6node 7 Read symbols: 27 Improve rate: 25% Read symbols: 27 Improve rate: 25% 7

Need A New Metric?  A modern storage system is natural to be composed of heterogeneous types of storage nodes System upgrades New node addition  A heterogeneous environment 8 Proxy node 0 node 1node 2 node 3 node4 node 5node 6 node 7 New node 26Mbps 68Mbps 109Mbps 110Mbps 113Mbps 10Mbps 110Mbps 86Mbps Need a new efficient failure recovery solution for heterogeneous environment! Need a new efficient failure recovery solution for heterogeneous environment!

Related Work  Hybrid recovery Minimize number of read symbols RAID-6 XOR-based erasure codes e.g., RDP [Xiang, ToS’11], EVENODD [Wang, Globecom’10  Enumeration recovery [Khan, FAST’12] Enumerate all recovery possibilities to achieve optimal recovery for general XOR-based erasure codes  Greedy recovery [Zhu, MSST’12] Efficient search of recovery solutions for general XOR-based erasure codes  Regenerating codes [Dimakis, ToIT’10] Nodes encode data during recovery Minimize recovery bandwidth Heterogeneous case considered in [Li, Infocom’10], but requires node encoding and collaboration 9

Challenges  How to enable efficient failure recovery for heterogeneous settings? Minimizing # of read symbols  homogeneous settings Performance bottlenecked by poorly performed nodes  How to quickly find the recovery strategy? Minimizing # of read symbols  deterministic metric Minimizing general cost  non-deterministic metric  Recovery decision typically can’t be pre-determined

Our Contributions  Target two RAID-6 codes: RDP and EVENODD XOR-based encoding operations  Goals: Minimize search time Minimize recovery cost Cost-based single-node failure recovery for heterogeneous distributed storage systems 11

Our Contributions  Formulate an optimization problem for single- node failure recovery in heterogeneous settings  Propose a cost-based heterogeneous recovery (CHR) algorithm  Narrow down search space  Suitable for online recovery  Implement and experiment on a heterogeneous networked storage testbed 12

... Node p-1Node p... Weight: Download Distribution: w0w0 w1w1 w p-1 wpwp y0y0 y1y1 y p-1 ypyp... Minimizing total recovery cost: Model Formulation  Our formulation: 13 Node : v0v0 v1v1 vkvk v p-1 vpvp Node 0Node 1 Node k

Physical Meanings wiwi C 1 for all itotal number of symbols being read from surviving nodes inverse of transmission bandwidth of node V i total amount of transmission time to download symbols from surviving nodes monetary cost of migrating per unit of data outbound from node V i the total monetary cost of migrating data from surviving nodes (or clouds) 14

Solving the Model  Important: Which symbols to be fetched from surviving nodes must follow inherent rules of specific coding schemes  To solve the model, we introduce recovery sequence (x 0, x 1, …, x p-2, 0) –x i = 0, d i,k is recovered from its row parity set –x i = 1, d i,k is recovered from its diagonal parity set  download distribution: (3, 2, 2, 3, 2)  recovery sequence: (0, 0, 1, 1, 0) d 0,0 d 1,0 d 2,0 d 3,0 d 0,1 d 1,1 d 2,1 d 3,1 d 0,2 d 1,2 d 2,2 d 3,2 d 0,3 d 1,3 d 2,3 d 3,3 d 0,4 d 1,4 d 2,4 d 3,4 d 0,5 d 1,5 d 2,5 d 3,5 node 0node 1node 2node 3node 4node 5 15  An example: 1) Each recovery sequence represents a feasible recovery solution; 2) Download distribution can be represented by recovery sequence; 1) Each recovery sequence represents a feasible recovery solution; 2) Download distribution can be represented by recovery sequence;

Solving the Model (2)  Step 1: use recovery sequence to represent downloads  Step 2: narrow down search space by only considering min-read recovery sequences (i.e., download minimum number of read symbols during recovery)  Step 3: reformulate the model as Minimize 16

Expensive Enumeration PTotal # of recovery sequences # of min-read recovery sequences # of unique min-read recovery sequences 51662 764204 11102425226 13409692474 176553612870698 19262144486202338 23419430470543228216 29268435456401166001302688 Challenge: Too many min-read recovery sequences to enumerate even we narrow down search space 17 Observation: many min-read recovery sequences return the same download distribution

Optimize Enumeration Process  Two conditions under which different recovery sequences have same download distribution:  Shift condition (0, 0, 0, 1, 1, 1, 0)  (0, 0, 1, 1, 1, 0, 0)  (0, 1, 1, 1, 0, 0, 0)  (1, 1, 1, 0, 0, 0, 0) …  Reverse condition (0, 0, 0, 1, 1, 1, 0)  (0, 1, 1, 1, 0, 0, 0) 18 Key idea: not all recovery sequences need to be enumerated (details in the paper)

Cost-based Heterogeneous Recovery (CHR) Algorithm: Intuition  Step 1: initialize a bitmap to track all possible min-read recovery sequences R  Step 2: compute recovery cost of R.  Step 3: mark all shifted and reverse sequences of R as being enumerated  Step 4: switch to another R; return the one with minimum cost 19

Example Proxy node 0 node 1node 2 node 3 node4 node 5node 6 node 7 New node 26Mbps 68Mbps 109Mbps 110Mbps 113Mbps 10Mbps 110Mbps 86Mbps Our proposed CHR algorithmHybrid approach [Xiang, ToS’11]

Recovery Cost Comparison  CHR approach  Hybrid approach  Conventional approach reduce by 25.89% reduce by 40.91% 21

Simulation Studies (1): Traverse Efficiency  Evaluate the computational time of CHR PNaive traverse time (ms) CHR’s traverse time (ms) Improved rate (%) 50.02200.010054.55 70.09500.031067.37 112.31600.391083.12 1311.98401.615086.52 17107.741010.079090.65 19455.276040.537091.10 239230.7800691.280092.51 29752296.270045423.557093.96 CHR significantly reduces the traverse time of the naive approach by over 90% as p increases! 22

Simulation Studies (2): Robustness Efficiency  Evaluate if CHR achieves the global optimal among all the feasible recovery sequences PHit Global Optimal Probability(%) Global Optimal Max Improvement(%) 594.96.12 794.55.54 1193.65.98 1393.26.46 1792.85.97 1993.15.73 CHR has a very high probability (over 93%) to hit the global optimal recovery cost! 23

Simulation Studies (3): Recovery Efficiency  Evaluate via 100 runs for each p the recovery efficiency of CHR in a heterogeneous storage environment CHR can reduce recovery cost by up to 50% over the conventional approach CHR can reduce recovery cost by up to 30% over the hybrid approach 24

Experiments  Experiments on a networked storage testbed Conventional vs. Hybrid vs. CHR Default chunk size = 1MB Communication via ATA over Ethernet (AoE) Consider two codes: RDP and EVENODD Only RDP results shown in this talk  Recovery operation: Read chunks from surviving nodes Reconstruct lost chunks Write reconstructed chunks to a new node 25 Recovery process Gigabit switch nodes

Experiments  Two types of Ethernet interface card equipped by physical storage devices 100Mbps  set weight = 1/(100Mbps) 1Gbps  set weight = 1/(1Gbps) 26 pTotal # of nodes # of nodes with 100Mbps # of nodes with 1Gbps 5624 7835 111257 131468 171899 Configuration for RDP code

Different Number of Storage Nodes  Total recovery time for RDP CHR improves conventional by 21-31% CHR improves hybrid by 15-20% 27

Different Chunk Size  Total recovery time for RDP (p = 11) CHR improves conventional by 18-26% CHR improves hybrid by 14-19%

Different Failed Nodes  Total recovery time for RDP (p = 11) CHR still outperforms conventional and hybrid 29

Conclusions  Address single-node failure recovery RAID-6 coded heterogeneous storage systems  Formulate a computation-efficient optimization model  Propose a cost-based heterogeneous recovery algorithm  Validate the effectiveness of the CHR algorithm through extensive simulations and testbed experiments  Future work:  Different cost formulations  Extension for general XOR-based erasure codes  Degraded reads  Source code: http://ansrlab.cse.cuhk.edu.hk/software/chr/ 30

Backup

Cost-based Heterogeneous Recovery (CHR) Algorithm F A bitmap that identifies if a min-read recovery sequence has been enumerated R, C A min-read recovery sequence with its recovery cost R*, C* The min-cost recovery sequence with the minimum total recovery cost 1 Initialize F[0…2 p-1 -1] with 0-bits; Initialize R with 1-bits followed by 0-bits; Initialize R* with R ; Initialize C* with MAX_VALUE 2 If R is null, then go to Step 4; Convert R into integer value v, if R has already enumerated, then go to Step 3; Mark all the shifted an reverse recovery sequences of R as being enumerated; Calculate the recovery cost C of R; Update R* and C* if necessary 3 Get the next min-read recovery sequence R and go to Step 2; 4 Finally, initialize R with all 0-bits; Calculate the recovery cost C of R; Update R* and C* if necessary Notation: Algorithm: 32

Example Proxy node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7 New node 26Mbps 68Mbps 109Mbps 110Mbps 113Mbps 10Mbps 110Mbps 86Mbps Step 1: Initialize F[0..63] with 0-bits, R = {1110000}, the recovery cost C = MAX_VALUE Step 2: F[7]=1, mark R’s shifted and reverse recovery sequences: F[56]=F[28]=F[14]=1; Calculate the recovery cost for R, C will be 0.7353α; R*, C* will be updated by R, C Step 2: F[7]=1, mark R’s shifted and reverse recovery sequences: F[56]=F[28]=F[14]=1; Calculate the recovery cost for R, C will be 0.7353α; R*, C* will be updated by R, C Step 3: Get the next min-read recovery sequence R and go to Step 2 Step 4: Finally, we can find that R* = {1010100} and C* = 0.5449α 33 node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7 3544533

Recovery Cost Comparison  CHR approach  Hybrid approach  Conventional approach reduce by 25.89% reduce by 40.91% 34 node 0 node 1 node 2 node 3 node 4 node 5 node 6 node 7 5433453

Different Number of Storage Nodes  Consider the overall performance of the complete recovery operation for EVENODD 35

Different Chunk Size  Evaluate the impact of chunk size for EVENODD on the recovery time performance 36

Different Failed Nodes  Evaluate the recovery time performance for EVENODD when the failed node is in a different column 37

A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes Yunfeng Zhu 1, Patrick P. C. Lee 2, Liping Xiang 1, Yinlong.

Similar presentations

Presentation on theme: "A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes Yunfeng Zhu 1, Patrick P. C. Lee 2, Liping Xiang 1, Yinlong."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes Yunfeng Zhu 1, Patrick P. C. Lee 2, Liping Xiang 1, Yinlong.

Similar presentations

Presentation on theme: "A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes Yunfeng Zhu 1, Patrick P. C. Lee 2, Liping Xiang 1, Yinlong."— Presentation transcript:

Similar presentations

About project

Feedback