A Fault Tolerance Protocol for Uploads: Design and Evaluation Leslie Cheung*, Cheng-Fu Chou#, Leana Golubchik*, Yan Yang* *Internet Multimedia Lab Computer Science Department & IMSC University of Southern California #Department of Computer Science and Information Engineering National Taiwan University
Background and Motivation Bistro: A scalable, secure, wide area upload architecture ISPA '04
Background and Motivation ISPA '04
Problem When some intermediaries fail or are malicious, the original protocol does not perform well Need to ask clients to retransmit lost data Goals Improve performance by reducing retransmissions Reduce the amount of redundant data ISPA '04
Outline of Fault Tolerance Protocol Erasure Code: Let k be the number of data packets, an erasure code encoder adds (n-k) parity packets to make it a n packets file Erasure codes assume that received packets are correct. This assumption is invalid because data can be corrupted. Solution: Use checksums to detect corrupted packets Drop corrupted packets, and treat them as losses ISPA '04
Outline of Fault Tolerance Protocol Definition: Checksum groups Generate a checksum of a group of packets ISPA '04
Outline of Fault Tolerance Protocol ISPA '04
Analytical Models Reliability Model Performance Model Cost Function Packet Lost Independently with probability p Metric (c1) Probability of retransmissions Performance Model Performance at the first step Size of the timestamp request messages Metric (c2) Number of checksums per data packet Cost Function Cost = w1 * c1 + w2 * c2 w1, w2: weights ISPA '04
Numerical Results Vary different parameters in the cost function Parameters of interest (n-k): number of parity packets in FEC group k: number of data packets in FEC group Few large FEC groups vs many small FEC groups Z: number of checksum groups in a FEC group p: probability of losing a packet ISPA '04
Numerical Results Varying (n-k), number of parity packets per FEC group Y = 5, k = 10, Z = 2,3,…, p = 0.01, w1 = 0.9, w2=0.1 ISPA '04
Numerical Results Varying k, number of data packets per FEC group W = 100, n = 2k , Z = 2, p = 0.01, w1 = 0.9, w2=0.1 ISPA '04
Numerical Results Varying Z, number of checksum groups per FEC group Y = 5, n = 20, k = 10, p = 0.01, w1 = 0.9, w2=0.1 ISPA '04
Numerical Results Varying p, probability of losing a packet Y = 5, n = 20, k = 10, Z = 2, w1 = 0.9, w2=0.1 ISPA '04
Conclusions and Future Work Fault tolerance is important in uploads Our protocol is in the right direction Future Work How to set the parameters? Striping reliability and performance Data collection problem (not all packets are needed with erasure code) ISPA '04
Ordering of FEC group & checksum group Can we reverse the order of FEC group and checksum group? No. We drop the all packets in a checksum group if the checksum check fails. If we reverse the order, losing one packet would result in dropping all packets in a checksum group, which consist of a number of FEC groups. Do not have any packets to recover the lost part. ISPA '04