Symmetric Allocations for Distributed Storage

Slides:



Advertisements
Similar presentations
On allocations that maximize fairness Uriel Feige Microsoft Research and Weizmann Institute.
Advertisements

On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
Lesson 08 Linear Programming
Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems Yuchong Hu, Patrick P. C. Lee, Kenneth.
parity bit is 1: data should have an odd number of 1's
Introduction to Sensitivity Analysis Graphical Sensitivity Analysis
Coverage by Directional Sensors Jing Ai and Alhussein A. Abouzeid Dept. of Electrical, Computer and Systems Engineering Rensselaer Polytechnic Institute.
An Approximate Truthful Mechanism for Combinatorial Auctions An Internet Mathematics paper by Aaron Archer, Christos Papadimitriou, Kunal Talwar and Éva.
B IPARTITE I NDEX C ODING Arash Saber Tehrani Alexandros G. Dimakis Michael J. Neely Department of Electrical Engineering University of Southern California.
Optimal redundancy allocation for information technology disaster recovery in the network economy Benjamin B.M. Shao IEEE Transaction on Dependable and.
Online Scheduling with Known Arrival Times Nicholas G Hall (Ohio State University) Marc E Posner (Ohio State University) Chris N Potts (University of Southampton)
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Chernoff Bounds, and etc.
Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod
Size of giant component in Random Geometric Graphs
1 Maximal Independent Set. 2 Independent Set (IS): In a graph, any set of nodes that are not adjacent.
Binomial Random Variables. Binomial experiment A sequence of n trials (called Bernoulli trials), each of which results in either a “success” or a “failure”.
Using Redundancy to Cope with Failures in a Delay Tolerant Network Sushant Jain, Michael Demmer, Rabin Patra, Kevin Fall Source:
Dealing with NP-Complete Problems
A Hybrid Approach of Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation Yinlong Xu University of Science and Technology of.
Minimaxity & Admissibility Presenting: Slava Chernoi Lehman and Casella, chapter 5 sections 1-2,7.
Efficient replica maintenance for distributed storage systems Byung-Gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon, M. Frans Kaashoek,
Name: Mehrab Khazraei(145061) Title: Penalty or Exterior penalty function method professor Name: Sahand Daneshvar.
Software faults & reliability Presented by: Presented by: Pooja Jain Pooja Jain.
Computing and Communicating Functions over Sensor Networks A.Giridhar and P. R. Kumar Presented by Srikanth Hariharan.
Distributed Storage Allocations for Optimal Delay Derek Leong 1, Alexandros G. Dimakis 2, Tracey Ho 1 1 California Institute of Technology 2 University.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
Linear Programming Data Structures and Algorithms A.G. Malamos References: Algorithms, 2006, S. Dasgupta, C. H. Papadimitriou, and U. V. Vazirani Introduction.
Maximization of Network Survivability against Intelligent and Malicious Attacks (Cont’d) Presented by Erion Lin.
Optimal Content Delivery with Network Coding Derek Leong, Tracey Ho California Institute of Technology Rebecca Cathey BAE Systems CISS 2009 March 19, 2009.
Energy-Efficient Monitoring of Extreme Values in Sensor Networks Loo, Kin Kong 10 May, 2007.
Erasure Coding for Real-Time Streaming Derek Leong and Tracey Ho California Institute of Technology Pasadena, California, USA ISIT
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
On Coding for Real-Time Streaming under Packet Erasures Derek Leong *#, Asma Qureshi *, and Tracey Ho * * California Institute of Technology, Pasadena,
OR Chapter 8. General LP Problems Converting other forms to general LP problem : min c’x  - max (-c)’x   = by adding a nonnegative slack variable.
Efficient Resource Allocation for Wireless Multicast De-Nian Yang, Member, IEEE Ming-Syan Chen, Fellow, IEEE IEEE Transactions on Mobile Computing, April.
Tunable QoS-Aware Network Survivability Presenter : Yen Fen Kao Advisor : Yeong Sung Lin 2013 Proceedings IEEE INFOCOM.
CS623: Introduction to Computing with Neural Nets (lecture-12) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Toward Reliable and Efficient Reporting in Wireless Sensor Networks Authors: Fatma Bouabdallah Nizar Bouabdallah Raouf Boutaba.
Approximation Algorithms Duality My T. UF.
Forward and Backward Deviation Measures and Robust Optimization Peng Sun (Duke) with Xin Chen (UIUC) and Melvyn Sim (NUS)
Approximation Algorithms based on linear programming.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Seminar On Rain Technology
Pouya Ostovari and Jie Wu Computer & Information Sciences
Theory of Computational Complexity Yusuke FURUKAWA Iwama Ito lab M1.
Theory of Computational Complexity M1 Takao Inoshita Iwama & Ito Lab Graduate School of Informatics, Kyoto University.
Lesson: ____ Section: 4.3 All global extrema occur at either critical points or endpoints of the interval. So our task is to find all these candidates.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Mingze Zhang, Mun Choon Chan and A. L. Ananda School of Computing
Double Regenerating Codes for Hierarchical Data Centers
Random Testing: Theoretical Results and Practical Implications IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2012 Andrea Arcuri, Member, IEEE, Muhammad.
Properties of Functions
Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign.
Authors Alessandro Duminuco, Ernst Biersack Taoufik and En-Najjary
Chapter 5. Optimal Matchings
Maximal Independent Set
Network Optimization Research Laboratory
Outline Announcements Fault Tolerance.
Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1
Duality Theory and Sensitivity Analysis
TECHNICAL SEMINAR PRESENTATION
RAID Redundant Array of Inexpensive (Independent) Disks
Introduction Wireless Ad-Hoc Network
Chapter 8. General LP Problems
Chapter 8. General LP Problems
Presentation transcript:

Symmetric Allocations for Distributed Storage Derek Leong1, Alexandros G. Dimakis2, Tracey Ho1 1California Institute of Technology, USA 2University of Southern California, USA GLOBECOM 2010 2010-12-09

A Motivating Example Suppose you have a distributed storage system comprising 5 storage devices (“nodes”)… 1 2 3 4 5

2 4 1 2 3 4 5 A Motivating Example (1/3)2 (2/3)3 ≈ 0.0329218 Each node independently fails with probability 1/3, and survives with probability 2/3 … 2 4 1 2 3 4 5 (1/3)2 (2/3)3 ≈ 0.0329218

1 2 3 4 5 1 2 3 4 5 A Motivating Example (1/3)5 ≈ 0.00411523 Each node independently fails with probability 1/3, and survives with probability 2/3 … 1 2 3 4 5 1 2 3 4 5 (1/3)5 ≈ 0.00411523

A Motivating Example You are given a single data object of unit size, and a total storage budget of 7/3 … 1 2 3 4 5

A Motivating Example You can use any coding scheme to store any amount of coded data in each node, as long as the total amount of storage used is at most the given budget 7/3 … 1 2 3 4 5

A Motivating Example 1 010010101010010101000101010101000101010111010101001001010001010100 2 01101010001010101110101010010010100010101001 3 1010010101000101001110 4 1010010101000101001110 5

? 1 2 3 4 5 A Motivating Example (1/3)2 (2/3)3 ≈ 0.0329218 010010101010010101000101010101000101010111010101001001010001010100 2 01101010001010101110101010010010100010101001 3 1010010101000101001110 4 1010010101000101001110 5 ?

A Motivating Example For maximum reliability, we need to find (1) an optimal allocation of the given budget over the nodes, and (2) an optimal coding scheme that jointly maximize the probability of successful recovery

A Motivating Example S 1 2 3 4 5 t1 t2 Using an appropriate code, successful recovery occurs whenever the data collector accesses at least a unit amount of data (= size of the original data object) S 1 2 3 4 5 t1 t2

A Motivating Example 1 2 3 4 5

A Motivating Example Recovery Probability 1 2 3 4 5 for p = 2/3 A 7/15 7/15 7/15 7/15 7/15 0.79012 B 7/6 7/6 0 0 0 0.88889 C C 2/3 2/3 1/3 1/3 1/3 0.90535

#P-hard to compute for a given allocation and choice of p Problem Formulation #P-hard to compute for a given allocation and choice of p Given n nodes, access probability p, and total storage budget T, find an optimal allocation (x1; …; xn) that maximizes the probability of successful recovery recovery probability The optimal allocation also tells us whether coding is beneficial for reliable storage budget constraint Trivial cases of minimum and maximum budgets: when T = 1, the allocation (1, 0, …, 0) is optimal when T = n, the allocation (1, 1, …, 1) is optimal

Related Work Discussion between R. Karp, R. Kleinberg, C. Papadimitriou, E. Friedman, and others at UC Berkeley, 2005 S. Jain, M. Demmer, R. Patra, K. Fall, “Using redundancy to cope with failures in a delay tolerant network,” SIGCOMM 2005

Symmetric Allocations We are particularly interested in symmetric allocations because they are easy to describe and implement Successful recovery for the symmetric allocation occurs if and only if at least out of the m nonempty nodes are accessed Therefore, the recovery probability of is

Asymptotic Optimality of Max Spreading The symmetric allocation that spreads the budget maximally over all n nodes is asymptotically optimal when the budget T is sufficiently large RESULT 1 The gap between the recovery probabilities for an optimal allocation and for the symmetric allocation is at most . If p and T are fixed such that , then this gap approaches zero as .

Asymptotic Optimality of Max Spreading Proof Idea: Bounding the optimal recovery probability… By conditioning on the number of accessed nodes r, we can express the probability of successful recovery as where Sr is the number of successful r-subsets We can in turn bound Sr by observing that we have Sr inequalities of the form , which can be summed up to produce , where

Asymptotic Optimality of Max Spreading Proof Idea: Bounding the optimal recovery probability… We therefore have Applying the bound to leads to the conclusion that the optimal recovery probability is at most

Asymptotic Optimality of Max Spreading Proof Idea: Bounding the suboptimality gap for max spreading… The recovery probability of the allocation is The suboptimality gap for this allocation is therefore at most the difference between the upper bound for the optimal recovery probability and 1, which is For , we can apply the Chernoff bound to obtain As , this upper bound approaches zero

Optimal Symmetric Allocation number of nonempty nodes in the symmetric allocation The problem is nontrivial even when restricted to symmetric allocations…

Optimal Symmetric Allocation Maximal spreading is optimal among symmetric allocations when the budget T is sufficiently large RESULT 2 If , then either or is an optimal symmetric allocation.

Optimal Symmetric Allocation Minimal spreading is optimal among symmetric allocations when the budget T is sufficiently small Coding is unnecessary for such an allocation RESULT 3 If , then is an optimal symmetric allocation.

Optimal Symmetric Allocation Proof Idea: Finding the optimal symmetric allocation… Observe that we can find an optimal m* from among candidates: For , where , the recovery probability is RESULT 2 (max spreading optimal) is a sufficient condition on p and T for to be nondecreasing in k To obtain RESULT 3 (min spreading optimal) , we first establish a sufficient condition on p and T for to be nonincreasing in k; we subsequently expand the condition to include other points for which remains optimal m … For constant p and k, is a nondecreasing function of m Recall that the recovery probability of the symmetric allocation is given by

Optimal Symmetric Allocation maximal spreading is optimal among symmetric allocations other symmetric allocations may be optimal in the gap minimal spreading is optimal among symmetric allocations

Conclusion The optimal allocation is not necessarily symmetric However, the symmetric allocation that spreads the budget maximally over all n nodes is asymptotically optimal when the budget is sufficiently large Furthermore, we are able to specify the optimal symmetric allocation for a wide range of parameter values of p and T

Thank you!