Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cooperative backup on Social Network Nguyen Tran and Jinyang Li.

Similar presentations


Presentation on theme: "Cooperative backup on Social Network Nguyen Tran and Jinyang Li."— Presentation transcript:

1 Cooperative backup on Social Network Nguyen Tran and Jinyang Li

2 Motivation Backup is important. State of the art solutions –Buy second harddisk –Manual backup to mobile disk / CDs –Sign up for online backup (10 bucks for 1GB/month) Manual backup is not good (additional harddisk, need to remember) Important data need long distance separation between original and backup copy, e.g. Wall Street center’s data. Idea: backup on p2p network (utilize idle space, backup daemon, remoteness).

3 Solution overview How to make sure nodes w/ data stay in the system? –the malicious gets data and go. Idea: backup on your real friend’s node(s). Consequence: lose global space utilization but gain incentives. For backup service: –Data safety >> global space utilization.

4 Model Meta data Data

5 Q#1: efficient space allocation If I join w/ 100G to back up and 100G to contribute, can I back up all the data? Orkut: 2363 nodes, 78% space utilization Venus: 39783 nodes, 81% space utilization Over half of nodes can backup all data Which buddy to pick to further optimize global space efficiency? Buddy with min degree?

6 Q#2: space optimization w/ coding Q: If you only have 1G idle space, can you store 5G worth of your friends’ backups? Ans: yes! a1a1 a2a2 anan a n-1 A = a 1 ⊕ a 2 ⊕ … ⊕ a n How about 2 friends crash at the same time?

7 How many disk space you need to store a 1, a 2, …, a n ? Disk Space# concurrent crashes tolerable Bandwidth redundancy nn0 F(n) = ?2? 11n-1

8 How many disk space you need to store a 1, a 2, …, a n ? Disk Space# concurrent crashes tolerable Bandwidth redundancy nn0 F(n) = log(n)2O(n/2) 11n-1

9 Definition Let S = { a 1, a 2, …, a n } Let T ⊆ S, denote ∂(T) is the XOR of all elements in T. A solution X = {S 1, S 2, …,S k } where S i ⊆ S means you store ∂(S 1 ), ∂(S 2 ), …, ∂(S k ) on your machine, i.e. F(n) = k. Of course, ∪ S i = S

10 Lemma: X is a solution that tolerates 2 concurrent crashes iff ∀ p, q ∈ [1..n], ∃ i ∈ [1..k]: S i contains either a p or a q but not both. i.e. X={{a 1, a 2, a 3 }} a2a2 a1a1 a3a3 a1⊕a2⊕a3a1⊕a2⊕a3 bad i.e. X={{a 1, a 2 }, {a 1,a 3 }} a2a2 a1a1 a3a3 a 1 ⊕ a 2 and a 1 ⊕ a 3 good

11 Lemma: X is a solution that tolerates 2 concurrent crashes iff ∀ p, q ∈ [1..n], ∃ i ∈ [1..k]: S i contains either a p or a q but not both. Proof: =>: suppose every S i contains both a p & a q or non of them, XOR them cannot reduce individual a p or a q. <=: Supose S i contains a p but not a q. For each element a i in S i \a p, get a i from the owner (not crash) and XOR with S i. Finally, we can get a p. Then getting a q is easy, i.e. X is the solution.

12 How small is k? Our ans: log(n) Solution construction: F(2n) = F(n) + 1 If there are 2n data a 1, a 2, …, a n,a n+1, …, a 2n to backup. –Put {a 1, a 2, …, a n } to X –For every set in the solution of n data {a 1, a 2, …, a n } union with it’s isomorphic in the set {a n,a n+1, …, a 2n } and put in X

13 Example n = 2, F(n) = k = 2 n = 4, F(n) = k = 3 n = 8, F(n) = k = 4

14 How many disk space you need to store a 1, a 2, …, a n ? Disk Space# concurrent crashes tolerable Bandwidth redundancy nn0 ?….? 2n/321 F(n) = log(n)2n/2-1 11n-1

15 My questions Is this result known before? Log(n) is a lower bound for 2 concurrent crashes tolerable F(n) = ? for tolerating 3, 4, 5 … #concurrent crashes.

16 Implementation Options #1: backup at which granularity? –Consolidate backup data into 1 log file: Pros: hide file size, recover older version, incremental backup Cons: bad space & bandwidth efficiency –Backup data at file granularity: Pros: space & bandwidth efficiency Cons: reveal file size, subtle detail about cutting big files, wise update…,

17 #2: Wise transfer for updating file Problem: if two versions of the file have little difference, transfer the whole file again is expensive. Idea (rsync): only transfer the necessary bytes. Let A’ is the updated file on node N, A is the old version of the file kept by M. M: –Cut A to fix size chunks and compute the hash. –Send all hash h 1, h 2, … h n to N N: –Compute hash of chunks in A’ in sliding window fashion. –Compare with h 1, h 2, … h n to know overlapping. –Sent only necessary bytes to M.

18 #3: Cutting big file into small parts Problem: One friend doesn’t have enough space for your big file. Therefore, you need to cut big file into smaller parts. But how to cut them so that later update is easy. Fix part size? No, if the file is insert/delete one byte, all the parts are shifted. Hence, you need to update all the old parts. Idea (LBFS): Using file bit pattern of the file to set the boundary rather than fix size. As a result, if one byte is inserted/deleted, only the part containing that file changes. boundary sliding window

19 Other issues #4: Trust but verify your friends. –check that backup is still there –how to check if friends contribute right share? #5: how to check if the backup copy still exists if you and your friend are not online at the same time. –Idea: ask other friends to help. #6: Sharing files among friends. –Viewers automatically cache/back up the file. –Backed up data increase availability of files shared.

20 The End


Download ppt "Cooperative backup on Social Network Nguyen Tran and Jinyang Li."

Similar presentations


Ads by Google