Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reclaiming Space from Duplicate Files in a Serverless Distributed File System From Microsoft Research.

Similar presentations


Presentation on theme: "Reclaiming Space from Duplicate Files in a Serverless Distributed File System From Microsoft Research."— Presentation transcript:

1 Reclaiming Space from Duplicate Files in a Serverless Distributed File System From Microsoft Research

2 Motivation Unused disk space on desktop computers A lot of files are identical Can be used to build a “central” file server Provide high availability & reliability Farsite –Convergent encryption –SALAD

3 Convergent encryption Identical files are still identical after encryption, even with different keys K1=Hash(P) C1=E1(P, K1) M =E2(K1, Ku) C = C1 are the same for identical files, but M are different for different users. Without Ku, nobody can read P.

4 THEX (Tree Hash EXchange format ) ROOT=H(E+F) / \ E =H(A+B) F=H(C+D) / \ / \ A=H(S1) B=H(S2) C=H(S3) D=H(S4)

5 SALAD Self-Arranging, Lossy, Associative Database Leaf: all nodes Cell: a set of nodes, full duplicate of all files Every file has a fingerprint Cell-ID width W= lg(L/۸) –L: system size, ۸: target redundancy factor Dimensionality parameter D

6 SALAD

7 Files are full duplicated inside cells, Each node maintains a routing table for all vector- aligned nodes

8 SALAD: properties Each node estimates the system size separately Inconsistent estimation doesn’t cause malfunction, but less efficiency Routing table is relatively small Robust to attack

9 A Demand based Algorithm for Rapid Updating of Replicas From Polytechnic University of Catalonia, Spain In weak consistency algorithms, updating replicas which have most demand, a greater number of clients would gain access to updated content in a shorter period of time. Anti-entropy Session: two servers mutually exchange summary vectors and then exchange data to build consistent content

10 Algorithm Each node has a number donating its demand for some replica Choose the neighbor which has highest demand to start the session After a session, the node (just get the new update) will continue this process if it has some neighbor which has higher demand than itself.

11 Algorithm Demand: number of request per unit time –What does it exactly mean? How to get it? Dynamic algorithm: –The demand of neighbors may change over time. So exchange the demand between neighbors periodically. –How does the static algorithm work? How and when does a node get the demand of its neighbors?


Download ppt "Reclaiming Space from Duplicate Files in a Serverless Distributed File System From Microsoft Research."

Similar presentations


Ads by Google