Reclaiming Space from Duplicate Files in a Serverless Distributed File System From Microsoft Research.

Slides:



Advertisements
Similar presentations
The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Advertisements

Computer Science CSC 474Dr. Peng Ning1 CSC 474 Information Systems Security Topic 4.6 Kerberos.
CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.
Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
Storage management and caching in PAST Antony Rowstron and Peter Druschel Presented to cs294-4 by Owen Cooper.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
1 LINK STATE PROTOCOLS (contents) Disadvantages of the distance vector protocols Link state protocols Why is a link state protocol better?
1 Dynamic Key-Updating: Privacy- Preserving Authentication for RFID Systems Li Lu, Lei Hu State Key Laboratory of Information Security, Graduate School.
1 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan.
Storage Management and Caching in PAST, a large-scale, persistent peer- to-peer storage utility Authors: Antony Rowstorn (Microsoft Research) Peter Druschel.
Scalable and Distributed Similarity Search in Metric Spaces Michal Batko Claudio Gennaro Pavel Zezula.
Distributed Lookup Systems
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Object Naming & Content based Object Search 2/3/2003.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
TNMK09 Computer Networks Copyright © 2005 Di Yuan, ITN, LiTH 1  Non-hierarchical routing, static or dynamic, won’t work in the Internet  None of the.
Routing.
Farsite: Ferderated, Available, and Reliable Storage for an Incompletely Trusted Environment Microsoft Reseach, Appear in OSDI’02.
P2P Course, Structured systems 1 Introduction (26/10/05)
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
Chapter 13: WAN Technologies and Routing 1. LAN vs. WAN 2. Packet switch 3. Forming a WAN 4. Addressing in WAN 5. Routing in WAN 6. Modeling WAN using.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment.
CEN Network Fundamentals Chapter 19 Binding Protocol Addresses (ARP) To insert your company logo on this slide From the Insert Menu Select “Picture”
M.Menelaou CCNA2 ROUTING. M.Menelaou ROUTING Routing is the process that a router uses to forward packets toward the destination network. A router makes.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment A. Atta, W. J. Bolowsky, M. Castro, G. Cermak, R. Chaiken, J.
Freenet: A Distributed Anonymous Information Storage and Retrieval System Josh Colvin CIS 590, Fall 2011.
1 Setting Up Routing Vectors in a Network of Bridged 1394 buses PHILIPS Research Subrata Banerjee PHILIPS Research Briarcliff, New York P WG Meeting,
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Static versus Dynamic Routes Static Route Uses a protocol route that a network administrators enters into the router Static Route Uses a protocol route.
Super-peer Network. Motivation: Search in P2P Centralised (Napster) Flooding (Gnutella)  Essentially a breadth-first search using TTLs Distributed Hash.
IP1 The Underlying Technologies. What is inside the Internet? Or What are the key underlying technologies that make it work so successfully? –Packet Switching.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Outline for Today’s Lecture Administrative: –Happy Thanksgiving –Sign up for demos. Objective: –Peer-to-peer file systems Mechanisms employed Issues Some.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
Peer-to-Peer (P2P) File Systems. P2P File Systems CS 5204 – Fall, Peer-to-Peer Systems Definition: “Peer-to-peer systems can be characterized as.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
System Models Advanced Operating Systems Nael Abu-halaweh.
SmartCode Brad Argue INLS /19/2001.
Services DFS, DHCP, and WINS are cluster-aware.
Updating SF-Tree Speaker: Ho Wai Shing.
Ch 13 WAN Technologies and Routing
A Replica Location Service
Accessing nearby copies of replicated objects
Routing.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
Providing Secure Storage on the Internet
A Scalable content-addressable network
Outline Midterm results summary Distributed file systems – continued
7.1. CONSISTENCY AND REPLICATION INTRODUCTION
The Secure Sockets Layer (SSL) Protocol
Replica Placement Model: We consider objects (and don’t worry whether they contain just data or code, or both) Distinguish different processes: A process.
Federated, Available, and Reliable Storage for an Incompletely Trusted Environment Atul Adya, William J. Bolosky, Miguel Castro, Gerald Cermak, Ronnie.
Routing.
Presentation transcript:

Reclaiming Space from Duplicate Files in a Serverless Distributed File System From Microsoft Research

Motivation Unused disk space on desktop computers A lot of files are identical Can be used to build a “central” file server Provide high availability & reliability Farsite –Convergent encryption –SALAD

Convergent encryption Identical files are still identical after encryption, even with different keys K1=Hash(P) C1=E1(P, K1) M =E2(K1, Ku) C = C1 are the same for identical files, but M are different for different users. Without Ku, nobody can read P.

THEX (Tree Hash EXchange format ) ROOT=H(E+F) / \ E =H(A+B) F=H(C+D) / \ / \ A=H(S1) B=H(S2) C=H(S3) D=H(S4)

SALAD Self-Arranging, Lossy, Associative Database Leaf: all nodes Cell: a set of nodes, full duplicate of all files Every file has a fingerprint Cell-ID width W= lg(L/۸) –L: system size, ۸: target redundancy factor Dimensionality parameter D

SALAD

Files are full duplicated inside cells, Each node maintains a routing table for all vector- aligned nodes

SALAD: properties Each node estimates the system size separately Inconsistent estimation doesn’t cause malfunction, but less efficiency Routing table is relatively small Robust to attack

A Demand based Algorithm for Rapid Updating of Replicas From Polytechnic University of Catalonia, Spain In weak consistency algorithms, updating replicas which have most demand, a greater number of clients would gain access to updated content in a shorter period of time. Anti-entropy Session: two servers mutually exchange summary vectors and then exchange data to build consistent content

Algorithm Each node has a number donating its demand for some replica Choose the neighbor which has highest demand to start the session After a session, the node (just get the new update) will continue this process if it has some neighbor which has higher demand than itself.

Algorithm Demand: number of request per unit time –What does it exactly mean? How to get it? Dynamic algorithm: –The demand of neighbors may change over time. So exchange the demand between neighbors periodically. –How does the static algorithm work? How and when does a node get the demand of its neighbors?