Goal: To build a ubiquitous and robust storage infrastructure Requirement: Scalability, availability, performance, robustness Solution: Dynamic object.

Slides:

Advertisements

Similar presentations

Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Advertisements

1 Data-Oriented Network Architecture (DONA) Scott Shenker (M. Chowla, T. Koponen, K. Lakshminarayanan, A. Ramachandran, A. Tavakoli, I. Stoica)

Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.

Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.

Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.

Mendel Rosenblum and John K. Ousterhout Presented by Travis Bale 1.

Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.

MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 6 Managing and Administering DNS in Windows Server 2008.

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.

Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.

Application Layer Anycasting: A Server Selection Architecture and Use in a Replicated Web Service Presented in by Jayanthkumar Kannan On 11/26/03.

Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.

P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.

Design, Implementation, and Experimentation on Mobile Agent Security for Electronic Commerce Applications Anthony H. W. Chan, Caris K. M. Wong, T. Y. Wong,

ICNP'061 Benefit-based Data Caching in Ad Hoc Networks Bin Tang, Himanshu Gupta and Samir Das Department of Computer Science Stony Brook University.

OSD Metadata Management

OSD: Storage Substrate for the Enterprise and … the Grid Feng Wang Department of Computer Science University of Minnesota.

Large Scale Sharing GFS and PAST Mahesh Balakrishnan.

presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.

Object Naming & Content based Object Search 2/3/2003.

Introspective Replica Management Yan Chen, Hakim Weatherspoon, and Dennis Geels Our project developed and evaluated a replica management algorithm suitable.

Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.

The Google File System.

Delivery, Forwarding, and Routing

Wide-area cooperative storage with CFS

70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 7: Planning a DNS Strategy.

Two-Tier Architecture of OSD Metadata Management Xianbo Zhang, Keqiang Wu 11/11/2002.

Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.

Tapestry: A Resilient Global-scale Overlay for Service Deployment Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.

 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.

Naming Chapter 5. n Most of the lecture notes are based on slides by Prof. Jalal Y. Kawash at Univ. of Calgary n Some slides are from Brennen Reynolds.

Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗

Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)

Application-Layer Anycasting By Samarat Bhattacharjee et al. Presented by Matt Miller September 30, 2002.

Cisco – Chapter 11 Routers All You Ever Wanted To Know But Were Afraid to Ask.

Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.

CCNA 1 Module 10 Routing Fundamentals and Subnets.

2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.

Freenet File sharing for a political world. Freenet: A Distributed Anonymous Information Storage and Retrieval System I. Clarke, O. Sandberg, B. Wiley,

Security Michael Foukarakis – 13/12/2004 A Survey of Peer-to-Peer Security Issues Dan S. Wallach Rice University,

Efficient Peer to Peer Keyword Searching Nathan Gray.

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 6 System Calls OS System.

Module 7: Resolving NetBIOS Names by Using Windows Internet Name Service (WINS)

Locating Mobile Agents in Distributed Computing Environment.

MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.

Communication Paradigm for Sensor Networks Sensor Networks Sensor Networks Directed Diffusion Directed Diffusion SPIN SPIN Ishan Banerjee

Serverless Network File Systems Overview by Joseph Thompson.

1 More on Plaxton routing There are n nodes, and log B n digits in the id, where B = 2 b The neighbor table of each node consists of - primary neighbors.

S-Paxos: Eliminating the Leader Bottleneck

1. Outline  Introduction  Different Mechanisms Broadcasting Multicasting Forward Pointers Home-based approach Distributed Hash Tables Hierarchical approaches.

Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.

Peer-to-Peer Video Systems: Storage Management CS587x Lecture Department of Computer Science Iowa State University.

CSCI 599: Beyond Web Browsers Professor Shahram Ghandeharizadeh Computer Science Department Los Angeles, CA

Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.

P2P Networking: Freenet Adriane Lau November 9, 2004 MIE456F.

IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany ©

PART1 Data collection methodology and NM paradigms 1.

Federating Data in the ALICE Experiment

CS 268: Lecture 22 (Peer-to-Peer Networks)

Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)

EE 122: Peer-to-Peer (P2P) Networks

5.2 FLAT NAMING.

Peer-to-Peer Video Services

COT 4600 Operating Systems Spring 2011

Specialized Cloud Architectures

Cloud Computing Architecture

Overview Multimedia: The Role of WINS in the Network Infrastructure

Presentation transcript:

Goal: To build a ubiquitous and robust storage infrastructure Requirement: Scalability, availability, performance, robustness Solution: Dynamic object replication and migration in hybrid architecture Background

Three-layer replica creation Object Layer + Intelligent Disk Layer + Regional Manager Layer Object Layer - Metadata entries - object GUID - replication_threshold: requests/time - delete_threshold: requests/time - replication_where: region_list - itinerary: (region1, DiskIP1, period1), (region2, DiskIP2, period2), … - subobject_list: (GUID1, pointer1), (GUID2, pointer2), ….

Intelligent Disk Layer - Metadata - device description bandwidth, CPU utilization, available space, region ID, IP address - object layout and other metadata (defined above) - request information for each object the amount of requests over a period of time for each region the popularity: request amount * region weight - soft-state information about its neighbor devices IP address, bandwidth, CPU utilization, available space neighbor devices are those in two hops and have the same region ID the disk will broadcast its load and free space information to all its neighbors periodically

Intelligent Disk Layer - Policy Parameters: disk_replication_threshold, disk_load_threshold 1. If the requests to an object exceed the replication_threshold associated with the object, the disk will create a new replica on its neighbor based on neighbor’s status. 2. If the requests to one object from one region exceed the disk_replication_threshold, the disk will replicate the object to that region. If the region is not the same region as the disk in, disk needs to ask its own regional manager to replicate the object. If the region is the same region as the disk in, the disk can replicate the object on its neighbors based on neighbor’s status. The disk is responsible for redirecting the request to the new replica. 3. If the disk load exceeds the disk load threshold, the disk needs to replicate the top 5 most popular objects to either its neighbors or through its regional manager to other regions. 4. If the disk receives replication requests from its neighbors, it will check whether it has a replica and check disk load, then decide whether to agree or not. 5. The disk will replace the object when needs more space using LRU.

Regional Manager Layer -- Metadata - Object layout, location information Because the creation of a new object must go to regional manager first, the regional manager can record all the initial object location information. In addition, both replica creation and deletion are required to register to the regional manager. - Device status information IP address, bandwidth, CPU utilization, available space Devices periodically send their status to the regional manager - Request information for each object the number of requests over a period of time for each region - Other regions information location of other regional managers, distance to other regional managers

Regional Manager Layer -- Policy Parameters: region_replication_threshold 1. The regional manager replicates an object as the “replicate_where”, “itinerary” entries associated with the object. 2. If a regional manager observes requests (open) to an object from one region exceed the region_replication_threshold, if the region is not the same as the regional manager’s, the regional manager will ask that regional manager to create a replica in that region, else the regional manager will find a disk in its region to create a new replica on it. 3. If a regional manager receives a request from devices in its region asking for creating a replica to a specific region, the regional manager will contact the regional manager of the specific region and returns the disk IP where to replicate to its disk. 4. If the regional manager receives a request from other regional manager to create a new replica in the region, it will first choose a disk to host the replica and tell the regional manager the disk IP.

The client queries the regional manager with the GUID of an object. - If the object is in the region the regional manager just randomly chooses one for the client. - If the object is not in the current region the regional manager applies the mechanism as in Oceanstore to locate the nearby regional manager that has the object. Then that regional manager randomly chooses one replica for the client. For client that pre-schedules object migration, the client knows the IP address of the disk or the region where the object is in. Replica selection

Compound object has a metadata entry listing all the GUID and Pointer pair of its sub-objects. Pointer is the IP address of the disk that hosts sub-object for the current compound object, it is different for compound objects on different disks. Every time a new replica for compound object is created, the entity that implements the replication will find out whether there is nearby sub-objects. If there is, the pointer is set to the IP address of nearby sub-objects. If not, the sub-objects will be replicated along with the compound objects. Compound Object

Experimental environment Parameters: Object number, Disk number, Region number, Client number Object size, Disk bandwidth, Regional manager bandwidth, Client bandwidth, Network delay (per hop) Request (open & read) generator Thresholds, time period to calculate request amount and frequency How to organize the region, disk, and object? Assume that at initial state, disks are geographically grouped into regions. The object is randomly scattered throughout the disks, and the object belongs to the region that the disk belongs to. Start with region number = 1.

- The average access time is greatly reduced - The network overhead (control message and object replication traffic) does not surpass the benefit (the reduced traffic due to access of nearby copy) - The ratio of storage overhead (metadata and the replicated objects) over total storage space is insignificant - The total number of replicas should be proportional with the request number of the replica - Under failure (by randomly removing the disks off the network), the availability of the system (the average success access over all access under different percentage of the live disk) is acceptable. - Scalability.The above factors (average access time, the ratio of network overhead over benefit, the storage overhead) increase linearly with the increase of the size of the system. The availability of the system should not be changed much under different system size. What to expect

- Refine the parameters - Refine the regional manager layer replica selection and add disk layer replica selection - Experiment with real workload for an application - Combine with the concurrency control work - Modify policies based on security limitation Future Work