OSD: Storage Substrate for the Enterprise and … the Grid Feng Wang Department of Computer Science University of Minnesota.

Slides:



Advertisements
Similar presentations
Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
Advertisements

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Ceph: A Scalable, High-Performance Distributed File System Priya Bhat, Yonggang Liu, Jing Qin.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Goal: To build a ubiquitous and robust storage infrastructure Requirement: Scalability, availability, performance, robustness Solution: Dynamic object.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
OSD Metadata Management
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Object Naming & Content based Object Search 2/3/2003.
Introspective Replica Management Yan Chen, Hakim Weatherspoon, and Dennis Geels Our project developed and evaluated a replica management algorithm suitable.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
Caching And Prefetching For Web Content Distribution Presented By:- Harpreet Singh Sidong Zeng ECE Fall 2007.
Grid Computing, B. Wilkinson, 20046c.1 Globus III - Information Services.
What Can Databases Do for Peer-to-Peer Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suciu Presented by: Ryan Huebsch CS294-4 P2P Systems.
Two-Tier Architecture of OSD Metadata Management Xianbo Zhang, Keqiang Wu 11/11/2002.
7/15/2015ROC/OceanStore Winter Retreat Introspective Replica Management in OceanStore Dennis Geels.
Client-Server Computing in Mobile Environments
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
GRID COMPUTING: REPLICATION CONCEPTS Presented By: Payal Patel.
Windows ® Powered NAS. Agenda Windows Powered NAS Windows Powered NAS Key Technologies in Windows Powered NAS Key Technologies in Windows Powered NAS.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
©Kwan Sai Kit, All Rights Reserved Windows Small Business Server 2003 Features.
On P2P Collaboration Infrastructures Manfred Hauswirth, Ivana Podnar, Stefan Decker Infrastructure for Collaborative Enterprise, th IEEE International.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
OSG Public Storage and iRODS
Managing Service Metadata as Context The 2005 Istanbul International Computational Science & Engineering Conference (ICCSE2005) Mehmet S. Aktas
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
5.1 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
Distributed File Systems
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
File and Object Replication in Data Grids Chin-Yi Tsai.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
Chapter 10: File-System Interface Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Jan 1, 2005 Chapter 10: File-System.
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
A Peer-to-Peer Approach to Resource Discovery in Grid Environments (in HPDC’02, by U of Chicago) Gisik Kwon Nov. 18, 2002.
Globus Replica Management Bill Allcock, ANL PPDG Meeting at SLAC 20 Sep 2000.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Ceph: A Scalable, High-Performance Distributed File System
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
Globus – Part II Sathish Vadhiyar. Globus Information Service.
Caching Consistency and Concurrency Control Contact: Dingshan He
Distributed Computing Systems CSCI 4780/6780. Scalability ConceptExample Centralized servicesA single server for all users Centralized dataA single on-line.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
OceanStore : An Architecture for Global-Scale Persistent Storage Jaewoo Kim, Youngho Yi, Minsik Cho.
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
The Data Grid: Towards an architecture for Distributed Management
A Replica Location Service
OGSA Data Architecture Scenarios
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
Advanced Operating Systems Chapter 11 Distributed File systems 11
Specialized Cloud Architectures
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Presentation transcript:

OSD: Storage Substrate for the Enterprise and … the Grid Feng Wang Department of Computer Science University of Minnesota

Requirements for the Enterprise and the Grid To provide a scalable, ubiquitous, robust storage infrastructure … –data objects must be replicated and migrated –replication: increased availability, greater performance –migration: lower latency, always available Research Issues –When/where to create/delete replicas? –When/where to migrate? –How to do replica selection? –How to replicate the object of objects? –How to keep objects consistent?

Solutions: State-of-the-Art –Coda –Globus –GDMP - Oceanstore

Coda File System Server replication - static replication - unit: volume - read-write replicas Lack of support of migration OSD: - dynamic replication - object migration - object replication

Coda File System Scalability - client caching - client selecting replica (ok for few server) - client propagating updates to all AVSGs OSD: - provide multi-level services for clients with different storage resource and processing power - use distributed intelligent storage devices, to alleviate the burden of client, and the server

Globus Replica catalog – provide location service - based on LDAP Replica management - replica creation, deletion, selection Reliable replication – GridFTP replication of large scientific data set Centralized, hierarchical -> decentralized, p2p OSD: - a device is like a site in Globus - no active object in Globus

Metadata Catalog Replica Catalog Tape Library Disk Cache Attribute Specification Logical Collection and Logical File Name Disk Array Disk Cache Application Replica Selection Multiple Locations NWS Selected Replica Performance Information and Predictions Replica Location 1Replica Location 2Replica Location 3 MDS Reliable Transport Reliable Replication A Model Architecture for Data Grids

GDMP GDMP - Grid Data management Pilot - asynchronous replication with a subscription model producer -> export catalog -> import catalog -> consumer - partial-replication model filter criteria e.g. only replicate file with “Muon” in the name or owned by user “Roy” - centralized replica catalog

GDMP static producer/consumer relationship file replication (one file may include multiple objects) based on globus replica catalog, GridFTP, GSI a central replica catalog for Grid OSD: object replication, dynamic replication, p2p environment for replica catalog.

OceanStore cluster recognition - identify and group closely related files - help prefetching replica management - dynamic replication system decides the number and location of floating replicas by monitoring client requests and system load system forwards a request to object’s parent node, and the parent can create additional floating replicas on nearby nodes - analyze global usage trends e.g., detect periodic migration of clusters from site to site and prefetch data based on these cycles OSD: - devices can participate and simplify this introspection process, - let owner/application to explicitly specify policies - two-tier vs. p2p

Limitation Server-based solutions –Ad-hoc (static) - lack of flexibility - need more management –Rely upon knowledge maintained solely at the server - do not scale well in performance and functionality –e.g. server-only based schemes may be bottlenecks - server modification required for new data objects –e.g. multimedia objects require QoS, what will be next? - point solutions –e.g. one kind of consistency –Offload too much work to client (like xFS) Oceanstore is not suitable for enterprise environment Globus does not consider active object, read-only data

Our Solution: OSD Data encapsulated as active objects - coupled with object-based meta-data is more scalable along both axes flexibility in functionality captured by object-based meta-data –e.g. object can define its own consistency or replication policy greater dynamic performance is possible –e.g. an object can request replication when demand grows without server intervention Migration and Dynamic replication - support mobility of clients, objects, devices - ease of management Differentiated service provided for different clients - e.g., PDA client vs. laptop client Help survive failures Read-write Consistency -access semantic of the object effects the number of replicas of the object.

Three-layer approach Object initiated: based on the metadata associated with the object e.g., object specifies if access to the object reach the “replicate_when” threshold, creates a replica; or object requires 5 replicas in the system Device initiated: based on local policy; only inside the region the device has better knowledge of its own load than the regional manager. If the access to one device reach the local threshold, device may initiate replication of popular objects to enforce load balancing. Regional manager initiated: based on global policy e.g., backup, mirroring

The Problem We will address in the context of OSD –How to explore this three layer architecture? –what meta-data is needed? –which meta-data is local to the object, which is on device, and which must be distributed to servers? –How to keep policies coherent? and avoid bad behaviors? –What is the mechanisms and policies to take advantage of this three layer?

Meta-data Strawman Meta-data (mandatory part) –replica: (x152338) –mobile: yes/no –consistency: Unix, Session, … –demand: requests/time –replication_when: demand >  –replication_where: region_list –time_to_live: 24 hours –access_semantic: read-only/read-most/single- writer/multiple-writer –contained objects: x343687, x –Preferred_transfer_speed: 10Mbps

Meta-data Strawman –version: 1.0 –location: device_ip_list –itinerary: (region, time), (region time), … –access_pattern:, –security-related attributes: –other_constraints: e.g., processing power of the device –storage_layout: –other data object specific attributes (non-mandatory) allow content-based searches and customized delivery User Profile can help set the value of these metadata entries. e.g., the mobility pattern of a client can decides the itinerary entry.

Metadata on a Device device description: bandwidth, available space,… the objects and their metadata the soft-state locations for all objects on itself - get the information from the regional manager - record it when the device create the replica Device stores the soft-state information about its neighbor devices - search the nearby neighbors by itself - learn from regional manager Thus the device can help during the period of regional manager failure

Metadata on Regional Manager Metadata of active object including replica location info support of attribute filtering Device status

Metadata on Client Replica location for read-only data, client can cache this information and can access object without contact the regional manager If the client has enough storage space, it can caches the object.

Scenarios: Migration Grid computing –I have a reservation for a supercomputer in remote region R 1 at time T 1 to finish at T 2 and this computation requires data object x123. The output will be a data object x934 that I want delivered back to my home region R home : itinerary for x123 has an entry (R 1, T 1 ) itinerary for x934 has an entry (R home, T 2 ) when x934 is created at time T 2 Personal computing –I have a local object x123 (R home ), and I am going to a conference on Tues- Thurs in region R 1 and want to operate on the object there. itinerary for x123 has entries (R 1, Tues), (R home, Thurs) if it is going on my laptop – add more entries Support user-directed migration such as these Research questions –develop object-directed migration policies that exploit meta-data e.g. object observes that many requests are coming from a particular region it self-migrates over there –space reservation –security

Scenarios: Replica Creation Collaboration of object, device and regional manager A company publishes a new video that needs to be read by every employee. The secretary stores this object and asks the system to disperse this object to all regions, and the devices holding the objects should at least be able to support 10 employees to read this object at 1Mbps simultaneously object:- QoS: 1Mbps - replication_where: all regions - estimated_traffic: 10Mbps regional manager: - according to the estimated traffic, find a proper device in this region to store this object - inform other regional managers to store this object inside their region device: - if the device cannot support the 1Mbps delivery of this object, it can create replicas of this object on nearby devices, or it can create replicas of other heavy-loaded objects on nearby devices based on local policy.

Scenarios: Replica Creation Research questions: - define the nearby neighbor - metrics that will effect replication security restriction device configuration access_semantic of object - define a way to register and locate replicas, i.e., location service - define a way to discover the available storage resources and their attributes, i.e., resource discovery - monitors for local traffic

Scenarios: Replica selection The application wants to get an object which can be transferred at least at 10Mbps and has lowest latency Combination of several ways: - client gets all replica information and makes the decision by itself - regional manager makes the decision for client - device that serves the request can redirect the request to other devices if it cannot satisfy the requirement or ask the regional manager to choose another replica - when one entity does replica selection, it can choose the “best” two replicas to let the client turn to another in case of failure Research issues: - Define a way to describe the replica and the application requirements in order to find the suitable replica

Scenarios: Failure Failure of a regional manager - client has cached the location of the object it needs - client may have cached the object it needs - client may choose a device to ask for an object, device can search itself, or ask its neighbors, it is like a peer to peer between devices Failure of a device - client can choose another one based on the cached object location - client may have cached the object - client can ask the regional manager to get another replica Induce more issues for security and consistency