CSE 598D Storage Systems, Spring 2007 Object Based Storage Presented By: Kanishk Jain.

Slides:



Advertisements
Similar presentations
Distributed Processing, Client/Server and Clusters
Advertisements

Scheduling in Web Server Clusters CS 260 LECTURE 3 From: IBM Technical Report.
Database Architectures and the Web
NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.
File Systems.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Ceph: A Scalable, High-Performance Distributed File System
Ceph: A Scalable, High-Performance Distributed File System Priya Bhat, Yonggang Liu, Jing Qin.
Allocation Methods - Contiguous
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
1 Object-Based Network Storage Systems Shang Rong Tsai DSLab Institute of Computer and Communication Department of Electrical Engineering National Cheng-Kung.
1 Chapter 2 Database Environment Transparencies © Pearson Education Limited 1995, 2005.
File Management Systems
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
University of Minnesota Digital Technology Center Thomas M. Ruwart
Chapter 2 Database Environment Pearson Education © 2014.
Naming Names in computer systems are used to share resources, to uniquely identify entities, to refer to locations and so on. An important issue with naming.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Chapter 10 Architectural Design
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
CSC271 Database Systems Lecture # 4.
Virtualization. Virtualization  In computing, virtualization is a broad term that refers to the abstraction of computer resources  It is "a technique.
Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.
MODULE – 8 OBJECT-BASED AND UNIFIED STORAGE
IT Infrastructure Chap 1: Definition
Intro to Architecture – Page 1 of 22CSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Introduction Reading: Chapter 1.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Component 4: Introduction to Information and Computer Science Unit 4: Application and System Software Lecture 3 This material was developed by Oregon Health.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
1 File Management Chapter File Management n File management system consists of system utility programs that run as privileged applications n Concerned.
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition Copyright © 2004 Pearson Education, Inc. Slide 2-1 Data Models Data Model: A set.
Ceph: A Scalable, High-Performance Distributed File System
Bayu Adhi Tama, M.T.I 1 © Pearson Education Limited 1995, 2005.
1Mr.Mohammed Abu Roqyah. Database System Concepts and Architecture 2Mr.Mohammed Abu Roqyah.
VMware vSphere Configuration and Management v6
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
File Systems cs550 Operating Systems David Monismith.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Chapter 2 Database Environment.
Review CS File Systems - Partitions What is a hard disk partition?
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Background Computer System Architectures Computer System Software.
W4118 Operating Systems Instructor: Junfeng Yang.
System Models Advanced Operating Systems Nael Abu-halaweh.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
File System Implementation
Grid Computing.
CHAPTER 3 Architectures for Distributed Systems
CSE 598D Storage Systems, Spring 2007 Object Based Storage
Storage Virtualization
A Survey on Distributed File Systems
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
Chapter 2 Database Environment Pearson Education © 2009.
An Introduction to Computer Networking
Database Environment Transparencies
Introduction To Distributed Systems
Database System Architectures
Lecture 4: File-System Interface
Presentation transcript:

CSE 598D Storage Systems, Spring 2007 Object Based Storage Presented By: Kanishk Jain

Introduction Object Based Storage ANSI T10 Object-based Storage Devices Standard storage object: a logical collection of bytes on a storage device, with well-known methods for access, attributes describing characteristics of the data, and security policies that prevent unauthorized access. “intelligent data layout”

Object Storage Interface OSD model is simply a rearrangement of existing data management functions OSD is a level higher than block access but one level below file access

Background – NAS sharing NAS being used to share files among a number of clients The files themselves may be stored on a fast SAN The file server is used to intermediate all requests and thus becomes the bottleneck !

Background – SAN sharing The files themselves are stored on a fast SAN (e.g., iSCSI) to which the clients are also attached While the file server is removed as a bottleneck, security is a concern !

Object-based storage security architecture Metadata managers grant capabilities to clients; clients present these capabilities to the devices on every I/O to ensure security Secure separation of control and data path !

Development of OSD Most initial work on object storage devices (OSD) was done at Parallel Data Lab at CMU Focused on developing underlying concepts in two closely related areas: NASD and Active Disks Proposed as part of same project as NASD Standardized by Storage Networking Industry Association (SNIA) in 2004.

OSD v/s Active Disks OSD standard only talks about the interface. It does not assume anything about the processing power at the disk. OSD intelligence is software/firmware running at the disk (no specifications for this) Processing power of an OSD can be scaled to meet the requirements of the functions an active disk

File System – Application side (User Component only) The OSD has the intelligence to perform basic data management functions such as space allocation, free space management etc., those functions are no longer part of the application-side file system. Thus the application side file system is reduced to a manager : an abstraction layer between user application and the OSD. Only provides security and backward compatibility

File System - On the Device (Storage Component) Workload offered to OSDs may be quite different from that of general-purpose file systems At the OSD level, objects typically have no logical relationship, presenting a flat name space General-purpose file systems, which are usually optimized for workloads exhibiting relatively small variable-sized files, relatively small hierarchical directories, and some degree of locality are not effective in this case

Object based File System Separation of metadata and data paths: Separate metadata servers (MDS) manage the directory hierarchy, permissions and file to object mapping. Distribution and replication of a file across a sequence of objects on many OSDs. Example files systems: Lustre, Panasas, Ceph

Some Optimizations in Ceph Partitioning the directory tree: To efficiently balance load, the MDS partition the directory tree across the cluster. A client guesses which metadata server is responsible for a file, and contacts that server to open the file. That MDS will forward the request to the correct MDS if necessary. Distribution and replication of a file across a sequence of objects on many OSDs. Limit on object size and use of regions: Ceph limits objects to a maximum size (e. g., 1MB), so files are a sequence of bytes broken into chunks on the maximum object size boundary. Since only the MDS hold the directory tree, OSDs do not have directory information to suggest layout hints for file data. Instead, the OSDs organize objects into small and large object regions, using small block sizes (e. g., 4KB or 8KB) for small objects and large block sizes (e. g. 50–100% of the maximum object size) for large objects. Use of a specialized mapping algorithm: A file handle returned by the metadata server describes which objects on which OSD contain the file data. A special algorithm, RUSH maps a sequence index to the OSD holding the object at that position in the sequence, distributing the objects in a uniform way.

Possible Performance Results OBFS outperforms Linux Ext2 and Ext3 by a factor of two or three, and while OBFS is 1/25 the size of XFS, it provides only slightly lower read performance and 10%–40% higher write performance

Possible Performance Results (contd..)

Database Storage Management Object attributes are also the key to giving storage devices an awareness of how objects are being accessed, so that it can use this information to optimize disk layout specific to the application. Database software often has very little detailed information about the storage subsystem Previous research took the view that a storage device can provide relevant characteristics to applications Device-specific information is known to the storage subsystem, and thus it is better-equipped to manage low-level storage tasks

Database Storage Management (contd..) Object attributes can contain information about the expected behavior of an object such as expected read/write ratio, access pattern (sequential vs. random), or expected size, dimension, and content of the object. Using OSD, a DBMS can inform the storage subsystem of the geometry of a relation, thereby passing responsibility for low-level data layout to the storage device. The dependency between the metadata and storage system/application is removed. This assists with data sharing between different storage applications

OSD Objects and Attributes

Scalability Scalability – what does that word really mean : Capacity: number of bytes, number of objects, number of files, …etc. OSD aggregation techniques will allow for hierarchical representations of more complex objects that consist of larger numbers of smaller objects. Performance: Bandwidth, Transaction rate, Latency. OSD performance management can be used in conjunction with OSD aggregation techniques to more effectively scale each of these three performance metrics and maintain required QoS levels on a per-object basis. Connectivity: number of disks, hosts, arrays, …etc. Since the OSD model requires self-managed devices and is transport agnostic the number of OSDs and hosts can grow to the size limits of the transport network. Geographic: LAN, SAN, WAN, …etc. Again, since the OSD model is transport agnostic and since there is a security model built into the OSD architecture, the geographic scalability is not bounded. Processing Power: OSD processing power can be scaled.

Other Advantages Manageability: OSD management model relies on self-managed, policy driven storage devices, that can be centrally managed and locally administered (i.e. central policies, local execution). Density: OSD on individual storage devices can optimize densities by abstracting the physical characteristics of the underlying storage medium Cost: address issues such as $/MB, $/sqft, $/IOP, $/MB/sec, TCO, …etc. Adaptability: to changing applications. Can the OSD be repurposed to different uses such as from a film editing station to mail serving? Capability: can add functionality for different applications. Can additional functionality be added to an OSD to increase its usefulness?

Other Advantages (contd..) Availability: Fail-over capabilities between cooperating OSD devices. 2-way failover versus N-way failover? Reliability: Connection-integrity capabilities Serviceability: Remote monitoring, remote servicing, hot-plug capability, genocidal sparing. When an OSD dies and a new one is put in it’s place, how does it get “rebuilt”? How automated is the service process? Interoperability: Supported by many OS vendors, file system vendors, storage vendors, middleware vendors. Power: decrease the power per unit volume by relying on the policy-driven self management schemes to “power down” objects (i.e. move them to disks and spin those disks down).

Cluster Computing Traditionally 'divide-and-conquer' approach, decomposing the problem to be solved into thousands of independently executed tasks using a problem's inherent data parallelism--identifying the data partitions that comprise the individual task, then distributing each task and corresponding partition to the compute nodes for processing. Data from a shared storage system is staged (copied) to the compute nodes, processing is performed, and results are de-staged from the nodes back to shared storage when done. In many applications, the staging setup time can be appreciable-up to several hours for large clusters.

OSD for Cluster Computing Object-based storage clustering is useful in unlocking the full potential of these Linux compute clusters. Intrinsic ability to linearly scale in capacity and performance to meet the demands of the supercomputing applications. High bandwidth parallel data access between thousands of Linux cluster nodes and a unified storage cluster over standard TCP/IP networks.

Commercial Products

OSD Commands