Data Evolution: 101. Parallel Filesystem vs Object Stores Amazon S3 CIFS NFS.

Slides:



Advertisements
Similar presentations
+ Thriving at Petabyte scale & beyond
Advertisements

Ddn.com ©2012 DataDirect Networks. All Rights Reserved. GridScaler™ Overview Vic Cornell Application Support Consultant.
Distributed Hyperscale Collaborative Storage
WS2012 File System Enhancements: ReFS and Storage Spaces Rick Claus Sr. Technical WSV316.
IB in the Wide Area How can IB help solve large data problems in the transport arena.
11-May-15CSE 542: Operating Systems1 File system trace papers The Zebra striped network file system. Hartman, J. H. and Ousterhout, J. K. SOSP '93. (ACM.
Ceph: A Scalable, High-Performance Distributed File System
Ddn.com ©2012 DataDirect Networks. All Rights Reserved. Object storage in Cloud Computing and Embedded Processing Jan Jitze Krol Systems Engineer.
Seagate Hybrid Systems & Solutions
Network-Attached Storage
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
1 File Management in Representative Operating Systems.
Large Scale Sharing GFS and PAST Mahesh Balakrishnan.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
How to Cluster both Servers and Storage W. Curtis Preston President The Storage Group.
Module – 7 network-attached storage (NAS)
Quantum Confidential | LATTUS OBJECT STORAGE JANET LAFLEUR SR PRODUCT MARKETING MANAGER QUANTUM.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Reducing Risk with Cloud Storage. Dell Storage Forum 2011 Storage 2 Dells’ Definition of Cloud Demand driven scalability: up or down, just happens Metered:
CERN openlab Open Day 10 June 2015 KL Yong Sergio Ruocco Data Center Technologies Division Speeding-up Large-Scale Storage with Non-Volatile Memory.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Performance Testing of DDN WOS Boxes Shaun de Witt, Roger Downing Future of Big Data Workshop June 27 th 2013.
1 Advanced Storage Technologies for High Performance Computing Sorin, Faibish EMC NAS Senior Technologist IDC HPC User Forum, April 14-16, Norfolk, VA.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
MODULE – 8 OBJECT-BASED AND UNIFIED STORAGE
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Protect Your Business-Critical Data in the Cloud with SoftNAS, a Full-Featured, Highly Available Solution for the Agile Microsoft Azure Platform MICROSOFT.
Moving Large Amounts of Data Rob Schuler University of Southern California.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
© Hortonworks Inc HDFS: Hadoop Distributed FS Steve Loughran, ATLAS workshop, June 2013.
Ddn.com ©2012 DataDirect Networks. All Rights Reserved. The Future of Cloud Infrastructure Cloud Scale Storage Jean-Luc Chatelain EVP, Strategy and Technology.
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
Ceph: A Scalable, High-Performance Distributed File System
HDFS (Hadoop Distributed File System) Taejoong Chung, MMLAB.
Introduction to HDFS Prasanth Kothuri, CERN 2 What’s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand.
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Accelerating High Performance Cluster Computing Through the Reduction of File System Latency David Fellinger Chief Scientist, DDN Storage ©2015 Dartadirect.
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
Review CS File Systems - Partitions What is a hard disk partition?
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Tackling I/O Issues 1 David Race 16 March 2010.
RobuSTore: Performance Isolation for Distributed Storage and Parallel Disk Arrays Justin Burke, Huaxia Xia, and Andrew A. Chien Department of Computer.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
DDN Web Object Scalar for Big Data Management Shaun de Witt, Roger Downing (STFC) Glenn Wright (DDN)
RozoFS Architecture Overview: RozoFS components edition /01/2015.
1 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events.
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
Transparent Cloud Tiering
Reducing Risk with Cloud Storage
Achieving the Ultimate Efficiency for Seismic Analysis
Efficient data maintenance in GlusterFS using databases
Flash Storage 101 Revolutionizing Databases
Steve Ko Computer Sciences and Engineering University at Buffalo
Introduction to Data Management in EGI
Geographically distributed storage over multiple universities
P2N: Cloud Control David Tarrant Ben O’Steen
Steve Ko Computer Sciences and Engineering University at Buffalo
GARRETT SINGLETARY.
Hadoop Technopoints.
Nolan Leake Co-Founder, Cumulus Networks Paul Speciale
Presentation transcript:

Data Evolution: 101

Parallel Filesystem vs Object Stores Amazon S3 CIFS NFS

Summary: Parallel Filesystem vs Object Stores Parallel FilesystemObject Store Client access methodNative RDMA Posix Client. Posix reads and writes, MPIIO API (plus gateways): PUT, GET, DELETE Object Types allowedFiles and DirectoriesObjects with defined structure (data, metadata, data protection policy, checksum) IO types supportedParallel IO from single clients, parallel IO from multiple clients Single stream IO from very large numbers of clients Geographical distributionUsually local with some limited WAN capability Global Filesystem metadata managament Usually use Inodes with pointers to data blocks and pointers to pointers No inodes, object ID returned to client which encodes object location in a hash

WOS Testing Limits System AttributesWOS 2.5 Maximum # of Unique Objects256 Billion Maximum Total Cluster Capacity (using 4TB HDDs)30.72 PB Maximum Aggregate R/W Performance (SAS HDDs)8M Object Reads; 2M Object Writes Latency (4KB Read; 2-way Replication, SAS HDD)9ms (Flash is even better) Latency (4KB Write; 2-way Replication, SAS HDD)25ms (Flash is even better) Maximum Object Size; Minimum Object Size5TB; 512B Maximum Number of Nodes / Cluster256 (2 nodes per WOS6000) Maximum Metadata Space per ObjectID64MB Maximum # of Storage Zones64 Maximum # of Replication/Protection Policies64 4

Extents-based Traditional File System Approach Ease of use is limited at Scale – FSCK challenges – Inode data structures – Fragmentation issues – Filesystem expansion tricky – No native understanding of distribution of data – RAID management and hot-spare management 5

WOS (Web Object Scaler) Not POSIX-based Not RAID-based No Spare Drives No inode references, no FAT, no extent lists No more running fsck No more volume management Not based on single-site/box architecture 6

DDN | WoS Software Stack ObjectAssure™ Erasure CodingReplication Engine WOS Policy Engine De-clustered Data Management & Fast Rebuild Self-Healing Object Storage Clustering Latency-Aware Access Manager WOS Core: Peer-to-Peer Object Storage WOS Cluster Management Utility Connectors User-Defined Metadata NFS & CIFS GRIDScaler HSM Android, iOS & S3 Multi-Tenancy Layer WOS API C++, Java, Python, PHP, HTTP, HTTP CDN Caching EXAScaler HSM API-based Integrate applications and devices more robustly Policy driven Manage truly via policy, rather than micromanaging mulitiple layers of traditional filesystems Global, Peer:Peer Distribute data across 100s of sites in one namespace Self-Healing Intelligent Data Management system recovers from failures rapidly and autonomously Data Protection Replicate and/or Erasure Code Small files, large files, streaming files Low seek times to get data WOS cacheing servers for massive streaming data Distributed Objects Object ID Management