New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 1 Presenter: Patrick Fuhrmann dCache.org Patrick Fuhrmann, Paul Millar,

Slides:



Advertisements
Similar presentations
Computing Infrastructure
Advertisements

Data Storage Solutions Module 1.2. Data Storage Solutions Upon completion of this module, you will be able to: List the common storage media and solutions.
NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
High Performance Computing Course Notes High Performance Storage.
1.1 Installing Windows Server 2008 Windows Server 2008 Editions Windows Server 2008 Installation Requirements X64 Installation Considerations Preparing.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
3D Computer Rendering Kevin Ginty Centre for Internet Technologies
Storage Networking Technologies and Virtualization Section 2 DAS and Introduction to SCSI1.
Module – 7 network-attached storage (NAS)
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
Implementing Failover Clustering with Hyper-V
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Storage Area Networks The Basics. Storage Area Networks SANS are designed to give you: More disk space Multiple server access to a single disk pool Better.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
High Performance Computing G Burton – ICG – Oct12 – v1.1 1.
Nexenta Proprietary Global Leader in Software Defined Storage Nexenta Technical Sales Professional (NTSP) COURSE CONTENT.
TPT-RAID: A High Performance Multi-Box Storage System
Best Practices for Backup in SAN/NAS Environments Jeff Wells.
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
Planning and Designing Server Virtualisation.
DAC-FF The Ultimate Fibre-to-Fibre Channel External RAID Controller Solution for High Performance Servers, Clusters, and Storage Area Networks (SAN)
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Guide to Linux Installation and Administration, 2e1 Chapter 2 Planning Your System.
Block1 Wrapping Your Nugget Around Distributed Processing.
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
Evolution, by tackling new challenges| CHEP 2015, Japan | Patrick Fuhrmann | 16 April 2015 | 1 Patrick Fuhrmann On behave of the project team Evolution,
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
Switched Storage Architecture Benefits Computer Measurements Group November 14 th, 2002 Yves Coderre.
Large Scale Parallel File System and Cluster Management ICT, CAS.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
VMware vSphere Configuration and Management v6
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.
Mr. P. K. GuptaSandeep Gupta Roopak Agarwal
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
STORAGE ARCHITECTURE/ MASTER): Where IP and FC Storage Fit in Your Enterprise Randy Kerns Senior Partner The Evaluator Group.
Evaluating the performance of Seagate Kinetic Drives Technology and its integration with the CERN EOS storage system Ivana Pejeva openlab Summer Student.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Load Rebalancing for Distributed File Systems in Clouds.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Automating Installations by Using the Microsoft Windows 2000 Setup Manager Create setup scripts simply and easily. Create and modify answer files and UDFs.
Supporting the Scientific Data Lifecycle | ISGC 2015, Taipei | Patrick Fuhrmann | 17 March 2015 | 1 Patrick Fuhrmann On behave of the project team Supporting.
1 Thierry Titcheu Chekam 1,2, Ennan Zhai 3, Zhenhua Li 1, Yong Cui 4, Kui Ren 5 1 School of Software, TNLIST, and KLISS MoE, Tsinghua University 2 Interdisciplinary.
By Harshal Ghule Guided by Mrs. Anita Mahajan G.H.Raisoni Institute Of Engineering And Technology.
DDN Web Object Scalar for Big Data Management Shaun de Witt, Roger Downing (STFC) Glenn Wright (DDN)
Sync n share and QoS in dCache | NEC 2015, Montenegro| Patrick Fuhrmann | 02 Oct 2015 | 1 Patrick Fuhrmann On behave of the dCache team by especially Tigran.
Network-Attached Storage. Network-attached storage devices Attached to a local area network, generally an Ethernet-based network environment.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Storage Area Networks The Basics.
Scalable sync-and-share service with dCache
Vladimir Stojanovic & Nicholas Weaver
Bernd Panzer-Steindel, CERN/IT
Experience of Lustre at a Tier-2 site
Introduction to Networks
Introduction to Networks
Direct Attached Storage and Introduction to SCSI
High-Performance Storage System for the LHCb Experiment
Windows Virtual PC / Hyper-V
Cost Effective Network Storage Solutions
CS 295: Modern Systems Organizing Storage Devices
Mario open Ethernet drive architecture Introducing a new technology from HGST Mario Blandini
Presentation transcript:

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 1 Presenter: Patrick Fuhrmann dCache.org Patrick Fuhrmann, Paul Millar, Tigran Mkrtchyan DESY Yves Kemp HGST Christopher Squires New directions in storage, hard-disks with build-in networking

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 2 Objectives Current storage systems are composed of sophisticated building blocks: large file servers, often equipped with special RAID controllers Are there options to build storage systems based on small, independent, low-care (and low-cost) components? Are there options for a massive scale-out using different technologies? Are there options for moving away from specialised hardware components to software components running on standard hardware? What changes would be needed to software and operational aspects, and how would the composition of TCO change?

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 3 The actual storage system must be able to handle independent data nodes. – Like CEPH and dCache It must support protocols, allowing massive scale-out (no bottlenecks) – CEPH Ok (Client side proprietary driver) – dCache through NFS 4.1/pNFS, GridFTP, WebDAV It must support protection against failures and support data integrity features. – CEPH and dCache continue operation if data nodes fail. – CEPH and dCache support data integrity checks. Building storage systems with small, independent data nodes

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 4 Selection As we are running dCache at DESY, we focused our investigation on dCache, for now, however, we will build a CEPH cluster soon. CEPH has already been successful ported to a HGST setup. We believe that a merge of CEPH as storage backend and dCache as protocol engine would be exactly the right solution for DESY.

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 5 dCache design dCache Head Node MDS (Name Space Ops) Data I/O Operations dCache pool (data) nodes Number Crunching Nodes

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 6 Available ‘small’ systems DELL C5000 blades, HP moonshot, Panasas, … – Those still share quite some infrastructure – Often more than one disk per CPU. Too large for a simple system, but to small for a serious RAID system. The extreme setup would be: – One disk is one system – Small CPU with limited RAM – Network connectivity included. HGST and Seagate announced such a device – Equipped with a standard disk drive, an ARM CPU and a network interface. – We have chosen the HGST Open Ethernet drive for our investigations.

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 7 HGST Open Ethernet Small ARM CPU with Ethernet piggybacked on regular Disk. Spec: – Any Linux (Debian on demo) – CPU 32-bit ARM, 512 KB Level 2 – 2 GB DRAM DDR-3 Memory 1792 MB available – Block storage driver as SCSI sda – Ethernet network driver as eth0 – dCache pool code working!

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 8 Photo Evidence Single Disk Enclosure

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 9 Professional DESY Setup Power supply HGST Disk Retaining Clip Yves

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 10 dCache Instance for CMS at DESY dCache instance for the CMS collaboration at DESY 199 File Servers in total – 172 with locally attached disks 12 disks: RAID-6 Varying file size – 27 with remote attached (fibre channel or SAS) – All file servers are equipped with 24 GB RAM and 1 or 10 GBit Ethernet dCache with 5 PBytes net capacity – Some dCache partitions are running in resilient mode, holding 2 copies of each file and reducing the total net capacity. Four head nodes for controlling and accounting – 2 CPU + 32 GByte RAM each 10 Protocol initiators (doors) – Simple CPU 16 GByte RAM

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 11 Potential setup with HGST Open Ethernet drive / iSCSI The standard 12 disks in RAID-6, attached via SAS to RAID controller card is replaced by : – One pizza-box with Intel CPU and performant network, connects to 12 HGST drives. – Each HGST drives acts as one iSCSI target – RAID-6 is build on the pizza-box in software. SAS Raid 6 Controller ** * CPU SAS Disks Ethernet Switch (iSCSI protocol) ** * CPU HGST Open Ethernet Disks

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 12 Potential setup with HGST Open Ethernet drive / CEPH The standard 12 disks in RAID-6, attached via SAS to RAID controller card is replaced with: – All HGST drives acting as one large object store – Pizza Boxes act as CEPH clients can can run higher level services, e.g. dCache Storage System.

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 13 SAS Raid 6 Controller ** * CPU Running CEPH plus high level services e.g. dCache pool SAS Disks * N CPU Ethernet Switch (CEPH protocol) ** * CPUs Running High level services e.g. dCache HGST Open Ethernet Disks running CEPH ** * 13M CPU Potential CEPH deployment

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 14 SAS Raid 6 Controller ** * CPU Running dCache pool. SAS Disks * N CPU Ethernet Switch ** * CPUs Running dCache head nodes HGST Open Ethernet Disks running dCache pool node ** * 13M CPU Potential dCache deployment

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 15 dCache composed of HGST disks (A quantitative Investigation) Qualitative comparison between a setup based on HGST and RAID dCache pool nodes. Assumptions: – All files identical in size (4 GBytes) – Streaming read only – Files are equally distributed (flat, random) Initial measurement: Total bandwidth versus number of concurrent streams. – Using ioZone (stream, read) – Applying a fit to the measured data.

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 16 Comparison: Ethernet Disk - RAID Number of concurrent sequential read operations Total Bandwidth (MB/s) mean value fit HGST (single drive) 1 / n fit RAID (12 drives) 1 / n fit mean value fit Measured

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 17 HGST Disk dCache: Qualitative Simulating and comparing two 4 PByte dCache instances – A) 100 RAID servers (12 disks a 4 TB), single file copy. – B) 2000 HGST disks a 4 TB, two file copies dCache only supports replication of entire files (no erasure code). Computing total bandwidth for a varying number of streams. – IOzone measurements from previous slides are used. – Counting streams/servers and looking up bandwidth to be added up. Simulation

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 18 Simulation Total Bandwidth (GB/s) Number of concurrent read operations HGST x RAID

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 19 Real Monitor RAID setup would have best performance  75 Streams Photon dCache Typically << 100 simultaneous reads HGST setup would have best performance  8000 streams CMS dCache Typically >> 1000 simultaneous reads

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 20 Total Cost of Ownership (some considerations) What?100 RAID2000 HGSTComment Network: Ports 100 Network ports2000/60=133 portsSmall overhead Network: IP 100 IP addresses2000 IP addressesPublic IPv4 NO Private IPv4 & IPv^6 OK Power 1200 disks Intel CPU RAID controller 2000 disks ARM CPU switches Roughly similar power consumption to be expected Space 100x2U = 200 U (usual DESY setup) 2000/60x4 U = 130 U Potentially two computing rooms Caveat: Other form factor exists for RAID which are denser Management 100 Systems, with RAID controller 2000 systems, without RAID controller Anyway, need a scaling life-cycle management system Operations 100x12 disks in RAID6, no resilience 2000 disks with resilienceHGST setup: No RAID rebuild times, no need for timely change of defective hardware, just re- establishing resilience At the time being, neither the price nor the delivery units are known. Assumption for a HGST product: – 4U box with 60 disks and internal switch – 4 * 10 GE network Reminder: dCache only supports full replica (Erasure codes would be more efficient)

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 21 Summary Performance: – To preserve a high overall throughput with a high number of streams, many single storage units are preferred. – If the focus in on the single stream performance of a medium or small number of streams, RAID system would be preferable. As an example for small storage units we selected the HGST Open Ethernet drive Operations considerations: – A dCache setup with two copies as data integrity method is roughly similar in TCO compared to traditional RAID system setup – A setup based e.g. on CEPH with more advanced and efficient data integrity methods would clearly shift the TCO towards an advantage for the HGST setup compared to traditional RAID system setup

New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 22 Thanks Further reading: HGST: dCache: DESY: