CASPUR / GARR / CERN / CNAF / CSP New results from CASPUR Storage Lab Andrei Maslennikov CASPUR Consortium May 2003.

Slides:



Advertisements
Similar presentations
Archive Task Team (ATT) Disk Storage Stuart Doescher, USGS (Ken Gacke) WGISS-18 September 2004 Beijing, China.
Advertisements

Introduction to Storage Area Network (SAN) Jie Feng Winter 2001.
NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.
© 2006 EMC Corporation. All rights reserved. Network Attached Storage (NAS) Module 3.2.
Silicon Graphics, Inc. Cracow ‘03 Grid Workshop SAN over WAN - a new way of solving the GRID data access bottleneck Dr. Wolfgang Mertz Business Development.
Vorlesung Speichernetzwerke Teil 2 Dipl. – Ing. (BA) Ingo Fuchs 2003.
Storage area Network(SANs) Topics of presentation
SQL Server, Storage And You Part 2: SAN, NAS and IP Storage.
IP –Based SAN extensions and Performance Thao Pham CS 622 Fall 07.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Mass-Storage Systems Revised Tao Yang.
5/8/2006 Nicole SAN Protocols 1 Storage Networking Protocols Nicole Opferman CS 526.
Storage Area Network (SAN)
Storage Networking Technologies and Virtualization Section 2 DAS and Introduction to SCSI1.
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
National Energy Research Scientific Computing Center (NERSC) The GUPFS Project at NERSC GUPFS Team NERSC Center Division, LBNL November 2003.
Mass Storage System EMELIZA R. YABUT MSIT. Overview of Mass Storage Structure Traditional magnetic disks structure ◦Platter- composed of one or more.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.
Storage Area Networks The Basics. Storage Area Networks SANS are designed to give you: More disk space Multiple server access to a single disk pool Better.
Module 9 PS-M4110 Overview <Place supporting graphic here>
GeoVision Solutions Storage Management & Backup. ๏ RAID - Redundant Array of Independent (or Inexpensive) Disks ๏ Combines multiple disk drives into a.
Best Practices for Backup in SAN/NAS Environments Jeff Wells.
CASPUR SAN News Andrei Maslennikov Orsay, April 2001.
CASPUR Site Report Andrei Maslennikov Sector Leader - Systems Catania, April 2001.
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
Slide 1 DESIGN, IMPLEMENTATION, AND PERFORMANCE ANALYSIS OF THE ISCSI PROTOCOL FOR SCSI OVER TCP/IP By Anshul Chadda (Trebia Networks)-Speaker Ashish Palekar.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
CASPUR Site Report Andrei Maslennikov Lead - Systems Karlsruhe, May 2005.
A study of introduction of the virtualization technology into operator consoles T.Ohata, M.Ishii / SPring-8 ICALEPCS 2005, October 10-14, 2005 Geneva,
CASPUR / CERN / CSP / DataDirect / Panasas / RZ Garching Im Storage etwas Neues (Not All is Quiet on the Storage Front) Andrei Maslennikov CASPUR Consortium.
1 Selecting LAN server (Week 3, Monday 9/8/2003) © Abdou Illia, Fall 2003.
Sensitivity of Cluster File System Access to I/O Server Selection A. Apon, P. Wolinski, and G. Amerson University of Arkansas.
Block1 Wrapping Your Nugget Around Distributed Processing.
1.4 Open source implement. Open source implement Open vs. Closed Software Architecture in Linux Systems Linux Kernel Clients and Daemon Servers Interface.
1 U.S. Department of the Interior U.S. Geological Survey Contractor for the USGS at the EROS Data Center EDC CR1 Storage Architecture August 2003 Ken Gacke.
Wide Area Network Access to CMS Data Using the Lustre Filesystem J. L. Rodriguez †, P. Avery*, T. Brody †, D. Bourilkov *, Y.Fu *, B. Kim *, C. Prescott.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Storage Networking Evolution Jim Morin VP Strategic Planning June 2001.
CASPUR Storage Lab Andrei Maslennikov CASPUR Consortium Catania, April 2002.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
Performance tests of storage arrays Irina Makhlyueva ALICE DAQ group 20 September 2004.
Storage and Storage Access 1 Rainer Többicke CERN/IT.
LFC Replication Tests LCG 3D Workshop Barbara Martelli.
CASPUR Site Report Andrei Maslennikov Lead - Systems Amsterdam, May 2003.
Disk Farms at Jefferson Lab Bryan Hess
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
CASPUR Site Report Andrei Maslennikov Lead - Systems Rome, April 2006.
 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing,
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
EMC Proven Professional. Copyright © 2012 EMC Corporation. All Rights Reserved. NAS versus SAN NAS – Architecture to provide dedicated file level access.
Internet Protocol Storage Area Networks (IP SAN)
KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association STEINBUCH CENTRE FOR COMPUTING - SCC
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
SA1 operational policy training, Athens 20-21/01/05 Presentation of the HG Node “Isabella” and operational experience Antonis Zissimos Member of ICCS administration.
A Scalable Distributed Datastore for BioImaging R. Cai, J. Curnutt, E. Gomez, G. Kaymaz, T. Kleffel, K. Schubert, J. Tafas {jcurnutt, egomez, keith,
1.3 ON ENHANCING GridFTP AND GPFS PERFORMANCES A. Cavalli, C. Ciocca, L. dell’Agnello, T. Ferrari, D. Gregori, B. Martelli, A. Prosperini, P. Ricci, E.
© 2007 EMC Corporation. All rights reserved. Internet Protocol Storage Area Networks (IP SAN) Module 3.4.
26. Juni 2003Bernd Panzer-Steindel, CERN/IT1 LHC Computing re-costing for for the CERN T0/T1 center.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Ryan Leonard Storage and Solutions Architect
Video Security Design Workshop:
NL Service Challenge Plans
Introduction to Networks
Storage Networking Protocols
Scalable Database Services for Physics: Oracle 10g RAC on Linux
Microsoft Virtual Academy
Presentation transcript:

CASPUR / GARR / CERN / CNAF / CSP New results from CASPUR Storage Lab Andrei Maslennikov CASPUR Consortium May 2003

A.Maslennikov - May SLAB update2 Participated : CASPUR : M.Goretti, A.Maslennikov(*), M.Mililotti, G.Palumbo ACAL FCS (UK): N. Houghton GARR : M.Carboni CERN : M.Gug, G.Lee, R.Többicke, A.Van Praag CNAF: P.P.Ricci, S.Zani CSP Turin: R.Boraso Nishan (UK): S.Macfall (*) Project Coordinator

A.Maslennikov - May SLAB update3 Sponsors : E4 Computer : Loaned 6 SuperMicro servers (MBs and assembly) - excellent hardware quality and support – Italy Intel : Donated 12 x 2.8 GHz Xeon CPUs San Valley Systems : Loaned two SL1000 units - good remote CE support during tests ACAL FCS / Nishan : Loaned two 4300 units - active participation in tests, excellent support

A.Maslennikov - May SLAB update4 Contents Goals Components and test setup Measurements: - SAN over WAN - NAS protocols - IBM GPFS - Sistina GFS Final remarks Vendors’ contact info

A.Maslennikov - May SLAB update5 1.Feasibility study for a SAN-based Distributed Staging System 2.Comparison of the well-known NAS protocols on latest commodity hardware 3.Evaluation of the new versions of IBM GPFS and Sistina GFS as a possible underlying technology for a scalable NFS server. Goals for these test series

A.Maslennikov - May SLAB update6 1. Feasibility study for a SAN-based Distributed Staging System - Most of the large centers keep the bulk of the data on tapes and use some kind of disk caching (staging, HSM, etc) to access these data. - Sharing datastores between several centers is frequently requested, and this means that some kind of remote tape access mechanism should be implemented. - Suppose now that your centre has implemented a tape disk migration system. And you have to extend your system to allow it to access the data dislocated on remote tape drives. Let us see how this can be achieved. Remote Staging

A.Maslennikov - May SLAB update7 Solution 1: To access a remote tape file, stage it on a remote disk, then copy it via network to the local disk. Local Site Remote Site Remote Staging Disk Tape Disk Disadvantages: - 2-step operation: more time is needed, harder to orchestrate - Wasted remote disk space

A.Maslennikov - May SLAB update8 Solution 2: Use a “tape server”: a process residing on a remote host that has access to the tape drive. The data are read remotely and then “piped” via network directly to the local disk. Local Site Remote Site Remote Staging Disadvantages: - remote machine is needed - architecture is quite complex Tape Disk Tape Server

A.Maslennikov - May SLAB update9 Solution 3: Access the remote tape drive as a native device on SAN. Use it then as if it is a local unit attached to one of your local data movers. Local Site Remote Site Remote Staging Benefits: - Makes the staging software a lot simpler. Local field-tested solution applies. - Best performance guaranteed (provided the remote drive may be used locally at native speed) Tape Disk Tape SAN

A.Maslennikov - May SLAB update10 “Verify whether FC tape drives may be used at native speeds over the WAN, using the SAN-over-WAN interconnection middleware” - In 2002, we had already tried to reach this goal. In particular, we used the CISCO 5420 iSCSI appliance to access an FC tape over the 400 km distance. We were able to write at the native speed of the drive, but the read performance was very poor. - This year, we were able to assemble a setup which implements a symmetric SAN interconnection and hence used it to repeat these tests. Remote Staging

A.Maslennikov - May SLAB update11 2. Benchmark the well-known NAS protocols on a modern commodity hardware. - These tests we do on a regular basis, as we wish to know what performance we may currently count on, and how the different protocols compare on the same hardware base. - Our test setup was visibly more powerful than that of the last year, so we were expecting to obtain better numbers. - We were comparing two remote copy protocols: RFIO and Atrans (cacheless AFS), and two protocols that provide the transparent file access: NFS and AFS. NAS protocols

A.Maslennikov - May SLAB update12 3. Evaluate the new versions of IBM GPFS and Sistina GFS as a possible underlying technology for a scalable NFS server. - In 2002, we have already tried both GPFS and GFS. - GFS 5.0 has shown interesting performance figures, but we have observed several issues with it: unbalanced perfomance in case of multiple clients, exponential increase of load on the lock server with increasing number of clients. - GPFS 1.2 was showing a poor performance in case of concurrent writing on several storage nodes. - We used GFS and GPFS during this test session. Goal 3

A.Maslennikov - May SLAB update13 - High-end Linux units for both servers and clients 6x SuperMicro Superserver 7042M-6 and 2x HP Proliant DL380 with: 2 CPUs Pentium IV Xeon 2.8GHz SysKonnect 9843 Gigabit Ethernet NIC (fibre) Qlogic QLA2300 2Gbit Fibre Channel HBA Myrinet HBA - Disk systems 4x Infortrend IFT-6300 IDE-to-FC arrays: 12 x Maxtor DiamondMax Plus GB IDE disks (7200 rpm) Dual Fibre Channel outlet at 2 Gbit Cache: 256 MB Components

A.Maslennikov - May SLAB update14 - Tape drives 4x LTO/FC (IBM Ultrium 3580) - Network 12-port NPI Keystone GE switch (fibre) 28-port Dell 5224 GE switches (fibre / copper) Myricom Myrinet 8-port switch Fast geographical link (Rome-Bologna, 400km), with guaranteed throughput of 1 Gbit. - SAN Brocade 2400, 2800 (1Gbit) and 3800 (2Gbit) switches SAN Valley Systems SL1000 IP-SAN Gateway Nishan IPS 4300 multiprotocol IP Storage Switch Components -2

A.Maslennikov - May SLAB update15 New devices We were loaned two new objects, one from San Valley Systems, and one from Nishan Systems. Both units provide the SAN-over-IP interconnect function, and are suitable for wide-area SAN connectivity. Let me give some more detail of both units. Components -3

A.Maslennikov - May SLAB update16 San Valley Systems IP-SAN Gateway SL-700 / SL or 4 wirespeed Fibre Channel -to- Gigabit Ethernet channels - Uses UDP and hence delegates to the application the handling of a network outage - Easy in configuration - Allows for the fine-grained traffic shaping (step size 200 Kbit, 1Gb/s to 1Mb/s) and QoS - Connecting two SANs over IP with a pair of SL1000 units is in all aspects equivalent - to the case when these two SANs are connected with a simple fibre cable - Approximate cost: 20 KUSD/unit (SL-700, 1 channel) - 30 KUSD/unit (SL-1000, 4 channels) - Recommended number of units per site: 1 -

A.Maslennikov - May SLAB update17 Nishan IPS 3300/4300 multiprotocol IP Storage Switch - 2 or 4 wirespeed iFCP ports for SAN interconnection over IP - Uses TCP and is capable to seamlessly handle the network outages - Allows for traffic shaping at predefined bandwidth (8 steps,1Gbit- 10Mbit) and QoS - Impements an intelligent router function: allows to interconnect multiple fabrics from different vendors and makes them look as a single SAN - When interconnecting two or more separately managed SANs, maintains their independent administration - Approximate cost: 33 KUSD/unit (6 universal FC/GE ports + 2 iFCP ports - IPS 3300) 48 KUSD/unit (12 universal FC/GE ports + 4 iFCP ports - IPS 4300) - Recommended number of units per site: 2 (to provide redundant routing)

A.Maslennikov - May SLAB update18 CASPUR Storage Lab HP DL380 Bologna Gigabit IP (Bologna) FC SAN (Bologna) IPS 4300 SL1000 HP DL380 Rome SM 7042M-6 Myrinet Gigabit IP (Rome) SM 7042M-6 Disks Tapes FC SAN (Rome) IPS 4300 SL Gbit WAN, 400km

A.Maslennikov - May SLAB update19 Series 1: accessing remote SAN devices Disks Tapes FC SAN (Rome) IPS 4300 SL1000 FC SAN (Bologna) IPS 4300 SL Gbit WAN, 400km HP DL380 Bologna HP DL380 Rome

A.Maslennikov - May SLAB update20 We were able to operate at the wire speed (100 MB/sec over 400 km distance) with both SL-1000 and ISP 4300 units ! - Both middleware devices worked fairly well - We were able to operate with tape drives at the drive native speed (R and W): 15 MB/sec in case of LTO and 25 MB/sec in case of other faster drive - In case of disk devices we have observed a small (5%) loss of performance on writes and a more visible (up to 12%) loss on reads, on both units. - Several powerful devices grab the whole available bandwidth of the GigE - in case of Nishan (TCP-based SAN interconnection) we have witnessed a successful job completion after an emulated 1-minute network outage Conclusion: Distributed Staging based on a direct tape drive access is POSSIBLE. Series 1 - results

A.Maslennikov - May SLAB update21 Series 2 – Comparison of NAS protocols ServerClient Gigabit Ethernet W: 78 MB/sec R: 123 MB/sec Infortrend IFT6300 FC 2 Gbit

A.Maslennikov - May SLAB update22 Some settings: - Kernels on server: (RedHat 7.3, 8.0) - Kernel on client: (RedHat 9) - AFS : cache was set up on ramdisk (400MB) - used ext2 filesystem on server Problems encountered: - Poor array performance on reads with kernel Series 2 - details

A.Maslennikov - May SLAB update23 Write tests: - Measured average time needed to transfer 20 x 1.9 GB from memory on the client to the disk of the file server and vice versa including the time needed to run “sync” command on both client and the server at the end of operation: 20 x {dd if=/dev/zero of= bs=1000k count=1900} T=Tdd + max(Tsyncclient, Tsyncserver) Read tests: - Measured average time needed to transfer 20 x 1.9 GB files from a disk on the server to the memory on the client (output directly to /dev/null ). Because of the large number of files in use and the file size comparable with available RAM on both client and server machines, caching effects were negligible. Series 2 – more detail

A.Maslennikov - May SLAB update24 Series 2- current results (MB/sec) [SM GB RAM on server and client] WriteRead Pure disk78123 RFIO78111 NFS7780 AFS cacheless(Atrans)7059 AFS4830

A.Maslennikov - May SLAB update25 Series 3a – IBM GPFS 4 x IFT 6300 disk arrays SM 7042M-6 FC SAN NFS Myrinet

A.Maslennikov - May SLAB update26 GPFS installation: - GPFS version Kernel smp - Myrinet as server interconnection network - All nodes see all disks (NSDs) What was measured: 1) Read and Write transfer rates (memory GPFS file system) for large files 2) Read and Write rates (memory on NFS client GPFS exported via NFS) Series 3a - details

A.Maslennikov - May SLAB update27 Series 3a – GPFS native (MB/sec) R / W speed for a single disk array: 123 / 78 ReadWrite 1 node nodes nodes157122

A.Maslennikov - May SLAB update28 Series 3a – GPFS exported via NFS (MB/sec) 1 node exporting 2 nodes exporting 3 nodes exporting 3 clients6 clients9 clients Read Write client2 clients3 clients9 clients Read3544 Write clients4 clients6 clients Read Write90106

A.Maslennikov - May SLAB update29 Series 3b – Sistina GFS 4 x IFT 6300 disk arrays SM 7042M-6 FC SAN SM7042M-6 NFS Lock Server

A.Maslennikov - May SLAB update30 GFS installation: - GFS version Kernel: SMP gfs (may be downloaded from Sistina together with the trial distribution), includes all the required drivers. Problems encountered: - Kernel-based NFS daemon does not work well on GFS nodes (I/O ends in error). Sistina is aware of the bug and is working on that using our setup. We hence used user space NFSD in these tests, it was quite stable. What was measured: 1) Read and Write transfer rates (memory GFS file system) for large files 2) Same for the case (memory on NFS client GFS exported via NFS) Series 3b - details

A.Maslennikov - May SLAB update31 Series 3b – GFS native (MB/sec) NB: - Out of 5 nodes: 1 node was running the lock server process 4 nodes were doing only I/O ReadWrite 1 client clients clients clients R / W speed for a single disk array: 123 / 78

A.Maslennikov - May SLAB update32 Series 3b – GFS exported via NFS (MB/sec) 8 clients Read250 Write236 1 client2 clients4 clients8 clients Read Write clients6 clients9 clients Read Write node exporting NB: - User space NFSD was used 3 nodes exporting 4 nodes exporting

A.Maslennikov - May SLAB update33 - We are proceeding with the test program. Currently under test: new middleware from CISCO, new tape drive from Sony. We are expecting also a new iSCSI appliance from HP, and an LTO2 drive. - We are open for any collaboration. Final remarks

A.Maslennikov - May SLAB update34 - Supermicro servers for Italy E4 Computer Vincenzo Nuti - E4 Computer - FC over IP San Valley Systems John McCormack - San Valley Systems Nishan Systems Stephen Macfall - Systems ACAL FCS Nigel Houghton - ACAL FCS Vendors’ contact info