GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 1 Mass Storage at CERN GDB meeting, 12. January 2005.

Slides:



Advertisements
Similar presentations
Bernd Panzer-Steindel, CERN/IT WAN RAW/ESD Data Distribution for LHC.
Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
LHC experimental data: From today’s Data Challenges to the promise of tomorrow B. Panzer – CERN/IT, F. Rademakers – CERN/EP, P. Vande Vyvre - CERN/EP Academic.
6/2/2015Bernd Panzer-Steindel, CERN, IT1 Computing Fabric (CERN), Status and Plans.
18. November 2003Bernd Panzer, CERN/IT1 LCG Internal Review Computing Fabric Overview.
12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
BACKUP/MASTER: Immediate Relief with Disk Backup Presented by W. Curtis Preston VP, Service Development GlassHouse Technologies, Inc.
Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Bernd Panzer-Steindel, CERN/IT 2 * 50 Itanium Server (dual 1.3/1.5 GHz Itanium2, 2 GB mem) High Througput Prototype (openlab + LCG prototype) (specific.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
POW : System optimisations for data management 11 November 2004.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
May PEM status report. O.Bärring 1 PEM status report Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.
Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
Storage and Storage Access 1 Rainer Többicke CERN/IT.
LFC Replication Tests LCG 3D Workshop Barbara Martelli.
S.Jarp CERN openlab CERN openlab Total Cost of Ownership 11 November 2003 Sverre Jarp.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
23.March 2004Bernd Panzer-Steindel, CERN/IT1 LCG Workshop Computing Fabric.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
NA62 computing resources update 1 Paolo Valente – INFN Roma Liverpool, Aug. 2013NA62 collaboration meeting.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,
CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.
19. November 2007Bernd Panzer-Steindel, CERN/IT1 CERN Computing Fabric Status LHCC Review, 19 th November 2007.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
26. Juni 2003Bernd Panzer-Steindel, CERN/IT1 LHC Computing re-costing for for the CERN T0/T1 center.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
15.June 2004Bernd Panzer-Steindel, CERN/IT1 CERN Mass Storage Issues.
CASTOR: possible evolution into the LHC era
ALICE Computing Data Challenge VI
Integrating Disk into Backup for Faster Restores
WP18, High-speed data recording Krzysztof Wrona, European XFEL
OpenLab Enterasys Meeting
Robotics and Tape Drives
NL Service Challenge Plans
IT-DB Physics Services Planning for LHC start-up
Tape Drive Testing.
PC Farms & Central Data Recording
LHC experiments Requirements and Concepts ALICE
Service Challenge 3 CERN
Experiences with Large Data Sets
Experiences and Outlook Data Preservation and Long Term Analysis
Bernd Panzer-Steindel, CERN/IT
Luca dell’Agnello INFN-CNAF
The INFN TIER1 Regional Centre
Bernd Panzer-Steindel, CERN/IT
Castor services at the Tier-0
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
LHC Computing re-costing for
Ákos Frohner EGEE'08 September 2008
The INFN Tier-1 Storage Implementation
Bernd Panzer-Steindel CERN/IT
OffLine Physics Computing
Lee Lueking D0RACE January 17, 2002
Presentation transcript:

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 1 Mass Storage at CERN GDB meeting, 12. January 2005

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 2  StorageTek (STK)  10 Powderhorn Silos, distributed over two physical sites (5 silos each)  Each silo with 6000 tape slots  Today 10 PB maximum capacity  9940B tape drives 30 MB/s read/write speed 200 GB cartridges  54 drives installed for physics production and 8 for Backup procedures  Still 10 * 9840 drives in production Tape Storage Installation

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 3 Disk Storage Installation  NAS disk server connected via Gigabit Ethernet 9single connection per server)  1 – 3 Terabyte per node (usable space)  One large file system (striped+RAID5) or multiple file systems with mirrored disks  ~400 disk server nodes  ~450 terabytes of space (usable, RAID config)  ~6000 single disks (75 – 200 GB)  Spread over ~80 separated disk pools (aggregation of file systems and server)

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 4 Mass Storage Software  CASTOR  CERN development  new version in final testing phase  Today ~30 million files and ~4 PB of data (15% LHC experiments)  ~4.5 FTE development team  ~7 FTE operation team (disk, tape, Castor) high level, not sysadmin

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 5 There is a parallel installation of networks, which are interconnected Network Installation Fast Ethernet, 100 Mbit/s Gigabit Ethernet, 1000 Mbit/s WAN Disk ServerTape Server CPU Server Production Backbone Multiple Gigabit Ethernet, 20 * 1000 Mbit/s 2-3 * Gigabit Ethernet, 1000 Mbit/s10Gbit New Backbone Data Challenges 10Gbit

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 6 Tape System performance (I) Number of tape mounts per week in the two silo installations >50000 per week for an average of 40 active tape drives reaching the limits of the robotics locality of tape drives and cartridges in connected silos

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 7 Tape System performance (II) relating this to the #mounts plots --> writing tapes is much more efficient than reading them factor 10 less mounts for a writing a TB relative to reading a TB

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 8 aggregate (~300 disk servers) read and write speed over the last 10 month aggregate (~50 tape server) read and write speed over the last 10 month short time period peaks are averaged-out here. there were half day peaks of twice the shown speeds some disk pools used at 80% and some at 5% Storage performance the existing nodes are not used to their full capacity inefficiency problems --> application features  access patterns

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT main activities  Run the service data challenges : 300 MB/s and 500 MB/s WAN  Mass Storage disk only and tape storage included  Run the local ALICE-IT data challenge, 750 MB/s disk+tape  Handle the experiment production load continuously over the year  impact of service data challenges, no extra tape resources in 2005  Understand much better the limitations and boundary conditions for tape-disk storage systems  Prepare the purchase of a new tape storage system in 2006

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 10 Resources for the 500 MB/s DC at CERN data generators, CPU server tape server disk server GridFTP server 10 Gbit WAN To run with 500 MB/s one needs about :  25 tape server (one tape drive each)  60% efficiency (= large files)  25 disk server with each 2 TB space  24h buffer)  15 gridFTP server  50% efficiency  20 data generator nodes the system needs to be part of the standard activities, so that the ‘extra’ man-power needed is acceptable. T0 buffer

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 11 parallel setup of tests and production : 1. high throughput cluster for data challenges with 10 GBit connections our new backbone construction will be ‘folded’ in during Q the current production system (GridFTP service) will continue in our production backbone (up to ~ 2 Gbit) 3. when the new backbone is stable, point 2 with enhanced network capacity will move over production and test setups will essentially exist ‘forever’ in parallel Setup

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 12 High Througput Prototype (openlab + LCG prototype) 80 * IA32 CPU Server (dual 2.4 GHz P4, 1 GB mem.) 36 Disk Server (dual P4, IDE disks, ~ 1TB disk space each) 4 * GE connections to the backbone 10GE WAN connection 10GE 4 *ENTERASYS N7 10 GE Switches 2 * Enterasys X-Series 28 TB, IBM StorageTank 2 * 50 Itanium 2 (dual 1.3/1.5 GHz, 2 GB mem) 10 GE per node 1 GE per node 12 Tape Server STK 9940B 10GE 80 IA32 CPU Server (dual 2.8 GHz P4, 2 GB mem.) 12 Tape Server STK 9940B 40 * IA32 CPU Server (dual 2.4 GHz P4, 1 GB mem.) 10GE 24 Disk Server (P4, SATA disks, ~ 2TB disk space each)

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT main activities  Run the service data challenges : 300 MB/s and 500 MB/s WAN  Mass Storage disk only and tape storage included  Run the local ALICE-IT data challenge, 750 MB/s disk+tape  Handle the experiment production load continuously over the year  impact of service data challenges, no extra tape resources in 2005  Understand much better the limitations and boundary conditions for tape-disk storage systems  Prepare the purchase of a new tape storage system in 2006

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 14 speed of the robot distribution of tapes in silos (at the time of writing the data) number of running batch jobs internal organization of jobs ( exp.) (e.g. just request file before usage) priority policies (between exp. and within exp.) CASTOR scheduling implementation tape drive speed tape drive efficiency disk server filesystem OS + driver CASTOR database performance CASTOR load balancing mechanism monitoring Fault Tolerance disk server optimization data layout on disk exp. policy access patterns (exp.) performance overall file size bugs and features Mass Storage Performance First set of parameters defining the access performance of an application to the mass Storage system

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 15 Tape efficiency effects Analyzing the Lxbatch inefficiency trends  wait time due to tape queues stager hits #files limit

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 16 Average file size on disk ATLAS 43 MB ALICE 27 MB CMS 67 MB LHCb 130 MB COMPASS 496 MB NA48 93 MB large amounts < 10MB Example : File sizes

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 17 file size [MB] efficiency [%] Analytical calculation of tape drive efficiencies tape mount time ~ 120 s file overhead ~ 4.4 s average # files per mount ~ 3 large # of batch jobs requesting files, one-by-one

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 18 Efficiency improvements Currently quite some effort is put into the analysis of all the available monitoring information to understand much better the influence of the different parameters on the overall performance. the goal is to be able to calculate the cost of data transfers from tape to the application  CHF per MB/s for volume of X TB combination of problems example : small files + randomness of access possible solutions :  concatenation of files on application or MSS level  extra layer of disk cache, Vendor or ‘home-made’  hierarchy of fast and slow access tape drives  very large amounts of disk space  ……….

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT main activities  Run the service data challenges : 300 MB/s and 500 MB/s WAN  Mass Storage disk only and tape storage included  Run the local ALICE-IT data challenge, 750 MB/s disk+tape  Handle the experiment production load continuously over the year  impact of service data challenges, no extra tape resources in 2005  Understand much better the limitations and boundary conditions for tape-disk storage systems  Prepare the purchase of a new tape storage system in 2006

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 20 Dataflow local CERN Fabric 2007 Online Filter Farm (HLT) Tape Storage Raw Data EST Data Reconstruction Farm Analysis Farm Calibration Farm Tier 1 Data Export Tape Storage Disk Storage Raw Data EST Data AOD Data EST Data AOD Data Calibration Data Raw Data Calibration Data Raw Data Calibration Data ‘permanent’ Disk Storage ‘permanent’ Disk Storage ‘permanent’ Disk Storage ‘permanent’ Disk Storage Raw Data Calibration Data Complex organization with high data rates (~10 GBytes/s) and ~100k streams in parallel

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 21 Concentrate on linear tape technology, not helical scan (Exabyte, AIT..) Today’s choices : IBM 3592 drives  40 MB/s and 300 GB cartridges 25 KCHF and 0.7 CHF/GB StorageTek 9940B drives  30 MB/s and 200 GB cartridges 35 KCHF and 0.6 CHF/GB LTO consortium (HP, IBM, Certance) LTO-2 drives  20 MB/s and 200 GB cartridges 15 KCHF and 0.4 CHF/GB (decreased by factor 2 during last 18 month) LTO-3 drives are available since about 3 weeks, MB/s and 400 GB cartridges  0.4 CHF/GB the standalone drive costs about 6 KCHF, modified robotic drives are available 3-6 month later and the modifications (fibre channel, extra mechanics,etc.) adds another ~ 10 KCHF to the cost the error on the drive costs is up to 50%, media costs varies by 10% Tape storage for (I)

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 22 STK : 9940B, 200 GB cassettes, 30 MB/s speed, today STK : STKA, 500 GB cassettes, 120 MB/s speed, mid 2005 STK : STKB, 1000 GB cassettes, 240 MB/s speed, beg ? IBM : 3592A, 300 GB cassettes, 40 MB/s speed, today IBM : 3592B, 600 GB cassettes, 80 MB/s speed, beg LTO : LTO2, 200 GB cassettes, 20 MB/s speed, today LTO : LTO3, 400 GB cassettes, 60 MB/s speed, mid 2005 LTO : LTO4, 800 GB cassettes, 120 MB/s speed, beg Boundaries :  would like to ‘see’ the drive for about one year in the market  5 years max lifetime for a drive technology  one year overlap of old and new tape drive installations  have a new service ready for 2007  not easy to achieve Tape storage for (II)

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 23 There are very little choices in the large robotic tape storage area cartridge silo including robotics ~ 1 MCHF % (the best, flexible in the market is currently probably the STK 8500 tape library) an old STK powderhorn silo (5500 slots) costs about 200 KCHF, but does not support LTO or the new IBM drives to be considered : single large installation or distributed, separate installations locality of data, load balancing for reading is defined at writing time regular physical movement of tapes is not really an option the pass-through mechanism between silos has still ‘locality’ restrictions  one move of a tape = one mount of a tape prices of robots and drives have large error margins (50%), because these are non-commodity products and depend heavily on the negotiations with the vendors (level of discount) Tape storage for (III) more details here:

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 24 Summary Lots of challenges…….. Need to do plenty of things in parallel which are correlated (new backbone, new tape storage, understand and test MSS tape  disk relation, data challenges, production….). Service data challenges must be part of the ‘standard’ IT production schemes as fast as possible to reduce the need for larger extra man-power (they need of course extra resources, as these are new services…). The data challenges will in 2005 interfere heavily with productions, as large amounts of tape resources are needed over longer periods. Understanding the tape-to/from-application dataflow is key for the costing and sizing of the systems in 2007, need to be ‘fixed’ in 2005  TDR, experiment computing models