15.June 2004Bernd Panzer-Steindel, CERN/IT1 CERN Mass Storage Issues.

Slides:



Advertisements
Similar presentations
Bernd Panzer-Steindel, CERN/IT WAN RAW/ESD Data Distribution for LHC.
Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Chapter 11: File System Implementation
CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner.
12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Process Management A process is a program in execution. It is a unit of work within the system. Program is a passive entity, process is an active entity.
Lecture 11: DMBS Internals
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
DBI313. MetricOLTPDWLog Read/Write mixMostly reads, smaller # of rows at a time Scan intensive, large portions of data at a time, bulk loading Mostly.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
CASTOR: CERN’s data management system CHEP03 25/3/2003 Ben Couturier, Jean-Damien Durand, Olof Bärring CERN.
Storage and Storage Access 1 Rainer Többicke CERN/IT.
ITEC 502 컴퓨터 시스템 및 실습 Chapter 11-2: File System Implementation Mi-Jung Choi DPNM Lab. Dept. of CSE, POSTECH.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
Storage issues for end-user analysis Bernd Panzer-Steindel, CERN/IT 08 July
CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.
23.March 2004Bernd Panzer-Steindel, CERN/IT1 LCG Workshop Computing Fabric.
LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
A UK Computing Facility John Gordon RAL October ‘99HEPiX Fall ‘99 Data Size Event Rate 10 9 events/year Storage Requirements (real & simulated data)
01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT.
ALICE experiences with CASTOR2 Latchezar Betev ALICE.
Max Baak 1 Efficient access to files on Castor / Grid Cern Tutorial Max Baak, CERN 30 October 2008.
19. November 2007Bernd Panzer-Steindel, CERN/IT1 CERN Computing Fabric Status LHCC Review, 19 th November 2007.
Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016.
LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.
Storage & Database Team Activity Report INFN CNAF,
Optimization of Large Scale HEP Data Analysis A file staging approach for analysis jobs on Stoomboot Daniela Remenska 1 B-physics Workshop - June 14, 2010.
26. Juni 2003Bernd Panzer-Steindel, CERN/IT1 LHC Computing re-costing for for the CERN T0/T1 center.
GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 1 Mass Storage at CERN GDB meeting, 12. January 2005.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Storage for Run 3 Rainer Schwemmer, LHCb Computing Workshop 2015.
Scott Koranda, UWM & NCSA 20 November 2016www.griphyn.org Lightweight Replication of Heavyweight Data Scott Koranda University of Wisconsin-Milwaukee &
Alexander Moibenko File Aggregation in Enstore (Small Files) Status report 2
CASTOR: possible evolution into the LHC era
LCG Storage Management Workshop, CERN, 7th April 2005
Atlas IO improvements and Future prospects
Status: ATLAS Grid Computing
Real Time Fake Analysis at PIC
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Data Federation & Data Management for the CMS Experiment at the LHC
FileStager test results
Tape Operations Vladimír Bahyl on behalf of IT-DSS-TAB
Lecture 16: Data Storage Wednesday, November 6, 2006.
Service Challenge 3 CERN
ATLAS activities in the IT cloud in April 2008
Experiences and Outlook Data Preservation and Long Term Analysis
Enrico Fattibene CDG – CNAF 18/09/2017
Bernd Panzer-Steindel, CERN/IT
Status and Prospects of The LHC Experiments Computing
Luca dell’Agnello INFN-CNAF
Bernd Panzer-Steindel, CERN/IT
Ákos Frohner EGEE'08 September 2008
The INFN Tier-1 Storage Implementation
Bernd Panzer-Steindel CERN/IT
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
Lecture 11: DMBS Internals
OffLine Physics Computing
R. Graciani for LHCb Mumbay, Feb 2006
CASTOR: CERN’s data management system
The LHCb Computing Data Challenge DC06
Virtual Memory 1 1.
Presentation transcript:

15.June 2004Bernd Panzer-Steindel, CERN/IT1 CERN Mass Storage Issues

15.June 2004Bernd Panzer-Steindel, CERN/IT2 Experience from the last 4 month of Data Challenges and current running experiments (I) Few issues : 1.Disk server instabilities caused problems (inefficiencies, lack of Castor error-recovery) and shortage in available disk space  server instability understood and fixed, now more or less back to normal running conditions  redundancy/error-recovery improvements in the new stager 2.Overload of the nameserver due to experiment specific queries -- large directory listings ( files during LHCb production)) -- use of the nameserver as experient production control system (regular, large status) queries  for the future have one nameserver per experiment  throttling/priority mechanisms for Oracle queries (not easy)  better collaboration between Experiments and IT

15.June 2004Bernd Panzer-Steindel, CERN/IT3 Experience from the last 4 month of Data Challenges and current running experiments (II) Few issues : 3.Open connections from/to disk servers from large number of batch jobs for long times  use the lxbatch local disks as extra cache layer improves the situation (ntof), but needs more investigations to understand the full implications user perception ?! 4.large numbers of single file requests from tape, which leads to tape drive inefficiencies (mount-unmount time is ~ 120 s)  new Castor design improves the aggregation of requests  multi-file staging possibilities needs to be discussed/organized with the experiments

15.June 2004Bernd Panzer-Steindel, CERN/IT4 Experience from the last 4 month of Data Challenges and current running experiments (III) 5.Large numbers of small files a) limits in the current stager, only one million files can be kept on disk (no tape limit), but the performance gets already very slow with 500 K files due to the catalogue structure this was seen with CMS and partly with Atlas major problem with ALICE, production phase 1 produced more than 2 million files (size between 50 KB and 1 GB) needed 2 more stager nodes for this  limit will disappear in the new stager, which is currently been tested (to be used also in the ALICE-IT computing DC)

15.June 2004Bernd Panzer-Steindel, CERN/IT5 b)large numbers of small files are creating large overheads and thus inefficiencies example : 1 MB files versus 1 GB files (ALICE wants to import in phase II of their DC 7 million 1 MB files) major point is the tape system, as tapes are designed to run most efficient with large files. There is more than 1s real time spent per file on tape (tape mark) thus 1MB file == < 1MB/s transfer speed, 9940B tape drive runs at 30 MB/s  inefficiency is > 97 % ( for 2008 we have foreseen ~ 200 drives for 4 GB/s tape performance assuming sequential read/write with GB files extreme : with 1 MB files we need drives ) Experience from the last 4 month of Data Challenges and current running experiments (IV) Major problem

15.June 2004Bernd Panzer-Steindel, CERN/IT6 other effects :  open+close of remote connections (e.g. RFIO) takes 0.1 s (== 10 MB/s max)  more streams per server and disk needed to get the aggregate performance e.g. in the current calculations for the large CDR disk buffer for 2008 we need from the pure performance (GB/s / perf. per 1 TB disk) point of view about 50 TB of disk space, but if one includes the # of streams we need 500 TB (already with large files…)  more streams per disk moves the perf. from sequential to random I/O = factor 10 perf. reduction  RLS insertion 3s per file (from the CMS DC)  response time of large number of files in a directory  OS caching issues  security checks per file (GridFTP >= 1s)  large buffers for effective WAN transfers ( > 100 MB ??)  …………….. Experience from the last 4 month of Data Challenges and current running experiments (V) Major problem

15.June 2004Bernd Panzer-Steindel, CERN/IT7 3 possible solutions 3 possible solutions 1.bigger files in the experiments ( RAW, EST/DST, AOD, NTUPLE !?) >1 GB now AND moving to >10 GB in start an intensive investigation about the overhead of small size files everywhere (HSM, OS, transfers, protocols, filesystems, disks, tapes, etc.) what is the effect ? how big ? scaling ? how to avoid/overcome limits? manpower and money ? 3. see whether we can find an optimal compromise between the two (for the AOD, NTUPLE part ?) mass storage versus shared filesystem