Bernd Panzer-Steindel, CERN/IT

Slides:

Advertisements

Similar presentations

31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.

Advertisements

Bernd Panzer-Steindel, CERN/IT WAN RAW/ESD Data Distribution for LHC.

Exporting Raw/ESD data from Tier-0 Tier-1s Wrap-up.

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.

Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En’yo, Takashi Ichihara, Yasushi Watanabe and Satoshi.

Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.

US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.

CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.

Data Import Data Export Mass Storage & Disk Servers Database Servers Tapes Network from CERN Network from Tier 2 and simulation centers Physics Software.

GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh

14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.

Spending Plans and Schedule Jae Yu July 26, 2002.

LOGO PROOF system for parallel MPD event processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna.

LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.

Meeting, 5/12/06 CMS T1/T2 Estimates à CMS perspective: n Part of a wider process of resource estimation n Top-down Computing.

DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.

LOGO Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Mathematical Modeling and Computational Physics.

Storage issues for end-user analysis Bernd Panzer-Steindel, CERN/IT 08 July

23.March 2004Bernd Panzer-Steindel, CERN/IT1 LCG Workshop Computing Fabric.

Tier-2 storage A hardware view. HEP Storage dCache –needs feed and care although setup is now easier. DPM –easier to deploy xrootd (as system) is also.

NA62 computing resources update 1 Paolo Valente – INFN Roma Liverpool, Aug. 2013NA62 collaboration meeting.

Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN

Niko Neufeld, CERN. Trigger-free read-out – every bunch-crossing! 40 MHz of events to be acquired, built and processed in software 40 Tbit/s aggregated.

DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.

Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,

01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

19. November 2007Bernd Panzer-Steindel, CERN/IT1 CERN Computing Fabric Status LHCC Review, 19 th November 2007.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.

GDB Meeting 12. January Bernd Panzer-Steindel, CERN/IT 1 Mass Storage at CERN GDB meeting, 12. January 2005.

29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.

Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.

15.June 2004Bernd Panzer-Steindel, CERN/IT1 CERN Mass Storage Issues.

ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.

CASTOR: possible evolution into the LHC era

ALICE Computing Data Challenge VI

Data Formats and Impact on Federated Access

CS 540 Database Management Systems

Module 12: I/O Systems I/O hardware Application I/O Interface

WP18, High-speed data recording Krzysztof Wrona, European XFEL

NL Service Challenge Plans

PROOF – Parallel ROOT Facility

RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne

Status and Prospects of The LHC Experiments Computing

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

Bernd Panzer-Steindel, CERN/IT

Emanuele Leonardi PADME General Meeting - LNF January 2017

LHC Computing re-costing for

Ákos Frohner EGEE'08 September 2008

ALICE Computing Model in Run3

The INFN Tier-1 Storage Implementation

ALICE Computing Upgrade Predrag Buncic

Bernd Panzer-Steindel CERN/IT

Lecture 11: DMBS Internals

OffLine Physics Computing

Grid Canada Testbed using HEP applications

R. Graciani for LHCb Mumbay, Feb 2006

I/O Systems I/O Hardware Application I/O Interface

Support for ”interactive batch”

Operating System Concepts

13: I/O Systems I/O hardwared Application I/O Interface

Selecting a Disk-Scheduling Algorithm

Chapter 13: I/O Systems I/O Hardware Application I/O Interface

ATLAS DC2 & Continuous production

The ATLAS Computing Model

Development of LHCb Computing Model F Harris

Module 12: I/O Systems I/O hardwared Application I/O Interface

Presentation transcript:

Bernd Panzer-Steindel, CERN/IT ALICE data flow, T0 buffer system, performance issues 27.March 2006 Bernd Panzer-Steindel, CERN/IT

Bernd Panzer-Steindel, CERN/IT Considerations for two scenarios : stable running first 2 years of ‘consolidation’ Activities : Data recording -- copy from the DAQ -- copy to tape Data export -- copy to and from the different Tiers Data processing -- calibration using calibration streams and using selected RAW data -- processing of RAW data and ESD production -- ESD data analysis -- end-user analysis Facility T0 CAF (CERN Analysis Facility) T0 and CAF are logical concepts, physical differences constrain the flexible resource re-allocation 27.March 2006 Bernd Panzer-Steindel, CERN/IT

Bernd Panzer-Steindel, CERN/IT ALICE generic data flow scenario Analysis Farm Calibration Farm USER Data CAL ESD CAL RAW RAW ESD CAL Tape Storage DAQ Disk Buffer Disk Buffer RAW RAW ESD ESD file aggregation layers ? Reconstruction Farm Tier 1 Data Export 27.March 2006 Bernd Panzer-Steindel, CERN/IT

Bernd Panzer-Steindel, CERN/IT Calibration Farm Analysis Farm ALICE data flow scenario USER Data CAL Activities User Disk Buffer ESD Disk Buffer 2 CAL CAL ESD RAW RAW Tape Storage ESD Disk Buffer 3 DAQ Disk Buffer Disk Buffer 1 RAW Reconstruction Farm file aggregation layers ? Tier 1 Data Export 27.March 2006 Bernd Panzer-Steindel, CERN/IT

Bernd Panzer-Steindel, CERN/IT Boundary conditions for data flow design Network topology and connectivity Node IO performance -- network -- disk controller -- disks -- file system -- OS Storage management system Application scheduling system Experiment data management software Cost 27.March 2006 Bernd Panzer-Steindel, CERN/IT

Bernd Panzer-Steindel, CERN/IT Network boundary conditions Force10 router HP 3400 switches CPU nodes Disk server HP 3400 switch 10 Gbit 1Gbit Tape server Current Network Blocking factors : 2.4 for the disk servers = 24 server share one 10 Gbit uplink = 50 MB/s per server 2.0 for the tape server = 20 server share one 10 Gbit uplink = 60 MB/s per server 19.2 for the CPU server = 192 server share one 10 Gbit uplink = 6 MB/s per server + ~5% of CPU nodes with a b locking factor of 2.4 no magic here! throughput figures can be changed : more uplinks per switch, different daisy-chaining numbers, more switches  money issue, not architecture 27.March 2006 Bernd Panzer-Steindel, CERN/IT

Bernd Panzer-Steindel, CERN/IT Some ALICE numbers and calculations (I) CPU resource growth rate (installed capacity) (T0 + CAF) 0.65 + 0.39 MSI2000 1.3 + 1.7 MSI2000 3.3 + 5.5 MSI2000 today one CPU = 1 KSI2000, move to multi-core 1.5 KSI2000/core assume to get 4-way in 2007 and 8-way in 2008 (cores per node) jump in technology during 2006, later ‘normal’ growth rate (SI2000/core and year) 8.8 MSI2000 in 2008 == ~1300 nodes == ~ 6000 cores (mixture of 2-way, 4-way and 8-way) Reconstruction pp event size = 1.0 MB proc effort = 5400 SI2000/ev  300 KB/s IO/core Chaotic analysis pp event size = 0.04 MB proc effort = 500 SI2000/ev  120 KB/s IO/core Reconstruction HI event size = 12.5 MB proc effort = 68000 SI2000/ev  280 KB/s IO/core Chaotic analysis HI event size = 2.5 MB proc effort = 7500 SI2000/ev  500 KB/s IO/core parameter for calibration not clear (5% of the CPU resources, but no IO values available) 27.March 2006 Bernd Panzer-Steindel, CERN/IT

Bernd Panzer-Steindel, CERN/IT Some ALICE numbers and calculations (II) Disk space resource growth rate (installed capacity) (T0 + CAF) 2006 48 + 183 TB 2007 95 + 526 TB 238 + 1447 TB Today one disk is about ~400 GB and a disk server has 22 disks == ~ 5TB RAID5 distributed over three file systems extrapolating to 2008:  1685 TB == ~180 server with ~540 file systems and ~4000 disks (3000 ‘active’ + 1000 parity/spare) 6000 jobs with one input and 1-2 output streams access 180 server with 540 file systems 70-100 streams per server, 20-30 streams per file system ‘overload’ figures depend on the access model (page 9) 3000 cores running RAW reconstruction needs an aggregate of ‘only’ 1 GB/s what is the calibration doing ? much larger calibration effort in the beginning !? what are the performance deviations during the first 2 years ? 27.March 2006 Bernd Panzer-Steindel, CERN/IT

Bernd Panzer-Steindel, CERN/IT Disk server, file system performances overall network speed 120 MB/s equal performance sharing of read streams write has strong preference over read streams sharing of high speed streams and many low speed streams complicated guaranteed >= 60 MB/ tape streams ……… 27.March 2006 Bernd Panzer-Steindel, CERN/IT

Bernd Panzer-Steindel, CERN/IT Interfaces for the disk pools access 4 different data transfer activities : DAQ buffer to the T0 buffer  RFIO rfcp T0 buffer to the tape drives  RFIO rtcopy T0 buffer to the T1 sites  gridFTP SRM FTS T0 buffer to the CPU nodes  xrootd point 4 has depends actually on the work flow model -- data files are accessed (opened) from the application directly on a disk server -- data files are copied from the disk server to the local disk of the worker node Application Disk server CPU server xrootd rootd, xrootd, rfio 27.March 2006 Bernd Panzer-Steindel, CERN/IT

Bernd Panzer-Steindel, CERN/IT disk pool as a base unit to provide policies and performance values, only one data management software can have control of the pool Characterization of application activities: number of concurrent physical streams mixture ratio of read and write operations (+seeks) number of different users priority schemes between users aggregate tape writing performance guaranteed total throughput guaranteed performance per stream file size distribution access pattern, sequential versus random access data replication policies, performance implications complexity of priority and policy schemes versus performance overheads due to extra data copies 27.March 2006 Bernd Panzer-Steindel, CERN/IT

Bernd Panzer-Steindel, CERN/IT Calibration Farm Analysis Farm ALICE data flow scenario USER Data CAL User Disk Buffer ESD Disk Buffer 2 CAL CAL ESD RAW RAW Tape Storage ESD Disk Buffer 3 DAQ Disk Buffer Disk Buffer 1 RAW Reconstruction Farm file aggregation layers ? Tier 1 Data Export 27.March 2006 Bernd Panzer-Steindel, CERN/IT

Bernd Panzer-Steindel, CERN/IT  disk pool design is complex -- access patterns with boundary conditions -- flexibility -- space not ‘really’ an issue clear separation of activities (and responsibilities) performance management disk pool  defined activity mapping disentanglement reduced complexity for easier understanding/debugging cost and support issues merging later is easier 27.March 2006 Bernd Panzer-Steindel, CERN/IT