Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bernd Panzer-Steindel, CERN/IT

Similar presentations


Presentation on theme: "Bernd Panzer-Steindel, CERN/IT"— Presentation transcript:

1 Bernd Panzer-Steindel, CERN/IT
ALICE data flow, T0 buffer system, performance issues 27.March 2006 Bernd Panzer-Steindel, CERN/IT

2 Bernd Panzer-Steindel, CERN/IT
Considerations for two scenarios : stable running first 2 years of ‘consolidation’ Activities : Data recording -- copy from the DAQ -- copy to tape Data export -- copy to and from the different Tiers Data processing -- calibration using calibration streams and using selected RAW data -- processing of RAW data and ESD production -- ESD data analysis -- end-user analysis Facility T0 CAF (CERN Analysis Facility) T0 and CAF are logical concepts, physical differences constrain the flexible resource re-allocation 27.March 2006 Bernd Panzer-Steindel, CERN/IT

3 Bernd Panzer-Steindel, CERN/IT
ALICE generic data flow scenario Analysis Farm Calibration Farm USER Data CAL ESD CAL RAW RAW ESD CAL Tape Storage DAQ Disk Buffer Disk Buffer RAW RAW ESD ESD file aggregation layers ? Reconstruction Farm Tier 1 Data Export 27.March 2006 Bernd Panzer-Steindel, CERN/IT

4 Bernd Panzer-Steindel, CERN/IT
Calibration Farm Analysis Farm ALICE data flow scenario USER Data CAL Activities User Disk Buffer ESD Disk Buffer 2 CAL CAL ESD RAW RAW Tape Storage ESD Disk Buffer 3 DAQ Disk Buffer Disk Buffer 1 RAW Reconstruction Farm file aggregation layers ? Tier 1 Data Export 27.March 2006 Bernd Panzer-Steindel, CERN/IT

5 Bernd Panzer-Steindel, CERN/IT
Boundary conditions for data flow design Network topology and connectivity Node IO performance -- network -- disk controller -- disks -- file system -- OS Storage management system Application scheduling system Experiment data management software Cost 27.March 2006 Bernd Panzer-Steindel, CERN/IT

6 Bernd Panzer-Steindel, CERN/IT
Network boundary conditions Force10 router HP 3400 switches CPU nodes Disk server HP 3400 switch 10 Gbit 1Gbit Tape server Current Network Blocking factors : 2.4 for the disk servers = 24 server share one 10 Gbit uplink = 50 MB/s per server 2.0 for the tape server = 20 server share one 10 Gbit uplink = 60 MB/s per server 19.2 for the CPU server = 192 server share one 10 Gbit uplink = 6 MB/s per server + ~5% of CPU nodes with a b locking factor of 2.4 no magic here! throughput figures can be changed : more uplinks per switch, different daisy-chaining numbers, more switches  money issue, not architecture 27.March 2006 Bernd Panzer-Steindel, CERN/IT

7 Bernd Panzer-Steindel, CERN/IT
Some ALICE numbers and calculations (I) CPU resource growth rate (installed capacity) (T0 + CAF) MSI2000 MSI2000 MSI2000 today one CPU = 1 KSI2000, move to multi-core 1.5 KSI2000/core assume to get 4-way in 2007 and 8-way in 2008 (cores per node) jump in technology during 2006, later ‘normal’ growth rate (SI2000/core and year) 8.8 MSI2000 in == ~1300 nodes == ~ 6000 cores (mixture of 2-way, 4-way and 8-way) Reconstruction pp event size = 1.0 MB proc effort = SI2000/ev  300 KB/s IO/core Chaotic analysis pp event size = 0.04 MB proc effort = 500 SI2000/ev  120 KB/s IO/core Reconstruction HI event size = 12.5 MB proc effort = SI2000/ev  280 KB/s IO/core Chaotic analysis HI event size = 2.5 MB proc effort = SI2000/ev  500 KB/s IO/core parameter for calibration not clear (5% of the CPU resources, but no IO values available) 27.March 2006 Bernd Panzer-Steindel, CERN/IT

8 Bernd Panzer-Steindel, CERN/IT
Some ALICE numbers and calculations (II) Disk space resource growth rate (installed capacity) (T0 + CAF) TB TB TB Today one disk is about ~400 GB and a disk server has 22 disks == ~ 5TB RAID5 distributed over three file systems extrapolating to 2008:  TB == ~180 server with ~540 file systems and ~4000 disks (3000 ‘active’ parity/spare) 6000 jobs with one input and 1-2 output streams access 180 server with 540 file systems streams per server, streams per file system ‘overload’ figures depend on the access model (page 9) 3000 cores running RAW reconstruction needs an aggregate of ‘only’ 1 GB/s what is the calibration doing ? much larger calibration effort in the beginning !? what are the performance deviations during the first 2 years ? 27.March 2006 Bernd Panzer-Steindel, CERN/IT

9 Bernd Panzer-Steindel, CERN/IT
Disk server, file system performances overall network speed MB/s equal performance sharing of read streams write has strong preference over read streams sharing of high speed streams and many low speed streams complicated guaranteed >= 60 MB/ tape streams ……… 27.March 2006 Bernd Panzer-Steindel, CERN/IT

10 Bernd Panzer-Steindel, CERN/IT
Interfaces for the disk pools access 4 different data transfer activities : DAQ buffer to the T0 buffer  RFIO rfcp T0 buffer to the tape drives  RFIO rtcopy T0 buffer to the T1 sites  gridFTP SRM FTS T0 buffer to the CPU nodes  xrootd point 4 has depends actually on the work flow model -- data files are accessed (opened) from the application directly on a disk server -- data files are copied from the disk server to the local disk of the worker node Application Disk server CPU server xrootd rootd, xrootd, rfio 27.March 2006 Bernd Panzer-Steindel, CERN/IT

11 Bernd Panzer-Steindel, CERN/IT
disk pool as a base unit to provide policies and performance values, only one data management software can have control of the pool Characterization of application activities: number of concurrent physical streams mixture ratio of read and write operations (+seeks) number of different users priority schemes between users aggregate tape writing performance guaranteed total throughput guaranteed performance per stream file size distribution access pattern, sequential versus random access data replication policies, performance implications complexity of priority and policy schemes versus performance overheads due to extra data copies 27.March 2006 Bernd Panzer-Steindel, CERN/IT

12 Bernd Panzer-Steindel, CERN/IT
Calibration Farm Analysis Farm ALICE data flow scenario USER Data CAL User Disk Buffer ESD Disk Buffer 2 CAL CAL ESD RAW RAW Tape Storage ESD Disk Buffer 3 DAQ Disk Buffer Disk Buffer 1 RAW Reconstruction Farm file aggregation layers ? Tier 1 Data Export 27.March 2006 Bernd Panzer-Steindel, CERN/IT

13 Bernd Panzer-Steindel, CERN/IT
 disk pool design is complex -- access patterns with boundary conditions -- flexibility -- space not ‘really’ an issue clear separation of activities (and responsibilities) performance management disk pool  defined activity mapping disentanglement reduced complexity for easier understanding/debugging cost and support issues merging later is easier 27.March 2006 Bernd Panzer-Steindel, CERN/IT


Download ppt "Bernd Panzer-Steindel, CERN/IT"

Similar presentations


Ads by Google