Presentation is loading. Please wait.

Presentation is loading. Please wait.

ALICE Data Challenge V P. VANDE VYVRE – CERN/PH LCG PEB - CERN March 2004.

Similar presentations


Presentation on theme: "ALICE Data Challenge V P. VANDE VYVRE – CERN/PH LCG PEB - CERN March 2004."— Presentation transcript:

1 ALICE Data Challenge V P. VANDE VYVRE – CERN/PH LCG PEB - CERN March 2004

2 LCG PEB March 20042P. VANDE VYVRE CERN-EP Trigger Level 0,1 Trigger Level 2 High-Level Trigger Transient Data Storage (TDS) Event-Building Network Storage network Detector Digitizers Front-end Pipeline/Buffer Decision Readout Buffer Decision Sub-event Buffer Local Data Concentrators (LDC) Event Buffer Global Data Collectors (GDC) Permanent Data Storage (PDS) Decision Detector Data Link (DDL) Data Logical Model and Requirements Tested during ADC 25 GB/s 2.50 GB/s 1.25 GB/s

3 LCG PEB March 20043P. VANDE VYVRE CERN-EP Architecture & Performance Goals (1) DAQ project: System size and scalability: Scale similar to ALICE year 1 (2007 pp and PbPb runs) 30 % of final performance: scalability up to 150 nodes System performances: ALICE data traffic: verify optimal usage of computing resource Verify load balancing From DAQ to MSS Tape: 300 MB/s sustained over a week Disk: 450 MB/s peak needed Performance monitoring

4 LCG PEB March 20044P. VANDE VYVRE CERN-EP Architecture & Performance Goals (2) Offline project Simulated raw data from several detectors (large and small data fragments) Used during ADC V: TPC, ITS Other detectors: dummy data of realistic size Different trigger classes and detector sets with realistic multiplicity Read data back Improve Alimdc (ROOT formatting program) /CASTOR performance Algorithms from HLT project used for data monitoring purposes Automatic registration of files in the AliEn catalogue for world-wide availability

5 LCG PEB March 20045P. VANDE VYVRE CERN-EP Technology Goals CPU servers: Mostly Dual CPUs (LXSHARE) SMP machines (HP Netservers) for DAQ services (ALICE) IA 64 technology: test DATE code on Itanium Network: New generation of NIC cards (Intel Pro 1000) Trunking 10 Gbit Eth. Backbone. Including NICs Storage: Disk servers 23 New IDE-based disk servers (Nominal performance: RFIO @ 90 MB/s) Tapes STK 9940B : ~ 30 MB/s, ~ 200 GB/vol.

6 LCG PEB March 20046P. VANDE VYVRE CERN-EP HW Architecture ~ 80 CPU servers 2 x 2.4 GHz Xeon, 1 GB RAM, Intel 8254EM Gigabit in PCI-X 133 (Intel PRO/1000), CERN Linux 7.3.3 4 x 7 Disk servers 2 x 2.0 GHz Xeon 1 GB RAM Intel 82544GC 32 x GE 32 IA64 HP-rx2600 Servers 2 x 1 GHz Itanium-2 2 GB RAM Broadcom NetXtrem BCM5701 (tg3) RedHat Advanced Workstation 2.1 6.4 GB/s to memory, 4.0 GB/s to I/O 3COM 4900 16 x Gbit Enterasys E1 OAS 12 Gbit, 1 x 10 Gbit Enterasys ER16 16 slots 4/8 x Gbit or 1 x 10 Gbit/slot 3COM 4900 LDCs GDCs

7 LCG PEB March 20047P. VANDE VYVRE CERN-EP System Setup: CPU servers CPU servers requested (Cocotime) CPU servers allocated & used Comments LCGOpenlab March 200330 Not used by ALICE due to an internal review April 2003150 Jul. 2003~ 805 DAQ + network tests, addition of IA64 nodes, setup perf. mon. Aug. 2003~ 8020 New CPU SEIL servers Network problems Sep. 2003~ 8020 Broadcom NIC replaced by Intel Enterasys ER16 replaced by N7 Oct. 200330~ 8020 Nov. 2003150~ 8020 Dec. 200364 (70)20 Jan. 200464 (70)20 Production Feb. 200464 (70)20 (+ 15) Production

8 LCG PEB March 20048P. VANDE VYVRE CERN-EP System Setup: Storage Number of disk servers Requested Bandwidth to disk (MB/s) Measured Bandwidth to disk (MB/s) Requested Bandwidth to tape (MB/s) Measured Bandwidth to tape (MB/s) March 2003100 April 2003450300 Oct. 2003100 300 Nov. 2003450 300 Dec. 2003450300 Jan. 200421450300 Feb. 200421450300

9 LCG PEB March 20049P. VANDE VYVRE CERN-EP ALICE DC: Scalability

10 LCG PEB March 200410P. VANDE VYVRE CERN-EP ALICE DC – DAQ Bw MBytes/s.

11 LCG PEB March 200411P. VANDE VYVRE CERN-EP Trunking ADC IV100 200 300 400 500 1234567 # LDCs Distributed Same switch MB/s Trunk of 3 x Gb Eth

12 LCG PEB March 200412P. VANDE VYVRE CERN-EP Trunking ADC V Trunk of 4 x Gb Eth

13 LCG PEB March 200413P. VANDE VYVRE CERN-EP ALICE DC – MSS Bw (1) MBytes/s.  alimdc/rootd/castor bw between 2 nodes: 30 MB/s

14 LCG PEB March 200414P. VANDE VYVRE CERN-EP ALICE DC – MSS Bw (2)

15 LCG PEB March 200415P. VANDE VYVRE CERN-EP ALICE DC – MSS Bw (3)

16 LCG PEB March 200416P. VANDE VYVRE CERN-EP Achievements (1) System size System scalability (Hw and DATE Sw) Performance test with ALICE data traffic ALICE-like traffic LDCs working in ALICE conditions: Realistic ratio of event rate and sub-event sizes from 1 LDC to another ALICE-like events using simulated data: Realistic (sub-)event size on tape (ALICE year 1) DATE load-balancing demonstrated and used Sustained bw to tape not achieved Peak 350 MB/s Reached production-quality level only last week of test Sustained 280 MB/s over 1 day but with interventions IA-64 from Openlab successfully integrated in the ADC V

17 LCG PEB March 200417P. VANDE VYVRE CERN-EP Achievements (2) Simulated raw data used for performance test Several detectors Several triggers Data read back from CASTOR Data read back and verified Data fully reconstructed Alimdc/CASTOR bw: from 3 to 10 MB/s per data stream Algorithms from HLT successfully integrated

18 LCG PEB March 200418P. VANDE VYVRE CERN-EP Hardware components Network LDCs and GDCs: stable and scaleable including trunking Between GDCs and disk servers: Unreliable Trunking not scaling as expected Module broken and replaced twice in Enterasys router Network either seriously degraded or completely unusable 10 Gbit Eth. Backbone New generation of NIC cards (Intel Pro 1000) NIC from Broadcom unreliable. Replaced by Intel Pro 1000. Several CPU servers unusable (~3 out of 70) Storage Hardware problems on the disk servers (unrecovered hard disks failure) Unfortunate reaction from CASTOR concentrating requests to the faulty machine Several last minute workarounds needed (scripts for monitoring and reconfiguring)

19 LCG PEB March 200419P. VANDE VYVRE CERN-EP Open issues and future goals CASTOR: Unsupervised recovery from malfunctioning disk server New stager Special daemon should be put back in main development Used instead of standard RFIO daemon to achieve adequate performance. New xrootd daemon from BaBar. DAQ Increase performances Improve performance monitoring package (AFFAIR) Offline Realistic data for more detectors More remote sites accessing the raw data (monitoring and prompt reconstruction) Data streaming per trigger or detector Run HLT inline in alimdc and not anymore semi-realtime Network First generation of 10 Gig cards from Enterasys unreliable No indication of hardware failure Enterasys support took a long time to resolve the problem

20 LCG PEB March 200420P. VANDE VYVRE CERN-EP ALICE DC – DAQ Bw revised MBytes/s.

21 LCG PEB March 200421P. VANDE VYVRE CERN-EP Conclusions Computing Data Challenge is still the best tool for: exercising the fabric, demonstrating the software, verifying interfaces ADC V: Lots of achievements but… 1 major performance milestone missed Trouble with the network due to the Enterasys equipment under beta test A lot of work and milestones in front of us Next Computing ADC: 50% more on performance milestones Simulated raw data from all major detectors Preparatory work needed to test each component independently before their integration

22 LCG PEB March 200422P. VANDE VYVRE CERN-EP Postscript A lot of people from IT and ALICE have spent quite some time and hard work on this DC DC are and will remain manpower intensive exercises as will the LHC computing be Excellent collaboration between all groups and projects involved Regular meetings: Very constructive attitude Informal but extremely efficient atmosphere Thanks to all for enthusiast and effective contribution !


Download ppt "ALICE Data Challenge V P. VANDE VYVRE – CERN/PH LCG PEB - CERN March 2004."

Similar presentations


Ads by Google