Download presentation
Presentation is loading. Please wait.
Published byBlanche Kelley Modified over 9 years ago
1
R.Divià, CERN/ALICE Challenging the challenge Handling data in the Gigabit/s range
2
R.Divià, CERN/ALICE 2 24 March 2003, CHEP03, La Jolla
3
R.Divià, CERN/ALICE 3 24 March 2003, CHEP03, La Jolla
4
R.Divià, CERN/ALICE 4 24 March 2003, CHEP03, La Jolla ALICE Data Acquisition architecture InnerTrackingSystem Local Data Concentrator Readout Receiver Card Front-EndElectronics TimeProjectionChamber Muon PhotonSpectrometerParticleIdentification Trigger Level 1 Trigger Level 2 Trigger Level 0 Triggerdata TriggerDetectors Trigger decisions Event Building switch 3 GB/s Global Data Collector Event Destination Manager Storage switch 1.25 GB/s Perm.DataStorage Detector Data Link
5
R.Divià, CERN/ALICE 5 24 March 2003, CHEP03, La Jolla ALICE running parameters u Two different running modes: m Heavy Ion (HI): 10 6 seconds/year m Proton: 10 7 seconds/year u One Data Acquisition system (DAQ): Data Acquisition and Test Environment (DATE) u Many triggers classes each providing events at different rates, sizes and sources u HI data rates: 3 GB/s 1.25 GB/s ~1 PB/year to mass storage u Proton run: ~ 0.5 PB/year to mass storage u Staged DAQ installation plan (20% 30% 100%): m 85 300 LDCs, 10 40 Global Data Collectors (GDC) u Different recording options: m Local/remote disks m Permanent Data Storage (PDS): CERN Advanced Storage Manager (CASTOR)
6
R.Divià, CERN/ALICE 6 24 March 2003, CHEP03, La Jolla History of ALICE Data Challenges u Started in 1998 to put together a high-bandwidth DAQ/recording chain u Continued as a periodic activity to: m Validate interoperability of all existing components m Assess and validate developments, trends and options commercial products in-house developments m Provide guidelines for ALICE & IT development and installation u Continuously expand up to ALICE requirement at LHC startup
7
R.Divià, CERN/ALICE 7 24 March 2003, CHEP03, La Jolla Performance goals MBytes/s
8
R.Divià, CERN/ALICE 8 24 March 2003, CHEP03, La Jolla Data volume goals TBytes to Mass Storage
9
R.Divià, CERN/ALICE 9 24 March 2003, CHEP03, La Jolla TheALICE Data Challenge IV
10
R.Divià, CERN/ALICE 10 24 March 2003, CHEP03, La Jolla RAWEVENTS Objectifier RAWDATAOBJECTS LDCemulator Components & modes ALICEDAQCASTOR PDS AFFAIR CASTOR monitor Private network CERN backbone CASTOR FE
11
R.Divià, CERN/ALICE 11 24 March 2003, CHEP03, La Jolla Targets u DAQ system scalability tests u Single peer-to-peer tests: m Evaluate the behavior of the DAQ system components with the available HW m Preliminary tuning u Multiple LDC/GDC tests: m Add the full Data Acquisition (DAQ) functionality m Verify the objectification process m Validate & benchmark the CASTOR I/F u Evaluate the performance of new hardware components: m New generation of tapes m 10 Gb Ethernet u Achieve a stable production period: m Minimum 200 MB/s sustained m 7 days non stop m 200 TB data to PDS
12
R.Divià, CERN/ALICE 12 24 March 2003, CHEP03, La Jolla Software components u Configurable LDC Emulator (COLE) u Data Acquisition and Test Environment (DATE) 4.2 u A Fine Fabric and Application Information Recorder (AFFAIR) V1 u ALICE Mock Data Challenge objectifier (ALIMDC) u ROOT (Object-Oriented Data Analysis Framework) v3.03 u Permanent Data Storage (PDS): CASTOR V1.4.1.7 u Linux RedHat 7.2, kernel 2.2 and 2.4 m Physical pinned memory driver (PHYSMEM) m Standard TCP/IP library
13
R.Divià, CERN/ALICE 13 24 March 2003, CHEP03, La Jolla Hardware setup u ALICE DAQ: infrastructure & benchmarking m NFS & DAQ servers m SMP HP Netserver (4 CPUs): setup & benchmarking u LCG testbed (lxshare): setup & production m 78 CPU servers on GE Dual ~1GHz Pentium III, 512 MB RAM Linux kernel 2.2 and 2.4 NFS (installation, distribution) and AFS (unused) m [ 8.. 30 ] DISK servers (IDE-based) on GE u Mixed FE/GE/trunk GE, private & CERN backbone m 2 * Extreme Networks Summit 7i switches (32 GE ports) m 12 * 3COM 4900 switches (16 GE ports) m CERN backbone: Enterasys SSR8600 routers (28 GE ports) u PDS: 16 * 9940B tape drives in two different buildings m STK linear tapes, 30 MB/s, 200 GB/cartridge
14
R.Divià, CERN/ALICE 14 24 March 2003, CHEP03, La Jolla Networking 2 2 2 3 3 3 3 3333 2 16 TAPE servers (distributed) Backbone (4 Gbps) 6 CPU servers on FE LDCs & GDCs DISK servers
15
R.Divià, CERN/ALICE 15 24 March 2003, CHEP03, La Jolla Scalability test u Put together as many hosts as possible to verify the scalability of: m run control m state machines m control and data channels m DAQ services m system services m hardware infrastructure u Connect/control/disconnect plus simple data transfers u Data patterns, payloads and throughputs uninteresting u Keywords: usable, reliable, scalable, responsive
16
R.Divià, CERN/ALICE 16 24 March 2003, CHEP03, La Jolla Scalability test
17
R.Divià, CERN/ALICE 17 24 March 2003, CHEP03, La Jolla Single peer-to-peer u Compare: m Architectures m Network configurations m System and DAQ parameters u Exercise: m DAQ system network modules m DAQ system clients and daemons m Linux system calls, system libraries and network drivers u Benchmark and tune: m Linux parameters m DAQ processes, libraries and network components m DAQ data flow
18
R.Divià, CERN/ALICE 18 24 March 2003, CHEP03, La Jolla Single peer-to-peer
19
R.Divià, CERN/ALICE 19 24 March 2003, CHEP03, La Jolla Full test runtime options u Different trigger classes for different traffic patterns u Several recording options m NULL m GDC disk m CASTOR disks m CASTOR tapes u Raw data vs. ROOT objects u We concentrated on two major traffic patterns: m Flat traffic: all LDCs send the same event m ALICE-like traffic: periodic sequence of different events distributed according to forecasted ALICE raw data
20
R.Divià, CERN/ALICE 20 24 March 2003, CHEP03, La Jolla Performance Goals MBytes/s 650 MB/s
21
R.Divià, CERN/ALICE 21 24 March 2003, CHEP03, La Jolla Flat data traffic u 40 LDCs * 38 GDCs u 1 MB/event/LDC NULL u Occupancies: m LDCs: 75% m GDCs: 50% u Critical item: load balancing over the GE trunks (2/3 nominal)
22
R.Divià, CERN/ALICE 22 24 March 2003, CHEP03, La Jolla Load distribution on trunks 100 200 300 400 500 1234567 # LDCs Distributed Same switch MB/s
23
R.Divià, CERN/ALICE 23 24 March 2003, CHEP03, La Jolla ALICE-like traffic u LDCs: m rather realistic simulation m partitioned in detectors m no hardware trigger m simulated readout, no “real” input channels u GDCs acting as: m event builder m CASTOR front-end u Data traffic: m Realistic event sizes and trigger classes m Partial detector readout u Networking & nodes’ distribution scaled down & adapted
24
R.Divià, CERN/ALICE 24 24 March 2003, CHEP03, La Jolla Challenge setup & outcomes u ~ 25 LDCs m TPC: 10 LDCs m others detectors: [ 1.. 3 ] LDCs u ~ 50 GDCs u Each satellite switch: 12 LDCs/GDCs (distributed) u [ 8.. 16 ] (+1) tape servers on the CERN backbone u [ 8.. 16 ] (+1) tape drives attached to a tape server u No objectification m named pipes too slow and too heavy m upgraded to avoid named pipes: ALIMDC/CASTOR not performing well
25
R.Divià, CERN/ALICE 25 24 March 2003, CHEP03, La Jolla Impact of traffic pattern FLAT/CASTORALICE/NULL ALICE/CASTOR
26
R.Divià, CERN/ALICE 26 24 March 2003, CHEP03, La Jolla Performance Goals MBytes/s 200 MB/s
27
R.Divià, CERN/ALICE 27 24 March 2003, CHEP03, La Jolla Production run u 8 LDCs*16 GDCs, 1 MB/event/LDC (FLAT traffic) u [ 8.. 16 ] tape server and tape units u 7 days at ~300 MB/s sustained, > 350 MB/s peak, ~ 180 TB to tape u 9 Dec: too much input data u 10 Dec: HW failures on tape drives & reconfiguration u Despite the failures, always exceeded the performance goals
28
R.Divià, CERN/ALICE 28 24 March 2003, CHEP03, La Jolla System reliability u Hosts: m ~ 10% Dead On Installation m ~ 25% Failed On Installation u Long period of short runs (tuning): m occasional problems (recovered) with: name server network & O.S. m in average [1.. 2 ] O.S. failures per week (on 77 hosts) unrecoverable occasional failures on GE interfaces u Production run: m one tape unit failed and had to be excluded
29
R.Divià, CERN/ALICE 29 24 March 2003, CHEP03, La Jolla Outcomes u DATE m 80 hosts/160+ roles with one run control m Excellent reliability and performance m Scalable and efficient architecture u Linux m Few hiccups here and there but rather stable and fast m Excellent network performance/CPU usage m Some components are too slow (e.g. named pipes) m More reliability needed from the GE interfaces
30
R.Divià, CERN/ALICE 30 24 March 2003, CHEP03, La Jolla Outcomes u FARM installation and operation: not to be underestimated! u CASTOR m Reliable and effective m Improvements needed on: Overloading Parallelizing tape resources u Tapes m One DOA and one DOO u Network: silent but very effective partner m Layout made for a farm, not optimized for ALICE DAQ u 10 GB Ethernet tests: m failure at first m problem “fixed” too late for the Data Challenge m reconfiguration: transparent to DAQ and CASTOR
31
R.Divià, CERN/ALICE 31 24 March 2003, CHEP03, La Jolla Future ALICE Data Challenges u Continue the planned progression u ALICE-like pattern u Record ROOT objects u New technologies m CPUs m Servers m Network NICs Infrastructure Beyond 1 GbE u Insert online algorithms u Provide some “real” input channels u Challenge the challenge! MBytes/s
32
R.Divià, CERN/ALICE 32 24 March 2003, CHEP03, La Jolla http://bulletin.cern.ch/eng/ebulletin.php
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.