ALICE Data Challenges On the way to 1 GB/s

Slides:



Advertisements
Similar presentations
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Advertisements

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Remigius K Mommsen Fermilab A New Event Builder for CMS Run II A New Event Builder for CMS Run II on behalf of the CMS DAQ group.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/20 New Experiences with the ALICE High Level Trigger Data Transport.
LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
ALICE Data Challenge V P. VANDE VYVRE – CERN/PH LCG PEB - CERN March 2004.
MSS, ALICE week, 21/9/041 A part of ALICE-DAQ for the Forward Detectors University of Athens Physics Department Annie BELOGIANNI, Paraskevi GANOTI, Filimon.
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
1 Alice DAQ Configuration DB
The ALICE DAQ: Current Status and Future Challenges P. VANDE VYVRE CERN-EP/AID.
20-22 September 1999 HPSS User Forum, Santa Fe CERN IT/PDP 1 History  Test system HPSS 3.2 installation in Oct 1997 IBM AIX machines with IBM 3590 drives.
4 Dec 2006 Testing the machine (X7DBE-X) with 6 D-RORCs 1 Evaluation of the LDC Computing Platform for Point 2 SuperMicro X7DBE-X Andrey Shevel CERN PH-AID.
Roberto Divià, CERN/ALICE 1 CHEP 2009, Prague, March 2009 The ALICE Online Data Storage System Roberto Divià (CERN), Ulrich Fuchs (CERN), Irina Makhlyueva.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
Management of the LHCb Online Network Based on SCADA System Guoming Liu * †, Niko Neufeld † * University of Ferrara, Italy † CERN, Geneva, Switzerland.
R.Divià, CERN/ALICE Challenging the challenge Handling data in the Gigabit/s range.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Pierre VANDE VYVRE ALICE Online upgrade October 03, 2012 Offline Meeting, CERN.
R.Divià, CERN/ALICE 1 ALICE off-line week, CERN, 9 September 2002 DAQ-HLT software interface.
L1/HLT trigger farm Bologna setup 0 By Gianluca Peco INFN Bologna Genève,
Course 03 Basic Concepts assist. eng. Jánó Rajmond, PhD
Remigius K Mommsen Fermilab CMS Run 2 Event Building.
The Evaluation Tool for the LHCb Event Builder Network Upgrade Guoming Liu, Niko Neufeld CERN, Switzerland 18 th Real-Time Conference June 13, 2012.
Barthélémy von Haller CERN PH/AID For the ALICE Collaboration The ALICE data quality monitoring system.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
The ALICE Data-Acquisition Read-out Receiver Card C. Soós et al. (for the ALICE collaboration) LECC September 2004, Boston.
NetFlow Analyzer Best Practices, Tips, Tricks. Agenda Professional vs Enterprise Edition System Requirements Storage Settings Performance Tuning Configure.
HTCC coffee march /03/2017 Sébastien VALAT – CERN.
ALICE Computing Data Challenge VI
Use of FPGA for dataflow Filippo Costa ALICE O2 CERN
Chapter 1: Introduction
Architecture and Algorithms for an IEEE 802
Modeling event building architecture for the triggerless data acquisition system for PANDA experiment at the HESR facility at FAIR/GSI Krzysztof Korcyl.
PC Farms & Central Data Recording
LHC experiments Requirements and Concepts ALICE
Service Challenge 3 CERN
Enrico Gamberini, Giovanna Lehmann Miotto, Roland Sipos
Chapter 1: Introduction
ALICE – First paper.
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
Bernd Panzer-Steindel, CERN/IT
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Chapter 1: Introduction
PCI BASED READ-OUT RECEIVER CARD IN THE ALICE DAQ SYSTEM
Chapter 1: Introduction
ITS combined test seen from DAQ and ECS F.Carena, J-C.Marin
Example of DAQ Trigger issues for the SoLID experiment
Web Server Administration
Event Building With Smart NICs
Chapter 1: Introduction
High-Performance Storage System for the LHCb Experiment
LHCb Trigger, Online and related Electronics
ALICE Data Challenges Fons Rademakers Click to add notes.
Chapter 1: Introduction
Network Processors for a 1 MHz Trigger-DAQ System
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Cluster Computers.
Presentation transcript:

ALICE Data Challenges On the way to recording @ 1 GB/s

What is ALICE

ALICE Data Acquisition architecture Inner Tracking System Time Projection Chamber Particle Identification Photon Spectrometer Trigger Detectors Muon Trigger decisions Front-End Electronics Trigger data Detector Data Link Trigger Level 0 Local Data Concentrator Readout Receiver Card Trigger Level 1 Event Building switch  3 GB/s Event Destination Manager Trigger Level 2 Global Data Collector Storage switch 1.25 GB/s Perm. Data Storage

ALICE running parameters Two different running modes: Heavy Ion (HI): 106 seconds/year Proton: 107 seconds/year One Data Acquisition system (DAQ): Data Acquisition and Test Environment (DATE) Many triggers classes each providing events at different rates, sizes and sources HI data rates:  3 GB/s  1.25 GB/s  ~1 PB/year to mass storage Proton run: ~ 0.5 PB/year to mass storage Staged DAQ installation plan (20%  30%  100%): 85  300 LDCs, 10  40 Global Data Collectors (GDC) Different recording options: Local/remote disks Permanent Data Storage (PDS): CERN Advanced Storage Manager (CASTOR)

History of ALICE Data Challenges Started in 1998 to put together a high-bandwidth DAQ/recording chain Continued as a periodic activity to: Validate interoperability of all existing components Assess and validate developments, trends and options commercial products in-house developments Provide guidelines for ALICE & IT development and installation Continuously expand up to ALICE requirement at LHC startup

Performance goals MBytes/s

Data volume goals TBytes to Mass Storage

The ALICE Data Challenge IV

Components & modes LDC emulator ALICE DAQ Objectifier CASTOR FE CASTOR PDS RAW DATA OBJECTS RAW EVENTS Private network CERN backbone AFFAIR CASTOR monitor

Targets DAQ system scalability tests Single peer-to-peer tests: Evaluate the behavior of the DAQ system components with the available HW Preliminary tuning Multiple LDC/GDC tests: Add the full Data Acquisition (DAQ) functionality Verify the objectification process Validate & benchmark the CASTOR I/F Evaluate the performance of new hardware components: New generation of tapes 10 Gb Ethernet Achieve a stable production period: Minimum 200 MB/s sustained 7 days non stop 200 TB data to PDS

Software components Configurable LDC Emulator (COLE) Data Acquisition and Test Environment (DATE) 4.2 A Fine Fabric and Application Information Recorder (AFFAIR) V1 ALICE Mock Data Challenge objectifier (ALIMDC) ROOT (Object-Oriented Data Analysis Framework) v3.03 Permanent Data Storage (PDS): CASTOR V1.4.1.7 Linux RedHat 7.2, kernel 2.2 and 2.4 Physical pinned memory driver (PHYSMEM) Standard TCP/IP library

Hardware setup ALICE DAQ: infrastructure & benchmarking NFS & DAQ servers SMP HP Netserver (4 CPUs): setup & benchmarking LCG testbed (lxshare): setup & production 78 CPU servers on GE Dual ~1GHz Pentium III, 512 MB RAM Linux kernel 2.2 and 2.4 NFS (installation, distribution) and AFS (unused) [ 8 .. 30 ] DISK servers (IDE-based) on GE Mixed FE/GE/trunk GE, private & CERN backbone 2 * Extreme Networks Summit 7i switches (32 GE ports) 12 * 3COM 4900 switches (16 GE ports) CERN backbone: Enterasys SSR8600 routers (28 GE ports) PDS: 16 * 9940B tape drives in two different buildings STK linear tapes, 30 MB/s, 200 GB/cartridge

Networking LDCs & GDCs DISK servers 3 3 3 3 2 2 2 6 CPU servers on FE Backbone (4 Gbps) 16 TAPE servers (distributed)

Scalability test Put together as many hosts as possible to verify the scalability of: run control state machines control and data channels DAQ services system services hardware infrastructure Connect/control/disconnect plus simple data transfers Data patterns, payloads and throughputs uninteresting Keywords: usable, reliable, scalable, responsive

Scalability test

Single peer-to-peer Compare: Architectures Network configurations System and DAQ parameters Exercise: DAQ system network modules DAQ system clients and daemons Linux system calls, system libraries and network drivers Benchmark and tune: Linux parameters DAQ processes, libraries and network components DAQ data flow

Single peer-to-peer MB/s % CPU/MB Event size (KB) 10 20 30 40 50 60 70 80 90 100 110 200 400 600 800 1000 1200 1400 1600 1800 2000 Event size (KB) MB/s 0.00 1.00 2.00 3.00 4.00 5.00 % CPU/MB Transfer speed GDC CPU usage LDC CPU usage HP Netserver LH6000, 4*Xeon 700 MHz, 1.5 GB RAM, 3COM996 1000BaseT, tg3 driver, RedHat 7.3.1, kernel 2.4.18-19.7.x.cernsmp

Full test runtime options Different trigger classes for different traffic patterns Several recording options NULL GDC disk CASTOR disks CASTOR tapes Raw data vs. ROOT objects We concentrated on two major traffic patterns: Flat traffic: all LDCs send the same event ALICE-like traffic: periodic sequence of different events distributed according to forecasted ALICE raw data

Performance Goals MBytes/s 650 MB/s

Flat data traffic 40 LDCs * 38 GDCs 1 MB/event/LDC  NULL Occupancies: Critical item: load balancing over the GE trunks (2/3 nominal)

Load distribution on trunks 100 200 300 400 500 1 2 3 4 5 6 7 # LDCs Distributed Same switch MB/s 3 GE trunks: 330 MB/s -> why 220 MB/s? Challenge: 200 MB/s sustained, each switch used for: DATE and CASTOR -> contention separate switches -> insufficient # switches

ALICE-like traffic LDCs: rather realistic simulation partitioned in detectors no hardware trigger simulated readout, no “real” input channels GDCs acting as: event builder CASTOR front-end Data traffic: Realistic event sizes and trigger classes Partial detector readout Networking & nodes’ distribution scaled down & adapted

Challenge setup & outcomes ~ 25 LDCs TPC: 10 LDCs others detectors: [ 1 .. 3 ] LDCs ~ 50 GDCs Each satellite switch: 12 LDCs/GDCs (distributed) [ 8 .. 16 ] tape servers on the CERN backbone [ 8 .. 16 ] tape drives attached to a tape server No objectification named pipes too slow and too heavy upgraded to avoid named pipes: ALIMDC/CASTOR not performing well ITS Pix: 2 ITS Drift: 3 ITS Strips: 1 TPC: 10 TRD: 2 TOF: 1 PHOS: 1 HMPID: 1 MUON: 2 PMD: 2 TRIGGER: 1

Impact of traffic pattern FLAT/CASTOR ALICE/NULL ALICE/CASTOR

Performance Goals MBytes/s 200 MB/s

Production run 8 LDCs*16 GDCs, 1 MB/event/LDC (FLAT traffic) [ 8 .. 16 ] tape server and tape units 7 days at ~300 MB/s sustained, > 350 MB/s peak, ~ 180 TB to tape 9 Dec: too much input data 10 Dec: HW failures on tape drives & reconfiguration Despite the failures, always exceeded the performance goals

System reliability Hosts: ~ 10% Dead On Installation ~ 25% Failed On Installation Long period of short runs (tuning): occasional problems (recovered) with: name server network & O.S. in average [1 .. 2 ] O.S. failures per week (on 77 hosts) unrecoverable occasional failures on GE interfaces Production run: one tape unit failed and had to be excluded DOI: power supply, Hard Disk, BOOT failure, bad BIOS, installation failed FOI: no AFS, not all products, NIC working at FE or at simple duplex, single CPU - usually solved by a complete re-installation, sometimes repeated twice

Outcomes DATE 80 hosts/160+ roles with one run control Excellent reliability and performance Scalable and efficient architecture Linux Few hiccups here and there but rather stable and fast Excellent network performance/CPU usage Some components are too slow (e.g. named pipes) More reliability needed from the GE interfaces GDCs: 70% of each CPU free for extra activities CPU user: 1% CPU system (interrupts, libs): [10 .. 30]% Input rate: [6 .. 10 ] MB/s (average: 7 MB/s) LDCs: [15 .. 30]% CPU busy CPU user: [3..6]% CPU system: [10..25]% In reality: higher CPU user, same CPU system (pRORC: no interrupts) MEMORY: LDC: TPC can buffer 300/3.79 ~ 80 events: no problems GDC: every central event needs~ 43 MB. The event builder has to allocate maxEvent Size to each LDC (to ensure forward progress). In reality events are allocated on-the-fly according to their real size, but the minimum has to be ensured statically NETWORK: configured as a mesh with same resources on each node. The DAQ needs a tree with asymmetric assignment of branches (to absorb all the traffic)

Outcomes FARM installation and operation: not to be underestimated! CASTOR Reliable and effective Improvements needed on: Overloading Parallelizing tape resources Tapes One DOA and one DOO Network: silent but very effective partner Layout made for a farm, not optimized for ALICE DAQ 10 GB Ethernet tests: failure at first problem “fixed” too late for the Data Challenge reconfiguration: transparent to DAQ and CASTOR

Future ALICE Data Challenges Continue the planned progression ALICE-like pattern Record ROOT objects New technologies CPUs Servers Network NICs Infrastructure Beyond 1 GbE Insert online algorithms Provide some “real” input channels Get ready to record at 1.25 GB/s MBytes/s