An Overview over Online Systems at the LHC Invited Talk at NSS-MIC 2012 Anaheim CA, 31 October 2012 Beat Jost, Cern.

Slides:



Advertisements
Similar presentations
CWG10 Control, Configuration and Monitoring Status and plans for Control, Configuration and Monitoring 16 December 2014 ALICE O 2 Asian Workshop
Advertisements

Clara Gaspar on behalf of the LHCb Collaboration, “Physics at the LHC and Beyond”, Quy Nhon, Vietnam, August 2014 Challenges and lessons learnt LHCb Operations.
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
LHCb Upgrade Overview ALICE, ATLAS, CMS & LHCb joint workshop on DAQ Château de Bossey 13 March 2013 Beat Jost / Cern.
A Gigabit Ethernet Link Source Card Robert E. Blair, John W. Dawson, Gary Drake, David J. Francis*, William N. Haberichter, James L. Schlereth Argonne.
The LHCb Event-Builder Markus Frank, Jean-Christophe Garnier, Clara Gaspar, Richard Jacobson, Beat Jost, Guoming Liu, Niko Neufeld, CERN/PH 17 th Real-Time.
Remigius K Mommsen Fermilab A New Event Builder for CMS Run II A New Event Builder for CMS Run II on behalf of the CMS DAQ group.
The LHCb DAQ and Trigger Systems: recent updates Ricardo Graciani XXXIV International Meeting on Fundamental Physics.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/20 New Experiences with the ALICE High Level Trigger Data Transport.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/27 A Control Software for the ALICE High Level Trigger Timm.
VC Sept 2005Jean-Sébastien Graulich Report on DAQ Workshop Jean-Sebastien Graulich, Univ. Genève o Introduction o Monitoring and Control o Detector DAQ.
Clara Gaspar, May 2010 The LHCb Run Control System An Integrated and Homogeneous Control System.
LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC.
The LHCb Online System Design, Implementation, Performance, Plans Presentation at the 2 nd TIPP Conference Chicago, 9 June 2011 Beat Jost Cern.
K. Honscheid RT-2003 The BTeV Data Acquisition System RT-2003 May 22, 2002 Klaus Honscheid, OSU  The BTeV Challenge  The Project  Readout and Controls.
MSS, ALICE week, 21/9/041 A part of ALICE-DAQ for the Forward Detectors University of Athens Physics Department Annie BELOGIANNI, Paraskevi GANOTI, Filimon.
Architecture and Dataflow Overview LHCb Data-Flow Review September 2001 Beat Jost Cern / EP.
Clara Gaspar, October 2011 The LHCb Experiment Control System: On the path to full automation.
Copyright © 2000 OPNET Technologies, Inc. Title – 1 Distributed Trigger System for the LHC experiments Krzysztof Korcyl ATLAS experiment laboratory H.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
Network Architecture for the LHCb DAQ Upgrade Guoming Liu CERN, Switzerland Upgrade DAQ Miniworkshop May 27, 2013.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Clara Gaspar, March 2005 LHCb Online & the Conditions DB.
1 Network Performance Optimisation and Load Balancing Wulf Thannhaeuser.
Overview of DAQ at CERN experiments E.Radicioni, INFN MICE Daq and Controls Workshop.
Latest ideas in DAQ development for LHC B. Gorini - CERN 1.
LHCb front-end electronics and its interface to the DAQ.
LHCb DAQ system LHCb SFC review Nov. 26 th 2004 Niko Neufeld, CERN.
1 The PHENIX Experiment in the RHIC Run 7 Martin L. Purschke, Brookhaven National Laboratory for the PHENIX Collaboration RHIC from space Long Island,
Niko Neufeld, CERN/PH. Online data filtering and processing (quasi-) realtime data reduction for high-rate detectors High bandwidth networking for data.
Management of the LHCb Online Network Based on SCADA System Guoming Liu * †, Niko Neufeld † * University of Ferrara, Italy † CERN, Geneva, Switzerland.
Data Acquisition, Trigger and Control
The Past... DDL in ALICE DAQ The DDL project ( )  Collaboration of CERN, Wigner RCP, and Cerntech Ltd.  The major Hungarian engineering contribution.
ATLAS TDAQ RoI Builder and the Level 2 Supervisor system R. E. Blair, J. Dawson, G. Drake, W. Haberichter, J. Schlereth, M. Abolins, Y. Ermoline, B. G.
Future experiment specific needs for LHCb OpenFabrics/Infiniband Workshop at CERN Monday June 26 Sai Suman Cherukuwada Sai Suman Cherukuwada and Niko Neufeld.
ICHEC Presentation ESR2: Reconfigurable Computing and FPGAs ICE-DIP Srikanth Sridharan 9/2/2015.
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
Niko Neufeld HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
Niko Neufeld, CERN. Trigger-free read-out – every bunch-crossing! 40 MHz of events to be acquired, built and processed in software 40 Tbit/s aggregated.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Pierre VANDE VYVRE ALICE Online upgrade October 03, 2012 Offline Meeting, CERN.
Alignment in real-time in current detector and upgrade 6th LHCb Computing Workshop 18 November 2015 Beat Jost / Cern.
DAQ Overview + selected Topics Beat Jost Cern EP.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
DAQ & ConfDB Configuration DB workshop CERN September 21 st, 2005 Artur Barczyk & Niko Neufeld.
Introduction to DAQ Architecture Niko Neufeld CERN / IPHE Lausanne.
COMPASS DAQ Upgrade I.Konorov, A.Mann, S.Paul TU Munich M.Finger, V.Jary, T.Liska Technical University Prague April PANDA DAQ/FEE WS Игорь.
P. Vande Vyvre – CERN/PH for the ALICE collaboration CHEP – October 2010.
The Evaluation Tool for the LHCb Event Builder Network Upgrade Guoming Liu, Niko Neufeld CERN, Switzerland 18 th Real-Time Conference June 13, 2012.
EPS 2007 Alexander Oh, CERN 1 The DAQ and Run Control of CMS EPS 2007, Manchester Alexander Oh, CERN, PH-CMD On behalf of the CMS-CMD Group.
CHEP 2010, October 2010, Taipei, Taiwan 1 18 th International Conference on Computing in High Energy and Nuclear Physics This research project has.
Giovanna Lehmann Miotto CERN EP/DT-DI On behalf of the DAQ team
LHCb and InfiniBand on FPGA
Workshop Concluding Remarks
LHC experiments Requirements and Concepts ALICE
Enrico Gamberini, Giovanna Lehmann Miotto, Roland Sipos
TPC Commissioning: DAQ, ECS aspects
Controlling a large CPU farm using industrial tools
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
ALICE, ATLAS, CMS & LHCb joint workshop on DAQ
The LHCb Event Building Strategy
VELO readout On detector electronics Off detector electronics to DAQ
The LHCb Run Control System
LHCb Trigger and Data Acquisition System Requirements and Concepts
John Harvey CERN EP/LBC July 24, 2001
Event Building With Smart NICs
LHCb Trigger, Online and related Electronics
The LHCb High Level Trigger Software Framework
Network Processors for a 1 MHz Trigger-DAQ System
The CMS Tracking Readout and Front End Driver Testing
Presentation transcript:

An Overview over Online Systems at the LHC Invited Talk at NSS-MIC 2012 Anaheim CA, 31 October 2012 Beat Jost, Cern

Acknowledgments and Disclaimer I would like to thank David Francis, Frans Meijers and Pierre vande Vyvre for lots of material on their experiments I would also like to thanks Clara Gaspar and Niko Neufeld for many discussions There are surely errors and misunderstandings in this presentation which are entirely due to my shortcomings 2 NSS-MIC Anaheim 31 October 2012

Beat Jost, Cern Outline ❏ Data Acquisition Systems ➢ Front-end Readout ➢ Event Building ❏ Run Control ➢ Tools and Architecture ❏ Something New – Deferred Triggering ❏ Upgrade Plans 3 NSS-MIC Anaheim 31 October 2012

Beat Jost, Cern Role of the Online System ❏ In today’s HEP experiments millions of sensors are distributed over hundreds of m 2 and actuated dozens million times per second ❏ The data of all these sensors have to be collected and assembled in one point (computer, disk, tape), after rate reduction through event selection ➢ This is the Data Acquisition (DAQ) system ❏ This process has to be controlled and monitored (by the operator) ➢ This is the Run Control System ❏ Together they form the Online system And, by the way, it’s a pre-requisite for any physics analysis 4 NSS-MIC Anaheim 31 October 2012

Beat Jost, Cern Setting the Scene – DAQ Parameters 5 NSS-MIC Anaheim 31 October 2012

Beat Jost, Cern A generic LHC DAQ system 6 NSS-MIC Anaheim 31 October 2012 Sensors Front-End Electronics Aggregation Aggregation/ (Zero Suppression) Zero Suppression/ Data Formatting/ Data Buffering Event Building Network HLT Farm Perm. Storage On/near Detector Off Detector Front-End Electronics Today’s data rates are too big to let all the data flow through a single component

Beat Jost, Cern ❏ The DAQ System can be viewed like a gigantic funnel collecting the data from the sensors to a single point (CPU, Storage) after selecting interesting events. ❏ In general the response of the sensors on the detector are transferred (digitized or analogue) on point-point links to some form of 1 st level of concentrators ➢ Often there is already a concentrator on the detector electronics, e.g. readout chips for silicon detectors. ➢ The more upstream in the system, the more the technologies at this level differ, also within the experiments ➥ In LHCb the data of the Vertex detector are transmitted in analogue form to the aggregation layer and digitized there ❏ The subsequent level of aggregation is usually also used to buffer the data and format them for the event-builder and High-level trigger ❏ Somewhere along the way, Zero suppression is performed Implementations – Front-End Readout 7 NSS-MIC Anaheim 31 October 2012

Beat Jost, Cern DDL Optical 200 MB/s ~500 links Full duplex: Controls FE (commands, Pedestals, Calibration data) Receiver card interfaces to PC Yes SLINK Optical: 160 MB/s ~1600 Links Receiver card interfaces to PC. Yes SLINK 64 LVDS: 400 MB/s (max. 15m) ~500 links Peak throughput 400 MB/s to absorb fluctuations, typical usage kHz = 200 MB/s. Receiver card interfaces to commercial NIC (Myrinet) Yes Glink (GOL) Optical 200 MB/s ~4800 links + ~5300 analog links Before Zero Suppression Receiver card interfaces to custom-built Ethernet NIC (4 x 1 Gb/s over copper) (no) Trigger Throttle Readout Links of LHC Experiments 8 NSS-MIC Anaheim 31 October 2012 Flow Control

Beat Jost, Cern Implementations – Event Building ❏ Event building is the process of collecting all the data fragments belonging to one trigger in one point, usually the memory of a processor of a farm. ❏ Implementation typically using a switched network ➢ ATLAS, ALICE and LHCb Ethernet ➢ CMS 2 steps, first with Myrinet, second Ethernet ❏ Of course the implementations in the different experiments differ in details from the ‘generic’ one, sometimes quite drastically. ➢ ATLAS implements an additional level of trigger, thus reducing the overall requirements on the network capacity ➢ CMS does event building in two steps; with Myrinet (fibre) and 1 GbE (copper) links ➢ ALICE implements the HLT in parallel to the event builder thus allowing bypassing it completely ➢ LHCb and ALICE use only one level of aggregation downstream of the Front-End electronics. 9 NSS-MIC Anaheim 31 October 2012

Beat Jost, Cern Ethernet Single Stage event–building - TCP/IP based Push protocol - Orchestrated by an Event Destination Manager Ethernet Staged Event-Building via a two-level trigger system - partial readout driven by ROI (Level-2 trigger) - full readout at reduced rate of accepted events TCP/IP based pull protocol Myrinet/ Ethernet Two-stage Full Readout of all triggered events - first stage Myrinet (flow control in hardware) - second stage with Ethernet TCP/IP and driven by Event Manager Ethernet Single stage event-building directly from Front-end Readout Units to HLT farm nodes - driven by Timing&Fast Control system - pure push protocol (raw IP) with credit-based congestion control. Relies on deep buffers in the switches 10 NSS-MIC Anaheim 31 October 2012 Event Building in the LHC Experiments

Beat Jost, Cern Controls Software – Run Control ❏ The main task of the run control is to guarantee that all components of the readout system are configured in a coherent manner according to the desired DAQ activity. ➢ 10000s of electronics components and software processes ➢ s of readout sensors ❏ Topologically implemented in a deep hierarchical tree-like architecture with the operator at the top ❏ In general the configuration process has to be sequenced so that the different components can collaborate properly  Finite State Machines (FSM) ❏ Inter-Process(or) communication (IPC) is an important ingredient to trigger transitions in the FSMs 11 NSS-MIC Anaheim 31 October 2012

Beat Jost, Cern Control Tools and Architecture ALICEATLASCMSLHCb IPC ToolDIMCORBAXDAQ (HTTP/SOAP)DIM/PVSS FSM ToolSMI++CLIPSRMCS/XDAQSMI++ Job/Process ControlDATECORBAXDAQPVSS/FMC GUI ToolsTcl/TkJavaJava Script/Swing/Web BrowserPVSS 12 NSS-MIC Anaheim 31 October 2012 Run Control Detector Control ex. LHCb Controls Architecture

Beat Jost, Cern GUI Example – LHCb Run Control 13 NSS-MIC Anaheim 31 October 2012  Main operation panel for the shift crew  Each sub-system can (in principle) also be driven independently

Beat Jost, Cern Error Recovery and Automation ❏ No system is perfect. There are always things that go wrong ➢ E.g. de-synchronisation of some components ❏ Two approaches to recovery ➢ Forward chaining ➥ We’re in the mess. How do we get out of it? –ALICE and LHCb: SMI++ automatically acts to recover –ATLAS: DAQ Assistant (CLIPS) operator assistance –CMS: DAQ Doctor (Perl) gives operator assistance ➢ Backward chaining ➥ We’re in the mess. How did we get there? –ATLAS: Diagnostic and Verification System (DVS) ❏ Whatever one does: One needs lots of diagnostics to know what’s going on. 14 NSS-MIC Anaheim 31 October 2012 Snippet of forward chaining (Big Brother in LHCb): object: BigBrother state: READY when ( LHCb_LHC_Mode in_state PHYSICS ) do PREPARE_PHYSICS when ( LHCb_LHC_Mode in_state BEAMLOST ) do PREPARE_BEAMLOST... action: PREPARE_PHYSICS do Goto_PHYSICS LHCb_HV wait ( LHCb_HV ) move_to READY action: PREPARE_BEAMLOST do STOP_TRIGGER LHCb_Autopilot wait ( LHCb_Autopilot ) if ( VELOMotion in_state {CLOSED,CLOSING} ) then do Open VELOMotion endif do Goto_DUMP LHCb_HV wait ( LHCb_HV, VELOMotion ) move_to READY...

Beat Jost, Cern Summary 15 NSS-MIC Anaheim 31 October 2012

Beat Jost, Cern Something New – deferred Trigger ❏ The inter-fill gaps (dump to stable-beams) of the LHC can be significant (many hours, sometimes days) ❏ During this time the HLT farm is basically idle ❏ The idea is to use this idle CPU time for executing the HLT algorithms on data that was written to a local disk during the operation of the LHC. 16 NSS-MIC Anaheim 31 October 2012 Moore MEPrx DiskWr MEP Result Overflow OvrWr Reader MEP buffer full? No Yes Farm Node

Beat Jost, Cern ❏ Currently deferring ~25% of the L0 Trigger Rate ➢ ~250 kHz triggers ❏ Data stored on 1024 nodes equipped with 1TB local disks ❏ Great care has to be taken ➢ to keep an overview of which nodes hold files of which runs. ➢ Events are not duplicated ➥ During deferred HLT processing files are deleted from disk as soon as they are opened by the reader ❏ Starting and stopping is automated according to the state of the LHC ➢ No stress for the shift crew Deferred Trigger – Experience 17 NSS-MIC Anaheim 31 October 2012 Start of Data taking Beam Dump Start of deferred HLT End deferred HLT Start of Data taking Beam Dump Online troubles New fill Start of Data taking Number of files Beam Dump Start of deferred HLT

Beat Jost, Cern Upgrade Plans ❏ All four LHC experiments have upgrade plans for the nearer or farther future ➢ Timescale 2015 ➥ CMS –integration of new point-to-point link (~10 Gbps) to new back-end electronics (in µTCA) of new trigger/detector systems –replacement of Myrinet with 10 GbE (TCP/IP) for data aggregation in to PCs and Infiniband (56 Gbps) or 40 GbE for event building ➥ ATLAS: merging of L2 and HLT networks and CPUs ➥ Each CPU in Farm will run both triggers ➢ Timescale 2019 ➥ ALICE: increase acceptable trigger rate from 1 to 50kHz for Heavy Ion operation –New front-end readout link –TPC continuous readout ➥ LHCb: Elimination of hardware trigger (readout rate 40 MHz) –Readout front-end electronics for every bunch crossing New front-end electronics Zero suppression on/near detector –Network/Farm capacity increase by factor 40 (3.2 TB/s, ~4000 CPUs) –Network technology: Infiniband or 10/40/100 Gb Ethernet –No architectural changes ➢ Timescale 2022 and beyond ➥ CMS&ATLAS: implementation of a HW track trigger running at 40 MHz and surely many other changes… 18 NSS-MIC Anaheim 31 October 2012