Clara Gaspar on behalf of the LHCb Collaboration, “Physics at the LHC and Beyond”, Quy Nhon, Vietnam, August 2014 Challenges and lessons learnt LHCb Operations.

Slides:



Advertisements
Similar presentations
Sander Klous on behalf of the ATLAS Collaboration Real-Time May /5/20101.
Advertisements

CWG10 Control, Configuration and Monitoring Status and plans for Control, Configuration and Monitoring 16 December 2014 ALICE O 2 Asian Workshop
André Augustinus ALICE Detector Control System  ALICE DCS is responsible for safe, stable and efficient operation of the experiment  Central monitoring.
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
27 th June 2008Johannes Albrecht, BEACH 2008 Johannes Albrecht Physikalisches Institut Universität Heidelberg on behalf of the LHCb Collaboration The LHCb.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/20 New Experiences with the ALICE High Level Trigger Data Transport.
CSC DQA and Commissioning Summary  We are responsible for the online and offline DQA for the CSC system, a US ATLAS responsibility  We are ready for.
Clara Gaspar, May 2010 The LHCb Run Control System An Integrated and Homogeneous Control System.
CMS Alignment and Calibration Yuriy Pakhotin on behalf of CMS Collaboration.
Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
CERN - IT Department CH-1211 Genève 23 Switzerland t The High Performance Archiver for the LHC Experiments Manuel Gonzalez Berges CERN, Geneva.
Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition.
Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition.
Central DQM Shift Tutorial Online/Offline. Overview of the CMS DAQ and useful terminology 2 Detector signals are collected through individual data acquisition.
Designing a HEP Experiment Control System, Lessons to be Learned From 10 Years Evolution and Operation of the DELPHI Experiment. André Augustinus 8 February.
System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam.
+ discussion in Software WG: Monte Carlo production on the Grid + discussion in TDAQ WG: Dedicated server for online services + experts meeting (Thusday.
Clara Gaspar, October 2011 The LHCb Experiment Control System: On the path to full automation.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
Online Calibration of the D0 Vertex Detector Initialization Procedure and Database Usage Harald Fox D0 Experiment Northwestern University.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
Offline shifter training tutorial L. Betev February 19, 2009.
ATLAS Liquid Argon Calorimeter Monitoring & Data Quality Jessica Levêque Centre de Physique des Particules de Marseille ATLAS Liquid Argon Calorimeter.
A.Golunov, “Remote operational center for CMS in JINR ”, XXIII International Symposium on Nuclear Electronics and Computing, BULGARIA, VARNA, September,
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Clara Gaspar, March 2005 LHCb Online & the Conditions DB.
Computing Division Requests The following is a list of tasks about to be officially submitted to the Computing Division for requested support. D0 personnel.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
Status report from the Deferred Trigger Study Group John Baines, Giovanna Lehmann Miotto, Wainer Vandelli, Werner Wiedenmann, Eric Torrence, Armin Nairz.
LHCb DAQ system LHCb SFC review Nov. 26 th 2004 Niko Neufeld, CERN.
ALICE Pixel Operational Experience R. Santoro On behalf of the ITS collaboration in the ALICE experiment at LHC.
Online System Status LHCb Week Beat Jost / Cern 9 June 2015.
Management of the LHCb Online Network Based on SCADA System Guoming Liu * †, Niko Neufeld † * University of Ferrara, Italy † CERN, Geneva, Switzerland.
23/2/2000Status of GAUDI 1 P. Mato / CERN Computing meeting, LHCb Week 23 February 2000.
Pixel DQM Status R.Casagrande, P.Merkel, J.Zablocki (Purdue University) D.Duggan, D.Hidas, K.Rose (Rutgers University) L.Wehrli (ETH Zuerich) A.York (University.
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
Data Acquisition, Trigger and Control
Online Monitoring System at KLOE Alessandra Doria INFN - Napoli for the KLOE collaboration CHEP 2000 Padova, 7-11 February 2000 NAPOLI.
6. Shift Leader 1. Introduction 2. SL Duties 3. Golden Rules 4. Operational Procedure 5. Mode Handshakes 6. Cold Start 7. LHCb State Control 8. Clock Switching.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.
Software for the CMS Cosmic Challenge Giacomo BRUNO UCL, Louvain-la-Neuve, Belgium On behalf of the CMS Collaboration CHEP06, Mumbay, India February 16,
14 November 08ELACCO meeting1 Alice Detector Control System EST Fellow : Lionel Wallet, CERN Supervisor : Andre Augustinus, CERN Marie Curie Early Stage.
Workflows and Data Management. Workflow and DM Run3 and after: conditions m LHCb major upgrade is for Run3 (2020 horizon)! o Luminosity x 5 ( )
OPERATIONS REPORT JUNE – SEPTEMBER 2015 Stefan Roiser CERN.
Clara Gaspar, April 2006 LHCb Experiment Control System Scope, Status & Worries.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Alignment in real-time in current detector and upgrade 6th LHCb Computing Workshop 18 November 2015 Beat Jost / Cern.
LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.
CERN R. Jacobsson Between LHC and the Grid - Aspects of Operating the LHC Experiments – T. Camporesi, C.Clement, C. Garabatos Cuadrado, L. Malgeri, T.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
ATLAS The ConditionDB is accessed by the offline reconstruction framework (ATHENA). COOLCOnditions Objects for LHC The interface is provided by COOL (COnditions.
LHCb 2009-Q4 report Q4 report LHCb 2009-Q4 report, PhC2 Activities in 2009-Q4 m Core Software o Stable versions of Gaudi and LCG-AA m Applications.
The LHCb Online Framework for Global Operational Control and Experiment Protection F. Alessio, R. Jacobsson, CERN, Switzerland S. Schleich, TU Dortmund,
ECAL Shift Duty: A Beginners Guide By Pourus Mehta.
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
ATLAS Detector Resources & Lumi Blocks Enrico & Nicoletta.
«The past experience with JCOP in ALICE» «JCOP Workshop 2015» 4-5 November 2015 Alexander Kurepin.
CMS High Level Trigger Configuration Management
LHC experiments Requirements and Concepts ALICE
Controlling a large CPU farm using industrial tools
Offline shifter training tutorial
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Commissioning of the ALICE HLT, TPC and PHOS systems
Hannes Sakulin, CERN/EP on behalf of the CMS DAQ group
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Offline shifter training tutorial
The LHCb Run Control System
Presentation transcript:

Clara Gaspar on behalf of the LHCb Collaboration, “Physics at the LHC and Beyond”, Quy Nhon, Vietnam, August 2014 Challenges and lessons learnt LHCb Operations in Run I:

Clara Gaspar on behalf of the LHCb Collaboration, August 2014 Operations ❚ Operations Strategy ❙ 2 people on shift (from day one) ❘ 1 Shift Leader (and main operator) ❘ 1 Data Quality Manager ❘ and small on-call team: ~10 people ➨ Integrated Control System ❘ Homogeneous User Interfaces to all areas: 〡 Data Acquisition (DAQ), Detector Control System (DCS), Trigger, High Level Trigger (HLT), etc. ➨ Full Automation ❘ For standard procedures and (known) error recovery 2

Clara Gaspar on behalf of the LHCb Collaboration, August 2014 Operations Scope 3 Detector Channels Front End Electronics Readout Boards High Level Trigger (Farm) Storage Trigger Experiment Control System DAQ Detector & General Infrastructure (Power, Gas, Cooling, etc.) External Systems (LHC, Technical Services, Safety, etc.) Timing Monitoring Offline Processing

Clara Gaspar on behalf of the LHCb Collaboration, August 2014 Operations Tools ❚ Run Control Automation: The Autopilot 4 ❚ Configures, Starts and Keeps the RUN going. Recovers: ❙ Sub-detector de-synch. ❙ Misbehaving farm nodes ❙ Etc.

Clara Gaspar on behalf of the LHCb Collaboration, August 2014 Operations Tools ❚ Big Brother 5 ❚ Based on LHC state, controls: ❙ Voltages ❙ VELO Closure ❙ Run Control ❚ Can sequence activities, ex.: ❙ End-of-fill Calibration ❚ Confirmation requests and Information ❙ Voice Messages

Clara Gaspar on behalf of the LHCb Collaboration, August 2014 Selected Challenges… 6

Clara Gaspar on behalf of the LHCb Collaboration, August 2014 VELO Closing 7 Take data (DAQ) 3D Vertex Reconstruction (HLT Farm) Calculate next VELO position (Monitoring Farm) Move VELO (DCS) Big Brother ❚ Safety: Many criteria to respect ❚ Procedure: Iterative “Feedback” loop

Clara Gaspar on behalf of the LHCb Collaboration, August 2014 VELO Closing ❚ Manually operated in the beginning ❚ Completely automated since 2011 ❙ Takes ~4 minutes at start of fill 8 LHCb Efficiency breakdown 2012

Clara Gaspar on behalf of the LHCb Collaboration, August 2014 Luminosity Levelling ❚ Allows running LHCb under stable optimum conditions ❚ Constraints: Safety and Reliability 9 Lumi controller LHCb Leveling GUI Configuration & Monitoring LHCb readout system LHCb CERN data exchange protocol (TCP/IP) LHCb Luminosity detectors Instant Lumi Target Lumi Levelling Request Step Size Levelling Enabled Levelling Active … LHC Control System

Clara Gaspar on behalf of the LHCb Collaboration, August 2014 Luminosity Levelling ❚ In operation since 2011 ❚ 95 % of integrated luminosity recorded within 3% of desired luminosity ❚ Thanks to the LHC experts and operators 10

Clara Gaspar on behalf of the LHCb Collaboration, August 2014 Farm Node HLT Farm Usage ❚ Resource Optimization -> Deferred HLT ❙ Idea: Buffer data to disk when HLT busy / Process in inter-fill gap ❙ Change of “paradigm”: A run can “last” days… 11 Standard HLT Deferred HLT Farm NodeFarm Node during Physics Farm Node Inter-fill 1 MHz 5 KHz 70 KHz HLT EvtBuilder

Clara Gaspar on behalf of the LHCb Collaboration, August 2014 Deferred HLT ❚ Major Success ❙ In place since 2012 ❙ ~20% resource gain for HLT ➨ Better Trigger ❙ Cushion for central DAQ/Storage hiccups 12

Clara Gaspar on behalf of the LHCb Collaboration, August 2014 HLT Farm Usage ❚ Resource Optimization -> Offline Processing ❙ Idea: Run standard Offline Simulation outside data taking periods ❙ Online Control System turns HLT Farm into Offline site ❙ In Operations since end ❚ Also a big Success: ❙ ~25K jobs simultaneously ❙ Largest LHCb producer

Clara Gaspar on behalf of the LHCb Collaboration, August 2014 Distributed Computing 14 Reconstruction Storage (CASTOR) Stripping Analyses From Online Full Stream 5 KHz Reconstruction Storage (CASTOR) Calibration & Alignment From Online Express Stream 5 Hz Conditions DB GRIDCERN ❚ Running Fine ❙ 126 sites in 20 countries ❙ 2 Shifters (daytime only) Running Jobs since 2012 CPU Usage since 2012 Generation

Clara Gaspar on behalf of the LHCb Collaboration, August 2014 Lessons Learnt… 15

Clara Gaspar on behalf of the LHCb Collaboration, August 2014 HLT Farm Usage in RUN II ❚ Even better Resource Optimization -> Split HLT ❙ Idea: Buffer ALL data to disk after HLT1 / Perform Calibration & Alignment / Run HLT2 permanently in background 16 Farm Node 100% For DQ only Conditions DB Calib&Align Selected Events Local disk Buffer 1 MHz 12.5 KHz 100 KHz HLT1 HLT2 EvtBuilder

Clara Gaspar on behalf of the LHCb Collaboration, August 2014 Summary ❚ LHCb Operations very successful ❚ The strategy was good ❙ One operator is enough ❙ Integration & Homogeneity ❘ Allows complex feedback mechanisms 〡 Ex.: Re-synch or Reset a Sub-detector from Data Monitoring information ❙ Automated Operations ❘ Saves time and avoids mistakes ❘ But requires experience and learning… ❙ Further improve HLT Farm usage ❘ Better Trigger and Online processing closer to Offline ❚ Related talks: ❙ LHCb Run I Performance: Giacomo Graziani, Monday morning ❙ LHCb Run II Challenges: Karol Hennessy, Friday afternoon ❙ LHCb Upgrades: Olaf Steinkamp, Friday afternoon 17 LHCb Efficiency breakdown pp collisions % channels working 99% good data for analysis