CWG10 Control, Configuration and Monitoring Status and plans for Control, Configuration and Monitoring 16 December 2014 ALICE O 2 Asian Workshop

Slides:



Advertisements
Similar presentations
The Detector Control System – FERO related issues
Advertisements

Progress Report: Metering NSLP (M-NSLP) 66th IETF meeting, NSIS WG.
Clara Gaspar on behalf of the LHCb Collaboration, “Physics at the LHC and Beyond”, Quy Nhon, Vietnam, August 2014 Challenges and lessons learnt LHCb Operations.
June 19, 2002 A Software Skeleton for the Full Front-End Crate Test at BNL Goal: to provide a working data acquisition (DAQ) system for the coming full.
Peter Chochula, January 31, 2006  Motivation for this meeting: Get together experts from different fields See what do we know See what is missing See.
1 HLT – ECS, DCS and DAQ interfaces Sebastian Bablok UiB.
VC Sept 2005Jean-Sébastien Graulich Report on DAQ Workshop Jean-Sebastien Graulich, Univ. Genève o Introduction o Monitoring and Control o Detector DAQ.
Clara Gaspar, May 2010 The LHCb Run Control System An Integrated and Homogeneous Control System.
1 The ATLAS Online High Level Trigger Framework: Experience reusing Offline Software Components in the ATLAS Trigger Werner Wiedenmann University of Wisconsin,
Quality Control B. von Haller 8th June 2015 CERN.
2/10/2000 CHEP2000 Padova Italy The BaBar Online Databases George Zioulas SLAC For the BaBar Computing Group.
Designing a HEP Experiment Control System, Lessons to be Learned From 10 Years Evolution and Operation of the DELPHI Experiment. André Augustinus 8 February.
System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam.
JCOP Workshop September 8th 1999 H.J.Burckhart 1 ATLAS DCS Organization of Detector and Controls Architecture Connection to DAQ Front-end System Practical.
Clara Gaspar, October 2011 The LHCb Experiment Control System: On the path to full automation.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
1 Alice DAQ Configuration DB
Update on Database Issues Peter Chochula DCS Workshop, June 21, 2004 Colmar.
Copyright © 2000 OPNET Technologies, Inc. Title – 1 Distributed Trigger System for the LHC experiments Krzysztof Korcyl ATLAS experiment laboratory H.
G. Maron, Agata Week, Orsay, January Agata DAQ Layout Gaetano Maron INFN – Laboratori Nazionali di Legnaro.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
Real data reconstruction A. De Caro (University and INFN of Salerno) CERN Building 29, December 9th, 2009ALICE TOF General meeting.
André Augustinus 10 September 2001 DCS Architecture Issues Food for thoughts and discussion.
Offline shifter training tutorial L. Betev February 19, 2009.
PROOF Cluster Management in ALICE Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
ALICE, ATLAS, CMS & LHCb joint workshop on
André Augustinus 21 June 2004 DCS Workshop Detector DCS overview Status and Progress.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Clara Gaspar, March 2005 LHCb Online & the Conditions DB.
Introduction CMS database workshop 23 rd to 25 th of February 2004 Frank Glege.
Databases in CMS Conditions DB workshop 8 th /9 th December 2003 Frank Glege.
Overview of DAQ at CERN experiments E.Radicioni, INFN MICE Daq and Controls Workshop.
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
CWG4 – The data model The group proposes a time frame - based data model to: – Formalize the access to data types produced by both detector FEE and data.
All Experimenters MeetingDmitri Denisov Week of September 9 to September 16 D0 Summary  Delivered luminosity and operating efficiency u Delivered: 4.0pb.
Online Monitoring System at KLOE Alessandra Doria INFN - Napoli for the KLOE collaboration CHEP 2000 Padova, 7-11 February 2000 NAPOLI.
O 2 Project Roadmap P. VANDE VYVRE 1. O2 Project: What’s Next ? 2 O2 Plenary | 11 March 2015 | P. Vande Vyvre TDR close to its final state and its submission.
Examine Overview D0 Online Workshop June 3, 1999 Jae Yu Outline 1. What is an Examine? 2. How Many Examines? 3. How does it work? 4. What are the features?
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Alignment in real-time in current detector and upgrade 6th LHCb Computing Workshop 18 November 2015 Beat Jost / Cern.
CWG13: Ideas and discussion about the online part of the prototype P. Hristov, 11/04/2014.
G. Anders, G. Avolio, G. Lehmann Miotto, L. Magnoni CERN, Geneva, Switzerland The Run Control System and the Central Hint and Information Processor of.
The DCS Databases Peter Chochula. 31/05/2005Peter Chochula 2 Outline PVSS basics (boring topic but useful if one wants to understand the DCS data flow)
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
Monitoring for the ALICE O 2 Project 11 February 2016.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
A Validation System for the Complex Event Processing Directives of the ATLAS Shifter Assistant Tool G. Anders (CERN), G. Avolio (CERN), A. Kazarov (PNPI),
Maria del Carmen Barandela Pazos CERN CHEP 2-7 Sep 2007 Victoria LHCb Online Interface to the Conditions Database.
4. Operations and Performance M. Lonza, D. Bulfone, V. Forchi’, G. Gaio, L. Pivetta, Sincrotrone Trieste, Trieste, Italy A Fast Orbit Feedback for the.
The ALICE data quality monitoring Barthélémy von Haller CERN PH/AID For the ALICE Collaboration.
The ATLAS Run 2 Expert System Giuseppe Avolio - CERN.
M. Caprini IFIN-HH Bucharest DAQ Control and Monitoring - A Software Component Model.
Online Software November 10, 2009 Infrastructure Overview Luciano Orsini, Roland Moser Invited Talk at SuperB ETD-Online Status Review.
CHEP 2010 – TAIPEI Robert Gomez-Reino on behalf of CMS DAQ group.
HTCC coffee march /03/2017 Sébastien VALAT – CERN.
WP18, High-speed data recording Krzysztof Wrona, European XFEL
ALICE moves into warp drive
LHC experiments Requirements and Concepts ALICE
CWG10 Control, Configuration and Monitoring
Controlling a large CPU farm using industrial tools
Offline shifter training tutorial
Hannes Sakulin, CERN/EP on behalf of the CMS DAQ group
Offline shifter training tutorial
Data Quality Monitoring of the CMS Silicon Strip Tracker Detector
The LHCb Run Control System
John Harvey CERN EP/LBC July 24, 2001
The LHCb High Level Trigger Software Framework
Tools for the Automation of large distributed control systems
Juan Carlos Cabanillas Noris September 29th, 2018
Presentation transcript:

CWG10 Control, Configuration and Monitoring Status and plans for Control, Configuration and Monitoring 16 December 2014 ALICE O 2 Asian Workshop

Outline ▶ Motivation ▶ A brief overview of data taking operations ▶ Lessons learned from Run 1 ▶ CCM Overview ▶ Performance tests ▶ Next steps ALICE O2 CWG10 Control, Configuration and Monitoring | ALICE O2 Asian Workshop

Motivation ▶ Why do we need a Control System ? ▶ Start and stop processes ▶ Sequence of operations, synchronization ▶ External systems ▶ Automation ▶ Why do we need a Configuration System ? ▶ Configure processes ▶ Why do we need a Monitoring System ? ▶ Detect abnormal conditions ▶ Automation ALICE O2 CWG10 Control, Configuration and Monitoring | ALICE O2 Asian Workshop

Team ▶ CERN ▶ KMUTT, Thailand ▶ See next presentation by Khanasin for an update ALICE O2 CWG10 Control, Configuration and Monitoring | ALICE O2 Asian Workshop

A brief overview of data taking operations ▶ A typical LHC year ALICE O2 CWG10 Control, Configuration and Monitoring | ALICE O2 Asian Workshop JanFebMarAprMayJunJulyAugSepOctNovDec Shutdown for maintenance proton-proton collisions Heavy-ion collisions Disclaimer: current system, not O 2

A brief overview of data taking operations ALICE O2 CWG10 Control, Configuration and Monitoring | ALICE O2 Asian Workshop Beam Injection Stable beams Beam dump ALICE safe Prepare trigger configuration Detector calibration Partial ALICE READY Full ALICE READY Data taking Detector calibration Ideally a single run Disclaimer: current system, not O 2 ▶ A typical LHC Fill (up to 30 hours) JanFebMarAprMayJunJulyAugSepOctNovDec

A brief overview of data taking operations ▶ A typical ALICE run ALICE O2 CWG10 Control, Configuration and Monitoring | ALICE O2 Asian Workshop Disclaimer: current system, not O 2 Start-of-Run Config detectors electronics Start online systems Store data taking conditions Data taking Readout Event building Online data monitoring Online calibration data End-of-Run Export data taking conditions and calibration data to Offline Stop online systems

A brief overview of data taking operations ▶ Run 1 SOR sequence (high level) ALICE O2 CWG10 Control, Configuration and Monitoring | ALICE O2 Asian Workshop Disclaimer: current system, not O 2

Lessons learned from Run 1 ( ) ▶ Must be fast when changing run ▶ More runs than expected ▶ Not everything needs to be restarted ▶ Must be flexible ▶ Not every problem needs to stop a run ▶ Must monitor everything ▶ Data flow monitoring ALICE O2 CWG10 Control, Configuration and Monitoring | ALICE O2 Asian Workshop Run 2: Fast SOR/EOR Run 2: Pause and Recover Run 2: MAD

Control in O 2 - Overview ▶ Process Management ▶ Start/stop processes ▶ Send commands to processes (CONFIGURE, PAUSE/RESUME, etc.) ▶ Estimated: O(100k) processes ▶ Task Management ▶ Ensure that actions are executed in the correct order ▶ Automation ▶ Automatically recover from errors ▶ Automatically react to internal events (e.g. need more EPNs), external events (e.g. start of LHC collisions) ALICE O2 CWG10 Control, Configuration and Monitoring | ALICE O2 Asian Workshop

Control in O 2 - Notes ▶ Includes processes from online and offline ▶ Must control both synchronous and asynchronous tasks ▶ Cannot be seen as a batch system ▶ Bound to external events (e.g. start of collisions) ▶ Sequence of operations, synchronization points ▶ Low latency very important ALICE O2 CWG10 Control, Configuration and Monitoring | ALICE O2 Asian Workshop

Configuration in O 2 - Overview ▶ Configuration distribution ▶ Provide processes with needed configuration parameters ▶ Dynamic process (re)configuration ▶ Essential to achieve fast run transition ▶ O(1GB) of configuration data ALICE O2 CWG10 Control, Configuration and Monitoring | ALICE O2 Asian Workshop

Monitoring in O 2 - Overview ▶ Data collection and archival ▶ System monitoring (CPU, memory, I/O, etc.) ▶ Application monitoring (data rates, link backpressure, internal buffer status, etc.) ▶ O(600KHz) of monitoring data ▶ Alarms and action triggering ▶ Support shift crew, experts ▶ Feedback to Control system ALICE O2 CWG10 Control, Configuration and Monitoring | ALICE O2 Asian Workshop

Monitoring in O 2 - Notes ▶ Includes metrics from online and offline ▶ Includes both low and high frequency metrics ▶ Low: every 30 seconds, system metrics ▶ High: every second, link status ▶ Permanent storage will be the limiting factor ▶ No need to store everything, can filter “interesting” values ALICE O2 CWG10 Control, Configuration and Monitoring | ALICE O2 Asian Workshop

Performance Tests: Control ▶ Tool: SMI (State Machine Interface) ▶ Setup: ▶ Level 0 SMI domain: Partition CCM ▶ Level 1 SMI domain: Detector CCMs EPN Cluster CCM ▶ Level 2 SMI domain: FLP CCMs, EPN CCMs ▶ Level 2 SMI proxy: local process ALICE O2 CWG10 Control, Configuration and Monitoring | ALICE O2 Asian Workshop

Performance Tests: Control ▶ Setup: ▶ 46 hosts ▶ 1 Level 0 domain ▶ 20 Level 1 domains ▶ 1350 Level 2 domains ▶ proxies ▶ Increase due to initial lookup in DIM DNS ▶ Conclusion: cannot use in current version ALICE O2 CWG10 Control, Configuration and Monitoring | ALICE O2 Asian Workshop

Performance Tests: Monitoring ▶ MonALISA + ApMon ▶ Setup: ▶ 10 sender nodes, up to 1000 threads per host (ApMon) ▶ 1 MonALISA service, all historical record disabled ▶ Result: 52 KHz without data loss ▶ Conclusion: could use 12+ collectors to reach 600 KHz ALICE O2 CWG10 Control, Configuration and Monitoring | ALICE O2 Asian Workshop By Costin Grigoras

Performance Tests: Monitoring ▶ Zabbix ▶ Setup: ▶ 10 sender nodes, up to 10 processes per host ▶ 1 Zabbix Server node, 200 threads, permanent storage disabled (in-memory history enabled) ▶ Result: 30 KHz without data loss ▶ Conclusion: could use 20+ collectors to reach 600 KHz ALICE O2 CWG10 Control, Configuration and Monitoring | ALICE O2 Asian Workshop By Andres Gomez Ramirez

Next steps ▶ Finalise TDR ▶ Perform more tests: ▶ Control: boost library + ZeroMQ ▶ Configuration: ZooKeeper ▶ Monitoring: MonALISA, Zabbix with permanent storage ▶ Provide CCM systems for ALFA prototype (CWG13) ▶ Refine design ALICE O2 CWG10 Control, Configuration and Monitoring | ALICE O2 Asian Workshop