Clara Gaspar, March 2013 Thanks to: Franco Carena, Vasco Chibante Barroso, Giovanna Lehmann Miotto, Hannes Sakulin and Andrea Petrucci for their input.

Slides:



Advertisements
Similar presentations
Clara Gaspar, April 2012 The LHCb Experiment Control System: Automation concepts & tools.
Advertisements

The Detector Control System – FERO related issues
The Control System for the ATLAS Pixel Detector
Clara Gaspar on behalf of the LHCb Collaboration, “Physics at the LHC and Beyond”, Quy Nhon, Vietnam, August 2014 Challenges and lessons learnt LHCb Operations.
Slow Control LHCf Catania Meeting - July 04-06, 2009 Lorenzo Bonechi.
André Augustinus ALICE Detector Control System  ALICE DCS is responsible for safe, stable and efficient operation of the experiment  Central monitoring.
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
Clara Gaspar, May 2010 The LHCb Run Control System An Integrated and Homogeneous Control System.
L. Granado Cardoso, F. Varela, N. Neufeld, C. Gaspar, C. Haen, CERN, Geneva, Switzerland D. Galli, INFN, Bologna, Italy ICALEPCS, October 2011.
Clara Gaspar, March 2006 LHCb’s Experiment Control System Step by Step.
Calo Piquet Training Session - Xvc1 ECS Overview Piquet Training Session Cuvée 2012 Xavier Vilasis.
Designing a HEP Experiment Control System, Lessons to be Learned From 10 Years Evolution and Operation of the DELPHI Experiment. André Augustinus 8 February.
1 ALICE Control System ready for LHC operation ICALEPCS 16 Oct 2007 L.Jirdén On behalf of the ALICE Controls Team CERN Geneva.
JCOP Workshop September 8th 1999 H.J.Burckhart 1 ATLAS DCS Organization of Detector and Controls Architecture Connection to DAQ Front-end System Practical.
Clara Gaspar, October 2011 The LHCb Experiment Control System: On the path to full automation.
XXVI Workshop on Recent Developments in High Energy Physics and Cosmology Theodoros Argyropoulos NTUA DCS group Ancient Olympia 2008 ATLAS Cathode Strip.
DCS Workshop - L.Jirdén1 ALICE DCS PROJECT ORGANIZATION - a proposal - u Project Goals u Organizational Layout u Technical Layout u Deliverables.
André Augustinus 10 September 2001 DCS Architecture Issues Food for thoughts and discussion.
D etector C ontrol S ystem ALICE DCS workshop G. De Cataldo CERN-CH, A. Franco INFN Bari, I 1 Finite State Machines (FSM) for the ALICE DCS:
Control in ATLAS TDAQ Dietrich Liko on behalf of the ATLAS TDAQ Group.
Report on the Commissioning Task Force activity Global and sub-detector views on ECS Histogram handling : – histogram handling and PVSS – the ALEPH and.
ALICE, ATLAS, CMS & LHCb joint workshop on
Controls EN-ICE Finite States Machines An introduction Marco Boccioli FSM model(s) of detector control 26 th April 2011.
Clara Gaspar, March 2005 LHCb Online & the Conditions DB.
JCOP Review, March 2003 D.R.Myers, IT-CO1 JCOP Review 2003 Architecture.
Bruno Belbute, October 2006 Presentation Rehearsal for the Follow-up meeting of the Protocol between AdI and CERN.
Overview of DAQ at CERN experiments E.Radicioni, INFN MICE Daq and Controls Workshop.
Peter Chochula ALICE Offline Week, October 04,2005 External access to the ALICE DCS archives.
Controls EN-ICE FSM for dummies (…w/ all my respects) 15 th Jan 09.
Clara Gaspar, July 2005 RTTC Control System Status and Plans.
Management of the LHCb Online Network Based on SCADA System Guoming Liu * †, Niko Neufeld † * University of Ferrara, Italy † CERN, Geneva, Switzerland.
DCS overview - L.Jirdén1 ALICE ECS/DCS – project overview strategy and status L.Jirden u Organization u DCS system overview u Implementation.
Clara Gaspar on behalf of the ECS team: CERN, Marseille, etc. October 2015 Experiment Control System & Electronics Upgrade.
Clara Gaspar, April 2006 LHCb Experiment Control System Scope, Status & Worries.
LHCb Configuration Database Lana Abadie, PhD student (CERN & University of Pierre et Marie Curie (Paris VI), LIP6.
The DCS Databases Peter Chochula. 31/05/2005Peter Chochula 2 Outline PVSS basics (boring topic but useful if one wants to understand the DCS data flow)
Clara Gaspar, March 2003 Hierarchical Control Demo: Partitioning, Automation and Error Recovery in the (Detector) Control System of LHC Experiments.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
CMS Luigi Zangrando, Cern, 16/4/ Run Control Prototype Status M. Gulmini, M. Gaetano, N. Toniolo, S. Ventura, L. Zangrando INFN – Laboratori Nazionali.
André Augustinus 18 March 2002 ALICE Detector Controls Requirements.
M. Caprini IFIN-HH Bucharest DAQ Control and Monitoring - A Software Component Model.
Online Software November 10, 2009 Infrastructure Overview Luciano Orsini, Roland Moser Invited Talk at SuperB ETD-Online Status Review.
Clara Gaspar, February 2010 DIM A Portable, Light Weight Package for Information Publishing, Data Transfer and Inter-process Communication.
20OCT2009Calo Piquet Training Session - Xvc1 ECS Overview Piquet Training Session Cuvée 2009 Xavier Vilasis.
Clara Gaspar, May 2010 SMI++ A Tool for the Automation of large distributed control systems.
B. Franek, poster presented at Computing in High Energy and Nuclear Physics, Praha – Czech Republic, 21 – 27 March 2009 This framework provides: -A method.
PVSS an industrial tool for slow control
The Finite State Machine toolkit of the JCOP Framework
LHC experiments Requirements and Concepts ALICE
Online Control Program: a summary of recent discussions
TPC Commissioning: DAQ, ECS aspects
ATLAS MDT HV – LV Detector Control System (DCS)
CMS – The Detector Control System
Controlling a large CPU farm using industrial tools
How SCADA Systems Work?.
WinCC-OA Upgrades in LHCb.
Hannes Sakulin, CERN/EP on behalf of the CMS DAQ group
ProtoDUNE SP DAQ assumptions, interfaces & constraints
by Prasad Mane (05IT6012) School of Information Technology
The LHCb Run Control System
Philippe Vannerem CERN / EP ICALEPCS - Oct03
The Online Detector Control at the BaBar experiment at SLAC
Design Principles of the CMS Level-1 Trigger Control and Hardware Monitoring System Ildefons Magrans de Abril Institute for High Energy Physics, Vienna.
Experiment Control System
Pierluigi Paolucci & Giovanni Polese
Pierluigi Paolucci & Giovanni Polese
Tools for the Automation of large distributed control systems
Pierluigi Paolucci & Giovanni Polese
Presentation transcript:

Clara Gaspar, March 2013 Thanks to: Franco Carena, Vasco Chibante Barroso, Giovanna Lehmann Miotto, Hannes Sakulin and Andrea Petrucci for their input Control Systems

Clara Gaspar, March 2013 Control System Scope Detector Channels Front End Electronics Readout Boards HLT Farm Storage Trigger Control System DAQ DCS Devices (HV, LV, GAS, Cooling, etc.) External Systems (LHC, Technical Services, Safety, etc) Timing DQ Monitoring 2

Clara Gaspar, March 2013 Control System Tasks ❚ Configuration ❙ Selecting which components take part in a certain “Activity” ❙ Loading of parameters (according to the “Activity”) ❚ Control core ❙ Sequencing and Synchronization of operations across the various components ❚ Monitoring, Error Reporting & Recovery ❙ Detect and recover problems as fast as possible ❘ Monitor operations in general ❘ Monitor Data Quality ❚ User Interfacing ❙ Allow the operator to visualize and interact with the system 3

Clara Gaspar, March Some Requirements ❚ Large number of devices/IO channels ➨ Need for Distributed Hierarchical Control ❘ De-composition in Systems, sub-systems, …, Devices ❘ Maybe: Local decision capabilities in sub-systems ❚ Large number of independent teams and very different operation modes ➨ Need for Partitioning Capabilities (concurrent usage) ❚ High Complexity & (few) non-expert Operators ➨ Need for good Diagnostics tools and if possible Automation of: ❘ Standard Procedures ❘ Error Recovery Procedures ➨ And for Intuitive User Interfaces

Clara Gaspar, March 2013 History ❚ None of this is new… ❙ Ex.: Both Aleph and Delphi Control Systems: ❘ Were Distributed & Hierarchical Systems ❘ Implemented Partitioning ❘ Were highly Automated ❘ Were operated by few shifters: 〡 ALEPH: 2 (Shift Leader, Data Quality) 〡 DELPHI: 3 (Run Control, Slow Control, Data Quality) 5

Clara Gaspar, March 2013 History ❚ LEP Experiments 6 The DELPHI experiment control system, 1995 Applying object oriented, real time, and expert system techniques to an automatic read-out error recovery in the Aleph data acquisition system, 1992

Clara Gaspar, March 2013 History ❚ Delphi Run Control ❚ Voice Messages (recorded) most annoying: “DAS is not running even though LEP is in Physics” 7

Clara Gaspar, March 2013 Design Principles ❚ Design emphasis per experiment 8 Keywords ATLASHierarchicalAbstractionScalable CMSWeb BasedScalableState Machine Driven ALICEPartitioningCustomizationFlexibility LHCbIntegrationHomogeneityAutomation

Clara Gaspar, March 2013 Scope & Architecture ❚ ATLAS 9 Commands Status & Alarms LV Dev1 LV Dev2 LV DevN SubDetN DCS SubDet1 DCS SubDet1 LV SubDet1 TEMP SubDet1 GAS … … SubDetN DAQ SubDet2 DAQ SubDet1 DAQ SubDet1 FEE SubDet1 RO FEE Dev1 FEE Dev2 FEE DevN … … INFR. LHC Run Control DAQHLT Trigger& Timing DCS

Clara Gaspar, March 2013 Scope & Architecture ❚ CMS 10 Commands Status & Alarms LV Dev1 LV Dev2 LV DevN SubDetN DCS SubDet1 DCS SubDet1 LV SubDet1 TEMP SubDet1 GAS … … SubDetN DAQ SubDet2 DAQ SubDet1 DAQ SubDet1 FEE SubDet1 RO FEE Dev1 FEE Dev2 FEE DevN … … INFR. LHC Run Control DAQHLT Trigger& Timing DCS

Clara Gaspar, March 2013 Scope & Architecture ❚ ALICE 11 LV Dev1 LV Dev2 LV DevN SubDetN DCS SubDet1 DCS SubDet1 LV SubDet1 TEMP SubDet1 GAS … … Commands SubDetN DAQ SubDet2 DAQ SubDet1 DAQ SubDet1 FEE SubDet1 RO FEE Dev1 FEE Dev2 FEE DevN … … INFR. LHC Run Control DAQ Status & Alarms ECS HLT Trigger& Timing DCS FEE Dev1 FEE Dev2

Clara Gaspar, March 2013 Scope & Architecture ❚ LHCb 12 Commands Status & Alarms LV Dev1 LV Dev2 LV DevN DCS SubDetN DCS SubDet1 DCS SubDet1 LV SubDet1 TEMP SubDet1 GAS … … SubDetN DAQ SubDet2 DAQ SubDet1 DAQ SubDet1 FEE SubDet1 RO FEE Dev1 FEE Dev2 FEE DevN … … INFR. LHC Run Control DAQ Trigger& Timing ECS HLT

Clara Gaspar, March 2013 Tools & Components ❚ Main Control System Components: ❙ Communications ❘ Message Exchange between processes ❙ Finite State Machines ❘ System Description, Synchronization and Sequencing ❙ Expert System Functionality ❘ Error Recovery, Assistance and Automation ❙ Databases ❘ Configuration, Archive, Conditions, etc. ❙ User Interfaces ❘ Visualization and Operation ❙ Other Services: ❘ Process Management (start/stop processes across machines) ❘ Resource Management (allocate/de-allocate common resources) ❘ Logging, etc. 13

Clara Gaspar, March 2013 Frameworks ❚ ALICE ❙ DAQ: DATE (Data Acquisition and Test Environment) ❘ Comms, FSM, UI, Logging, etc. ❚ ATLAS ❙ Sub-Detectors: RodCrateDAQ; ❘ Comms, FSM, UI, Configuration, Monitoring, HW Access libraries ❚ CMS ❙ Control: RCMS (Run Control and Monitoring System) ❘ Comms, FSM, UI, Configuration ❙ DAQ: XDAQ (DAQ Software Framework) ❘ Comms, FSM, UI, Hw Access, Archive ❚ LHCb ❙ Control: JCOP(Joint COntrols Project)/LHCb FW (Dataflow: Gaudi “Online”) ❘ Comms, FSM, UI, Configuration, Archive, HW Access, UI builder 14

Clara Gaspar, March 2013 Sub-System Integration ❚ Sub-Systems use common Framework and tools ❙ ALICE ❘ No interface needed: all done centrally ❘ Configuration via DCS for most Sub-Detectors ❙ ATLAS ❘ Interface: FSM + tools & services ❘ Configuration via DCS for some Sub-Detectors ❙ CMS ❘ Interface: FSM in RCMS + XDAQ FW ❙ LHCb ❘ Interface: FSM + JCOP FW + guidelines (color codes, etc.) 15

Clara Gaspar, March 2013 Communications ❚ All experiments chose one ❙ ALICE: DIM (mostly within the FSM toolkit) ❘ Mostly for Control, some Configuration and Monitoring ❙ ATLAS: CORBA (under IPC and IS packages) ❘ IPC (Inter Process Comm.) for Control and Configuration ❘ IS (Information Service) for Monitoring ❙ CMS: Web Services (used by RCMS, XDAQ) ❘ RCMS for Control ❘ XDAQ for Configuration ❘ XMAS (XDAQ Monitoring and Alarm System) for Monitoring ❙ LHCb: DIM (and PVSSII, within the JCOP FW) ❘ DIM & PVSSII for Control, Configuration and Monitoring 16

Clara Gaspar, March 2013 Communications ❚ All Client/Server mostly Publish/Subscribe ❙ Difficult to compare (different “paradigms”) ❘ DIM is a thin layer on top of TCP/IP ❘ ATLAS IPC is a thin layer on top of CORBA 〡 Both provide a simple API, a Naming Service and error detection & recovery ❘ CMS RCMS & XDAQ use WebServices (XML/Soap) 〡 Remote Procedure Call (RPC) like, also used as Pub./Sub. ❘ ATLAS IS, CMS XMAS and LHCb PVSSII 〡 work as data repositories (transient and/or permanent) to be used by clients (UIs, etc.) 17

Clara Gaspar, March 2013 Communications ❚ Advantages and drawbacks ❙ DIM ✔ Efficient, Easy-to-use ✘ Home made, old… ❙ CORBA ✔ Efficient, Easy-to-use (via the ATLAS API) ✘ Not so popular anymore… ❙ WEB Services ✔ Standard, modern protocol ✘ Performance: XML overhead 18

Clara Gaspar, March 2013 Finite State Machines ❚ All experiments use FSMs ❙ In order to model the system behaviour: ❘ For Synchronization, Sequencing, in some cases also for Error Recovery and Automation of procedures ❙ ALICE: SMI++ ❘ FSM for all sub-systems provided centrally (can be different) ❙ ATLAS: CHSM -> CLIPS -> C++ ❘ FSM for all sub-systems provided centrally (all the same) ❙ CMS: Java for RCMS, C++ for XDAQ ❘ Each sub-system provided their own code (Java/C++) ❙ LHCb: SMI++ (integrated in PVSS II) ❘ FSM provided centrally, sub-systems modify template graphically 19

Clara Gaspar, March 2013 FSM Model Design ❚ Two Approaches: ❙ Few, coarse-grained States: ❘ Generic actions are sent from the top 〡 Each sub-system synchronizes it’s own operations to go to the required state ❘ The top-level needs very little knowledge of the sub- systems ❘ Assumes most things can be done in parallel ❙ Many, fine-grained States ❘ Every detailed transition is sequenced from the top ❘ The top-level needs to know the details of the sub- systems 20 READY NOT_READY StartRunStopRun ERROR Configure Reset RUNNING Recover

Clara Gaspar, March 2013 FSM Definitions ❚ Top-level FSM from “ground” to Running ❙ ATLAS ❘ None -> Booted -> Initial -> Configured -> Connected -> Running ❙ CMS ❘ Initial -> Halted -> Configured -> Running (+intermediary states) ❙ LHCb ❘ Not Allocated -> Not Ready -> Ready -> Active -> Running ❙ ALICE ❘ Many: 20 to 25 states, 15 to get to Running 21 ACTIVE READY NOT_READY StartRunStopRun ERROR NOT_ALLOCATED Configure Reset Recover CONFIGURING RUNNING StartTrigger StopTrigger Allocate Deallocate

Clara Gaspar, March 2013 Expert System Functionality ❚ Several experiments saw the need… ❙ Approach: ❘ “We are in the mess how do we get out of it?” ❘ No Learning… ❚ Used for: ❙ Advising the Shifter ➨ ATLAS, CMS ❙ Automated Error Recovery ➨ ATLAS, LHCb, ALICE (modestly) ❙ Completely Automate Standard Operations ➨ LHCb 22

Clara Gaspar, March 2013 Expert System Functionality ❚ ATLAS ❙ Uses CLIPS for Error Recovery ❘ Common and distributed, domain specific, rules ❘ Used by experts only, sub-system rules on request ❙ Uses Esper for “Shifter Assistant” ❘ Centralised, global “Complex Event Processing” ❘ Moving more towards this approach… ❚ CMS ❙ Uses Perl for “DAQ Doctor” ❘ “Rules” are hardcoded by experts 23

Clara Gaspar, March 2013 Expert System Functionality ❚ LHCb ❙ Uses SMI++ for everything ❘ Distributed FSM and Rule based system ❘ Used by sub-systems for local error recovery and automation ❘ Used by central team for top-level rules integrating various sub-systems ❚ ALICE ❙ Uses SMI++ too ❘ Some error recovery (only few specific cases) 24

Clara Gaspar, March Expert System Functionality ❚ Decision Making, Reasoning, Approaches ❙ Decentralized ❘ Bottom-up: Sub-systems react only to their “children” 〡 In an event-driven, asynchronous, fashion ❘ Distributed: Each Sub-System can recover its errors 〡 Normally each team knows better how to handle local errors ❘ Hierarchical/Parallel recovery ❘ Scalable ❙ Centralized ❘ All “rules” in the same repository

Clara Gaspar, March Online Databases ❚ Three main logical Database concepts in the Online System Experiment’s HW & SW PVSS To Offline Cond. DB Config. DB To Offline Configuration settings for a running mode Monitoring data (at regular intervals) if History needed if needed by Offline if needed for next run settings (Pedestal Followers) Control System Archive DB

Clara Gaspar, March 2013 User Interfacing ❚ Types of User Interfaces ❙ Alarm Screens and/or Message Displays ❙ Monitoring Displays ❙ Run Control 27

Clara Gaspar, March 2013 ALICE Run Control+ECS ❚ Implemented in Tcl/Tk 28

Clara Gaspar, March 2013 ATLAS Run Control ❚ IGUI ❙ Java ❙ Modular: ❙ Sub-systems can add their panels 29

Clara Gaspar, March 2013 CMS Run Control ❚ Web tools: JavaScript+HTML ❚ Also LabView Monitoring 30

Clara Gaspar, March 2013 LHCb Run Control ❚ JCOP FW ❙ PVSS+SMI++ ❙ Like all UIs at all levels ❙ Using a very convenient Graphic Editor ❙ Very easy to modify 31

Clara Gaspar, March 2013 LHCb Big Brother 32 ❚ Used by operator to confirm automated actions ❚ Voice Messages (synthesized)

Clara Gaspar, March 2013 Access Control ❚ What are we trying to achieve? ❙ Protect against “evil” attacks? ❘ We are “authenticated” and inside a protected network… ❙ Avoid Mistakes? ❘ Mistakes are often done by experts… ❙ Traceability and Accountability… ❚ Types of Protection ❙ At UI Level ❘ LHCb, ALICE, CMS: Very basic: ownership (CMS: also Role based views) ❙ Everywhere (at message reception) ❘ ATLAS: Role Based 33

Clara Gaspar, March 2013 Size and Performance ❚ Size of the Control Systems (in PCs) ❙ ALICE:1 ❙ ATLAS: 32 + some for HLT control ❙ CMS: 12 ❙ LHCb: ~50 DAQ + ~50 HLT + ~50 DCS ❚ Some Performance numbers 34 ALICEATLASCMSLHCb Cold Start to Running (min.)5534 Stop/Start Run (min.)6211 Fast Stop/Start (sec.)-<10 DAQ Inefficiency (%)1<1

Clara Gaspar, March 2013 Operations ❚ Experiment Operations ❙ Shifters: ❘ ALICE: 4 (SL, DCS, RC, DQ+HLT) ❘ ATLAS: 8 (SL, DCS, RC, TRG, DQ, ID, Muon, Calo) ❘ CMS: 5 (SL, DCS, RC, TRG, DQ) ❘ LHCb: 2 (SL, DQ) ❙ Ex.: Start of Fill sequence ❘ In general HV automatically handled ❘ Run Control Shifter manually Configures/Starts the Run (apart from LHCb) 35

Clara Gaspar, March 2013 Detector Control Systems

Clara Gaspar, March 2013 LHCb Control System ❚ Courtesy of CMS DCS Team 37