Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hannes Sakulin, CERN/EP on behalf of the CMS DAQ group

Similar presentations


Presentation on theme: "Hannes Sakulin, CERN/EP on behalf of the CMS DAQ group"— Presentation transcript:

1 Hannes Sakulin, CERN/EP on behalf of the CMS DAQ group
New operator assistance features in the CMS Run Control System 22nd International Conference on Computing in High Energy and Nuclear Physics (CHEP) San Francisco, USA, 10th Oct 2016 Hannes Sakulin, CERN/EP on behalf of the CMS DAQ group

2 System Overview

3 Front-end Electronics
Control and Monitoring Systems in CMS errors alarms monitor clients DAQ Operator DCS Operator Level-0 Function Manager SM Conf Ind. Config DBs Run Control System Subs 1 Subs n Detector Control System Monitoring Services XDAQ Online Software HLT, merge & transfer control data data control Low voltage High voltage Gas, Magnet Trigger Control and Distribution System Front-end Electronics L1 trigger electronics Sub-det DAQ electronics Central DAQ electronics Central DAQ Event Builder File based Filter Farm & Storage

4 Level-0 Function Manager – the top-level control node
State machine Configuration Handling Indiv. Subs. Control GCM Configuration DBs Resource Service Equipment HLT

5 Level-0 Function Manager GUI
sdf Top level Run Control GUI Initially flexibility needed – many manual settings

6 At the beginning of Run-1 …
Full control possible But everything had to be done manually Only experts able to operate run control manual operation very error prone

7 Configuration Handling

8 Configuration handling
Operator initially had to select Compatible first and high level trigger configurations Compatible set of RUN_KEYs for each sub-system We grouped these First into a combined trigger key Then into a combined CMS run mode Combines all subsystem and trigger configuration into a single configuration item Run modes for Collisions Cosmics Various special runs Run mode may be automatically selected based on LHC beam mode

9 The top-level Run Control web GUI
auto-select run mode based on LHC run mode in turn determines most other settings

10 Configuration handling (II)
Initially shifter needed to know what subsystems need to be reconfigured / recycled after a certain configuration change When to change / recover the clock Now a guidance system constantly compares the applied configuration with the selected configuration for each sub-system Indicators are displayed prompting operators to do the correct action Checks for updates to the selected configuration ( configuration databases ) Selected configuration can be tied to LHC mode through run mode Configuration DBs Resource Service L1 Trg Equipment HLT GCM State machine Guidance system Configuration Handling Indiv. Subs. Control Level-0 Function Manager Run Mode

11 Guidance system Ensures that all settings are applied … in the correct order.

12 Following the cycle of the LHC

13 Manual actions throughout an LHC cycle …
LHC dipole current STABLE BEAMS ADJUST STABLE BEAMS section 1 DCS ramps up high voltages DCS ramps down high voltages Initially, new run needed when LHC start/stops ramping when high voltages are ramped Subsystem operators needed to change settings: ramp start Mask sensitive trigger channels ramp done Unmask sensitive trigger channels Tracker HV on Enable payload (Tk) raise gains (Pixel) Tracker HV off Disable payload (Tk) reduce gains (Pixel)

14 Front-end Electronics
Control and Monitoring Systems in CMS errors alarms monitor clients DAQ Operator DCS Operator Level-0 Function Manager LHC HV status, LHC state DCS Gd SM Conf Ind. Ind. SM Conf Config DBs Run Control System Subs 1 Subs n Detector Control System Monitoring Services XDAQ Online Software HLT, merge & transfer control data data control Low voltage High voltage Gas, Magnet Trigger Control and Distribution System Front-end Electronics L1 trigger electronics Sub-det DAQ electronics Central DAQ electronics Central DAQ Event Builder File based Filter Farm & Storage

15 Run control automatically handles run section changes
LHC dipole current STABLE BEAMS ADJUST STABLE BEAMS section 1 Start run at FLAT TOP. The rest is automatic DCS ramps up tracker HV DCS ramps down tracker HV section 1 section 2 section 1 Section 2 Special run with circulating beam Automatic actions in DAQ : ramp start Mask sensitive trigger channels ramp done Unmask sensitive trigger channels Tracker HV on Enable payload (Tk) raise gains (Pixel) Tracker HV off Disable payload (Tk) reduce gains (Pixel)

16 Soft Error Recovery

17 Automatic soft error recovery
With higher instantaneous luminosity in 2011 more and more frequent “soft errors” causing the run to get stuck Proportional to integrated luminosity Believed to be due to single event upsets Recovery procedure Stop run (30 sec) Re-configure a sub- detector (2-3 min) Start new run (20 sec) One Single Event Upset (needing recovery) every 73 pb-1 Single-event upsets in the electronics of the Si-Pixel detector. Proportional to integrated luminosity. 3-10 min down-time

18 Automatic soft error recovery
From 2012, new automatic recovery procedure in top-level control node Sub-system detects soft error and signals by changing its state Top-level control node invokes recovery procedure Pause Triggers Invoke newly defined selective recovery transition on requesting detector In parallel perform preventive recovery of other detectors Resynchronize Resume DCS SE Gd Ind. SM Conf 12 seconds down-time At least 46 hours of down-time avoided in 2012

19 DAQ Doctor

20 Front-end Electronics
DAQ Doctor DAQ Doctor Run-1 errors alarms monitor clients DAQ Operator DCS Operator Level-0 Function Manager LHC HV status, LHC state DCS SE Gd Ind. SM Conf Config DBs Run Control System Subs 1 Subs n Detector Control System Monitoring Services XDAQ Online Software HLT, merge & transfer control data data control Low voltage High voltage Gas, Magnet Trigger Control and Distribution System Front-end Electronics L1 trigger electronics Sub-det DAQ electronics Central DAQ electronics Central DAQ Event Builder File based Filter Farm & Storage

21 Towards the end of Run-1 Improved configuration handling Guidance Automatic actions following DCS / LHC state changes Soft Error recovery DAQ Doctor

22 New operator assistance
for Run-2

23 Some errors still need to be recovered manually
How to improve further Guidance indicates all necessary steps to the operator … … but operator still needs to follow them manually Click, wait, click, wait a few minutes, click … not always efficient Some errors still need to be recovered manually Rare / new / not well understood errors Want to speed up the typical recovery Stop the run Reconfigure / recycle a sub-system Start a new run Potentially recover secondary errors Prepare to trigger typical recovery by expert system

24 Front-end Electronics
Automator errors alarms monitor clients DCS Operator DAQ Operator DAQ Doctor Run-1 Automator Function Manager Level-0 Function Manager LHC HV status, LHC state DCS SE Gd Ind. SM Conf Config DBs Run Control System Subs 1 Subs n Detector Control System Monitoring Services XDAQ Online Software HLT, merge & transfer control data data control Low voltage High voltage Gas, Magnet Trigger Control and Distribution System Front-end Electronics L1 trigger electronics Sub-det DAQ electronics Central DAQ electronics Central DAQ Event Builder File based Filter Farm & Storage

25 Level-0 Automator

26 The top-level Run Control web GUI with Automator
Recover run with 2 clicks (1) (2) Full Level-0 functionality still accessible

27 Timeline history of all manual or automatic actions
Recovery triggered by operator schedule: start/stop only Configurable number of attempts to recover transitions going to Error

28 One-click start of run Can do this from any state of the system all indications of the guidance system are followed

29 One-click start of run - timeline
Automator follows Guidance System in Level-0 Timeline shows history of all manual or automatic actions

30 Offline timeline – for post mortem analysis

31 New DAQ Expert

32 Front-end Electronics
New DAQ Expert DAQ Expert Run-2 errors alarms monitor clients DAQ Operator DCS Operator Automator Function Manager Level-0 Function Manager LHC HV status, LHC state DCS SE Gd Ind. SM Conf Config DBs Run Control System Subs 1 Subs n Detector Control System Monitoring Services XDAQ Online Software HLT, merge & transfer control data data control Low voltage High voltage Gas, Magnet Trigger Control and Distribution System Front-end Electronics L1 trigger electronics Sub-det DAQ electronics Central DAQ electronics Central DAQ Event Builder File based Filter Farm & Storage

33 New DAQ Expert New tool based on Java / Web Technologies
New rules for DAQ-2 system Gives detailed recovery instructions for known error situations Simplified model of monitoring data Reasoning encapsulated in logic modules Easy to extend

34 Today Automatic actions following DCS / LHC state changes Improved configuration handling Guidance Soft Error recovery Automator : two-click recovery / 1-click start of run Recovery instructions by the DAQ Expert

35 Summary Expt System controlled by DAQ Expert Future ? Expt
DCS SM Gd SE Conf Ind. Expt Automator: 2-click recovery DAQ Expert advice Run 2 DCS SM Gd SE Conf Ind. Automatic actions, Guidance, Soft Error recovery DAQ Doctor (Run-1) End of Run 1 DCS SM Gd SE Conf Ind. Manual Operations Start of Run 1 SM Conf Ind.

36 Thank You

37 Run-2 DAQ Expert

38 DAQ Expert

39 New DAQ expert

40 Run-1 DAQ Doctor

41 Expert System Sound alerts Spoken alerts Advice Automatic actions
DAQ Doctor DAQ Operator Spoken alerts Advice Monitoring Run Control System Level-0 TRG DAQ Automatic actions Live Access Servers DCS Trigger Tracker ECAL DAQ DQM Trigger Supervisor Slice Slice Monitor collectors XDAQ Online Software CMSSW CMSSW XDAQ monitoring & alarming Computing farm monitoring data L1 trigger electronics Sub-det DAQ electronics Central DAQ electronics Central DAQ farm, High Level Trigger & storage

42 The DAQ Doctor Expert tool based on the same technology as
High level scripting language (Perl) Generic framework & pluggable modules Detection of high level anomaly triggers further investigation Archive (web based) All Notes Sub-system errors CRC errors Dumps (of all monitoring data) for expert analysis in case of anomalies

43 The DAQ Doctor Diagnoses Anomalies in Automatic actions L1 rate
HLT physics stream rate Dead time Backpressure Resynchronization rate Farm health Event builder and HLT farm data flow HLT farm CPU utilization Automatic actions Triggers computation of a new central DAQ configuration in case of PC hardware failure(great help for on-call experts since 2012)

44 Technology

45 Front-end Electronics
Control and Monitoring Systems in CMS DAQ Operator DCS Operator Function Manager Node in the Run Control Tree defines a State Machine & parameters Specific actions, automation etc. implemented in Java User function managers dynamically loaded into the web application Run Control System – Java, Web Technologies Defines the control structure HTML, CSS, JavaScript, AJAX GUI in a web browser Run Control Web Application Apache Tomcat Servlet Container JSP, Tag Libraries Production system: 40 tomcats on 20 virtual machines Axis Web Service SOAP Servlet Communication Detector Control System Run Control System Level-0 Tracker ECAL L1 Trigger TCDS Tracker ECAL DAQ DQM data data Low voltage High voltage Gas, Magnet Trigger Control and Distribution System Front-end Electronics L1 trigger electronics Sub-det DAQ electronics Central DAQ electronics Central DAQ Event Builder File based Filter Farm & Storage

46 Front-end Electronics
Control and Monitoring Systems in CMS DAQ Operator DCS Operator XDAQ Application XDAQ Framework – C++ XDAQ applications control hardware and handle data flow Hardware Access, Transport Protocols, XML configuration, SOAP communication, HyperDAQ web server Several 1000 applications to control data SOAP Detector Control System Run Control System Level-0 Tracker ECAL L1 Trigger TCDS Tracker ECAL DAQ DQM XDAQ Online Software data data Low voltage High voltage Gas, Magnet Trigger Control and Distribution System Front-end Electronics L1 trigger electronics Sub-det DAQ electronics Central DAQ electronics Central DAQ Event Builder File based Filter Farm & Storage


Download ppt "Hannes Sakulin, CERN/EP on behalf of the CMS DAQ group"

Similar presentations


Ads by Google