First attempt of ECS training Work in progress… A lot of material was borrowed! Thanks!
Objectives Get familiar with routine operation. Get familiar with routine problem recovery. Get familiar with the way to work inside a complex, nearly chaotic, highly distributed environment: rules must be followed… Get familiar with the language. Avoid details. After the training you need to study the TWiki documentation… (and possibly contribute to it…).
Warnings We are probably leaving aside many important things… Many things are changing… and some will change a lot.. This tutorial is only meant as a broad overview. The aim is to learn the basics for SD operation; not to learn to develop parts of the ECS… The other aim is to learn common usage and rules. What is ECS ?
P.C. Burkimsher PVSS & JCOP Framework Course May 2006 LHC era Control Technologies Supervision Process Management Field Management Technologies Experimental equipment LAN WAN Storage Other systems (LHC, Safety,...) Configuration DB, Archives, Log files, etc. Controller/ PLC VME Field Bus LAN Node Based on an original idea from LHCb Layer Structure Sensors/devices Field buses & Nodes PLC/UNICOS OPC Communication Protocols SCADA VME DIM FSM Commercial Custom
Clara Gaspar, March 2006 ECS Scope Detector Channels Front End Electronics Readout Network High Level Trigger Storage L0 Experiment Control System DAQ DCS Devices (HV, LV, GAS, Temperatures, etc.) External Systems (LHC, Technical Services, Safety, etc) TFC
Clara Gaspar, March 2006 ECS Generic Architecture... To Devices (HW or SW) Commands Status & Alarms ECS DCS DAQ DetDcs1 DetDcs N SubSys 1 SubSys 2 Dev 1 Dev 2 Dev 3 DetDaq 1 SubSysN Dev N LHC T.S.... GAS DSS Abstract levels
Clara Gaspar, March 2006 Control Units ❚ Each node is able to: ❙ Summarize information (for the above levels) ❙ “Expand” actions (to the lower levels) ❙ Implement specific behaviour & Take local decisions ❘ Sequence & Automate operations ❘ Recover errors ❙ Include/Exclude children (i.e. partitioning) ❘ Excluded nodes can run is stand-alone ❙ User Interfacing ❘ Present information and receive commands DCS Tem p Tracke r Muon HVHV GA S HVHV
Clara Gaspar, March 2006 Device Units ❚ Device Units ❙ Provide the interface to real devices: ( Electronics Boards, HV channels, trigger algorithms, etc.) ❘ Can be enabled/disabled ❘ In order to integrate a device within FSM 〡 Deduce a STATE from device readings (in DPs) 〡 Implement COMMANDS as device settings ❘ Commands can apply the recipes previously defined Dev N
Clara Gaspar, March 2006 ❚ The FwFSM Component is based on: ❙ PVSS for: ❘ Device Description (Run-time Database) ❘ Device Access (OPC, Profibus, drivers) ❘ Alarm Handling (Generation, Filtering, Masking, etc) ❘ Archiving, Logging, Scripting, Trending ❘ User Interface Builder ❘ Alarm Display, Access Control, etc. ❙ SMI++ providing: ❘ Abstract behavior modeling (Finite State Machines) ❘ Automation & Error Recovery (Rule based system) The Control Framework Device Units Control Units
Clara Gaspar, March 2006 SMI++ Run-time Environment Proxy Hardware Devices Obj SMI Domain Obj SMI Domain ❙ Device Level: Proxies ❘ drive the hardware: 〡 deduceState 〡 handleCommands ❘ C, C++, PVSS ctrl scripts ❙ Abstract Levels: Domains ❘ Implement the logical model ❘ Dedicated language - SML ❘ A C++ engine: smiSM ❙ User Interfaces ❘ For User Interaction ❙ All Tools available on: ❘ Windows, Unix (Linux) ❘ All communications are transparent and dynamically (re)established
Clara Gaspar, March 2006 Features of PVSS/SMI++ ❚ Error Recovery Mechanism ❙ Bottom Up ❘ SMI Objects react to changes of their children 〡 In an event-driven, asynchronous, fashion ❙ Distributed ❘ Each Sub-System recovers its errors 〡 Each team knows how to recover local errors ❙ Hierarchical/Parallel recovery ❙ Can provide complete automation even for very large systems
Clara Gaspar, March 2006 Sub-detector FSM Guidelines ❚ Started defining naming conventions. ❚ Defined standard “domains” per sub-detector: ❙ DCS ❘ DCS Infrastructure (Cooling, Gas, Temperatures, pressures, etc) that is normally stable throughout a running period ❙ HV ❘ High Voltages or in general components that depend on the status of the LHC machine (fill related) ❙ DAQ ❘ All Electronics and components necessary to take data (run related) ❙ DAQI ❘ Infrastructure necessary for the DAQ to work (computers, networks, electrical power, etc.) in general also stable throughout a running period. ❚ And standard states & transitions per domain. ❚ Doc available in EDMS: ❘
Clara Gaspar, March 2006 MUON DCS MUON HV MUON DAQI MUON DAQ Hierarchy & Conf. DB VELO DCS Infrast.DCSHVDAQIDAQL0TFCHLTLHC VELO HV VELO DAQI VELO DAQ VELO DCS_1 VELO DCS_2 VELO DAQ_1 VELO DAQ_2 ECS VELO Dev1 VELO DevN Conf. DB Configure/mode=“PHYSICS” (Get “PHYSICS” Settings) Apply Settings
P.C. Burkimsher PVSS & JCOP Framework Course May 2006 LHC Era Control Technologies Supervision Process Management Field Management Technologies Experimental equipment LAN WAN Storage Other systems (LHC, Safety,...) Configuration DB, Archives, Log files, etc. Controller/ PLC VME Field Bus LAN Node Based on an original idea from LHCb Layer Structure Sensors/devices Field buses & Nodes PLC/UNICOS OPC Communication Protocols SCADA VME DIM FSM Commercial Custom
P.C. Burkimsher PVSS & JCOP Framework Course May 2006 What is JCOP? JCOP stands for “Joint Controls Project” Grouping of representatives from the 4 big LHC experiments. Aims to reduce the overall manpower cost required to produce and run the experiment control systems
P.C. Burkimsher PVSS & JCOP Framework Course May 2006 What is JCOP Framework? A layer of software components –Produced in collaboration, components shared –Produced using common tools, components that work together
P.C. Burkimsher PVSS & JCOP Framework Course May 2006 What is PVSS? The Supervisory Control And Data Acquisition (SCADA) system chosen by JCOP. –In-depth evaluation of products available (commercial or open-source) –JCOP (i.e. the experiments, i.e. you) chose PVSS –Commercial product from ETM, Austria –Since then, PVSS has been widely adopted across CERN, not just used by the experiments PVSS is a TOOL, not a control system! –You have to build your own system
P.C. Burkimsher PVSS & JCOP Framework Course May 2006 What is PVSS (cont.)? PVSS II has capabilities for: –Device Description Data Points, and Data Point items –Device Access OPC, ProfiBus, Drivers –Alarm Handling Generation, Masking, etc –Alarm Display, Filtering, Summarising –Archiving, Trending, Logging –User Interface Builder –Access Control
P.C. Burkimsher PVSS & JCOP Framework Course May 2006 What is PVSS not? PVSS II does not have tools specifically for: –Abstract behaviour modelling Finite State Machines –Automation & Error Recovery Expert System But… –FSM (SMI++) does
Clara Gaspar, March 2006 PVSS
Clara Gaspar, March 2006 PVSS Features ❚ Open Architecture ❙ We can write our own managers ➨ It can be interfaced to anything (FSM, DIM) ❚ Highly Distributed ❙ 130 Systems (PCs) tested ➨ No major problem found ❚ Standard Interface ❙ All data of all sub-systems defined as DataPoints!
Clara Gaspar, March 2006 What is FSM? ❚ Finite State Machine (FSM) ❙ Abstract representation of your experiment. What state is it in? Is it taking data? Is it in standby? Is it broken? Is it switched off? What triggers it to move from one of these states to another? ❙ JCOP choose the State Management Interface (SMI++) developed for the DELPHI experiment. ❙ SMI = tool to build an FSM + Expert system. Vital for controlling & recovering large experiments
Implementation of the ECS A mixed Win/Linux cluster, with shared resources (network disks, via SAMBA). PCs: –Controls PC: used to directly control some device. –Control Room consoles: used to connect to controls PC. –General servers: gateways to the external world, etc… The mixed cluster means: you need to master the basics of both Win and Linux. Interfacing the HW: –CCPC (Credit Card PC), Linux, integrated in the cluster; local intelligence on electronics boards: UKL1 and HV. –SPECS system (in radiationa areas): Antonis.
Computing Environment at IP8 Access via the gateways (lbgw for Linux, lbts for Windows). The LHCb gateways are only visible from inside the CERN network/firewall. Users have personal logins on the LHCb network. Online administrators: The login and all computing infrastructure is common across both Linux (including CCPC) and Windows. Note that from inside the LHCb network the external world is not, in general, accessible.
Computing Environment at IP8 There is an area set aside for common RICH software: /group/rich/ and G:\rich respectively. Group-wide login profile for the Linux systems at /group/rich/scripts/rich_login.sh See TWiki for file protection issues….(important). The group area must only be used for files used for running the detectors!
Remote Access to ECS PC After logging into the LHCb network, any ECS PC can be accessed as follows. Windows to Windows: use remote desktop. Linux to Linux: use ssh, X sessions are not yet enabled (???) on the ECS PC. Windows to Linux (including CCPCs): –start the Exceed X server on the local PC; default options are normally ok: mode: passive, security: any host access, display: multiple plus display in localhost; –logon via ssh with PuTTY; enable: X11 forwarding and X display location = localhost.
Other The oper folder in the group area contains a lot of useful shortcuts for common operations. Generic rich_shift account: must only be used when logging on the consoles in the control room. It will be treated as scratch: for example files stored by this user can be deleted at any time. I strongly suggest that everybody uses its own account…
Which tools? Web Console (healthiness of software components). FSM panel (routine operation). ECS manager panel (routine debugging). Expert on-call (routine problem fixing…). Logbook (identify yourself only using your account!). When everything else fails …
Which tools? Carmelo!
Routine Checks/Operations Such a complex system need daily babysitting… –many routine checks must be carried on, to identify and/or trying to prevent problems. A routine check-list is to be defined… Everything relevant must be precisely written in the logbook: this might save your time next time and for sure it will save time to somebody else… Write the issue, write the fixing! Every problem must be delivered to the appropriate list of people.
Warnings Be always very careful: in a distributed system non local effects may happen!
PVSS implementation Distributed system across Win/Linux: some PVSS projects run on windows, some on Linux (all CCPC-related). Projects are installed in local disks: L:\pvvs | /localdisk/pvss. FW and RICH components installed in the group area. PVSS projects run as system services (Win only, so far). The basic process is PVSS00pmon: check via TaskManager | ps. PVSS is basically running in background, connect to it! Beware: PVSS is everywhere: every problem will reflect on PVSS, this does not mean that there is a problem with PVSS! PVSS console: shows managers and allow controlling them.
The components of ECS Sub-Systems –DCS MONITORING –DCS LV and SiBias –HV –DAQ L0 –DAQ L1 –FSM –Configuration DB –Conditions DB Interface to Gas, Cooling&Ventilation, DSS, Magnet.
ECS operation Distributed system: all systems can talk together and exchange data. Can do many (but not - yet - all) operations from a single machine: no need to log on the Controls PC (there are still currently many limitations!).
Some PVSS-related operations RICH-ECS web panel (Mozilla)RICH-ECS web panel (Mozilla) slide slide RICH-ECS web panel (Mozilla)slide PVSS Web Console PVSS Web Console Normal Operations Normal Operations are handled via the FSM view: Antonis Normal Operations Normal DebuggingNormal Debugging (also routine debug operations) are via the ECS-Manager panels: local/remote functions useful for debugging… It complements and integrates the FSM panels; it is intended more for easy and quick access to a number of functions and tools required outside routine operation and for debugging. - slide - - slide - Normal Debugging - slide - A miscellanea of panels A miscellanea of panels
Normal Operation: the FSM tree See Antonis. Used for routine operation: –Everything must be accessible navigating the tree. –Everything shall go via simple FSM commands. –To be used by LHCb shifters also: simple, clear, robust and mistake-protected. –Normal operations, including error recovery, must not require the operator to navigatethe tree nor do any complex actions.
DSS info
? Not everything is done, nor final, nor bug-free/perfect. We need to exercise and stress the system to spot problems which cannot be seen at the current stage… Many things need to be finalized and the system must be stress-tested. Reaction to alarm situations not yet complete. Documentation not yet complete.
To do after! All in twiki: study
The HV control CCPC program: –log onto the CCPC; –type HVSetup; –follow the message (after having studied the instructions in TWiKi). The PVSS interface…
HV PVSS Controls The interface to the HW is done by the CCPC program; the PVSS project is only a flexible interface to the CCPC program. A first production version of the PVSS controls is available at the pit: –Monitoring of the CCPC data and the ELMB voltage measurements; –Full control of the CCPC: Single channel control; All channels control via the FSM and recipes: –TEST / COMMISSIONING / PHYSICS.. –Many trace plots..
Warnings If you do changes via the CCPC program PVSS is confused: it does not (yet) receive read-back settings. The FSM states are not always (yet) properly evaluated: take them with care and report issues: –I am trying to take care of a lot of information… –No real test outside the pit is good enough… WARNING means: I have contradictory information, keep watching; it is often a temporary state. Always read TWiKi for updates…. Make sure not to confuse: –The ISEG channel (0-19); –The physical column (which the ELMB monitoring refers to).
HV Controls: automatic actions The CCPC server will switch-off in case of OvCurr: The CCPC server will switch-off in case of (UnCurr, OvVolt, UnVolt). Other actions must be coordinated by PVSS, if they need information not available by the CCPC. Currently: PVSS gets information by the ELMB monitoring.
Col_1 HV_1 EM_1 AL_1 HW Col_0 HV_0 EM_0 AL_0 HW HV EM Very simple objects with simple functions. Avoid to make more complex Device Units and objects to introduce alarm handling.
TWiKi Link