Gaetano Maron, CPT week, CERN, 16 April RCS Discussion
Gaetano Maron, CPT week, CERN, 16 April RCS Layout – I Sub-System (Daq)Resources Security Service Resource Service Info&Mon Service Job Ctrl Problem Solver Sub-System Controller Services Connection Session Manager RCMS UI UserDB ConfDB LogDB Run Bkkpng
Gaetano Maron, CPT week, CERN, 16 April RCS Layout – II Session Manager UI Services Connection Services Services EVB Ctrl CS Ctrl TRG Ctrl DCS Ctrl EVF Ctrl FED Builder CS Sub- System Glbl Mu Cal DCS Sub- System EVB Sub-System TRG Sub-System EVF Sub-System RCMS RU Builder
Gaetano Maron, CPT week, CERN, 16 April Open Issues –DCS (PVSS) components under control DBs to store information (e.g. HV default values, etc.) interface to RCS –Run mode and Conditions DB –Problem Solver and aTTS –Monitor Procedures
Gaetano Maron, CPT week, CERN, 16 April Example: Configuring a DTR column (present understanding) DT Chmbr Mini Crate ROS FED FRLC FED Builder RU EVM RU Bld BU EVF HV P FE Thr TDC Time Wdw oper mode TRG BRD Merge MCs buffer setup en/dis chnls Merge ROSs en/dis chnls link setup D2S links setup en/dis links switch setup RU setup EVM setup switch setup EVF setup geometry trigger tbls calibrations Session Manager DCS Ctrl EVF Ctrl EVB Ctrl RUB Ctrl FEDB Ctrl PVSS/EXT DB Resource Service Conditions DB
Gaetano Maron, CPT week, CERN, 16 April PVSS Access RCS PVSS GTW PVSS NET get par x par x a) to monitor conditions b) to dump conditions RCS PVSS GTW PVSS NET subscribe alarms Time, alarm Partition –DCS partition definition is a PVSS task –RCS can ask the partition list and select one Monitor –RCS asks params to be monitored dump conditions –RCS subscribe (?) the receiving of pvss alarms –PVSS-GTW propagates the alarms to RCS when they occured –Any global dump on request (e.g. before and after a run)? May be to the conditions DB?
Gaetano Maron, CPT week, CERN, 16 April Running DT C DT Chmbr Mini Crate ROS FED FRLC FED Builder RU EVM RU Bld BU EVF Allarms on det condts Trends Statistics TDC errs chip T buff ovfl alignment buff ovfl. alignment links status rates links status appl status perf x port error rate rates error rates buffer occup appl status rate error rate bufff occup appl status perf x port error rate Session Manager DCS Ctrl EVF Ctrl EVB Ctrl RUB Ctrl FEDB Ctrl rates rej rate appl status Run Condition errors, status, parameters over/under thrs, etc. Conditions DB Off-line DB Time StampCondition resolution time ? Condition versions (Babar) ? geometry calibration constants PVSS/EXT DB Information Service detector conditions daq&trigger conditions + monitor ? trigger conditions + monitor
Gaetano Maron, CPT week, CERN, 16 April Conditions DB Usually (e.g. Babar) Conditions DB is meant to store: –detector alignments –calibration constants –time dependent parameters, under which the experimental events are taken and that are necessary for the reconstruction and analysis of the raw data the stored conditions are time based (time stamp) –Babar has a resolution time of 1 second If we agree on the general principle that – all changes in any detector, daq or trigger components are logged at all times –only severe errors lead to stop the system (e.g. system efficiency less than 50%) Then, the Conditions DB should also contains (at least logically) parameters describing the time behavior of the daq and the trigger systems –off-liners should be able to easily extract the needed view (detectors, daq, triggers) to reconstruct properly a given set of events, possibly using same API and same query languages
Gaetano Maron, CPT week, CERN, 16 April DAQ Run Conditions ( varie ) General DAQ Run Conditions are (referred to a given daq application): –Performances (input, output). Up/Low thresholds should be set to identify under/over-performances situations. Under/over performances situation to report to RCS –Buffer occupancy. Over threshold situations to report to RCS –Status. Status change to report to RCS –. Independent applications heartbeat is necessary to have the map of the “still alive” applications. This could avoid dead locks. Applications crash to report to RCS Hardware (e.g. PC) hosting daq applications should also be pinged by an heartbeat to have the map of the “still alive” PCs (farm components, etc.). PCs crash to report to RCS PCs hosting daq applications should be also controlled/monitored both from the hardware point of view (T, Fans, etc.) and OS point of view (memory used, CPU used, disks, etc.). Critical situations to report to RCS DAQ Appl. DAQ Appl. PC Heartbeat App. Appl. Conditions Heartbeat Hardware PC/OS Conditions
Gaetano Maron, CPT week, CERN, 16 April Problem Solver Information Service Problem Solver alarms malfunctioning aTTS recovering actions Some examples to understand how this could work How to implement it? –tools ? It is a sort of expert system –languages? Atlas experience on their problem solver? They use CLIPS, a tool to build expert systems Jess (Java expert system). Does it fit our needs? Recovering actions –recovering could involve to reset a daq component –reset could impose to block the system for a while (aTTS) reset (fast) the electronics reset the failed component align its status to system statud restart the component restart the system (aTTS) fast reset
Gaetano Maron, CPT week, CERN, 16 April RCS plans Complete the prototype based on Tomcat and delivery it to: –Small daq systems –Small test environment Setup a testbed with a dedicated 16x16 cluster to figure out: –Performances in command propagation –Performances in collecting information into IMS Investigation on new native XML DB (in particular Oracle 9i vers. 2) Start to play with Web Services