Download presentation
Presentation is loading. Please wait.
Published byBrook Powell Modified over 8 years ago
1
Trigger Supervisor Monitoring & Alarms Workshop, 2008 Christos Lazaridis Marc Magrans de Abril Ildefons Magrans de Abril
2
2 TS Monitoring & Alarms WorkshopXXXXX, 2008Outline Logs Alarms Monitoring Severity levels If an error occurs LogCollector & Chainsaw Summary Workshop agenda
3
3 TS Monitoring & Alarms WorkshopXXXXX, 2008 Collector Logs architecture LAS general Dashboard pulser WSEventing Subsystem group Cell Worker SimpleXdaq Cell Supervisor Collector WSEventing TStore CMS_OMDS_LB xmas::store exception Dashboard WSEventing Log Collector log Chainsaw Cmsrc-trigger /tmp
4
4 TS Monitoring & Alarms WorkshopXXXXX, 2008 Logs Logging should be treated as 'cout' statements – Not shifter-oriented – Include information for developing/debugging 5 Logging levels/macros to choose from: – DEBUG – INFO – WARN – ERROR – FATAL
5
5 TS Monitoring & Alarms WorkshopXXXXX, 2008 Alarms architecture pulser WSEventing Subsystem group Cell Worker SimpleXdaq Cell Supervisor TStore CMS_OMDS_LB xmas::store Log Collector log Chainsaw Cmsrc-trigger /tmp LAS general Dashboard Collector WSEventing exception Dashboard WSEventing Collector
6
6 TS Monitoring & Alarms WorkshopXXXXX, 2008 Alarms messages propagated to subsystem sentinel-dashboard – Common L1Trigger setup needed for central display Alarms exist to inform the trigger shifter – Clear information – Alarm cause/possible actions Can be raised: – During configuration – From monitorable items DataSource / Periodic DataSource Alarms Planned
7
7 TS Monitoring & Alarms WorkshopXXXXX, 2008 Monitoring architecture pulser WSEventing Subsystem group Cell Worker SimpleXdaq Cell Supervisor TStore CMS_OMDS_LB xmas::store Cmsrc-trigger /tmp Log Collector log Chainsaw LAS general Dashboard Collector WSEventing exception Dashboard WSEventing Collector
8
8 TS Monitoring & Alarms WorkshopXXXXX, 2008 Monitoring Retrieve and publish the status of trigger subsystems – Typed metrics – System retrofitting DataSource (uses xdaq pulser) – Pull mode Automatic refresh method – data always sent to flashlist & DB – Push mode - No autorefresh! Periodic DataSource (push mode; uses own timer) – Reduce rate to DB – Periodic hardware checks – Children Cells status CellContext* c = dynamic_cast (getContext()); MonitorSource * monSource = c->getDataSource(); const std::string item = "itemNonAutoRefresh"; monSource->put(item, xdata::String("a value")); monSource->push( item );
9
9 TS Monitoring & Alarms WorkshopXXXXX, 2008 Severity Levels Various logging/alarm levels Message severity sent to alarms dashboard can be defined (thru an arbitrary string)
10
10 TS Monitoring & Alarms WorkshopXXXXX, 2008 If an error occurs... Log the error with appropriate severity Report to the sentinel dashboard If it happens in a CellCommand/CellOperation descendant: – Can and should be handled there – Return a reply with a warning message of the same level Anywhere else: – Throw an exception XCEPT_DECLARE(tsexception::MonitoringError, e, "Monitorable in WARN"); getContext()->getCell()->notifyQualified("warning", e); LOG4CPLUS_WARN(getLogger(), "Monitorable in WARN"); getWarning().setMessage("Monitorable in WARN"); getWarning().setLevel(tsframework::CellWarning::WARNING); XCEPT_RAISE(tsexception::CellException, "Monitorable in WARN" ); Exception string
11
11 TS Monitoring & Alarms WorkshopXXXXX, 2008 LogCollector & Chainsaw Log Collector should be used for logging: Few subsystems do... – Central Cell, GCT, RCT, CSCTF, ECAL – Recipe to send logs to persistent storage https://savannah.cern.ch/cookbook/?func=detailitem&item_id=168 – Postmortem reports Plain logfiles overwritten – Chainsaw during operations Message filtering – Python script to convert xml logfiles to human readable format http://triggersupervisor.cern.ch/uploads/api_docs/v1.6/logreader.py chainsaw
12
12 TS Monitoring & Alarms WorkshopXXXXX, 2008 Summary Logs and Alarms have different orientation – Logs to debug the subsystem – Alarms to inform the shifter Monitoring information – Report current status – Raise alarms – Retrofit the system Abide by the severity levels guidelines – Uniform definitions will help avoid confusion
13
13 TS Monitoring & Alarms WorkshopXXXXX, 2008 Workshop agenda Error handling in a transition Push mode in CellOperation Monitoring persistency (DataSource & pulser) – Reducing rate using a Periodic DataSource Checking children cell status
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.