J. Gutleber, L. Orsini, 2005 March 15 OLSWG Meeting J. Gutleber, L. Orsini, 2005 March 15 XDAQ Error Handling
Topics Exceptions in XDAQ Local Exceptions System WideError Handling Error Handling Contexts Error Handling Use Cases Uniform Error Message Format Programming Approaches Dealing with Errors in XDAQ Distributed Error Handling An Example XDAQ III Topical Meeting
Exceptions in XDAQ An error/exception is characterized by an identifier (What has happened) an originator (Who and where) a time (When) the context ( Who is in charge of handling) All exceptions are treated as software objects All exceptions types are defined and hierarchically organized e.g C++ class hierarchy Exceptions can be detected in a particular portion of code either the exception is handled and recovered locally or the exception must be notified to a external entity for handling finally an exception that cannot be handled is reported to the user Tools must be available for error detection error notification error reporting XDAQ III Topical Meeting
Local Exceptions reported exceptions method exceptions layer 3 reported exceptions layer 2 method exceptions layer 1 layer 0 handled exceptions XDAQ III Topical Meeting
System Wide Error Handling System Sub-System layers W E D F A C B Applications and Error Handlers Report Lines XDAQ III Topical Meeting
Errors Handling Contexts Context A e.g. Incomplete event Error handling intelligence e.g. re-start event building Context B e.g. Could not send event fragment Context C e.g Network send failure One or more error handlers for each context An error can be treated by a handler or is forwarded to a super-context A program always reports into a single context A program can switch the reporting contexts at run-time Contexts are assigned by static or run-time configuration XDAQ III Topical Meeting
Error Handling Use Cases Single thread (synchronous) Different threads (asynchronous) Different processes (synchronous) (asynchronous) XDAQ III Topical Meeting
Uniform Error Message Format An error schema was defined together with Michele, Alex Schema can be mapped to SOAP and other data formats initial implementation for SOAP will follow implementation binary I2O (performance) combined use of different transports This is the proposed format to be used for exchange error messages among CMS subsystems supports integration of Magnet Test XDAQ III Topical Meeting
Content Definition General error information Compulsory information Identifier (URI detailed format TBD) Notifier (originator of the error, detailed format TBD) Date/Time Context Optional Information Severity Message (open format) Recursive definition for nested errors Multiple error collections Support for qualified error (user defined schema) Possibility to plug a user defined errors. E.g Tracker, ECAL, RC, XDAQ etc. Capability to use other industry standard formats e.g CBE by IBM XML format definition http://xdaq.web.cern.ch/xdaq/xsd/2005/ErrorNotification-11.xsd other formats possible e.g. binary I2O format XDAQ III Topical Meeting
Notification Schema XDAQ III Topical Meeting
Programming Approaches Single Thread, Synchronous try { … } catch() { … } clauses Multi-Thread, Asynchronous Error processor and callback pattern Multi-Process, Synchronous SOAP call with Fault reply Multi-Process (distributed), Asynchronous Error notification message XDAQ III Topical Meeting
Dealing with Errors in XDAQ Error are defined as C++ classes in a hierarchy All errors inherit from xcept::Exception E.g. class MyException: public xcept::Exception {} Well defined interface Stack history Each package provides a error repository packagename/exceptions E.g. toolbox/exception/OutOfMemory.h Have properties standard properties (line, module, message, id etc.) user definable properties ( system, time etc.) XDAQ III Topical Meeting
Distributed Error Handling Poor man’s solution SOAP according defined schema Protocol dependent XDAQ exceptions with SOAP serializer/deserializer for defined schema Extensible to other technologies Sentinel Pluggable XDAQ extension (application) Homogeneous use of XDAQ C++ exceptions Protocol independent SOAP/XML Binary I2O ( with network configurability) Static or Dynamic configuration use of XML extensibility use with auto discovery XDAQ III Topical Meeting
A Fault Tolerant EVB Example(I) Readout Units Builder Units XDAQ III Topical Meeting
A Fault tolerant EVB Example(II) Sentinel … /* notify error to interested handler */ sentinel->notify(ru::exception::OutOfSynch) “Event Builder Context” Error are reported to the specified context … /* call back with message containing error */ BU::onException(/* exception …*/) { if ( exception == “ru::exception::OutOfSynch”) // disable RUi // log error } XDAQ III Topical Meeting
Fault Tolerant EVB Example(III) Faulty Readout Unit is masked Event building process can continues with uncompleted events XDAQ III Topical Meeting
Supplemental information
Latest Releases FEDStreamer V1.0 (FED I2O Data Streamer) XDAQ interface to GIII based on E. Cano drivers (required bigphys) FRL headers support New data format interface (EVB compatible) SOAP and WEB interfaces Pheaps V1.0 (Physical Memory Heaps) new device drivers multiple memory chunks multiple processes memory allocation protection ready for general purpose physical allocation on 2.6 (no bigphys on 2.6) XDAQ full packaged V3.0.1 support release for FEDStreamer V1.0 and Pheaps V1.0 XDAQ III Topical Meeting