Presentation is loading. Please wait.

Presentation is loading. Please wait.

Steve Fisher for JRA1-UK

Similar presentations


Presentation on theme: "Steve Fisher for JRA1-UK"— Presentation transcript:

1 Steve Fisher for JRA1-UK
gLite Error Handling Steve Fisher for JRA1-UK

2 Introduction Good error handling is appreciated by users
Bad error handling incurs the wrath of Stephen Burke Errors - Brno

3 Examples from Stephen B
From long experience I think there's a hierarchy of bad error messages: 1) Crash, core dump etc. 2) No error message, so you think it worked when it didn't maybe worse than 1 3) Something which might indicate an error or might not, e.g. “No results returned” from R-GMA. 4) A catch-all error which translates to “something went wrong”, e.g. “ERROR: Failed to instantiate Consumer” from R-GMA. 5) An error which assumes a particular cause when in fact there are many causes, e.g. “invalid argument” from the lcg-* tools. 6) A message which can only be translated to the real cause by the initiated. Errors - Brno

4 Examples - continued 7) A message which almost tells you what happened, but leaves out some vital information: “couldn't open file” - but which file?! 8) A 50-line dump of everything the code can find, which has the real error buried somewhere in it, e.g. “expired host certificates” with GSI. 9) A message which tells you what went wrong in a way which makes it clear that the code could have recovered itself but didn't bother, e.g. edg-rm giving up when the first replica fails even when there might be 30 others to try. I would include 10), a helpful error message which tells you exactly what went wrong and what to do about it, but I don't think I've ever seen one of those ... Errors - Brno

5 So… Good error handling is most important when one gLite component calls another Error passed finally back to the user must be Comprehensible Comprehensive It must be easy for the API user to take appropriate action i.e. don’t expect the user to do pattern matching on an error message 4 Areas Internal to a service The service interface (WSDL) gLite API Displayed by a gLite provided tool Errors - Brno

6 Internal to a service There is no reason to suggest any rules
Services can preserve their autonomy For R-GMA we use moderately deep exception hierarchy Errors - Brno

7 In the WSDL Use a small number of WSDL faults:
<element name="UnknownResourceException" type="rgma:UnknownResourceException"/> <complexType name="UnknownResourceException"> <sequence> <element name="errMsg" type="xsd:string" minOccurs="0"/> <element name="errNo" type="xsd:int"/> </sequence> </complexType> <wsdl:message name="UnknownResourceExceptionMessage"> <wsdl:part name="fault" element="rgma:UnknownResourceException"/> </wsdl:message> <wsdl:operation name="setTerminationInterval" parameterOrder="resourceId terminationInterval"> <wsdl:fault name="UnknownResourceException" message="impl:UnknownResourceExceptionMessage"> </wsdl:fault> </wsdl:operation> Errors - Brno

8 R-GMA set of faults RGMAException UnknownResourceException
xsd:string errMsg(0..1) xsd:int errNo xsd:string trace(0..1) UnknownResourceException RGMASecurityException Errors - Brno

9 Could generalise ServiceException UnknownResourceException
xsd:string errorMessage(0..1) xsd:int errorNumber xsd:string trace(0..1) UnknownResourceException AuthException errorMessage is free format string errorNumber is a “small” integer trace is free format string Auth rather than Security because of java.lang.SecurityException clashes If one service calls another which returns an exception it is the responsibility of the caller to generate a decent message and error number. Information from the underlying problem can be added to the trace. Errors - Brno

10 API view Errors get passed from the Service back to the user in a style appropriate to the language. For Java, C++ and Python use Exceptions matching the WSDL For C we use an object like thing: if (RGMAPrimaryProducer_insert(pp, insert) != 0) { fprintf(stderr, "Failed to insert.\n"); fprintf(stderr, "<%s>\n", RGMA_getException(pp)->errorMessage); exit(1); } Errors - Brno

11 API Errors Additionally some errors can be generated by the API:
RemoteException unable to contact the service AuthException same as service returns but this time due to authentication problem ServiceException user does not know what is in the API and what is in the service. from a user perspective the API is the service Each API should provide a set of symbolic constants for the error numbers. Changing the error numbers introduces an incompatibility No attempt should be made to interpret the value of the number The error messages are for humans and are subject to change Errors - Brno

12 The 4 types of exception AuthException RemoteException
User should ensure that he is authenticated and has the right authorization. He should not get back much information. RemoteException Unable to contact the service. You might want to try again. UnknownResourceException Try remaking the resource – though you want to wait a little while first or limit the number of attempts ServiceException This may be in invalid interaction with the service or it could be a faulty service. Consult the error message. Errors - Brno

13 CLI The CLI will normally trap and handle errors
Unexpected errors should result in printing the error message but not the trace unless the CLI is being run in debug mode. Errors - Brno

14 Conclusion Most of the issues about errors are non-technical
Error handling needs to be taken seriously with full attention to the messages: Comprehensibility Comprehensiveness We should try to agree upon the principles Errors - Brno


Download ppt "Steve Fisher for JRA1-UK"

Similar presentations


Ads by Google