The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic
Motivation n Job tracking –Too complex environment –Responsibility delegation –Independent decision by components –Security issues (only delegated contact) n Parallel and multipart jobs –Too many sub-tasks –View aggregation
Job Movement
The Logging and Bookkeeping Service n Collects events associated with job life, e.g. –Job submitted –Resource found –Job started on a CE (Computing Element) –Job finished its computation n Stores them in bookkeeping and logging databases n Provides the job state to end users
Job Life Cycle
LB service architecture n Two APIs –logging API –server API n Local logger service n The database servers
Architecture Schema
ArchitectureComments n Message format: –ULM based (NetLogger) –Semantic rules prescribed n Local logger service –locallogger daemon –interlogger daemon –local persistency (local disk file) n Data transfer to database servers –Bookkeeping server: persistent during the job life time –Logging server: eternally persistent
Logging API n Simple n Just one function dg_log_event() n Always stores date/time, event producer, jobID n Authenticated
Server API n State computed on-demand n Three core functions: –List of users jobs –Job status for a given job –List of events related to a given job n Authenticated
Job Identification n GRID-wide (global) identifier n Used to identify the appropriate bookkeeping server –Currently wired in –In the future probably via Information service n URL-like syntax: n unique_string to distinguish individual jobs n Bookkeeping server speaks https protocol
Security Considerations n Authentication –Both for logging and database queries –Certificate based (user and/or host/service) –User associated with jobID on first authenticated event n Secure channels n Storage (database) access
R-GMA Integration n Work in progress n The goals: –To lower database load –To provide notification service –To allow better integration with other information services
R-GMAFirst Extension
LB Service Extensions n User defined attributes –To store additional information associated with a job –To retrieve job collections n Synchronous API n Job checkpointing (at the application level) –Information stored in Bookkeeping server
Job Partitionning n Group ID –Job collections –Hierarchical n Aggregate queries
Conclusion n LB service provides –Job tracking –Persistent event storage –Job state provision n Future work –(R-)GMA integration –Authorization –Collective operations
Thank you for your interest