David Adams ATLAS AJDL: Abstract Job Description Language David Adams BNL June 29, 2004 PPDG Collaboration Meeting Williams Bay
David Adams ATLAS AJDL PPDG Collaboration MeetingJune 29, Contents Model Components Implementation
David Adams ATLAS AJDL PPDG Collaboration MeetingJune 29, Model Job-based model User selects an input dataset User selects/constructs a xform to apply to this dataset Distributed analysis system constructs a job to apply the xform to the dataset –Result is a new dataset –Partial results may be available during processing User examines the result From this identify the components of AJDL Dataset Transformation (e.g. application and task) Job (xform, dataset, job preferences)
David Adams ATLAS AJDL PPDG Collaboration MeetingJune 29, Model (cont) Abstract means User job definition should be suitable for invocation at any site using any WMS Specify what to do; not how to do it Analysis service Receives abstract job request Split into subjobs –Typically by splitting input dataset Map transformation to local executable and runtime environment Run executable on each sub-dataset Gather and merge results from each sub-job
David Adams ATLAS AJDL PPDG Collaboration MeetingJune 29, Components Dataset Identity –Dataset is immutable Location –Typically list of LFN’s –May be absent (virtual dataset) >DRC then provides Content –Which events –Type of data in each event (raw, trackxs, jets, aod, …) Compound structure –List of sub-datasets –Can be a tree structure
David Adams ATLAS AJDL PPDG Collaboration MeetingJune 29, Components (cont) Application Script to process a dataset –Output is another dataset List of software packages –Assume package management service to provide location of a specified package –May have automatic installation Application advertises the required content –Compare with content of input dataset to verify compatibility Second script to build task before processing –E.g. compile provided sources
David Adams ATLAS AJDL PPDG Collaboration MeetingJune 29, Components (cont) Task Carries the data used to configure the application At present the task carries embedded text files –E.g. myalg.cxx May add named parameters
David Adams ATLAS AJDL PPDG Collaboration MeetingJune 29, Components (cont) Job preferences Allow user to provide hits for processing –Location for output data –User role –Desired response time System may ignore or freely interpret these
David Adams ATLAS AJDL PPDG Collaboration MeetingJune 29, Components (cont) Job ID Current state (initializing, running, done, failed, …) Start stop time List of sub-job ID’s Input application, task and dataset Output dataset –Partial result if job is not complete Access to control job –Suspend/resume –Kill
David Adams ATLAS AJDL PPDG Collaboration MeetingJune 29, Implementation Extensibility Must be extensible to support different types of datasets and jobs –AtlasPoolEventDataset, RootHistogramDataset, … –ProcessJob, LsfJob, CondorJob, EgeeJob, … Can we use the same schema for all types? –So far yes for jobs –Probably for applications and tasks –Not clear for datasets Data representation XML description for each type
David Adams ATLAS AJDL PPDG Collaboration MeetingJune 29, Implementation Classes Provide class interfaces for each type C++, python and maybe java –C++ from DIAL –Python binding to C++ using lcgdict (GANGA) Convenience for implementing clients and services Add operations to take action –E.g. fetch local replicas of files in a dataset –Update status or kill a job May add functionality for subtypes –Extract histograms for a RootHistogramDataset