Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.

Slides:



Advertisements
Similar presentations
WP1 Grid Workload Management Massimo Sgaravatto INFN Padova
Advertisements

Grid Workload Management (WP 1) Report to INFN-GRID TB Massimo Sgaravatto INFN Padova.
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,
Workload Management David Colling Imperial College London.
WP 1 Members of Wp1: INFN Cesnet DATAMAT PPARC. WP 1 What does WP1 do? Broker Submission mechanism JDL/JCL and other UIs Logging computational economics.
ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
Andrew McNab - Manchester HEP - 22 April 2002 EU DataGrid Testbed EU DataGrid Software releases Testbed 1 Job Lifecycle Authorisation at your site More.
EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Hands on Job Services.
Andrew McNab - Manchester HEP - 2 May 2002 Testbed and Authorisation EU DataGrid Testbed 1 Job Lifecycle Software releases Authorisation at your site Grid/Web.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
F.Pacini - Milan - 7 maggio, n° 1 UI Interactions and Interfaces with the Workload Manager Components DataGrid WP1 F. Pacini
The Grid Constantinos Kourouyiannis Ξ Architecture Group.
Job Submission The European DataGrid Project Team
A Computation Management Agent for Multi-Institutional Grids
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
Basic Grid Job Submission Alessandra Forti 28 March 2006.
Workload Management Massimo Sgaravatto INFN Padova.
Job Submission The European DataGrid Project Team
EDG - WP1 (Grid Work Scheduling) Status and plans Massimo Sgaravatto - INFN Padova Francesco Prelz – INFN Milano.
“Grey areas” of the new architecture Massimo Sgaravatto INFN Padova.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) WMPROXY API Python & C++ Diego Scardaci
Elisabetta Ronchieri - How To Use The UI command line - 10/29/01 - n° 1 How To Use The UI command line Elisabetta Ronchieri by WP1 elisabetta.ronchieri.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Computational grids and grids projects DSS,
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
The Plan for this morning: Description of the EDG WP 1 software: How it works, basic commands, how to get started etc Example of how to submit jobs: From.
Nadia LAJILI User Interface User Interface 4 Février 2002.
F.Pacini - Milan - 8 May, n° 1 Results of Meeting on Workload Manager Components Interaction DataGrid WP1 F. Pacini
Grid Workload Management Massimo Sgaravatto INFN Padova.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
Job Submission The European DataGrid Project Team
Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL The.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
Job Management DIRAC Project. Overview  DIRAC JDL  DIRAC Commands  Tutorial Exercises  What do you have learned? KEK 10/2012DIRAC Tutorial.
E-infrastructure shared between Europe and Latin America 1 Workload Management System-WMS Luciano Diaz Universidad Nacional Autónoma de México - UNAM Mexico.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
Enabling Grids for E-sciencE Workload Management System on gLite middleware - commands Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
EGEE-0 / LCG-2 middleware Practical.
Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
EDG - WP1 (Grid Work Scheduling) Status and plans Massimo Sgaravatto INFN Padova.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.
Job Submission The European DataGrid Project Team
User Interface UI TP: UI User Interface installation & configuration.
Istituto Nazionale di Astrofisica Information Technology Unit INAF-SI Job with data management Giuliano Taffoni.
Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
The EDG Testbed Deployment Details
How to connect your DG to EDGeS? Zoltán Farkas, MTA SZTAKI
Workload Management System on gLite middleware
Corso di Calcolo Parallelo Grid Computing
EGEE tutorial, Job Description Language - more control over your Job Assaf Gottlieb Tel-Aviv University EGEE is a project.
Introduction to Grid Technology
5. Job Submission Grid Computing.
The EU DataGrid Job Submission Services
EGEE Middleware: gLite Information Systems (IS)
Job Submission M. Jouvin (LAL-Orsay)
Presentation transcript:

Job Submission and Resource Brokering WP 1

Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned future functionality Documentation available from: A particularly gripping read is the “Administrator and User Guide” released last Friday.

The User Interface (UI): All user interactions are through the UI Installed on the submitting machine Communicates with both the Resource Broker (RB) and the Logging Broker (LB) On job submission the UI assigns a unique job identifier to the job (dg_jobId), sends the executable, Job Description File and Input Sandbox to the RB. It also sends notification of the submission to the LB.

The User Interface (UI): The UI can also be used to query the status of a job… which it does by interrogating the LB Configuration: The UI configuration is contained UI_ConfigEnv.cfg which contains the following information: address and port of accessible RBs address and port of accessible LBs default location of the local storage areas for the Input/Output sandbox files default values for the JDL mandatory attributes default number of retrials on fatal errors when connecting to the LB.

The User Interface (UI): Users concurrently using the same submitting machine use the same configuration files. For users (or groups of users) having particular needs it is possible to “customise” the UI configuration through the -config option supported by each UI command.

The Resource Broker (RB): Situated at a central location (not local to your machine). Expected to have one per VO, currently only one at CERN Jobs are queued locally(stored in a PostgreSQL Database) Interrogates the replica catalogue and the information services and attempts to match the job to an available resource. Matching is based on the Condor ClassAd Libraray. If a suitable match is made the RB can submit the job to the Job Submission Service (JSS). Of course all events and status information is sent to the LB.

The Resource Broker (RB): Configuration: Most people will never need to configure their own RB. However for completeness the configuration file is: /etc/rb.conf. This contains entries for the replica catalogue, the MDS etc. For more detailed information see the “Administrator and User Guide”. Input/Output Sandboxes etc are stored on the machine hosting the RB and so a reasonable amount of disk space is required.

The Job Submission Service (JSS): If the RB has successfully matched a job to a resource it is passed to the JSS (which is usually on the same machine). The JSS queues the job internally in a PostgreSQL database. Job submission is performed using Condor-G The JSS also monitors job until their completion, notifying the LB of any significant events.

The Job Submission Service (JSS): Configuration: Again most people will need to configure a JSS sever. The configuration file is /etc/jss.conf

The Logging Broker (LB): All events throughout the job submission, execution and output retrieval processes are logged by the LB in a MySQL database. All information is time stamped. It is through the logged information that users are able to discover the state of their jobs.

The Logging Broker (LB): Configuration: An LB local logger must be installed on all machines which are pushing information into the LB system (RB and JSS machines and the gatekeeper machines of each CE). An exception to this is the job submission machine which can have a local logger but it is not mandatory. The LB server needs only be installed on a server machine.

The Logging Broker (LB): Configuration: The local logger requires no configuration and the server is configured when the database is created using /etc/server.sql. No further configuration is required.

Submitting a job: ClassAds are: Declarative – rather than procedural… that is they describe notions of compatibility rather than specifying a procedure to determine compatibilty Simple – both syntactically and semantically … easy to use Portable – Nothing is used that requires features specific to a given architecture

Submitting a job: ClassAds have dynamic typing and so only values have types (not expressions) As well as the usual type (numeric, string Boolean) values can also have types such as time intervals and timestamps and esoteric values such as undefined and error. ClassAds can be nested ClassAds have the usual set of operators (See the JDL how to).

Submitting a job: An example: Executable = "WP1testF"; StdOutput = "sim.out"; StdError = "sim.err"; InputSandbox = {"/home/datamat/sim.exe", "/home/datamat/DATA/*"}; OutputSandbox = {"sim.err","sim.err","testD.out"}; Rank = other.TotalCPUs * other.AverageSI00; Requirements = other.LRMSType == "PBS" \ && (other.OpSys == "Linux RH 6.1" || other.OpSys == "Linux RH 6.2") && \ self.Rank > 10 && other.FreeCPUs > 1; RetryCount = 2; Arguments = "file1"; InputData = "LF:test "; ReplicaCatalog = "ldap://sunlab2g.cnaf.infn.it:2010/rc=WP2 INFN Test Replica Catalog,dc=sunlab2g, dc=cnaf, dc=infn, dc=it"; DataAccessProtocol = "gridftp"; OutputSE = "grid001.cnaf.infn.it";

Submitting a job: ANDFTUE OROR FTUE NOT FFFFE FFTUE FT TFTUE TTTTE TF UFUUE UUTUE UU EEEEE EEEEE EE

Submitting a job: ClassAds have dynamic typing and so only values have types (not expressions) As well as the usual type (numeric, string Boolean) values can also have types such as time intervals and timestamps and esoteric values such as undefined and error. ClassAds can be nested ClassAds have the usual set of operators (See the JDL how to).

Submitting a job: – dg-job-submit Allows the user to submit a job for execution on remote resources in a grid. SYNOPSIS dg-job-submit [-help] dg-job-submit [-version] dg-job-submit [-template] dg-job-submit [-input input_file | - resource res_id] [-notify e_mail_address] [-config group_name] [-output out_file] [-noint] [-debug]

############################################## # # Job description file # ############################################## Executable = "$(CMS)/fpacini/exe/sum.exe"; InputData = "LF:testbed "; ReplicaCatalog="ldap://firefox.esrin.esa.it:2155/ReplicaCatalog1"; DataAccessProtocol = "gridftp"; RetryCount = 10; Rank = other.MaxCpuTime; Requirements = other.LRMSType == "Condor" && \ other.Architecture == "INTEL" && other.OpSys== "LINUX" && \ other.FreeCpus >= 4;

– dg-get-job-output This command requests the RB for the job output files (specified by the OutputSandbox attribute of the job-ad) and stores them on the submitting machine local disk. SYNOPSIS dg-get-job-output [-help] dg-get-job-output [-version] dg-get-job-output [-dir directory_path] [-config group_name] [-noint] [-debug] Examples Let us consider the following command: $> dg-get-job-output –dir /home/data it retrieves the files listed in the OutputSandbox attribute from the RB and stores them locally in /home/data/

– dg-list-job-match Returns the list of resources fulfilling job requirements. SYNOPSIS dg-list-job-match [-help] dg-list-job-match [-version] dg-list-job-match [-verbose] [-config group_name] [-output output_file] [-noint] [-debug] – dg-job-cancel Cancels one or more submitted jobs. SYNOPSIS dg-job-cancel [-help] dg-job-cancel [-version] dg-job-cancel [-notify e_mail_address] [-config group_name] [-output output_file] [-noint] [-debug]

– dg-job-status Displays bookkeeping information about submitted jobs. SYNOPSIS dg-job-status [-help] dg-job-status [-version] dg-job-status [-full] [-config group_name] [-output output_file] [-noint] [-debug]

Examples $> dg-job-status dg_jobId2 displays the following lines: ******************************************************************** BOOKKEEPING INFORMATION Printing status for the job: dg_jobId2 --- dg_JobId= firefox.esrin.esa.it__ _163007_21833_RB1_LB3 Job Owner = /C=IT/O=ESA/OU=ESRIN/CN=Fabrizio Status= RUNNING Location= firefox.esa.it:2119/jobmanager-condor Job Destination = Status Enter Time = 10:24: GMT Last Update Time = 10:25: GMT CpuTime= 1 ********************************************************************

– dg-get-logging-info Displays logging information about submitted jobs. SYNOPSIS dg-get-logging-info [-help] dg-get-logging-info [-version] dg-get-logging-info [- from T1] [-to T2] [-level logLevel] [-config group_name] [-output output_file] [- noint] [-debug]

Job Submission: There is a GUI

ReleaseDependenciesJobPartner 1.4WP4Support for interactive jobsUI/RB/JSS groups 2WP4Support for job partioningINFN PD/PPARC 1.3WP4Ability to submit MPI jobsUI/RB/JSS groups 1.4WP4Specification of job dependenciesINFN CNAF/PPARC 1.4WP7 WP2Triggering of file transfersINFN TO +Catania 1.4WP7Integration of network into scheduling policyINFN TO + Catania +CNAF? 1.3 Develop APIs for applicationDATAMAT 1.4 Development of GUIDATAMAT 1.4Globus CAS +WP4 Deployment of accounting infrastructure over testbeds (HLR with command line interface) INFN TO 2 Full integration of cost estimation/accounting into scheduling policies INFN TO +CT 1.? Review command requirement from D8.1A: "hold", "move queue. Document reviewed by February. Implications to RB architecture to be understood. DATAMAT 1.?WP8 WP9 WP10 Review of job info from D8,1A. Document to be reviewed by January. Implications may need coordination/blessing of WP4, and needs to be finalised and matched alongside their schedule CESNET 1.2 Support for Proxy renewalCESNET, JSS part UNFN PD possible UI change ???WP3Availability of L&B info through "standard" WP3 mechanism. Interfacing with WP3 R-GMA will tested by MAY. Feedback will be provided CESNET 1.4WP2 WP4 WP5 WP7 Advanced reservation API. Usefulness dependent on Testbed QoS configuration INFN CNAF 2 Integration of advanced reservation(co-allocation) into RB INFN CNAF Things to come over the next year

ReleaseDependenciesJobPartner 1.4WP4Support for interactive jobsUI/RB/JSS groups 2WP4Support for job partioningINFN PD/PPARC 1.3WP4Ability to submit MPI jobsUI/RB/JSS groups 1.4WP4Specification of job dependenciesINFN CNAF 1.4WP7 WP2Triggering of file transfersINFN TO +Catania 1.4WP7Integration of network into scheduling policyINFN TO + Catania +CNAF? 1.3 Develop APIs for applicationDATAMAT 1.4 Development of GUIDATAMAT 1.4Globus CAS +WP4 Deployment of accounting infrastructure over testbeds (HLR with command line interface) INFN TO 2 Full integration of cost estimation/accounting into scheduling policies INFN TO +CT 1.? Review command requirement from D8.1A: "hold", "move queue. Document reviewed by February. Implications to RB architecture to be understood. DATAMAT 1.?WP8 WP9 WP10 Review of job info from D8,1A. Document to be reviewed by January. Implications may need coordination/blessing of WP4, and needs to be finalised and matched alongside their schedule CESNET 1.2 Support for Proxy renewalCESNET, JSS part UNFN PD possible UI change ???WP3Availability of L&B info through "standard" WP3 mechanism. Interfacing with WP3 R-GMA will tested by MAY. Feedback will be provided CESNET 1.4WP2 WP4 WP5 WP7 Advanced reservation API. Usefulness dependent on Testbed QoS configuration INFN CNAF 2 Integration of advanced reservation(co-allocation) into RB INFN CNAF