Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.

Slides:



Advertisements
Similar presentations
WP1 Grid Workload Management Massimo Sgaravatto INFN Padova
Advertisements

Installation and evaluation of the Globus toolkit WP 1 INFN-GRID Workload management WP 1 DATAGRID WP 2.1 INFN-GRID Massimo Sgaravatto INFN Padova.
INFN & Globus activities Massimo Sgaravatto INFN Padova.
Grid Workload Management (WP 1) Report to INFN-GRID TB Massimo Sgaravatto INFN Padova.
WP 1 (Globus) Status Report Massimo Sgaravatto INFN Padova for the INFN Globus group
Workload Management David Colling Imperial College London.
Work Package 1 Installation and Evaluation of the Globus Toolkit Massimo Sgaravatto INFN Padova.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
NorduGrid Grid Manager developed at NorduGrid project.
Evaluation of the Globus Toolkit: Status Roberto Cucchi – INFN Cnaf Antonia Ghiselli – INFN Cnaf Giuseppe Lo Biondo – INFN Milano Francesco Prelz – INFN.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
CMS HLT production using Grid tools Flavia Donno (INFN Pisa) Claudio Grandi (INFN Bologna) Ivano Lippi (INFN Padova) Francesco Prelz (INFN Milano) Andrea.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Status of Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
Report on the INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
First ideas for a Resource Management Architecture for Productions Massimo Sgaravatto INFN Padova.
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
EDG - WP1 (Grid Work Scheduling) Status and plans Massimo Sgaravatto - INFN Padova Francesco Prelz – INFN Milano.
INFN-GRID Globus evaluation (WP 1) Massimo Sgaravatto INFN Padova for the INFN Globus group
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
11 December 2000 Paolo Capiluppi - DataGrid Testbed Workshop CMS Applications Requirements DataGrid Testbed Workshop Milano, 11 December 2000 Paolo Capiluppi,
Grid Workload Management Massimo Sgaravatto INFN Padova.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
Review of Condor,SGE,LSF,PBS
GRID Zhen Xie, INFN-Pisa, on DataGrid WP6 meeting1 Globus Installation Toolkit Zhen Xie On behalf of grid-release team INFN-Pisa.
Proposal for a IS schema Massimo Sgaravatto INFN Padova.
WP1 WMS rel. 2.0 Some issues Massimo Sgaravatto INFN Padova.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
Report on the INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
EDG - WP1 (Grid Work Scheduling) Status and plans Massimo Sgaravatto INFN Padova.
JSS Job Submission Service Massimo Sgaravatto INFN Padova.
STAR Scheduling status Gabriele Carcassi 9 September 2002.
4/9/ 2000 I Datagrid Workshop- Marseille C.Vistoli Wide Area Workload Management Work Package DATAGRID project Parallel session report Cristina Vistoli.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.
First evaluation of the Globus GRAM service Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
EU 2nd Year Review – Feb – WP1 Demo – n° 1 WP1 demo Grid “logical” checkpointing Fabrizio Pacini (Datamat SpA, WP1 )
Workload Management Workpackage
First proposal for a modification of the GIS schema
Basic Grid Projects – Condor (Part I)
Wide Area Workload Management Work Package DATAGRID project
GRID Workload Management System for CMS fall production
Presentation transcript:

Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova

Where we are CMS-HLT use case (Monte Carlo production and reconstruction) analyzed in terms of GRID requirements and GRID tools availability Discussions with Globus team and Condor team Definition of a prototype architecture of workload management system Use of Globus and Condor mechanisms But major developments needed

Prototype workload management system architecture Globus GRAM CONDOR Globus GRAM LSF Globus GRAM PBS Site1 Site2Site3 condor_submit (Globus Universe) Condor-G Master Grid Information Service (GIS) Submit jobs Resource Discovery Local Resource Management Systems Globus GRAM as uniform interface to different local resource management systems Condor-G able to provide a reliable/crashproof job submission service Master chooses in which Globus resources the jobs must be submitted Farms Info

Where we are Evaluating the existing components (D1.1) and “putting together” the various building blocks Evaluation of Globus Collaboration with WP 1 of INFN-GRID project (Evaluation of the Globus toolkit) Evaluation of Globus GRAM GRAM as uniform interface to different underlying resource management systems Evaluation of RSL “Cooperation” between GRAM and GIS Evaluation of Condor-G The current implementation is a prototype It works, but some problems must be solved Globus + Condor-G tested with a real CMS MC production Many many many memory leaks found in the Globus jobmanager !!! Fixes (provided by Francesco Prelz) submitted to Globus team Feedback only for what concerning the bugs in the GAA and GSS modules (new fixes “merged” with the original ones)

Layout for CMS production Globus GRAM CONDOR Globus GRAM LSF globusrun Bologna Pisa condor_submit (Globus Universe) Condor-G Submit jobs Local Resource Management Systems Production manager (Ivano Lippi – Padova) Farms Padova

First deliverables Month 3: Report on current technology (report) D1.1 Month 6: Definition of architecture for scheduling, resource management, security and job description (report) D1.2 Month 9: Components and documentation for the 1 st release: initial workload management system (prototype) D1.3

Proposed work plan Let’s continue the implementation of the proposed prototype Evaluation of current technologies (Globus, Condor) (D1.1) Functionalities for the 1 st release First release We can propose the functionalities that could be implemented “Negotiation” in the ATF To understand if these functionalities “address” the proposed use cases To understand if our module can be “plugged” together with the other “pieces” To understand if the other WPs can provide the required (by WP 1) functionalities

Proposed functionalities for the 1 st release First version of job description language (JDL) First version of broker (master), that decides where to submit the jobs Job submission service First version of logging and bookkeeping services First user interface

Job Description Language (JDL) Used when the job is submitted, to specify The application The input data set File ? Collection of files ? “Logical” or “physical” names ? Need to be discussed with WP 2, WP 8, ATF Where the output data must be saved (Required and preferable) resources Info for bookkeeping … ??? Prototype: Condor ClassAds

Broker/Master Choice of resource (farm) where to submit job Input: JDL expression Output: computing resource choice Published resource access lists (gridmap-files in the Globus-based prototype) are checked as a first step in the resource match-making

Broker/Master The “accessible” computing resources are matched with the job request according to: Availability of the requested input data set In the 1 st release the broker will have to choose a resource where this input data set is already available (we are not going to “trigger” the replica of the input data set) Availability of the appropriate application "sandbox“ If necessary, it could be necessary to "copy" and install this sandbox if not already available in the executing farm (“code migration”) (in the 1 st release ???) Queue characteristics and status (architecture, etc…) vs. job requests Let’s start with a few, simple parameters Availability of the requested amount of scratch space

Broker/Master We assume that all the information needed by the broker are “published” in one “Grid Information Space” (GIS in the Globus-based prototype) by the other WPs Prototype: Condor matchmaking library Match between the info published in the GIS and the ClassAds defined in the JDL Necessary a “translator” GIS attributes  ClassAds Some work already done by Globus team ???

Job submission service Input: job to submit + computing resource choice (provided by broker) Reliable, fault tolerant, crash proof service Reliability in the executing machines up to WP 4 Prototype: Condor-G Submission of jobs to Globus resources (farms) New implementation of Condor-G (+ new Globus job manager) available soon

“Code” migration Not easy at all !!! Necessary to “install” in the target farm a complex run time environment Necessary a STRONG collaboration with WP 8 (and WP 4) to define an “application sandbox”, that can easily be installed in one farm, and doesn’t “conflict” with other sandboxes Use of “application repositories” ??? When an application must be installed on one farm, the sandbox is downloaded from such repository

Bookkeeping Necessary to “record” for each job Submitting user identity Input data Output data Status of processing Where and when the processing has been done Other bookkeeping info specified in the JDL …???

Logging Necessary to keep tracks of the significant events occurred in the system Requests by users Computing resource choice (by broker) Submission to resource …???

User Interface Job management Job submission Job removal Job status monitoring Access to bookkeeping info Access to logging info …???