Workload Management Workpackage

Slides:



Advertisements
Similar presentations
WP1 Grid Workload Management Massimo Sgaravatto INFN Padova
Advertisements

Installation and evaluation of the Globus toolkit WP 1 INFN-GRID Workload management WP 1 DATAGRID WP 2.1 INFN-GRID Massimo Sgaravatto INFN Padova.
INFN & Globus activities Massimo Sgaravatto INFN Padova.
Grid Workload Management (WP 1) Report to INFN-GRID TB Massimo Sgaravatto INFN Padova.
Work Package 1 Installation and Evaluation of the Globus Toolkit Massimo Sgaravatto INFN Padova.
Evaluation of the Globus Toolkit: Status Roberto Cucchi – INFN Cnaf Antonia Ghiselli – INFN Cnaf Giuseppe Lo Biondo – INFN Milano Francesco Prelz – INFN.
A Computation Management Agent for Multi-Institutional Grids
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
CMS HLT production using Grid tools Flavia Donno (INFN Pisa) Claudio Grandi (INFN Bologna) Ivano Lippi (INFN Padova) Francesco Prelz (INFN Milano) Andrea.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
EU-GRID Work Program Massimo Sgaravatto – INFN Padova Cristina Vistoli – INFN Cnaf as INFN members of the EU-GRID technical team.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Status of Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
Report on the INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
First ideas for a Resource Management Architecture for Productions Massimo Sgaravatto INFN Padova.
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
INFN-GRID Globus evaluation (WP 1) Massimo Sgaravatto INFN Padova for the INFN Globus group
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
11 December 2000 Paolo Capiluppi - DataGrid Testbed Workshop CMS Applications Requirements DataGrid Testbed Workshop Milano, 11 December 2000 Paolo Capiluppi,
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
Grid Workload Management Massimo Sgaravatto INFN Padova.
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
Review of Condor,SGE,LSF,PBS
Proposal for a IS schema Massimo Sgaravatto INFN Padova.
Report on the INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group
Condor on WAN D. Bortolotti - INFN Bologna T. Ferrari - INFN Cnaf A.Ghiselli - INFN Cnaf P.Mazzanti - INFN Bologna F. Prelz - INFN Milano F.Semeria - INFN.
6 march Building the INFN Grid Proposal outline a.ghiselli,l.luminari,m.sgaravatto,c.vistoli INFN Grid meeting, milano.
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
4/9/ 2000 I Datagrid Workshop- Marseille C.Vistoli Wide Area Workload Management Work Package DATAGRID project Parallel session report Cristina Vistoli.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.
First evaluation of the Globus GRAM service Massimo Sgaravatto INFN Padova.
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.
GridOS: Operating System Services for Grid Architectures
First proposal for a modification of the GIS schema
Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016
Design rationale and status of the org.glite.overlay component
Peter Kacsuk – Sipos Gergely MTA SZTAKI
GWE Core Grid Wizard Enterprise (
Testbed Software Test Plan Status
WP1 activity, achievements and plans
Globus Job Management. Globus Job Management Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Wide Area Workload Management Work Package DATAGRID project
Gridifying the LHCb Monte Carlo production system
Resource and Service Management on the Grid
I Datagrid Workshop- Marseille C.Vistoli
GRID Workload Management System for CMS fall production
Condor-G Making Condor Grid Enabled
Condor-G: An Update.
Presentation transcript:

Workload Management Workpackage Massimo Sgaravatto INFN Padova

Overview Goal: define and implement a suitable architecture for distributed scheduling and resource management in a GRID environment Large heterogeneous environment PC farms and not supercomputers used in HEP Large numbers (thousands) of independent users in many different sites Different applications with different requirements HEP Monte Carlo productions, reconstructions and production analyses “Scheduled” activities Goal: throughput maximization HEP individual physics analyses “Chaotic”, non-predictable activities Goal: latency minimization …

Overview Many challenging issues : INFN responsibility in DataGrid Optimizing the choice of execution location based on the availability of data, computation and network resources Optimal co-allocation and advance reservation of CPU, data, network Uniform interface to different local resource management systems Priorities, policies on resource usage Reliability Fault tolerance Scalability … INFN responsibility in DataGrid

Tasks Job resource specification and job description Method to define and publish the resources required by a job Job control language (command line tool, API, GUI) Partitioning programs for parallel execution “Decomposition” of single jobs in multiple, “smaller” jobs that can be executed in parallel Exploitation of task and data parallelism

Tasks Scheduling Services Definition and implementation of scheduling policies to find the best match between job requirements and available resources Co-allocation and advance reservation Resource management Services Authentication, authorization, bookkeeping, accounting, logging,

Effort breakdown (mm) Funded Unfunded INFN 216 184 400 DATAMAT 108 CESnet 72 144 PPARC 18 396 274 670

Italian participation Bologna (INFN) Catania (INFN and University) CNAF (INFN and University of Bologna) Lecce (INFN and University) Milano (INFN) Padova (INFN and University of Venezia) Roma (INFN) Torino (INFN and University)

Workload Management in the INFN-GRID project Integration, adaptation and deployment of middleware developed within the DataGrid project GRID software must enable physicists to run their jobs using all the available GRID resources in a “transparent” way HEP applications classified in 3 different “classes”, with incremental level of complexity Workload management system for Monte Carlo productions Goal: throughput maximization Implementation strategy: code migration (moving the application where the processing will be performed) Workload management system for data reconstruction and production analysis Implementation strategy: code migration + data migration (moving the data where the processing will be performed, and collecting the outputs in a central repository) Workload management system for individual physics analysis “Chaotic” processing Goal: latency minimization Implementation strategy: code migration + data migration + remote data access (accessing data remotely) for client/server applications

First Activities and Results CMS-HLT use case (Monte Carlo production and reconstruction) analyzed in terms of GRID requirements and GRID tools availability Discussions with Globus team and Condor team Good and productive collaborations already in place Definition of a possible high throughput workload management system architecture Use of Globus and Condor mechanisms But major developments needed

High throughput workload management system architecture (simplified design) Other info Resource Discovery Submit jobs (using Class-Ads) Master Grid Information Service (GIS) condor_submit (Globus Universe) Master chooses in which Globus resources the jobs must be submitted Information on characteristics and status of local resources Condor-G Condor-G able to provide reliability Use of Condor tools for job monitoring, logging, … globusrun Globus GRAM as uniform interface to different local resource management systems Globus GRAM Globus GRAM Globus GRAM Local Resource Management Systems CONDOR LSF PBS Site1 Farms Site2 Site3

First Activities and Results On going activities in putting together the various building blocks Globus deployment INFNGRID distribution toolkit to make Globus deployment easier and more automatic INFN customizations Evaluation of Globus GRAM Tests with job submissions on remote resources Globus GRAM as uniform interface to different underlying resource management systems (LSF, Condor, PBS) Evaluation of Globus RSL as uniform language to describe resources “Cooperation” between GRAM and GIS

First Activities and Results Evaluation of Condor-G It works, but some problems must be fixed: Very difficult to understand about errors Problems with log files Problems with scalability in the submitting machine Condor-G is not able to provide fault tolerance and robustness (because Globus doesn’t provide these features) Fault tolerance only in the submitting side Condor team is already working to fix some of these problems They are also implementing a new Globus jobmanager

First activities and results Tests with a real CMS MC production Real applications (Pythia) Real production environments Jobs submitted from Padova using Condor-G and executed in Bologna and Pisa Many many memory leaks found in the Globus jobmanager !!! Fixes provided by Francesco Prelz (INFN Milano)

Test layout for CMS production Submit jobs Production manager (Ivano Lippi – Padova) condor_submit (Globus Universe) Condor-G Padova globusrun Globus GRAM Globus GRAM Local Resource Management Systems CONDOR LSF Farms Bologna Pisa

Some next steps Evaluation of the new Globus jobmanager and the new Condor-G implementations (when ready) Master development !!!

Other info http://www.infn.it/grid