Download presentation
Presentation is loading. Please wait.
Published byEzra Hood Modified over 9 years ago
1
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova
2
Grid What is a Grid? “Dependable, consistent, pervasive access to resources” So, in some way Grid technology makes it easy to use diverse, geographically distributed, locally managed and controlled computing facilities – as if they formed a coherent local cluster What does the Grid do for you ? You submit your work And the Grid Finds convenient places for it to be run Organises efficient access to your data Caching, migration, replication Deals with authentication to the different sites that you will be using Interfaces to local site resource allocation mechanisms, policies Runs your jobs Monitors progress Recovers from problems Tells you when your work is complete
3
The DataGrid Project Principal goals: Middleware for grid & fabric management Large scale testbed Production quality demonstrations “mock data”, simulation analysis, current experiments Three year phased developments & demos Collaborate with and complement other European and US projects
4
The DataGrid Project WP 1 Grid Workload Management (INFN) WP 2 Grid Data Management (CERN) WP 3 Grid Monitoring services (PPARC) WP 4 Fabric Management (CERN) WP 5 Mass Storage Management (PPARC) WP 6 Integration Testbed (CNRS) WP 7 Network Services (CNRS) WP 8 HEP Applications (CERN) WP 9 EO Science Applications (ESA) WP 10 Biology Applications (CNRS) WP 11 Dissemination (CNR) WP 12 Project Management (CERN) http://www.cern.ch/grid
5
INFN-GRID Project Deployment inside INFN of a prototype GRID infrastructure that will allow easy access to very large data volumes and the large computing resources distributed in the 26 INFN sites The deployment of the GRID SW inside INFN will be steered by the need of the future experiments The INFN GRID infrastructure will provide with other national and CERN similar resources the testbeds for the European DATAGRID project http://www.infn.it/grid GRID
6
Workload Management Goal: define and implement a suitable architecture for distributed scheduling and resource management in a GRID environment Large heterogeneous environment Large numbers (thousands) of independent users in many different sites Different applications with different requirements Many challenging issues : Optimizing the choice of execution location based on the availability of data, computation and network resources Optimal co-allocation and advance reservation of CPU, data, network Uniform interface to possible different local resource management systems Priorities, policies on resource usage Reliability Fault tolerance Scalability … INFN responsibility in DataGrid Contributions also from Cesnet and Datamat
7
Work plan (for the next future) Implement the 1 st workload management system (deliverable for project month 9) We plan to profit from functionalities, services, … provided by existing tools and technologies, when possible Condor could be one of this technology
8
Functionalities foreseen for the 1 st release First version of job description language (JDL) Used when the job is submitted, to specify the job characteristics (input data, output data location, required resources, …) Prototype: Condor ClassAds First version of broker (master), that chooses the resources (farms) where to submit jobs The computing resources are matched with the job request according to authorization policies, availability of the requested input data set, availability of the appropriate application "sandbox“, queue characteristics and status, availability of the requested amount of scratch space All information needed by the broker must be “published” in one “Grid Information Space” (Globus GIS in the 1 st prototype) Prototype: Condor matchmaking library Match between the info published in the GIS and the ClassAds specified in the JDL
9
Functionalities foreseen for the 1 st release Job submission service Reliable, fault tolerant, crash proof service Prototype: Condor-G Submission of jobs to Globus resources (farms managed by local resource management systems) First version of logging and bookkeeping services First user interface Mainly for job management (job submission, job removal, job status monitoring, …)
10
Prototype workload management system architecture Globus GRAM CONDOR Globus GRAM LSF Globus GRAM PBS Site1 Site2Site3 condor_submit (Globus Universe) Condor-G Master Grid Information Service (GIS) Submit jobs (using Class-Ads) Resource Discovery Information on characteristics and status of local resources Local Resource Management Systems Globus GRAM as uniform interface to different local resource management systems Condor-G as a reliable/crashproof job submission service Master chooses in which Globus resources the jobs must be submitted Farms Other info
11
Where we are On-going activities in evaluating the existing technologies and “putting together” the various building blocks Evaluation of Globus Collaboration with WP 1 of INFN-GRID project (Evaluation of the Globus toolkit) http://www.infn.it/globus Evaluation of Condor-G The current implementation is a prototype It works, but some problems must be solved New implementation ready in a few weeks Globus + Condor-G tested with a real CMS MC production Many many many memory leaks found in the Globus jobmanager !!! Fixes provided by Francesco Prelz
12
Layout for CMS production Globus GRAM CONDOR Globus GRAM LSF Bologna Pisa condor_submit (Globus Universe) Condor-G Submit jobs Local Resource Management Systems Production manager (Ivano Lippi – Padova) Farms Padova
13
Conclusions We can exploit different Condor functionalities and features Condor as underlying resource management system for Globus (resource management system for farms) Could it be viable to configure the INFN Condor pool as Globus resource ??? Condor ClassAds as Job Description Language Very flexible !! Condor matchmaking library Match between requirements and offers Condor-G as job submission service Reliable, fault tolerant, crash-proof service
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.