Presentation is loading. Please wait.

Presentation is loading. Please wait.

EGEE is a project funded by the European Union under contract INFSO-RI-508833 Practical approaches to Grid workload management in the EGEE project Massimo.

Similar presentations


Presentation on theme: "EGEE is a project funded by the European Union under contract INFSO-RI-508833 Practical approaches to Grid workload management in the EGEE project Massimo."— Presentation transcript:

1 EGEE is a project funded by the European Union under contract INFSO-RI-508833 Practical approaches to Grid workload management in the EGEE project Massimo Sgaravatto INFN Padova On behalf of the EGEE JRA1 IT-CZ cluster CHEP 2004 www.eu-egee.org

2 Chep 2004 - 2 EGEE project  Aim: build a consistent, robust and secure Grid infrastructure  Focus first on two pilot applications areas (HENP, Biomedical applications) But the goal is to take other researchers in academia and industry Middleware activity (JRA1)  Re-engineer Grid software to provide production quality middleware  Evolution towards emerging standards, based on Service Oriented Architectures  Taking into account application requirements and production/ deployment/ management needs See talk #247 (E. Laure)

3 Chep 2004 - 3 Workload management Grid workload and resource management is one of the key Grid middleware functionality  How to efficiently schedule a big number of different data-intensive jobs, submitted by a distributed community of users, to a Grid encompassing many and heterogeneous resources Progress was made in various projects with different integrated software solutions:  DataGrid Workload Management System  Condor  EuroGrid-Unicore resource broker  … Still a lot to do  Scalability, reliability  Identification and handling of failures originating from different software layers, and possibly from 'foreign' Grid system and resources  Distributed (hierarchical ?) super-scheduling  Proper semantics of resource information collection and distribution (push, pull, index, cache, refresh)  …

4 Chep 2004 - 4 Workload Management System Provision of Grid Workload Management System services assigned to the “EGEE JRA1 Italian Czech cluster”  CESNET  Datamat S.p.A.  INFN Architecture of the EGEE WMS designed and being implemented  Taking into account feedback and requirements from reference applications and deployment/production/management activities  Taking into account previous experiences from other Grid projects (in particular the DataGrid WMS)  Set of Grid services Workload Manager (WM) Computing Element (CE): Resource access Logging & Bookkeeping (L&B) Job Provevance (JP) Grid Accounting service  Interoperating among them and with other EGEE Grid Services

5 Chep 2004 - 5 Workload Manager

6 Chep 2004 - 6 Workload Manager Job management requests (submission, cancellation) expressed via a Job Description Language (JDL)

7 Chep 2004 - 7 Workload Manager Keeps submission requests Requests are kept for a while if no matching resources available

8 Chep 2004 - 8 Workload Manager Repository of resource information available to matchmaker Updated via notifications and/or active polling on sources

9 Chep 2004 - 9 Workload Manager Finds an appropriate CE for each submission request, taking into account job requests and preferences, Grid status, utilization policies on resources

10 Chep 2004 - 10 Scheduling policies Different possible policies  Eager scheduling: a job is bound to a resource as soon as possible Job is then forwarded to that CE, where very likely it will end up in a queue  Lazy scheduling: job held by the WM until a resource becomes available Job then forwarded to that CE for immediate execution WM architecture able to accommodate both models (and the intermediate solutions)  Eager scheduling: matching a job against multiple resources  Lazy scheduling: matching a resource against multiple jobs Needed to better investigate strengths and weaknesses of different policies in different scenarios  Evaluation of relevant metrics, covering both resource utilization and user satisfaction

11 Chep 2004 - 11 Computing Element Service representing a computing resource Main functionality: job management  Run jobs  Cancel jobs  Suspend and resume jobs  Provide info on “quality of service” How many resources match the job requirements ? What is the estimated time to have the job starting its execution ? …  … Used by the WM or by any other client (e.g. end-user) CE architecture accommodated to support both push and pull model  Push model: the job is pushed to the CE by the WM  Pull model: the CE asks the WM for jobs These two models are somewhat mirrored in the resource information flow  In order to 'pull' a job a resource must choose where to 'push' information about itself

12 Chep 2004 - 12 CE LSF Worker Nodes PBS? Mon Client WEB CE Architecture JobSubmit JobAssess JobKill JobSuspend JobResume JobGetStatus Web service accepting job management requests

13 Chep 2004 - 13 CE LSF Worker Nodes PBS? Mon Client WEB CE Architecture Notifications Job requests Async. notifications about job/CE events Job requests (for CE working in pull mode )

14 Chep 2004 - 14 Logging & Bookkeeping Collects and manages job-related events (e.g. submission, suitable CE found, start of execution, …) from the WMS components Processes these events to give a higher level view on job states Both job states and raw data available to users  Also via Web Service interface Possible to subscribe to receive notifications on particular job state changes LB event trail can be analyzed to identify problems with resources ("black holes", unusual failure rates, etc). See poster #419 for more details

15 Chep 2004 - 15 Job Provenance Keeps track of definition of submitted jobs, execution conditions and job life cycle for a long time  Job life logs (JDL, timestamps, jobids, …)  Executable and input/output files  Execution environment (OS, installed software version, …)  Custom data provided by user Used for  Debugging  Post-portem analysis  Comparison of job executions in an evolving environment Service components  Primary Storage Server Keeps data in the most compact and economic form  Index Servers Configured to support a set of queryable attributes See poster # 419 for more details

16 Chep 2004 - 16 Grid Accounting Accumulates information about the usage of Grid resources by users / groups (e.g. VOs) To be used  To track resource usage  To discover abuses (and help avoiding them) Also possible to charge users for the resources they have used Allows implementation of submission policies based on resource usage  Exchange market among Grid users and Grid resource owners, which should result in market equilibrium  Load balancing on the Grid

17 Chep 2004 - 17 Accounting architecture Accounting Computing Element Storage Element Resource metering: getting info about resource usage Resource metering: getting info about resource usage

18 Chep 2004 - 18 Accounting architecture Accounting Computing Element Storage Element Reports about resource usage per user / VO/ resource

19 Chep 2004 - 19 Accounting architecture Accounting Computing Element Storage Element Resource pricing Resource owner

20 Chep 2004 - 20 Accounting architecture Accounting Computing Element Storage Element Resource pricing Resource owner Cost computation

21 Chep 2004 - 21 Status Workload Manager, Logging & Bookkeeping, Grid Accounting software inherited by DataGrid WMS software  Being revised and complemented according to the new architecture E.g. Information Supermarket, TaskQueue new developments Web services interfaces  First implementation already deployed in the EGEE GLITE prototype testbed Computing Element  New fresh developments  CEMon prototype already implemented Job Provenance  New component being implemented

22 Chep 2004 - 22 Links EGEE JRA1 IT-CZ cluster homepage  http://egee-jra1-wm.mi.infn.it/egee-jra1-wm http://egee-jra1-wm.mi.infn.it/egee-jra1-wm EGEE JRA1 (middleware activity) homepage  http://egee-jra1.web.cern.ch/egee-jra1 http://egee-jra1.web.cern.ch/egee-jra1 EGEE project homepage  http://www.eu-egee.org http://www.eu-egee.org


Download ppt "EGEE is a project funded by the European Union under contract INFSO-RI-508833 Practical approaches to Grid workload management in the EGEE project Massimo."

Similar presentations


Ads by Google