Download presentation
Presentation is loading. Please wait.
Published byAugust Kelley Modified over 9 years ago
1
Enabling Grids for E-sciencE www.eu-egee.org The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester, UK 2007, May 10
2
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 2 Contents WMS –System overview, partners, task, components JDL –Language overview –JobTypes: single /compounds News –New Functionalities –Latest Activities Future Plans –Future Implementations & Activities –WMS and WfMS Tests –Middleware Testing Activities & Results
3
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 3 Workload Management System (WMS) –Italian and Czech clusters Part of Joint Research Activity 1 (JRA1) Partners involved –INFN –Datamat –CESNET Provides Distribution and Management of tasks across resources available on a Grid –Accepts a request of execution of a Job from a client –Finds appropriate resources to satisfy the Job –Follows the Job until completion. Introduction: gLite WMS
4
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 4 WMS Architecture: core components WMProxy –Accepts Request from User –Checks/Authentication/Authorization –Sets up Local File System –Forwards request to WM Workload Manager (WM) –Look for the appropriate Computing Element (CE) Matchmaking operation –Forwards request to CondorC Logging & Bookkeeping (LB) –Tracks jobs in terms of events gathered from various gLite components –Processes the incoming events to give a higher level view on the job states –Recently introduction of LBProxy Lightweight local LB service “dedicated” to WMS components Asynchronously logs info to the actual (usually remote) LB
5
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 5 WMS Architecture overview Job Controller CondorG gLite WMS Workload Manager LB Proxy WMProxy UserInterface LB Server gLite CE LCG CE Job Controller CondorC Log Monitor
6
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 6 WMProxy Architecture & Capabilities WMProxy integration with the WMS LB Proxy Workload Manager Client Local File System LB Data Base Server Host Logging & Bookkeeping MOD SSL MOD GridSiteMOD FCGI WMProxy Server Apache Request Queue or JobDir SOAP/ HTTPS
7
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 7 Job Submission Chain – WMS extract
8
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 8 JDL: overview Job Description Language (JDL) –gLite approach to Request Description –Allows the user to provide job execution needed information Characteristics of the application Requirements/preferences about resources Customized hints for gLite WMS on how to handle the application Supported Job Types –Single Jobs –Compound Jobs Workflows (DAGs) Collections, Parametric Jobs
9
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 9 JDL: Single Types Single Jobs –Normal: single and simple batch job with no peculiar requirements –MPICH: a parallel application to be run on the nodes of a cluster using the MPICH implementation of the message passing interface new MPI flavours support planned –Interactive: a job whose standard streams are forwarded to the submitting client, that can actually interact and steer the job execution by providing real-time input information Previously Supported Job Types –Not supported anymore: Checkpointable Jobs Partitionable Jobs –Deprecation due Lack of feedback from users It seems they are not used at all –Focus on improving support for “really used” job types
10
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 10 JDL: Compound Jobs Definition –Aggregation of Single/Normal Jobs Benefits –One Shot submission for (up to thousands of) jobs Single call to WMProxy server Single AuthN and AuthZ process Submission time reduction –Single Identification to manage all jobs (father Job) Not an actual Job, used to monitor the whole bunch –Sharing of files between jobs
11
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 11 JDL: Compound Types Compound Jobs: Workflows –Implemented as Directed Acyclic Graphs (DAGs) –Set of jobs where the input, output or execution of one of more jobs may depend on one or more other jobs –Dependencies represent time constraints: a child cannot start before all parents have successfully completed nodeE nodeD nodeC nodeI nodeF nodeH nodeG nodeB nodeA Father
12
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 12 JDL: Compound Types Compound Jobs: Parametric Jobs –Parameterized description of a Job Parameter sweep usage –Automatically converted on WMS side –Generate a (possibly) huge number of (similar) jobs –No dependencies between nodes nodeD nodeC nodeB nodeA nodeE Father
13
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 13 JDL: Compound Types Compound Jobs: Collections –A set of possibly heterogeneous jobs that can be specified within a single JDL description –No dependencies among specified jobs nodeD nodeC nodeB nodeA nodeE Father
14
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 14 New Functionalities: WMProxy WMProxy server –Replaces the old C++ based socket connection service –Implements an interoperable interface Web Service based WS-I compliant WMProxy client –Provides C++ based WMS command-line User Interface (UI), which executes all the needed operation automatically –Provides multi language (C++, Java and Python) provided APIs
15
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 15 New Functionalities: ICE- CREAM CREAM service: Computing Resource Execution And Management service –CE with Web service interface –WMS requests are directly forwarded to CREAM based CEs through ICE ICE: Interface to Cream Environment –Basically reproduces the Job Controller / CondorC / Log Monitor layer needed by gLite/LCG CEs
16
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 16 New Functionalities: ICE- CREAM Job Controller CondorG gLite WMS Workload Manager LB Proxy WMProxy UserInterface ICE LB Server gLite CE LCG CE CREAM Job Controller CondorC LogMonitor
17
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 17 New Functionalities: Sandbox Files Sandbox Archiving and Sharing –Job sandbox files can be automatically compressed –Different jobs can share the same sandbox dramatically reduces network traffic allowes the user to save time and bandwidth Sandbox Remote Specification –User can store files directly on a remote machine –No intermediate copies – JobWrapper will download directly from WorkerNode –Reduces server load Supported File Transfer –Full support (input & output files) for protocols: gridftp https
18
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 18 New Functionalities: Bulk-MM Bulk-MatchMaking –Natural completion of the Bulk Submission –Allow single Matchmaking of similar jobs in one shot Jobs equivalence are based upon specifying “significant attributes” Jobs whose significant attribues are literally equal are equivalent –Target Jobs: Bunch of Independent Jobs Mainly Collections and Parametric Jobs Originally managed with Condor DAGMan –Allows Submission to CREAM-based CEs Provides additional boost in WMS performance –Saves time & resources
19
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 19 Other New Functionalities Service Discovery –Provides services information by performing queries to external databases of different kinds (RGMA, BDII) –Client side Queries for available WMProxy Endpoints on the net Does not need user commands manual reconfiguration –Server side Queries for available LB servers where to Log Job information Job Files Perusal –Performs a monitoring activity on the actual output files produced by a job during its lifecycle: –Adds important pieces of information not available by simple status monitoring and that were before available only at job completion
20
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 20 gLite New Activities New platforms widely deployed on the infrastructure –In particular Scientific Linux 4 and 64-bit architectures Migration to ETICS build system –High Flexibility –Addresses multiple platform support almost impossible using the old gLite build system –All WMS components build achieved –WMS Ongoing activity: Integration/deployment Software not yet fully deployed Client side manual installation fully working Server side installation not yet available (almost achieved)
21
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 21 gLite WMS Ongoing Restructuring gLite Restructuring: –All new features development stopped for 6 months –improving usability & portability Multi platform (Structural changes needed) –Cleaning up sections that cause build and porting difficulties –Removing/Reducing Dependencies on external software to ease installation and deployment Goals: –Easier Service maintainance and Usage Will increase stability and throughput –Toward a “gLighter” User Interface Identify and remove all unnecessary dependencies
22
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 22 gLite WMS Future Improving Logging and Error Reporting –Common syslog-like logging format Windows working prototype gLite porting on Microsoft Windows platforms Improving interoperability –Supercomputing 06 Working Prototype for Demo Basic Execution Service (BES) Job Submission Description Language (JSDL)
23
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 23 WfMS and gLite WMS Possible “external integration” with external existing Workflow frameworks –Still to be discussed and planned A proposal for a Workflow Mangement System Integrated within WMS under discussion –Running on top of gLite Middleware –Abstract and Generic Representation of Workflow Internally usage of Petri Net model Externally translation mechanisms from different language front ends
24
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 24 WfMS and gLite WMS
25
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 25 Test & Result Intense testing and constant bug fixing activities have been performed over the last months –Improved job submission rate –Improved service stability –New Functionalities tested and adopted by experiments Production quality test Results: –16K jobs/day over one week of submissions –No manual intervention on server Stable memory usage –0.3% of jobs in non-final state Aborted jobs mostly due to expired user credentials Was about 5% before Bulk-MM support
26
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 26 Test & Result
27
Enabling Grids for E-sciencE OGF20, Manchester, UK 2007, May 10th 27 Some Links WMS –http://egee-jra1-wm.mi.infn.it/egee-jra1-wm/http://egee-jra1-wm.mi.infn.it/egee-jra1-wm/ WMProxy –http://trinity.datamat.it/projects/EGEE/wiki/wiki.phphttp://trinity.datamat.it/projects/EGEE/wiki/wiki.php LB –http://egee.cesnet.cz/en/JRA1/index.htmlhttp://egee.cesnet.cz/en/JRA1/index.html CREAM –http://grid.pd.infn.it/creamhttp://grid.pd.infn.it/cream JDL –http://edms.cern.ch/document/590869/1http://edms.cern.ch/document/590869/1
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.