1 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE Enforcing resource allocations with the SweGrid Accounting System (SGAS) GDB meeting, Bologna October 11, 2005 Joint effort by Erik Elmroth (Umeå University) Peter Gardfjäll (Umeå University) Lennart Johnsson (KTH) Olle Mulmo (KTH) Thomas Sandholm (KTH) Presented by Tord Ekelof, Uppsala University on behalf of this group
2 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE The involved parties Resource owner User Allocation authority User E.g. a member of a scientific project Wants QoS guarantees His/her ”fair share of the Grid” Allocation authority E.g. Swedish National Allocation Committee Wants to coordinate aggregate Grid capacity to assure efficient utilization of resources Reserve resources to projects (e.g. subject to payment/importance) Resource owner Resource administrator Wants to retain control over the local resource and its utilization SGAS – ”the mechanism” Provides functionality to manage and enforce the necessary policies SGAS
3 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE Grid accounting - coordinating Grid resource usage Maintaining a (consistent) Grid-wide view of the resources utilized by VO members Measure and control users’ total resource usage on the Grid Assuming absence of central point of control Resource owners should retain local control
4 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE Why accounting? Accounting information can be used for several purposes Economic compensation Tracking of resource usage Evaluation/forecasting of resource usage Resource brokering decisions Assign scheduling priorities to jobs based on previous resource utilization Pricing & creating economic markets for resource sharing Enforcement of resource allocations Etc…
5 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE SGAS in SweGrid SweGrid is a Swedish computational Grid Connects six computer clusters (Umeå, Göteborg, Uppsala, Stockholm, Lund, Linköping) with a total of 600 processors Swedish National Allocation Committee Allocates CPU time (measured in node hours) on SweGrid to research projects Grid-wide allocations can be spent arbitrarily among Grid sites SGAS has been developed to Enforce project allocations across all SweGrid sites Prevent project members from overspending Store detailed information on each Grid job’s resource usage
6 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE Milestones and future directions Sep 2003: SGAS project initiated Sep 2003: SweGrid site survey – needs analysis Oct 2003: SGAS white paper Investigated existing work on Grid accounting Accounting system architecture proposal Jan 2004: Finished proof-of-concept prototype Feb 2004: Started work on production code base Apr 2004: Version 0.1 released OGSI/GT3-based Apr 2004: Contributed authorization framework to Globus Toolkit Nov 2004: Version 0.2 released Additional core functionality (e.g. timestamped allocations) Oct 2004: Version 1.0 released Stability/scalability improvements Jun 2005: Version 2.0 (alpha) released Fully WSRF-compliant implementation (GT4-based) Aug 2005: SGAS included as ”tech preview” in Globus Toolkit Autumn 2005: Final 2.0 version Further real-world testing Distributed bank solution, and simplified account naming
7 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE SweGrid Accounting System (SGAS) Decentralized resource allocation enforcement system SGAS performs soft real-time enforcement of allocations Real-time enforcement: Resources can, at the time of job submission, deny access if project quota has been used up Soft: enforcement is subject to local resource policies (strict enforcement not always appropriate) Initially addressed allocation enforcement in SweGrid Not restricted to SweGrid use Developed with an emphasis on easy integration into different Grid middleware Single-point-of-integration In SweGrid: deployed on top of NorduGrid middleware WSRF-compliant Java implementation using Globus Toolkit 4 (GT4) primitives
8 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE Design goals Service-oriented architecture (Web Services-based) Based on open standards GGF: OGSA, UR OASIS: WSRF, XACML, WS-Security WS-SecureConversation Light-weight/non-intrusive deployment Single-point-of integration with underlying middleware End-to-end security Message-level and transport-level security Fine-grained authorization model based on XACML policies End-user transparency Flexibility and customizability Can account for any type of resource usage Abstract “currency” (Grid credits) Usage transformed into Grid credits before charging an account Policy customization on three different levels User: “only run jobs if sufficient quota is available” Resource owner: “run quota-exceeding jobs with low priority” Allocation authority: “allow 10 % account overdraft”
9 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE SGAS component overview Four main components Bank Online service Manages project accounts (resource allocations) Provides Grid users/resources with consistent information about resources consumed by Grid projects JARM (Job Account Reservation Manager) Intercepts job requests on resources Makes account reservation prior to job execution Charges project account after job completion Single-point-of-integration LUTS (Logging and Usage Tracking Service) Collects and publishes usage records which can be queried by users PAT (Policy Administration Tool) Client tool to manage Bank and LUTS policies
10 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE Component interactions 1.Contact resource 2.Authenticate/authorize (delegate credentials) 3.Submit job request 4.JARM intercepts request 5.Make account reservation 6.Run job 7.Collect usage info 8.Charge project account and log usage info
11 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE Policy enforcement overview PAP = Policy Administration Point - set up policies PIP = Policy Information Point - retrieve policies PDP = Policy Decision Point - make policy decisions/manage policy PEP = Policy Enforcement Point - intercept request and query PDP(s)
12 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE Bank component Composed of WSRF-compliant Web services Bank Creates and locates accounts Account Represents a project’s resource allocation Users make reservations on account allocation. A successful reservation results in a... Hold Time-limited reservation on the account Used to charge the account Overdraft policy can be associated with each account Batch operations for scalability/performance Each account manages a set of time-stamped allocations Each allocation valid for a limited time period Allows total allocation to be spread out in time Implements a "use-it-or-lose-it" policy … Bank Account … Hold >
13 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE Allocation Strategies Picture from:
14 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE Allocation strategy example Picture from:
15 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE Allocation Strategies Picture from:
16 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE Logging and Usage Tracking Service (LUTS) Collects and publishes usage records compliant with GGF- UR specification XML-based format for storing detailed information about the resources consumed by Grid jobs CPU time, memory, storage, network, … Authorized users are allowed to run XPath queries directly against LUTS URs can be extended to hold additional information only understood by a subset of users/resources without modifying LUTS URs can be logged in batches Improved performance and scalability XSLT-based transformation infrastructure to allow sites to easily convert their non-XML usage data to a UR- compliant format
17 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE Collecting usage data
18 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE Job Account Reserv. Manager (JARM) Integration-point between SGAS and underlying Grid env. Workload manager independent NorduGrid integration configuration of plug-in scripts triggered on state-transitions during the NG job submission process Plugged into workload manager at each cluster Intercepts job submissions Makes account reservations prior to job execution Can be carried out in parallel with job preparation (less overhead) Collects usage data from batch system when job has finished Charges account and logs a usage record in LUTS Charging & logging of jobs usually deferred and performed in batches Local site policies can be enforced by overloading the default Site Policy Manager Default Site Policy Manager let job through even if bank cannot be reached; log and charge later overdraft violation detected: run job with lower priority
19 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE sgas-admin A tool for: Administering SGAS Collecting information from SGAS Provides a command line interface including commands for: Bank management Creating and removing accounts Managing account allocations Managing account policies Retrieving usage information Off-line corrections Can be run in interactive or script mode TODO: web/graphical interface
20 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE Authorization framework Fine-grained authorization framework Authorization specified on a per-operation basis Separate Globus contribution Associate authorization policy and engine with service Service orthogonal: transparent to service implementation Customizable: allows different backend engines/policy languages SGAS authorization engine based on XACML
21 CENTER FOR PARALLEL COMPUTERS DEPARTMENT OF COMPUTING SCIENCE DEPARTMENT OF COMPUTING SCIENCE Project information Please visit us at SGAS download Documentation Publications Mailing list: Globus Toolkit contribution Grid research at Umeå University