A Distributed Policy Scenario

Slides:



Advertisements
Similar presentations
Community Grids Lab1 CICC Project Meeting VOTable Developed VotableToSpreadsheet Service which accepts VOTable file location as an input, converts to Excel.
Advertisements

News in XACML 3.0 and application to the cloud Erik Rissanen, Axiomatics
Global Analysis and Distributed Systems Software Architecture Lecture # 5-6.
More HTCondor 2014 OSG User School, Monday, Lecture 2 Greg Thain University of Wisconsin-Madison.
Matchmaking in the Condor System Rajesh Raman Computer Sciences Department University of Wisconsin-Madison
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
Dr. David Wallom Use of Condor in our Campus Grid and the University September 2004.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
1 Draft of a Matchmaking Service Chuang liu. 2 Matchmaking Service Matchmaking Service is a service to help service providers to advertising their service.
Resource Selector Chuang Liu. What do we want to do? A smart Resource Selector App R S Resource requirement.
Workload Management Massimo Sgaravatto INFN Padova.
Design and Evaluation of a Resource Selection Framework for Grid Applications University of Chicago.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.
1 Integrating GPUs into Condor Timothy Blattner Marquette University Milwaukee, WI April 22, 2009.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
An Introduction to High-Throughput Computing Rob Quick OSG Operations Officer Indiana University Some Content Contributed by the University of Wisconsin.
An Introduction to High-Throughput Computing Monday morning, 9:15am Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
TeraGrid Science Gateways: Scaling TeraGrid Access Aaron Shelmire¹, Jim Basney², Jim Marsteller¹, Von Welch²,
Maximilian Berger David Gstir Thomas Fahringer Distributed and parallel Systems Group University of Innsbruck Austria Oct, 13, Krakow, PL.
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
Grid Computing I CONDOR.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Hunter of Idle Workstations Miron Livny Marvin Solomon University of Wisconsin-Madison URL:
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Alain Roy Computer Sciences Department University of Wisconsin-Madison ClassAds: Present and Future.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Quill / Quill++ Tutorial.
Software Life Cycle The software life cycle is the sequence of activities that occur during software development and maintenance.
Grid and Cloud Computing Alessandro Usai SWITCH Sergio Maffioletti Grid Computing Competence Centre - UZH/GC3
Condor Tutorial for Users INFN-Bologna, 6/29/99 Derek Wright Computer Sciences Department University of Wisconsin-Madison
Nick LeRoy Computer Sciences Department University of Wisconsin-Madison Hawkeye.
An Introduction to High-Throughput Computing With Condor Tuesday morning, 9am Zach Miller University of Wisconsin-Madison.
Scheduling & Resource Management in Distributed Systems Rajesh Rajamani, May 2001.
Nicholas Coleman Computer Sciences Department University of Wisconsin-Madison Distributed Policy Management.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
An Introduction to High-Throughput Computing Monday morning, 9:15am Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Condor Tutorial NCSA Alliance ‘98 Presented by: The Condor Team University of Wisconsin-Madison
John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Caging the CCLRC Compute Zoo (Activities at.
The OxGrid Resource Broker David Wallom. Overview OxGrid Resource Broking Why build our own Job Submission and other tools Future developments.
First evaluation of the Globus GRAM service Massimo Sgaravatto INFN Padova.
Profiling Applications to Choose the Right Computing Infrastructure plus Batch Management with HTCondor Kyle Gross – Operations Support.
Condor Week May 2012No user requirements1 Condor Week 2012 An argument for moving the requirements out of user hands - The CMS experience presented.
Scheduling Policy John (TJ) Knoeller Condor Week 2017.
HTCondor Security Basics
Quick Architecture Overview INFN HTCondor Workshop Oct 2016
Scheduling Policy John (TJ) Knoeller Condor Week 2017.
Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)
GLAST Release Manager Automated code compilation via the Release Manager Navid Golpayegani, GSFC/SSAI Overview The Release Manager is a program responsible.
Accounting, Group Quotas, and User Priorities
Job Matching, Handling, and Other HTCondor Features
Operating Systems.
Condor Glidein: Condor Daemons On-The-Fly
Basic Grid Projects – Condor (Part I)
Genre1: Condor Grid: CSECCR
HTCondor Training Florentia Protopsalti IT-CM-IS 1/16/2019.
Douglas Thain INFN Bologna, December 2001
General Purpose Computing with Condor
Condor: Firewall Mirroring
Improving ARC backends: Condor and SGE/GE LRMS interface
Condor Administration in the Open Science Grid
Condor-G Making Condor Grid Enabled
Presentation transcript:

Distributed Policy Management and Comprehension with Classified Advertisements

A Distributed Policy Scenario A user submits a job to Condor The user has designed a policy defining requested services Machines in condor pool have policies restricting the use of services The user’s job won’t run - Why? Is user’s policy to restrictive? Was job rejected by machine policies?

Policy Management Resource allocation challenges Resource heterogeneity Policy heterogeneity How to allocate resources? Conventional centralized allocation not sufficient Solution: Matchmaking with Classified Advertisements (ClassAds)

Matchmaking ? Job

? Matchmaking Job MyType = “Machine”; [ MyType = “Job”; ............. Rank = ... Requirements = ... ] MyType = “Machine”; ? Job

? Matchmaking Job MyType = “Machine”; [ MyType = “Job”; ............. Rank = ... Requirements = ... ] MyType = “Machine”; ? Job

? Matchmaking Job MyType = “Machine”; [ MyType = “Job”; ............. Rank = ... Requirements = ... ] MyType = “Machine”; ? Job

Classified Advertisements Represent entities (e.g. jobs, machines) and their policies A ClassAd is a set of named expressions called attributes Types of attributes: Characteristics of an entity (Arch, OpSys, Memory) Constraints for requested resource (Requirements) Preferences for requested resource (Rank)

Typical Classads [ [ Type = “Job”; Type = “Machine”; Owner = “ncoleman”; Cmd = “run_sim”; Memory = 31m; Rank = KFlops/1E3 + other.Memory/32; Requirements = (other.Type == “Machine”) && (other.Arch == “INTEL”) && (other.Opsys == “LINUX”) && (other.Memory >= 128); ] [ Type = “Machine”; KeybrdIdle = ‘00:23:12’; Memory = 256M; LoadAvg = 0.042969; Kflops = 21893; Arch = “INTEL”; OpSys = “LINUX”; Name = “foo.cs.wisc.edu”; Rank = (DayTime() >= ‘9:00’) && ((DayTime() <= ‘17:00’) ? 1/other.ImageSize : 0); Requirements = (other.Type == “Job”) && (other.Owner != “riffraff”) && (LoadAvg < 0.3) && (KeybrdIdle > ‘00:15’); ]

Policy Comprehension Why won’t my job run? Looking for answers My policy is too restrictive My job is rejected by machines in the pool Looking for answers Use condor tools (condor_q, condor_status) Stare at job ClassAd to find out what’s wrong

Condor Tools condor_q –analyze: User wants more details: Of 105 resource offers, 105 do not satisfy the request's constraints 64 resource offer constraints are not satisfied by this request User wants more details: Which parts of job requirements expression are problematic? Is job ClassAd missing any attributes?

Two Cases to Examine 1. No machines meet the job’s requirements 2. The job does not meet any machine’s requirements One or both of these issues may be preventing the job from running, but they are not interdependent. We can analyze each one separately.

Example 1 JOB [ Requirements = (Arch==“SPARK”) &&(OpSys==“SOLARIS2.7”) ]

JOB Example 1 [ Requirements = (Arch==“SPARK”) &&(OpSys==“SOLARIS2.7”) ] Result: (Arch == “SPARK"): did not match - suggestion: REMOVE (Opsys == "SOLARIS2.7"): matched: 2 - suggestion: KEEP

Example 2 JOB [ Requirements = (Arch==“ALPHA”) && (OpSys==“WINNT”) && (Memory>=64) ]

JOB Example 2 [ Requirements = (Arch==“ALPHA”) && (OpSys==“WINNT”) && (Memory>=64) ] Result: (Arch == "ALPHA"): matched: 1 - suggestion: REMOVE (OpSys == "WINNT"): matched: 2 - suggestion: KEEP (Memory >= 64): matched: 4 - suggestion: KEEP

Example 3 JOB MACHINES 1 4 2 5 3 [ Owner = “jsmith”; ImageSize = 120000; Requirements = ... ] MACHINES [ Requirements = (ImageSize <= 50176) && (MemoryReq < 49) ] 1 [ Requirements = (ImageSize <= 50176) && (MemoryReq < 49) ] 4 [ Requirements = (ImageSize <= 50176) && (MemoryReq < 49) ] 2 [ Requirements = (ImageSize <= 115712) && (MemoryReq < 98) ] 5 [ Requirements = (ImageSize <= 115712) && (MemoryReq < 98) ] 3

+ ¥ 115712 3,5 ImageSize 50176 1,2,3,4,5 3,5 + ¥ 49 98 MemoryReq

Example 3 JOB Results of Test --------------- [ Owner = “jsmith”; ImageSize = 120000; Requirements = ... ] Results of Test --------------- The following attributes are missing from the job classad: MemoryReq The following attributes should be added or modified: ImageSize: - suggestion: use a value less than or equal to 50176 MemoryReq: - suggestion: use a value less than 49

Current Work ClassAd analysis prototype implemented in Java Job requirements analysis Machine requirements analysis Current version supports a simple menu driven interface Working on integrating with Condor tools condor_q –analyze condor_status

Future Work Applications to other uses of ClassAds in Condor Analysis of a successful match Graphical interface Analysis of gang matching ClassAds as an authorization language

Conclusions Automated fine grained policy expression analysis is useful and feasible Different issues arise with job requirements analysis and machine requirements analysis The ClassAd language is ideal for these purposes.