Distributed Policy Management and Comprehension with Classified Advertisements
A Distributed Policy Scenario A user submits a job to Condor The user has designed a policy defining requested services Machines in condor pool have policies restricting the use of services The user’s job won’t run - Why? Is user’s policy to restrictive? Was job rejected by machine policies?
Policy Management Resource allocation challenges Resource heterogeneity Policy heterogeneity How to allocate resources? Conventional centralized allocation not sufficient Solution: Matchmaking with Classified Advertisements (ClassAds)
Matchmaking ? Job
? Matchmaking Job MyType = “Machine”; [ MyType = “Job”; ............. Rank = ... Requirements = ... ] MyType = “Machine”; ? Job
? Matchmaking Job MyType = “Machine”; [ MyType = “Job”; ............. Rank = ... Requirements = ... ] MyType = “Machine”; ? Job
? Matchmaking Job MyType = “Machine”; [ MyType = “Job”; ............. Rank = ... Requirements = ... ] MyType = “Machine”; ? Job
Classified Advertisements Represent entities (e.g. jobs, machines) and their policies A ClassAd is a set of named expressions called attributes Types of attributes: Characteristics of an entity (Arch, OpSys, Memory) Constraints for requested resource (Requirements) Preferences for requested resource (Rank)
Typical Classads [ [ Type = “Job”; Type = “Machine”; Owner = “ncoleman”; Cmd = “run_sim”; Memory = 31m; Rank = KFlops/1E3 + other.Memory/32; Requirements = (other.Type == “Machine”) && (other.Arch == “INTEL”) && (other.Opsys == “LINUX”) && (other.Memory >= 128); ] [ Type = “Machine”; KeybrdIdle = ‘00:23:12’; Memory = 256M; LoadAvg = 0.042969; Kflops = 21893; Arch = “INTEL”; OpSys = “LINUX”; Name = “foo.cs.wisc.edu”; Rank = (DayTime() >= ‘9:00’) && ((DayTime() <= ‘17:00’) ? 1/other.ImageSize : 0); Requirements = (other.Type == “Job”) && (other.Owner != “riffraff”) && (LoadAvg < 0.3) && (KeybrdIdle > ‘00:15’); ]
Policy Comprehension Why won’t my job run? Looking for answers My policy is too restrictive My job is rejected by machines in the pool Looking for answers Use condor tools (condor_q, condor_status) Stare at job ClassAd to find out what’s wrong
Condor Tools condor_q –analyze: User wants more details: Of 105 resource offers, 105 do not satisfy the request's constraints 64 resource offer constraints are not satisfied by this request User wants more details: Which parts of job requirements expression are problematic? Is job ClassAd missing any attributes?
Two Cases to Examine 1. No machines meet the job’s requirements 2. The job does not meet any machine’s requirements One or both of these issues may be preventing the job from running, but they are not interdependent. We can analyze each one separately.
Example 1 JOB [ Requirements = (Arch==“SPARK”) &&(OpSys==“SOLARIS2.7”) ]
JOB Example 1 [ Requirements = (Arch==“SPARK”) &&(OpSys==“SOLARIS2.7”) ] Result: (Arch == “SPARK"): did not match - suggestion: REMOVE (Opsys == "SOLARIS2.7"): matched: 2 - suggestion: KEEP
Example 2 JOB [ Requirements = (Arch==“ALPHA”) && (OpSys==“WINNT”) && (Memory>=64) ]
JOB Example 2 [ Requirements = (Arch==“ALPHA”) && (OpSys==“WINNT”) && (Memory>=64) ] Result: (Arch == "ALPHA"): matched: 1 - suggestion: REMOVE (OpSys == "WINNT"): matched: 2 - suggestion: KEEP (Memory >= 64): matched: 4 - suggestion: KEEP
Example 3 JOB MACHINES 1 4 2 5 3 [ Owner = “jsmith”; ImageSize = 120000; Requirements = ... ] MACHINES [ Requirements = (ImageSize <= 50176) && (MemoryReq < 49) ] 1 [ Requirements = (ImageSize <= 50176) && (MemoryReq < 49) ] 4 [ Requirements = (ImageSize <= 50176) && (MemoryReq < 49) ] 2 [ Requirements = (ImageSize <= 115712) && (MemoryReq < 98) ] 5 [ Requirements = (ImageSize <= 115712) && (MemoryReq < 98) ] 3
+ ¥ 115712 3,5 ImageSize 50176 1,2,3,4,5 3,5 + ¥ 49 98 MemoryReq
Example 3 JOB Results of Test --------------- [ Owner = “jsmith”; ImageSize = 120000; Requirements = ... ] Results of Test --------------- The following attributes are missing from the job classad: MemoryReq The following attributes should be added or modified: ImageSize: - suggestion: use a value less than or equal to 50176 MemoryReq: - suggestion: use a value less than 49
Current Work ClassAd analysis prototype implemented in Java Job requirements analysis Machine requirements analysis Current version supports a simple menu driven interface Working on integrating with Condor tools condor_q –analyze condor_status
Future Work Applications to other uses of ClassAds in Condor Analysis of a successful match Graphical interface Analysis of gang matching ClassAds as an authorization language
Conclusions Automated fine grained policy expression analysis is useful and feasible Different issues arise with job requirements analysis and machine requirements analysis The ClassAd language is ideal for these purposes.