Nicholas Coleman Computer Sciences Department University of Wisconsin-Madison Distributed Policy Management and Comprehension with Classified Advertisements
A Distributed Policy Scenario › A user submits a job to Condor › The user has designed a policy defining requested services › Machines in condor pool have policies restricting the use of services › The user’s job won’t run - Why? Is user’s policy to restrictive? Was job rejected by machine policies?
Policy Management › Resource allocation challenges Resource heterogeneity Policy heterogeneity › How to allocate resources? Conventional centralized allocation not sufficient Solution: Matchmaking with Classified Advertisements (ClassAds)
Job ? [ MyType = “Job”; Rank =... Requirements =... ] [ MyType = “Job”; Rank =... Requirements =... ] [ MyType = “Machine”; Rank =... Requirements =... ] [ MyType = “Machine”; Rank =... Requirements =... ] [ MyType = “Machine”; Rank =... Requirements =... ] [ MyType = “Machine”; Rank =... Requirements =... ] [ MyType = “Machine”; Rank =... Requirements =... ] [ MyType = “Machine”; Rank =... Requirements =... ] [ MyType = “Machine”; Rank =... Requirements =... ] [ MyType = “Machine”; Rank =... Requirements =... ] [ MyType = “Machine”; Rank =... Requirements =... ] [ MyType = “Machine”; Rank =... Requirements =... ] [ MyType = “Machine”; Rank =... Requirements =... ] [ MyType = “Machine”; Rank =... Requirements =... ] Matchmaking
Classified Advertisements › Represent entities (e.g. jobs, machines) and their policies › A ClassAd is a set of named expressions called attributes › Types of attributes: Characteristics of an entity ( Arch, OpSys, Memory ) Constraints for requested resource ( Requirements ) Preferences for requested resource ( Rank )
Typical Classads [ Type = “Machine”; KeybrdIdle = ‘00:23:12’; Memory = 256M; LoadAvg = ; Kflops = 21893; Arch = “INTEL”; OpSys = “LINUX”; Name = “foo.cs.wisc.edu”; Rank = (DayTime() >= ‘9:00’) && ((DayTime() <= ‘17:00’) ? 1/other.ImageSize : 0); Requirements = (other.Type == “Job”) && (other.Owner != “riffraff”) && (LoadAvg < 0.3) && (KeybrdIdle > ‘00:15’); ] [ Type = “Job”; Owner = “ncoleman”; Cmd = “run_sim”; Memory = 31m; Rank = KFlops/1E3 + other.Memory/32; Requirements = (other.Type == “Machine”) && (other.Arch == “INTEL”) && (other.Opsys == “LINUX”) && (other.Memory >= 128); ]
Policy Comprehension › Why won’t my job run? My policy is too restrictive My job is rejected by machines in the pool › Looking for answers Use condor tools (condor_q, condor_status) Stare at job ClassAd to find out what’s wrong
Condor Tools › condor_q –analyze: Of 105 resource offers, 105 do not satisfy the request's constraints 64 resource offer constraints are not satisfied by this request › User wants more details: Which parts of job requirements expression are problematic? Is job ClassAd missing any attributes?
Two Cases to Examine 1. No machines meet the job’s requirements 2. The job does not meet any machine’s requirements One or both of these issues may be preventing the job from running, but they are not interdependent. We can analyze each one separately.
[ Requirements = (Arch==“SPARK”) &&(OpSys==“SOLARIS2.7”) ] JOB Example 1
[ Requirements = (Arch==“SPARK”) &&(OpSys==“SOLARIS2.7”) ] JOB Example 1 Result: (Arch == “SPARK"): did not match - suggestion: REMOVE (Opsys == "SOLARIS2.7"): matched: 2 - suggestion: KEEP
[ Requirements = (Arch==“ALPHA”) && (OpSys==“WINNT”) && (Memory>=64) ] JOB Example 2
[ Requirements = (Arch==“ALPHA”) && (OpSys==“WINNT”) && (Memory>=64) ] JOB Example 2 Result: (Arch == "ALPHA"): matched: 1 - suggestion: REMOVE (OpSys == "WINNT"): matched: 2 - suggestion: KEEP (Memory >= 64): matched: 4 - suggestion: KEEP
[ Owner = “jsmith”; ImageSize = ; Requirements =... ] [ Requirements = (ImageSize <= 50176) && (MemoryReq < 49) ] [ Requirements = (ImageSize <= ) && (MemoryReq < 98) ] [ Requirements = (ImageSize <= 50176) && (MemoryReq < 49) ] JOB MACHINES [ Requirements = (ImageSize <= 50176) && (MemoryReq < 49) ] [ Requirements = (ImageSize <= ) && (MemoryReq < 98) ] Example 3
ImageSize MemoryReq 1,2,3,4,5 3,
[ Owner = “jsmith”; ImageSize = ; Requirements =... ] JOB Example 3 Results of Test The following attributes are missing from the job classad: MemoryReq The following attributes should be added or modified: ImageSize: - suggestion: use a value less than or equal to MemoryReq: - suggestion: use a value less than 49
Current Work › ClassAd analysis prototype implemented in Java Job requirements analysis Machine requirements analysis Current version supports a simple menu driven interface › Working on integrating with Condor tools condor_q –analyze condor_status
Future Work › Applications to other uses of ClassAds in Condor › Analysis of a successful match › Graphical interface › Analysis of gang matching › ClassAds as an authorization language
Conclusions › Automated fine grained policy expression analysis is useful and feasible › Different issues arise with job requirements analysis and machine requirements analysis › The ClassAd language is ideal for these purposes.