Slot Acquisition Presenter: Daniel Nurmi
Scope One aspect of VGDL request is the time ‘slot’ when resources are needed –Earliest time when resource set is needed –Maximum duration resource set will be used Three classes of resources –dedicated: always available –batch controlled: lag before available –advanced reservation: guaranteed availability in the future
Acquisition Routines Each class of resource needs the following (logical) routines –Prob = Query (cluster, nodes, walltime, starttime) –Id = BindInit (cluster, nodes, walltime, starttime, success_prob) –Status = Check (id) –Status = Install(id)
Slot Manager Acquisition Procedure Query Bind Is available? probability Query() Initiate bind Bind yet? True/false/abort BindInit() Check() Install() Slot Manager Install PBS glide-in when time
Dedicated Query –NOP (prob = 1) BindInit –NOP (always true) Check –NOP (always true) Install –Installs PBS glide-in
Advanced Reservation Query –Makes request to advanced reservation system –Prob = 1 if we can make the reservation –Prob = 0 if we cannot BindInit –Make adv. res. Request Check –NOP (always return true) Install –Submit PBS glide-in installation job to specialized adv. res. queue
Batch Controlled Query –Performs an algorithm to determine probability of meeting the slot requirement through regular batch queue BindInit –Use values calculated from ‘query’ for job dimensions and time to wait before submission Check –When ‘time to wait’ has elapsed, return true Install –Submit PBS glide-in installation job
The Algorithm Routines –‘deadline’ is ‘seconds from now’ –P = bqp_pred(machine, nodes, walltime, deadline) Algorithm Preq = 0.75 past = 0 P = bqp_pred(M, N, W+D, D) While((D-past) > 0) { if (P ~ Preq) { wait = past real_walltime = W+(D-past) } past += 30 P = bqp_pred(M, N, W+(D-past), (D-past)) }
Batch Experiment 75% is the target probability 356 total requests 257 total batch submissions –99 requests resulted in initial ‘not possible’ response 192 slots successfully acquired 257 *.75 = 193 Choose last acceptable time to minimize waste now 0.75 submit time
Near Term Experiments Try other probability levels Try other deadlines
PBS Glide-in Basic batch queue system assumes one-to-one mapping of job to resource set (slot) Idea: once a single ‘slot’ has been acquired, install ‘personal’ res. manager and scheduler within it in order to support multiple jobs within single slot Have instrumented torque (PBS) to fulfill this task –Plays the role that Condor would play as infrastructure scheduler –PBS “glide-in” –Simpler, supports MPI, etc.
PBS Overview PBS ServerPBS Sched PBS Mom node1 PBS Mom node2 PBS Mom node3 PBS Mom node4 Transfer scriptA qsub ‘scriptA’ scriptA gets node1, node2, and node3
PBS Overview PBS ServerPBS Sched PBS Mom node1 PBS Mom node2 PBS Mom node3 PBS Mom node4 scriptA ssh cmd cmd ssh cmd cmd
PBS glide-in PBS ServerPBS Sched PBS Mom node1 PBS Mom node2 PBS Mom node3 PBS Mom node4 pglide.pbs qsub pglide.pbs
PBS glide-in PBS ServerPBS Sched PBS Mom node1 PBS Mom node2 PBS Mom node3 PBS Mom node4 pglide.pbspbs_mom pbs_server pbs_sched
PBS glide-in PBS ServerPBS Sched PBS Mom node1 PBS Mom node2 PBS Mom node3 PBS Mom node4 pglide.pbspbs_mom pbs_server pbs_sched qsub scriptA GRAM globusrun-ws jobA globusrun-ws jobB qsub scriptB scriptAscriptB
PBS glide-in TODO In order to implement this, needed to disable some of PBS internal security features (drop privs, root check, priv ports, user auth checks, host auth checks) Streamline installation process (good but not great) Architecture discussion: one server per slot? One server for all slots on a single machine? –Requires reworking torque software a bit
Slot Acquisition Status BQP ‘virtual advanced reservation’ system in place PBS glide-in working on all machines Dan has access to Need to investigate advanced reservation interface(s) Need to figure out how to properly submit PBS jobs using GRAM
Thanks! Questions?
Statistics TODO More reactive change point detection –Machine down time constitutes a change point we can detect better –Better understanding of autocorrelation and quantiles Non-statistical case –One user submits 20,000 single processor jobs
Current Cluster Status DedicatedBatch Controlled Advanced Res. Dante X ? NCSA Mercury X ? SDSC Teragrid X ? ADA X ? IU TG X ? IU BigRed X ? IU Tyr X ?