Job Submission Via File Transfer
Run Everywhere Use all available resources Submit locally Run globally
Foreign Languages Using remote resources Easy with HTCondor and friendly admins Flocking Harder otherwise Lost jobs Unknown queue times Different allocation expectations Different level of service
Paper Over the Differences Don’t send user jobs directly Make everything act like HTCondor Glideins Run HTCondor startd as a job/image/container Send jobs to it Turn remote resources into temporary members of your own HTCondor pool See also Annex
Glidein Example Home Away slurm schedd startd shadow* starter* job * One shadow, starter per running job
Private Networks Machines with no public network access HPC Cloud “Split the schedd”
2-Schedd Glidein Example Home Away slurm schedd schedd startd shadow starter job (Private network)
Really Private Networks Centers with limited remote access Job submission Data transfer No general network communication Submit via a shared file service
Talking Via a File System Goal: Support any file sharing mechanism NFS, Gluster, GPFS Box.com, Google Drive, Dropbox Gridftp, xrootd, rsync Blog site comments
Job Management Primitives Submit Write job description and input sandbox Status (optional) Write status description Completion Write final status description and output sandbox Cleanup or Removal Delete job description and input sandbox
File-Based Submission Example Home Away slurm schedd schedd startd shadow starter job (Private network)
File-Based Job Submission Box Cloud Storage JobXXX Schedd A Schedd B request status.1 status.2 status.3 input input output input output output
And Along Came CMS Stretching the boundaries of Run Everywhere 250,000 cores 100 sites CMS researchers at PIC wanted to run on Mare Nostrum at BSC
BSC Site Setup Execute nodes Login nodes Shared filesystem No public network access (in or out) Login nodes No output network connections Inbound network for ssh and file transfer only No long-lived or cpu-intensive programs Shared filesystem GPFS (IBM General Parallel File System)
Find a New Model Can’t run a schedd at BSC CMS likes late binding Maybe run as part of the glidein CMS likes late binding Jobs stay at home schedd until a machine is ready to run them Let’s split the starter in two
Setting It Up Run a startd at PIC (close to BSC) Advertises the resources of a set of BSC nodes Won’t match until BSC job starts Sshfs mount from PIC to BSC’s GPFS Submit starter job to BSC’s SLURM When a job arrives at PIC startd PIC starter writes job to GPFS BSC starter reads job from GPFS and runs it
File System Example CERN PIC BSC slurm schedd startd launcher shadow starter starter job sshfs (Private network)
How Is It Different? No changes outside of starter Other daemons unaware Some features don’t work Ssh-to-job Chirp Streaming output Periodic checkpointing
Progress Done TODO Run sleep jobs on 2 BSC nodes Jobs started at CERN schedd TODO Use more BSC nodes Larger data transfers Fault tolerance Run CMS application
Questions?