Alain Roy Computer Sciences Department University of Wisconsin-Madison I/O Access in Condor and Grid
2 What is Condor? › Condor is a batch job system › Goal: High throughput computing Different than high-performance › Goal: High reliability › Goal: Support distributed ownership
3 High Throughput Computing › Worry about FLOPS/year, not FLOPS/second › Use all resources effectively Dedicated clusters Non-dedicated computers (desktop)
4 Effective resource use › Requires high reliability Computers come and go, your jobs shouldn’t. Checkpointing Be prepared for everything breaking › Requires distributed ownership › Requires distributed access Must deal with lack of shared filesystem
5 Jobs in Condor › Standard Universe Checkpointing & Migration Remote I/O Available to many (not all) jobs › Vanilla Universe Any job you want No checkpointing, No remote I/O › Other Universes MPI PVM Java
6 Machines in Condor › Distributed ownership Your desktop can be in a Condor pool You choose how it is used You choose when it is used › Dedicated computers or non-dedicated
7 Notable Users › UW-Madison Condor pool 900+ CPUs, millions of CPU hours/year › INFN 150 CPUs? › Oracle CPUs, worldwide › Hundreds of pools worldwide
8 Working with files in Condor › Today Shared file systems Transferring files Remote I/O › Tomorrow Pluggable File System NeST Stork
9 Review: Submitting a job › Write a submit file: Executable= dowork Input= dowork.in Output= dowork.out Arguments= 1 alpha beta Universe= vanilla Log= dowork.log Queue › Give it to Condor: condor_submit › Watch it run: condor_q } Files: on shared fs
10 What happens when it runs? › Condor requires a shared filesystem for this job. Why? You have a vanilla job You did not tell it to transfer any files › Therefore, Condor adds a requirement: (TARGET.FileSystemDomain == MY.FileSystemDomain) › What does this mean?
11 What happens when it runs? Matchmaker Job Submit computer Job Executable = … Input = … Req = (TARGET.FileSystemDomain == MY.FileSystemDomain) FileSystemDomain = “bo.infn.it” …. Machine Name = “condor.pd.infn.it” Req = (TARGET.ImageSize < 1000) FileSystemDomain = “pd.infn.it” …. Computer 1 condor.pd.infn.it Machine Name = “lnxi.bo.infn.it” Req = (TARGET.ImageSize < 1000) FileSystemDomain = “bo.infn.it” …. Computer 2 lnxi.bo.infn.it
12 No shared filesystem › Tell Condor to transfer files: Executable= “dowork” Input= dowork.in Output= dowork.out Arguments= 1 alpha beta Universe= vanilla Log= dowork.log Transfer_Files= ONEXIT Transfer_Input_Files = dataset Queue › Job can run in Padova or Bologna Files are always transferred } Automatically transferred
13 Shared Filesystem? › Even better: (Condor and later) … Input= dowork.in Output= dowork.out Universe= vanilla Should_Transfer_files = IF_NEEDED Transfer_Input_Files = dataset Rank= (MY.FileSystemDomain == TARGET.FileSystemDomain) › Job can run in Padova or Bologna Files are transferred, for Padova, not Bo We prefer avoiding transfer
14 Standard Universe › Standard universe provides: Checkpointing Remote I/O › Requires re-link of your program No recompilation › Doesn’t work for all programs No threads No dynamic libraries on Linux Limited networking
15 Remote I/O › All system calls are redirected to submission computer No files are transferred It looks “just like home” › Job runs as “nobody” on remote computer › Files should be read or write only, not both
16 Is remote I/O efficient? › If you read a file only once, yes › If you read less than a whole file, yes › If you read a file many time, it may be less efficient › We find it is not a problem for most jobs
17 How do you use it? › Compile your program gcc –c somejob.c somejob.o › Link your program: condor_compile gcc –o somejob somejob.o › Use the standard universe: Executable = somejob Universe = standard …
18 What happens when it runs? › Condor does not require a shared filesystem You used the standard universe Files will not be transferred › Condor does not modify the requirements
19 What happens when it runs? Submit Computer Execute Computer Job Remote I/O library Shadow Remote I/O handler Disk
20 Summary: Condor Today › Vanilla jobs works with: Shared file system Transferring files › Standard universe: Remote I/O Checkpointing
21 Pluggable File System (PFS) › Bypass: an interposition agent Shared library squeezes into program Intercepts calls to specific functions › PFS Intercepts file access Use FTP/GSI-FTP/HTTP/NeST › Like Condor remote I/O except: No relinking Usable outside of Condor
22 Bypass (Work by Doug Thain) › A tool for interposition agents › Uses dynamic library preload setenv LD_PRELOAD /usr/lib/pfs.so › Just write replacement code: ssize_t read(int fd…) agent_action {{ //code to do something return read(…); // call real read }}
23 PFS (Doug Thain) › PFS uses Bypass: intercepts all file accesses › When you access a “URL”, it implements it. /http/ › Just use pfsrun: pfsrun vi /http/ › Warning: it mostly works
24 Live Demo › pfsrun vi /http/ › pfsrun tcsh –f › cd /anonftp/ftp.cs.wisc.edu/condor; › pwd › more R[Tab] › cd; › grep -i infom /http/
25 PFS Status › You can download it today › It works well, but not all the time › Can give remote I/O in the vanilla universe › An alternative is being explored: ptrace
26 NeST › Network storage for the grid › A file server on steroids › A “work in progress” › Work by John Bent and Joseph Stanley
27 NeST as a file server › Any user, not just root—you do not need to be a system administrator › Multiple protocols: access however you like: GridFTP, FTP, HTTP, NFS, and more
28 Lots › NeST supports storage allocations, or lots › You request “500MB for 10 hours” › You can rely on your storage being there (Well, as much as you can rely on anything in a grid)
29 Nest as a Cache › Run one NeST as a “master” Your home file server Make it reliable: well-maintained machine, UPS… › Run other NeSTs as caches Point to master NeST Cache data locally May be unreliable
30 Using a NeST cache › Use the NeST protocol to talk to the cache › If the cache disappears, you will talk to the master NeST › Is it inconvenient to use the NeST protocol?
31 PFS + NeST › PFS can speak to NeST › Your applications can speak to a NeST cache with no modification › You can work in a wide-area or grid environment with no modification › You get local data access for free Recall question: is remote I/O a good idea?
32 Scenario 1 › Submit script as job. The script: Runs NeST Runs your job with PFS & file arguments pointing at NeST Submit MachineExecute Machine JobNeST Cache PFS NeST Master
33 Scenario 1 › When your job reads data, PFS redirects requests to NeST cache › If the data is not present it is requested from the NeST master › If the NeST cache fails, the NeST master is used
34 Scenario 2 › Submit one job that is the NeST cache › Submit many jobs that access this NeST cache Submit Machine NeST Master Execute Machine 1 NeST Cache Job PFS Job PFS Job PFS
35 Condor or a Grid? › These scenarios work in Condor & on a grid › Mostly useful across a wide-area, not a local Condor pool › Clever way to use it on a grid: Condor-G Glide-in
36 What is Condor-G? › Another Condor universe: the Globus universe › When you submit jobs, they are not run by Condor, but are given to Globus › Condor-G provides job persistence and reliability for Globus jobs
37 Condor-G Submit Machine Job Exec=… Args=… Universe = Globus … Globus Gatekeeper Cluster
38 Glide-in › Problem: You want the whole world to be your Condor pool › Solution: Create an overlay Condor pool Run Condor daemons as a job on another pool You have a larger Condor pool
39 Glide-in Globus Gatekeeper Remote Cluster Submit Machine Job Exec=condor Universe = Globus Local Cluster Job Exec = my job Universe = vanilla
40 Nest & Glide-in › Submit glide-in job that is Nest cache and Condor daemons › Your remote jobs access Nest cache › Your local jobs access Nest master › Everything looks like Condor › Good performance everywhere
41 NeST Status › In active development › You can download it today: › Cache feature is experimental › Paper: Pipeline and Batch Sharing in Grid Workloads
42 Stork › Background: the job problem Globus-job-run Unreliable: no persistent job queue No retries Condor-G Reliable: persistent job queue Retry after failures Submit it and forget it!
43 Stork: the file problem › Background: the file transfer problem Globus-url-copy (or wget, or…) Unreliable: no persistent queue No retries Stork: Reliable: persistent queue of file transfers to be done Retries on failure
44 Why bother? › You could do Stork with Condor, but… › Stork understands file transfers Local files FTP Nest › Stork understands space reservations › Stork recovers from failures › Just “submit & forget” GSI-FTP SRB SRM
45 A Stork “job” [ dap_type = "transfer"; src_url = "srb://ghidorac.sdsc.edu/test1.txt"; dest_url = "nest://db18.cs.wisc.edu/test8.txt"; ] › stork_submit: queue transfer › stork_status: show progress
46 One job isn’t enough › Reserve space › Transfer files › Run job › Release space › How do we combine these?
47 Condor DAGMan › DAG = Directed Acyclic Graph › A DAG is the data structure used by DAGMan to represent these dependencies. › Each job is a “node” in the DAG. › Each node can have any number of “parent” or “children” nodes – as long as there are no loops! Job A Job BJob C Job D
48 DAGMan + Stork › A DAG can combine Condor jobs with Stork jobs › Useful in a grid › Can be used with a NeST or without Reserve Transfer Run TransferRelease
49 Summary › Condor Today: Files on shared filesystems Transfer files Remote I/O › Condor & grids tomorrow: Pluggable file system Nest Stork Some combination of the above