Open Science Grid: More compute power Alan De Smet
chtc.cs.wisc.edu (CPU days each day averaged over one month) CHTC Cores In Use 1,500
chtc.cs.wisc.edu (CPU days each day averaged over one month) OSG Cores In Use 60,000
chtc.cs.wisc.edu Open Science Grid
chtc.cs.wisc.edu CHTC and OSG usage (CPU days each day)
chtc.cs.wisc.edu Challenges Solved We worry about all of this. You don’t have to. › Authentication X.509 certificates, certificate authorities, VOMS › Interface Globus, GridFTP, Grid universe › Validation Linux distribution, glibc version, basic libraries
chtc.cs.wisc.edu Using OSG › Before universe = vanilla executable = myjob log = myjob.log queue
chtc.cs.wisc.edu Using OSG › After universe = vanilla executable = myjob log = myjob.log +WantGlidein = true queue
chtc.cs.wisc.edu Challenge: Opportunistic › OSG computers go away without notice › Solutions Condor restarts automatically Sub-hour jobs Self-checkpointing Automated checkpointing Condor’s standard universe DMTCP
chtc.cs.wisc.edu Challenge: Local Software
chtc.cs.wisc.edu Challenge: Local Software › Bare-bones Linux systems › Solution Bring everything with you CHTC provided MATLAB and R packages RunDagEnv/mkdag
chtc.cs.wisc.edu Challenge: Erratic Failures › Complex systems fail sometimes › Solution Expect failures and automatically retry DAGMan for retries DAGMan POST scripts to detect problems RunDagEnv/mkdag
chtc.cs.wisc.edu Challenge: Bandwidth › Solutions Only send what you need Store large, shared files in our web cache Read small amounts of data on the fly Condor’s standard universe Parrot