GLOW A Campus Grid within OSG University of Wisconsin, Madison Dan Bradley Representing UW Madison CMS, GLOW, and Condor Teams
Overview: GLOW & OSG What is the value of a local campus-level grid? Why are we a part of OSG? How do we make them work together?
Why have a campus or enterprise grid? very high utilization more diverse users = less wasted cycles simplicity All we need is Condor at campus level. Plus, we get the full feature-set rather than lowest common denominator. collective buying power We speak to vendors with one voice. consolidated administration Fewer chores for scientists. Fewer holes for hackers. synergy Face-to-face technical meetings between members. Mailing list scales well at campus level.
Why is GLOW part of OSG? We can always use more resources. But we want to share when we have a surplus. Our users want to collaborate outside the bounds of the campus (e.g. Atlas and CMS). Others may join that trend. OSG does not infringe on our local control. The OSG grid interface does not limit our choice of technology within the campus grid, because it strives to remain independent of it.
What is the UW Campus Grid? Condor pools at various departments, made accessible via Condor ‘flocking’ Users submit jobs to their own private or department Condor scheduler. Jobs are dynamically matched to available machines. No cross-campus NFS for file access. People use Condor remote I/O, sandboxes, AFS, dCache, etc.
How big is the UW campus grid? GLOW Condor pool is distributed across the campus at the sites of the machine owners. 1800 cores 100 TB disk Over 25 million CPU-hours served Machine owner always has highest priority (via machine rank expression). Computer Science Condor pool 1000 cores Other private pools serve as submission and execution points for some users. Their excess jobs flock to GLOW and CS pools.
Who Uses UW Campus Grid? Computational Genomics, Chemistry High Energy Physics (CMS, Atlas) Materials by Design, Chemical Engineering Radiation Therapy, Medical Physics Computer Science Amanda, Ice-cube, Physics/Space Science Plasma Physics OSG VOs: nanohub, DZero, CDF, Zeus, … Diverse users with different conference deadlines, and usage patterns.
Submitting Jobs within UW Campus Grid HEP matchmaker CS matchmaker GLOW matchmaker schedd (Job caretaker) condor_submit job ClassAd flocking startd (Job Executor) machine ClassAd Supports full feature-set of Condor: matchmaking remote system calls checkpointing MPI universe suspension VMs preemption policies job runs
Submitting jobs through OSG to UW Campus Grid HEP matchmaker CS matchmaker GLOW matchmaker flocking Globus gatekeeper schedd (Job caretaker) condor_submit schedd (Job caretaker) job ClassAd startd (Job Executor) machine ClassAd condor gridmanager job runs
Routing Jobs from UW Campus Grid to OSG HEP matchmaker CS matchmaker GLOW matchmaker schedd (Job caretaker) condor_submit Grid JobRouter globus gatekeeper transform to a grid job condor gridmanager Best of both worlds: simple, feature-rich local mode transformable to standard OSG job for traveling globally
Conclusions: GLOW & OSG Our UW Condor grid is not a collection of OSG mini-sites, nor do we intend it to be. However, interoperability with the OSG has been an increasingly important activity for us, because it brings real benefits to our users. This model is emerging on many campuses today. We believe tomorrow’s researches will expect and demand to be well connected to both local and global computing resources.