Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid Laboratory Of Wisconsin (GLOW)

Similar presentations


Presentation on theme: "Grid Laboratory Of Wisconsin (GLOW)"— Presentation transcript:

1 Grid Laboratory Of Wisconsin (GLOW)
UW Madison’s Campus Grid Dan Bradley Department of Physics & CS Representing the GLOW + Condor Teams May 13, 2019 2006 ESCC/Internet2 Joint Techs Workshop

2 2006 ESCC/Internet2 Joint Techs Workshop
The Premise Many researchers have computationally intensive problems. Individual workflows rise and fall over the coarse of weeks and months. Computers and computing people are less volatile than a researcher’s demand for them. May 13, 2019 2006 ESCC/Internet2 Joint Techs Workshop

3 Grid Laboratory of Wisconsin
2003 Initiative funded by NSF/UW Six Initial GLOW Sites Computational Genomics, Chemistry Amanda, Ice-cube, Physics/Space Science High Energy Physics/CMS, Physics Materials by Design, Chemical Engineering Radiation Therapy, Medical Physics Computer Science Diverse users with different deadlines and usage patterns. May 13, 2019 2006 ESCC/Internet2 Joint Techs Workshop

4 2006 ESCC/Internet2 Joint Techs Workshop
UW Madison Campus Grid Condor pools in various departments, made accessible via Condor ‘flocking’ Users submit jobs to their own private or department Condor scheduler. Jobs are dynamically matched to available machines. Crosses multiple administrative domains. No common uid-space across campus. No cross-campus NFS for file access. Users rely on Condor remote I/O, file-staging, AFS, SRM, gridftp, etc. May 13, 2019 2006 ESCC/Internet2 Joint Techs Workshop

5 UW Campus Grid Machines
GLOW Condor pool is distributed across the campus to provide locality with big users. GHz Xeon CPUs GHz Opteron cores 100 TB disk Computer Science Condor pool 1000 ~1GHz CPUs testbed for new Condor releases Other private pools job submission and execution private storage space excess jobs flock to GLOW and CS pools May 13, 2019 2006 ESCC/Internet2 Joint Techs Workshop

6 2006 ESCC/Internet2 Joint Techs Workshop
New GLOW Members Proposed minimum involvement One rack with about 50 CPUs Identified system support person who joins GLOW-tech Can be an existing member of GLOW-tech PI joins the GLOW executive committee Adhere to current GLOW policies Sponsored by existing GLOW members UW ATLAS and other physics groups were proposed by CMS and CS, and were accepted as new members Expressions of interest from other groups May 13, 2019 2006 ESCC/Internet2 Joint Techs Workshop

7 2006 ESCC/Internet2 Joint Techs Workshop
Housing the Machines Condominium Style centralized computing center space, power, cooling, management standardized packages Neighborhood Association Style each group hosts its own machines each contributes to administrative effort base standards (e.g. Linux & Condor) to make easy sharing of resources GLOW has elements of both, but leans towards neighborhood style May 13, 2019 2006 ESCC/Internet2 Joint Techs Workshop

8 2006 ESCC/Internet2 Joint Techs Workshop
What About “The Grid” Who needs a campus grid? Why not have each cluster join “The Grid” independently? May 13, 2019 2006 ESCC/Internet2 Joint Techs Workshop

9 The Value of Campus Scale
simplicity software stack is just Linux + Condor fluidity high common denominator makes sharing easier and provides richer feature-set collective buying power we speak to vendors with one voice standardized administration e.g. GLOW uses one centralized cfengine synergy face-to-face technical meetings mailing list scales well at campus level May 13, 2019 2006 ESCC/Internet2 Joint Techs Workshop

10 2006 ESCC/Internet2 Joint Techs Workshop
The value of the big G Our users want to collaborate outside the bounds of the campus (e.g. Atlas and CMS are international). We also don’t want to be limited to sharing resources with people who have made identical technological choices. The Open Science Grid gives us the opportunity to operate at both scales, which is ideal. May 13, 2019 2006 ESCC/Internet2 Joint Techs Workshop

11 2006 ESCC/Internet2 Joint Techs Workshop
On the OSG Map Any GLOW member is free to link their resources to other grids. facility: WISC site: UWMadisonCMS May 13, 2019 2006 ESCC/Internet2 Joint Techs Workshop

12 Submitting Jobs within UW Campus Grid
UW HEP User HEP matchmaker CS matchmaker GLOW matchmaker schedd (Job caretaker) condor_submit flocking startd (Job Executor) Supports full feature-set of Condor: matchmaking remote system calls checkpointing MPI suspension VMs preemption policies May 13, 2019 2006 ESCC/Internet2 Joint Techs Workshop

13 Submitting jobs through OSG to UW Campus Grid
Open Science Grid User HEP matchmaker CS matchmaker GLOW matchmaker flocking Globus gatekeeper schedd (Job caretaker) condor_submit schedd (Job caretaker) startd (Job Executor) condor gridmanager May 13, 2019 2006 ESCC/Internet2 Joint Techs Workshop

14 Routing Jobs from UW Campus Grid to OSG
HEP matchmaker CS matchmaker GLOW matchmaker schedd (Job caretaker) condor_submit Grid JobRouter globus gatekeeper condor gridmanager Combining both worlds: simple, feature-rich local mode when possible, transform to grid job for traveling globally May 13, 2019 2006 ESCC/Internet2 Joint Techs Workshop

15 GLOW Architecture in a Nutshell
One big Condor pool But backup central manager runs at each site (Condor HAD service) Users submit jobs as members of a group (e.g. “CMS” or “MedPhysics”) Computers at each site give highest priority to jobs from same group (via machine RANK) Jobs run preferentially at the “home” site, but may run anywhere when machines are available May 13, 2019 2006 ESCC/Internet2 Joint Techs Workshop

16 Accommodating Special Cases
Members have flexibility to make arrangements with each other when needed Example: granting 2nd priority Opportunistic access Long-running jobs which can’t easily be checkpointed can be run as bottom feeders that are suspended instead of being killed by higher priority jobs Computing on Demand tasks requiring low latency (e.g. interactive analysis) may quickly suspend any other jobs while they run May 13, 2019 2006 ESCC/Internet2 Joint Techs Workshop

17 2006 ESCC/Internet2 Joint Techs Workshop
Example Uses Chemical Engineering Students do not know where the computing cycles are coming from - they just do it - largest user group ATLAS Over 15 Million proton collision events simulated at 10 minutes each CMS Over 70 Million events simulated, reconstructed and analyzed (total ~10 minutes per event) in the past one year IceCube / Amanda Data filtering used 12 CPU-years in one month Computational Genomics Prof. Shwartz asserts that GLOW has opened up a new paradigm of work patterns in his group They no longer think about how long a particular computational job will take - they just do it May 13, 2019 2006 ESCC/Internet2 Joint Techs Workshop

18 2006 ESCC/Internet2 Joint Techs Workshop
Summary Researchers are demanding to be well connected to both local and global computing resources. The Grid Laboratory of Wisconsin is our attempt to meet that demand. We hope you too will find a solution! May 13, 2019 2006 ESCC/Internet2 Joint Techs Workshop


Download ppt "Grid Laboratory Of Wisconsin (GLOW)"

Similar presentations


Ads by Google