Grid Laboratory Of Wisconsin (GLOW) Sridhara Dasu, Dan Bradley, Steve Rader Department of Physics Miron Livny, Sean.

Slides:



Advertisements
Similar presentations
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Advertisements

CHEPREO Tier-3 Center Achievements. FIU Tier-3 Center Tier-3 Centers in the CMS computing model –Primarily employed in support of local CMS physics community.
Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.
Building Campus HTC Sharing Infrastructures Derek Weitzel University of Nebraska – Lincoln (Open Science Grid Hat)
Duke Atlas Tier 3 Site Doug Benjamin (Duke University)
Cyberinfrastructure and Scientific Collaboration: Application of a Virtual Team Performance Framework to GLOW II Teams Sara Kraemer Chris Thorn Value-Added.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
Workload Management Massimo Sgaravatto INFN Padova.
Condor Overview Bill Hoagland. Condor Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware.
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
Research Computing with Newton Gerald Ragghianti Nov. 12, 2010.
Miron Livny Computer Sciences Department University of Wisconsin-Madison From Compute Intensive to Data.
F Run II Experiments and the Grid Amber Boehnlein Fermilab September 16, 2005.
S. Dasu, CHEP04, Interlacken, Switzerland1 Use of Condor and GLOW for CMS Simulation Production What are Condor & GLOW? What is special about Condor &
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Submit locally and run globally – The GLOW and OSG Experience.
HTPC - High Throughput Parallel Computing (on the OSG) Dan Fraser, UChicago OSG Production Coordinator Horst Severini, OU (Greg Thain, Uwisc) OU Supercomputing.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Welcome to CW 2007!!!. The Condor Project (Established ‘85) Distributed Computing research performed by.
History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006.
A. Mohapatra, HEPiX 2013 Ann Arbor1 UW Madison CMS T2 site report D. Bradley, T. Sarangi, S. Dasu, A. Mohapatra HEP Computing Group Outline  Infrastructure.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
Hideko Mills, Manager of IT Research Infrastructure
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
CSED Computational Science & Engineering Department CHEMICAL DATABASE SERVICE The Current Service is Well Regarded The CDS has a long and distinguished.
Campus Grids (May 27, 2005)Paul Avery1 University of Florida Bringing Grids to University Campuses International ICFA Workshop on HEP,
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
Thu, Oct 8th 2009OSCER Symposium 2009; Grid Computing 101; Karthik Arunachalam, University of Oklahoma 1 Grid Computing 101 Karthik Arunachalam IT Professional.
Miron Livny Center for High Throughput Computing Computer Sciences Department University of Wisconsin-Madison Open Science Grid (OSG)
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor : A Concept, A Tool and.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
Dan Bradley, USCMS Tier-2 Workshop, Wisconsin1 USCMS Tier-2 in Wisconsin UW High Energy Physics Dan Bradley Sridhara Dasu Vivek Puttabuddhi Steve Rader.
Data Intensive Science Network (DISUN). DISUN Started in May sites: Caltech University of California at San Diego University of Florida University.
1 Condor Team 2011 Established 1985.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
High Energy FermiLab Two physics detectors (5 stories tall each) to understand smallest scale of matter Each experiment has ~500 people doing.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
LHC Computing, CERN, & Federated Identities
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
Welcome!!! Condor Week 2006.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE1 University of Wisconsin-Madison CMS Tier-2 Site Report D. Bradley, S. Dasu, A. Mohapatra, T. Sarangi, C. Vuosalo.
1 Open Science Grid.. An introduction Ruth Pordes Fermilab.
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
INFSO-RI Enabling Grids for E-sciencE EGEE general project update Fotis Karayannis EGEE South East Europe Project Management Board.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Gene Oleynik, Head of Data Storage and Caching,
Workload Management Workpackage
Condor A New PACI Partner Opportunity Miron Livny
Condor – A Hunter of Idle Workstation
EGEE support for HEP and other applications
US CMS Testbed.
Introduce yourself Presented by
Grid Laboratory Of Wisconsin (GLOW)
GLOW A Campus Grid within OSG
Presentation transcript:

Grid Laboratory Of Wisconsin (GLOW) Sridhara Dasu, Dan Bradley, Steve Rader Department of Physics Miron Livny, Sean Murphy, Erik Paulson Department of Computer Science

Grid Laboratory of Wisconsin Computational Genomics, Chemistry Amanda, Ice-cube, Physics/Space Science High Energy Physics/CMS, Physics Materials by Design, Chemical Engineering Radiation Therapy, Medical Physics Computer Science 2003 Initiative funded by NSF/UW Six GLOW Sites GLOW phases-1,2 + non-GLOW funded nodes already have ~1000 Xeons TB disk

Condor/GLOW Ideas Exploit commodity hardware for high throughput computing –The base hardware is the same at all sites –Local configuration optimization as needed e.g., Number of CPU elements vs storage elements –Must meet global requirements It turns out that our initial assessment calls for almost identical configuration at all sites Managed locally at 6 sites on campus –One Condor pool shared globally across all sites. HA capabilities deal with network outages and CM failures. –Higher priority for local jobs Neighborhood association style –Cooperative planning, operations …

Local GLOW CS

GLOW Deployment GLOW Phase-I and II are Commissioned –CPU 66 nodes ChemE, CS, LMCG, MedPhys, Physics 30 IceCube ~100 extra CS (50 ATLAS + 50 CS) 26 extra Physics Total CPU: ~1000 –Storage Head at all sites 45 TB CS and Physics Total storage: ~ 100 TB GLOW Resources are used at 100% level –Key is to have multiple user groups GLOW continues to grow

GLOW Usage GLOW Nodes are always running hot! –CS + Guests Serving guests - many cycles delivered to guests! –ChemE Largest community –HEP/CMS Production for collaboration Production and analysis of local physicists –LMCG Standard Universe –Medical Physics MPI jobs –IceCube Simulations

GLOW Usage 04/04-09/05 Over 7.6 million CPU-Hours (865 CPU-Years) served! Takes advantage of “shadow” jobs Take advantage of check-pointing jobs Leftover cycles available for “Others”

hours used on 01/22/ Top active users by hours used on 01/22/ deepayan: (21.00%) - Project: UW:LMCG steveg: (15.35%) - Project: UW:LMCG nengxu: (10.11%) - Project: UW:UWCS-ATLAS quayle: ( 6.81%) - Project: UW:UWCS-ATLAS ice3sim: ( 6.67%) - Project: camiller: ( 3.76%) - Project: UW:ChemE yoshimot: ( 3.58%) - Project: UW:ChemE hep-muel: ( 3.41%) - Project: UW:HEP cstoltz: ( 3.29%) - Project: UW:ChemE cmsprod: ( 2.97%) - Project: UW:HEP jhernand: ( 2.82%) - Project: UW:ChemE xi: ( 2.71%) - Project: UW:ChemE rigglema: ( 2.19%) - Project: UW:ChemE aleung: ( 2.12%) - Project: UW:UWCS-ATLAS skolya: ( 1.91%) - Project: knotts: ( 1.75%) - Project: UW:ChemE mbiddy: ( 1.50%) - Project: UW:ChemE gjpapako: ( 1.49%) - Project: UW:ChemE asreddy: ( 1.33%) - Project: UW:ChemE eamastny: ( 1.24%) - Project: UW:ChemE oliphant: ( 1.04%) - Project: ylchen: ( 0.61%) - Project: UW:ChemE manolis: ( 0.58%) - Project: UW:ChemE deublein: 92.6 ( 0.39%) - Project: UW:ChemE wu: 83.8 ( 0.35%) - Project: UW:UWCS-ATLAS wli: 70.9 ( 0.30%) - Project: UW:ChemE bawa: 57.7 ( 0.24%) - Project: izmitli: 40.9 ( 0.17%) - Project: hma: 33.8 ( 0.14%) - Project: mchopra: 13.0 ( 0.05%) - Project: UW:ChemE krhaas: 12.3 ( 0.05%) - Project: manjit: 11.4 ( 0.05%) - Project: UW:HEP shavlik: 3.0 ( 0.01%) - Project: ppark: 2.5 ( 0.01%) - Project: schwartz: 0.6 ( 0.00%) - Project: rich: 0.4 ( 0.00%) - Project: daoulas: 0.3 ( 0.00%) - Project: qchen: 0.1 ( 0.00%) - Project: jamos: 0.1 ( 0.00%) - Project: UW:LMCG inline: 0.1 ( 0.00%) - Project: akini: 0.0 ( 0.00%) - Project: physics-: 0.0 ( 0.00%) - Project: nobody: 0.0 ( 0.00%) - Project: kupsch: 0.0 ( 0.00%) - Project: jjiang: 0.0 ( 0.00%) - Project: Total hours:

Example Uses ATLAS –Over 15 Million proton collision events simulated at 10 minutes each CMS –Over 10 Million events simulated in a month - many more events reconstructed and analyzed Computational Genomics –Prof. Shwartz asserts that GLOW has opened up new paradigm of work patterns in his group They no longer think about how long a particular computational job will take - they just do it Chemical Engineering –Students do not know where the computing cycles are coming from - they just do it

New GLOW Members Proposed minimum involvement –One rack with about 50 CPUs Identified system support person who joins GLOW-tech –Can be an existing member of GLOW-tech PI joins the GLOW-exec Adhere to current GLOW policies Sponsored by existing GLOW members –UW ATLAS group and other physics groups were proposed by CMS and CS, and were accepted as new members UW ATLAS using bulk of GLOW cycles CS) –Expressions of interest from other groups

ATLAS Use of GLOW UW ATLAS group is sold on GLOW –First new member of GLOW –Efficiently used idle resources –Used suspension mechanism to keep jobs in background when higher priority “owner” jobs kick-in

GLOW & Condor Development GLOW presents distributed computing researchers with an ideal laboratory of real users with diverse requirements (NMI-NSF Funded) –Early commissioning and stress testing of new Condor releases in an environment controlled by Condor team Results in robust releases for world-wide deployment –New features in Condor Middleware, examples: Group wise or hierarchical priority setting Rapid-response with large resources for short periods of time for high priority interrupts Hibernating shadow jobs instead of total preemption (HEP cannot use Standard Universe jobs) MPI use (Medical Physics) Condor-C (High Energy Physics and Open Science Grid)

Open Science Grid & GLOW OSG Jobs can run on GLOW –Gatekeeper routes jobs to local condor cluster –Jobs flock to campus wide, including the GLOW resources –dCache storage pool is also a registered OSG storage resource –Beginning to see some use Now actively working on rerouting GLOW jobs to the rest of OSG –Users do NOT have to adapt to OSG interface and separately manage their OSG jobs –New Condor code development

Summary Wisconsin campus grid, GLOW, has become an indispensable computational resource for several domain sciences Cooperative planning of acquisitions, installation and operations results in large savings Domain science groups no longer worry about setting up computing - they do their science! –Empowers individual scientists –Therefore, GLOW is growing on our campus By pooling together our resources we are able to harness larger than our individual-share at times of critical need to produce science results in a timely way Provides a working laboratory for computer science studies