An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick & Steve Timm.

Slides:



Advertisements
Similar presentations
PRAGMA Application (GridFMO) on OSG/FermiGrid Neha Sharma (on behalf of FermiGrid group) Fermilab Work supported by the U.S. Department of Energy under.
Advertisements

IPv6 at Fermilab Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
Dec 14, 20061/10 VO Services Project – Status Report Gabriele Garzoglio VO Services Project WBS Dec 14, 2006 OSG Executive Board Meeting Gabriele Garzoglio.
Role Based VO Authorization Services Ian Fisk Gabriele Carcassi July 20, 2005.
The FermiGrid Software Acceptance Process aka “So you want me to run your software in a production environment?” Keith Chadwick Fermilab
Campus High Throughput Computing (HTC) Infrastructures (aka Campus Grids) Dan Fraser OSG Production Coordinator Campus Grids Lead.
Implementing Finer Grained Authorization in the Open Science Grid Gabriele Carcassi, Ian Fisk, Gabriele, Garzoglio, Markus Lorch, Timur Perelmutov, Abhishek.
OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom.
SCD FIFE Workshop - GlideinWMS Overview GlideinWMS Overview FIFE Workshop (June 04, 2013) - Parag Mhashilkar Why GlideinWMS? GlideinWMS Architecture Summary.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
The Fermilab Campus Grid (FermiGrid) Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
Integration and Sites Rob Gardner Area Coordinators Meeting 12/4/08.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
1 Open Science Grid.. An introduction Ruth Pordes Fermilab.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
Grid Security Issues Shelestov Andrii Space Research Institute NASU-NSAU, Ukraine.
May 8, 20071/15 VO Services Project – Status Report Gabriele Garzoglio VO Services Project – Status Report Overview and Plans May 8, 2007 Computing Division,
Apr 30, 20081/11 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Apr 30, 2008 Gabriele Garzoglio.
Virtualization within FermiGrid Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
Virtualization within FermiGrid Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
Metrics and Monitoring on FermiGrid Keith Chadwick Fermilab
Mar 28, 20071/9 VO Services Project Gabriele Garzoglio The VO Services Project Don Petravick for Gabriele Garzoglio Computing Division, Fermilab ISGC 2007.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
Interoperability Grids, Clouds and Collaboratories Ruth Pordes Executive Director Open Science Grid, Fermilab.
Fermilab Site Report Spring 2012 HEPiX Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, John McGee, OSG Engagement Manager School of Computing.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
May 12, 2005Batch Workshop HEPiX Karlsruhe 1 Preparing for the Grid— Changes in Batch Systems at Fermilab HEPiX Batch System Workshop.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Overview of Privilege Project at Fermilab (compilation of multiple talks and documents written by various authors) Tanya Levshina.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Conference name Company name INFSOM-RI Speaker name The ETICS Job management architecture EGEE ‘08 Istanbul, September 25 th 2008 Valerio Venturi.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
4/25/2006Condor Week 1 FermiGrid Steven Timm Fermilab Computing Division Fermilab Grid Support Center.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
Metrics and Monitoring on FermiGrid Keith Chadwick Fermilab
VO Privilege Activity. The VO Privilege Project develops and implements fine-grained authorization to grid- enabled resources and services Started Spring.
Farms User Meeting April Steven Timm 1 Farms Users meeting 4/27/2005
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
Sep 25, 20071/5 Grid Services Activities on Security Gabriele Garzoglio Grid Services Activities on Security Gabriele Garzoglio Computing Division, Fermilab.
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
The GRIDS Center, part of the NSF Middleware Initiative Grid Security Overview presented by Von Welch National Center for Supercomputing.
FermiGrid Keith Chadwick. Overall Deployment Summary 5 Racks in FCC:  3 Dell Racks on FCC1 –Can be relocated to FCC2 in FY2009. –Would prefer a location.
Sep 17, 20081/16 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Sep 17, 2008 Gabriele Garzoglio.
November 16, 2004FermiGrid Project1 FermiGrid – Fermilab Grid Gateway Keith Chadwick Bonnie Alcorn Steve Timm.
Tier 3 Support and the OSG US ATLAS Tier2/Tier3 Workshop at UChicago August 20, 2009 Marco Mambelli –
The Fermilab Campus Grid (FermiGrid) Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
Fermilab / FermiGrid / FermiCloud Security Update Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359 Keith Chadwick Grid.
An introduction to (Fermi)Grid September 14, 2007 Keith Chadwick.
FermiGrid Keith Chadwick Fermilab Computing Division Communications and Computing Fabric Department Fabric Technology Projects Group.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
FIFE Architecture Figures for V1.2 of document. Servers Desktops and Laptops Desktops and Laptops Off-Site Computing Off-Site Computing Interactive ComputingSoftware.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Development of the Fermilab Open Science Enclave Policy and Baseline Keith Chadwick Fermilab Work supported by the U.S. Department of.
FermiGrid/CDF/D0/OSG. Global Collaboration With Grids Ziggy wants his humans home by the end of the day for food and attention Follow Ziggy through National,
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
April 18, 2006FermiGrid Project1 FermiGrid Project Status April 18, 2006 Keith Chadwick.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
Defining the Technical Roadmap for the NWICG – OSG Ruth Pordes Fermilab.
FermiGrid The Fermilab Campus Grid 28-Oct-2010 Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
Virtualization within FermiGrid Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
FermiGrid Highly Available Grid Services Eileen Berman, Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
FermiGrid - PRIMA, VOMS, GUMS & SAZ Keith Chadwick Fermilab
f f FermiGrid – Site AuthoriZation (SAZ) Service
Presentation transcript:

An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick & Steve Timm

Outline Definition of a Campus Grid Why do Campus Grids Drawbacks of Campus Grids Examples of Campus Grids –GLOW –Purdue –University of California –Nebraska FermiGrid –Pre-Grid Situation –Architecture –Metrics Evolution & Other Considerations Cloud Computing Additional Resources Conclusions 19-Apr-2010Campus Grids1

Definition A Campus Grid is a distributed collection of [compute and storage] resources, provisioned by one or more stakeholders, that can be seamlessly accessed through one or more [Grid] portals. 19-Apr-20102Campus Grids

Why Do Campus Grids ? Improve utilization of (existing) resources – don’t purchase resources when they are not needed. –Cost savings. Provide common administrative framework and user experience. –Cost savings. Buy resources (clusters) in lower costs. –Cost savings. Lower maintenance costs. –Cost savings. Unified user interface will reduce the amount of user training required to make effective use of the resources. –Cost savings. 19-Apr-2010Campus Grids3

What are the drawbacks ? Additional centralized infrastructure to provision and support. –Additional costs. –Can be provisioned incrementally to manage buy-in costs. –Virtual machines can be used to lower buy-in costs. Can make problem diagnosis somewhat more complicated. –Correlation of multiple logs across administrative boundaries. –A central log repository is one mechanism to manage this. Not appropriate for all workloads. –Don’t want campus financials running on the same resources as research. Have to learn (and teach the user community) how to route jobs to the appropriate resources. –Trivially parallel jobs require different resources than MPI jobs. –I/O intensive jobs require different resources than compute intensive jobs. Limited stakeholder buy-in may lead to a campus grid that's less interoperable than you might like. 19-Apr-2010Campus Grids4

GLOW Single Globus Gatekeeper (GLOW) Large central cluster funded by grant. Multiple department based clusters all running Condor. Departments have priority [preemptive] access to their clusters. Clusters interchange workloads using Condor “flocking”. Approximately 1/3 of jobs are opportunistic. 19-Apr-20105Campus Grids

Purdue Single Gatekeeper (Purdue-Steele) Centrally managed “Steele” cluster. ??? Nodes, ??? Slots Departments purchase “slots” on the cluster. Primary batch scheduler is PBS for purchased slots. Secondary batch scheduler is Condor for opportunistic computing. Condor is configured to only run jobs when PBS is not running a job on the node. 19-Apr-20106Campus Grids

University of California Multiple campuses. Each campus has a local campus Grid portal. Overall Grid portal in addition. Access is Web portal based. 19-Apr-2010Campus Grids7

Nebraska 3 Campuses across Nebraska. Being commissioned now. 19-Apr-2010Campus Grids8

Fermilab – Pre Grid Multiple “siloed” clusters, each dedicated to a particular stakeholder: –CDF – 2 clusters, ~2,000 slots –D0 – 2 clusters, ~2,000 slots –CMS – 1 cluster, ~4,000 slots –GP – 1 cluster, ~500 slots Difficult to share: –When a stakeholder needed more resources, or did not need all of their currently allocated resources, it was extremely difficult to move jobs or resources to match the demand. Multiple interfaces and worker node configurations: –CDF – Kerberos + Condor –D0 – Kerberos + PBS –CMS – Grid + Condor –GP – Kerberos + FBSNG 19-Apr-2010Campus Grids9

FermiGrid - Today Site Wide Globus Gatekeeper (FNAL_FERMIGRID). Centrally Managed Services (VOMS, GUMS, SAZ, MySQL, MyProxy, Squid, Accounting, etc.) Compute Resources are “owned” by various stakeholders: Compute Resources # Clusters# GatekeepersBatch System# Batch Slots CDF35Condor5685 D022PBS5305 CMS14Condor6904 GP13Condor1901 Total715n/a~19,000 Sleeper Pool12Condor~14, Apr Campus Grids

FermiGrid - Architecture 19-Apr-2010Campus Grids11 VOMS Server SAZ Server GUMS Server FERMIGRID SE (dcache SRM) BlueArc CDF OSG0 CDF OSG2 D0 CAB1 GP Grid SAZ Server GUMS Server Step 2 - user issues voms-proxy-init user receives voms signed credentials Step 5 – Gateway requests GUMS Mapping based on VO & Role Step 4 – Gateway checks against Site Authorization Service clusters send ClassAds via CEMon to the site wide gateway Step 6 - Grid job is forwarded to target cluster Periodic Synchronization D0 CAB2 Exterior Interior VOMRS Server Periodic Synchronization Step 1 - user registers with VO VOMS Server CDF OSG3 Squid Site Wide Gateway CMS WC4 CMS WC2 CMS WC3 CMS WC1 Step 3 – user submits their grid job via globus-job-run, globus-job-submit, or condor-g Gratia CDF OSG1 CDF OSG4

FermiGrid HA Services Apr-2010Campus Grids12 Client(s) Replication LVS Standby VOMS Active VOMS Active GUMS Active GUMS Active SAZ Active SAZ Active LVS Standby LVS Active MySQL Active MySQL Active LVS Active Heartbeat

FermiGrid HA Services Apr-2010Campus Grids13 Activefermigrid5 Xen Domain 0 Activefermigrid6 Xen Domain 0 Activefg5x1 VOMS Xen VM 1 Activefg5x2 GUMS Xen VM 2 Activefg5x3 SAZ Xen VM 3 Activefg5x4 MySQL Xen VM 4 Activefg5x0 LVS Xen VM 0 Activefg6x1 VOMS Xen VM 1 Activefg6x2 GUMS Xen VM 2 Activefg6x3 SAZ Xen VM 3 Activefg6x4 MySQL Xen VM 4 Standbyfg6x0 LVS Xen VM 0

(Simplified) FermiGrid Network 19-Apr-2010Campus Grids14 san ba head 1 ba head 2 s-s-fcc2-server3 s-s-hub-fcc s-s-fcc1-server s-s-fcc2-server fermigrid0 fnpc3x1 s-cdf-cas-fcc2e s-cdf-fcc1 fcdf1x1 fcdf2x1 fcdfosg4 fcdfosg3 s-d0-fcc1w-cas d0osg1x1 d0osg2x1 fnpc4x1 fnpc5x2 s-f-grid-fcc1 fgtest s-cd-wh8se s-s-wh8w-6 s-cd-fcc2 s-s- hub - wh fgitb-gk Switches for Gp worker nodes d0 wn d0 wn gp wn gp wn cdf wn cdf wn

FermiGrid Utilization 19-Apr-2010Campus Grids15

GUMS calls 19-Apr-2010Campus Grids16

VOMS-PROXY-INIT calls 19-Apr-2010Campus Grids17

Evolution You don’t have to start with a massive project to transition to a Grid infrastructure overnight. FermiGrid was commissioned over roughly a 18 month interval: –Ongoing discussions with stakeholders, –Establish initial set of central services based on these discussions [VOMS, GUMS], –Work with each stakeholder to transition their cluster(s) to use Grid infrastructure, –Periodically review the set of central services and add additional services as necessary/appropriate [SAZ, MyProxy, Squid, etc.]. 19-Apr-2010Campus Grids18

Other Considerations You will likely want to tie your (?centrally managed?) administration/staff/faculty/student computer account data into your Campus Grid resources. –FermiGrid has implemented automated population of the “fermilab” virtual organization (VO) from our Central Name and Address Service (CNAS). –We can help with the architecture of your equivalent service if you decide to implement such a VO. If you have centrally provided services to multiple independent clusters [eg. GUMS, SAZ], you will eventually need to implement some sort of high availability service configuration. –Don’t have to do this right off the bat, but it is useful to keep in mind when designing and implementing services. –FermiGrid has implemented highly available Grid services & we are willing to share our designs and configurations. 19-Apr Campus Grids

What About Cloud Computing? Cloud Computing can be integrated into a Campus Grid infrastructure. 19-Apr-2010Campus Grids20

Additional Resources FermiGrid – – OSG Campus Grids Activity: – OSG Campus Grids Workshop: – tingFermilabhttps://twiki.grid.iu.edu/bin/view/CampusGrids/WorkingMee tingFermilab ISGTW Article on Campus Grids: – 19-Apr-2010Campus Grids21

Conclusions Campus Grids offer significant cost savings. Campus Grids do require a bit more infrastructure to establish and support. –This can be added incrementally. Many large higher education and research organizations have already deployed and are making effective use of Campus Grids. Campus Grids can be easily integrated into larger Grid organizations (such as the Open Science Grid or TeraGrid) to give your community access to larger or specialized resources. –Of course it’s nice if you are also willing to make your unused resources available for opportunistic access. 19-Apr-2010Campus Grids22