Download presentation
Presentation is loading. Please wait.
Published byHilary Nichols Modified over 8 years ago
1
An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick & Steve Timm
2
Outline Definition of a Campus Grid Why do Campus Grids Drawbacks of Campus Grids Examples of Campus Grids –GLOW –Purdue –University of California –Nebraska FermiGrid –Pre-Grid Situation –Architecture –Metrics Evolution & Other Considerations Cloud Computing Additional Resources Conclusions 19-Apr-2010Campus Grids1
3
Definition A Campus Grid is a distributed collection of [compute and storage] resources, provisioned by one or more stakeholders, that can be seamlessly accessed through one or more [Grid] portals. 19-Apr-20102Campus Grids
4
Why Do Campus Grids ? Improve utilization of (existing) resources – don’t purchase resources when they are not needed. –Cost savings. Provide common administrative framework and user experience. –Cost savings. Buy resources (clusters) in “bulk” @ lower costs. –Cost savings. Lower maintenance costs. –Cost savings. Unified user interface will reduce the amount of user training required to make effective use of the resources. –Cost savings. 19-Apr-2010Campus Grids3
5
What are the drawbacks ? Additional centralized infrastructure to provision and support. –Additional costs. –Can be provisioned incrementally to manage buy-in costs. –Virtual machines can be used to lower buy-in costs. Can make problem diagnosis somewhat more complicated. –Correlation of multiple logs across administrative boundaries. –A central log repository is one mechanism to manage this. Not appropriate for all workloads. –Don’t want campus financials running on the same resources as research. Have to learn (and teach the user community) how to route jobs to the appropriate resources. –Trivially parallel jobs require different resources than MPI jobs. –I/O intensive jobs require different resources than compute intensive jobs. Limited stakeholder buy-in may lead to a campus grid that's less interoperable than you might like. 19-Apr-2010Campus Grids4
6
GLOW Single Globus Gatekeeper (GLOW) Large central cluster funded by grant. Multiple department based clusters all running Condor. Departments have priority [preemptive] access to their clusters. Clusters interchange workloads using Condor “flocking”. Approximately 1/3 of jobs are opportunistic. 19-Apr-20105Campus Grids
7
Purdue Single Gatekeeper (Purdue-Steele) Centrally managed “Steele” cluster. ??? Nodes, ??? Slots Departments purchase “slots” on the cluster. Primary batch scheduler is PBS for purchased slots. Secondary batch scheduler is Condor for opportunistic computing. Condor is configured to only run jobs when PBS is not running a job on the node. 19-Apr-20106Campus Grids
8
University of California Multiple campuses. Each campus has a local campus Grid portal. Overall Grid portal in addition. Access is Web portal based. 19-Apr-2010Campus Grids7
9
Nebraska 3 Campuses across Nebraska. Being commissioned now. 19-Apr-2010Campus Grids8
10
Fermilab – Pre Grid Multiple “siloed” clusters, each dedicated to a particular stakeholder: –CDF – 2 clusters, ~2,000 slots –D0 – 2 clusters, ~2,000 slots –CMS – 1 cluster, ~4,000 slots –GP – 1 cluster, ~500 slots Difficult to share: –When a stakeholder needed more resources, or did not need all of their currently allocated resources, it was extremely difficult to move jobs or resources to match the demand. Multiple interfaces and worker node configurations: –CDF – Kerberos + Condor –D0 – Kerberos + PBS –CMS – Grid + Condor –GP – Kerberos + FBSNG 19-Apr-2010Campus Grids9
11
FermiGrid - Today Site Wide Globus Gatekeeper (FNAL_FERMIGRID). Centrally Managed Services (VOMS, GUMS, SAZ, MySQL, MyProxy, Squid, Accounting, etc.) Compute Resources are “owned” by various stakeholders: Compute Resources # Clusters# GatekeepersBatch System# Batch Slots CDF35Condor5685 D022PBS5305 CMS14Condor6904 GP13Condor1901 Total715n/a~19,000 Sleeper Pool12Condor~14,200 19-Apr-201010Campus Grids
12
FermiGrid - Architecture 19-Apr-2010Campus Grids11 VOMS Server SAZ Server GUMS Server FERMIGRID SE (dcache SRM) BlueArc CDF OSG0 CDF OSG2 D0 CAB1 GP Grid SAZ Server GUMS Server Step 2 - user issues voms-proxy-init user receives voms signed credentials Step 5 – Gateway requests GUMS Mapping based on VO & Role Step 4 – Gateway checks against Site Authorization Service clusters send ClassAds via CEMon to the site wide gateway Step 6 - Grid job is forwarded to target cluster Periodic Synchronization D0 CAB2 Exterior Interior VOMRS Server Periodic Synchronization Step 1 - user registers with VO VOMS Server CDF OSG3 Squid Site Wide Gateway CMS WC4 CMS WC2 CMS WC3 CMS WC1 Step 3 – user submits their grid job via globus-job-run, globus-job-submit, or condor-g Gratia CDF OSG1 CDF OSG4
13
FermiGrid HA Services - 1 19-Apr-2010Campus Grids12 Client(s) Replication LVS Standby VOMS Active VOMS Active GUMS Active GUMS Active SAZ Active SAZ Active LVS Standby LVS Active MySQL Active MySQL Active LVS Active Heartbeat
14
FermiGrid HA Services - 2 19-Apr-2010Campus Grids13 Activefermigrid5 Xen Domain 0 Activefermigrid6 Xen Domain 0 Activefg5x1 VOMS Xen VM 1 Activefg5x2 GUMS Xen VM 2 Activefg5x3 SAZ Xen VM 3 Activefg5x4 MySQL Xen VM 4 Activefg5x0 LVS Xen VM 0 Activefg6x1 VOMS Xen VM 1 Activefg6x2 GUMS Xen VM 2 Activefg6x3 SAZ Xen VM 3 Activefg6x4 MySQL Xen VM 4 Standbyfg6x0 LVS Xen VM 0
15
(Simplified) FermiGrid Network 19-Apr-2010Campus Grids14 san ba head 1 ba head 2 s-s-fcc2-server3 s-s-hub-fcc s-s-fcc1-server s-s-fcc2-server fermigrid0 fnpc3x1 s-cdf-cas-fcc2e s-cdf-fcc1 fcdf1x1 fcdf2x1 fcdfosg4 fcdfosg3 s-d0-fcc1w-cas d0osg1x1 d0osg2x1 fnpc4x1 fnpc5x2 s-f-grid-fcc1 fgtest s-cd-wh8se s-s-wh8w-6 s-cd-fcc2 s-s- hub - wh fgitb-gk Switches for Gp worker nodes d0 wn d0 wn gp wn gp wn cdf wn cdf wn
16
FermiGrid Utilization 19-Apr-2010Campus Grids15
17
GUMS calls 19-Apr-2010Campus Grids16
18
VOMS-PROXY-INIT calls 19-Apr-2010Campus Grids17
19
Evolution You don’t have to start with a massive project to transition to a Grid infrastructure overnight. FermiGrid was commissioned over roughly a 18 month interval: –Ongoing discussions with stakeholders, –Establish initial set of central services based on these discussions [VOMS, GUMS], –Work with each stakeholder to transition their cluster(s) to use Grid infrastructure, –Periodically review the set of central services and add additional services as necessary/appropriate [SAZ, MyProxy, Squid, etc.]. 19-Apr-2010Campus Grids18
20
Other Considerations You will likely want to tie your (?centrally managed?) administration/staff/faculty/student computer account data into your Campus Grid resources. –FermiGrid has implemented automated population of the “fermilab” virtual organization (VO) from our Central Name and Address Service (CNAS). –We can help with the architecture of your equivalent service if you decide to implement such a VO. If you have centrally provided services to multiple independent clusters [eg. GUMS, SAZ], you will eventually need to implement some sort of high availability service configuration. –Don’t have to do this right off the bat, but it is useful to keep in mind when designing and implementing services. –FermiGrid has implemented highly available Grid services & we are willing to share our designs and configurations. 19-Apr-201019Campus Grids
21
What About Cloud Computing? Cloud Computing can be integrated into a Campus Grid infrastructure. 19-Apr-2010Campus Grids20
22
Additional Resources FermiGrid –http://fermigrid.fnal.govhttp://fermigrid.fnal.gov –http://cd-docdb.fnal.govhttp://cd-docdb.fnal.gov OSG Campus Grids Activity: –https://twiki.grid.iu.edu/bin/view/CampusGrids/WebHomehttps://twiki.grid.iu.edu/bin/view/CampusGrids/WebHome OSG Campus Grids Workshop: –https://twiki.grid.iu.edu/bin/view/CampusGrids/WorkingMee tingFermilabhttps://twiki.grid.iu.edu/bin/view/CampusGrids/WorkingMee tingFermilab ISGTW Article on Campus Grids: –http://www.isgtw.org/?pid=1002447http://www.isgtw.org/?pid=1002447 19-Apr-2010Campus Grids21
23
Conclusions Campus Grids offer significant cost savings. Campus Grids do require a bit more infrastructure to establish and support. –This can be added incrementally. Many large higher education and research organizations have already deployed and are making effective use of Campus Grids. Campus Grids can be easily integrated into larger Grid organizations (such as the Open Science Grid or TeraGrid) to give your community access to larger or specialized resources. –Of course it’s nice if you are also willing to make your unused resources available for opportunistic access. 19-Apr-2010Campus Grids22
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.