November 16, 2004FermiGrid Project1 FermiGrid – Fermilab Grid Gateway Keith Chadwick Bonnie Alcorn Steve Timm.

Slides:



Advertisements
Similar presentations
PRAGMA Application (GridFMO) on OSG/FermiGrid Neha Sharma (on behalf of FermiGrid group) Fermilab Work supported by the U.S. Department of Energy under.
Advertisements

4/2/2002HEP Globus Testing Request - Jae Yu x Participating in Globus Test-bed Activity for DØGrid UTA HEP group is playing a leading role in establishing.
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004.
Implementing Finer Grained Authorization in the Open Science Grid Gabriele Carcassi, Ian Fisk, Gabriele, Garzoglio, Markus Lorch, Timur Perelmutov, Abhishek.
October 13, 2005FermiGrid – Fall HEPiX FermiGrid Status and Plans Keith Chadwick Fermilab Computing Division Communications and.
Welcome Course 20410B Module 0: Introduction Audience
Paper on Best implemented scientific concept for E-Governance Virtual Machine By Nitin V. Choudhari, DIO,NIC,Akola By Nitin V. Choudhari, DIO,NIC,Akola.
Assessment of Core Services provided to USLHC by OSG.
F Run II Experiments and the Grid Amber Boehnlein Fermilab September 16, 2005.
Paper on Best implemented scientific concept for E-Governance projects Virtual Machine By Nitin V. Choudhari, DIO,NIC,Akola.
Chapter-4 Windows 2000 Professional Win2K Professional provides a very usable interface and was designed for use in the desktop PC. Microsoft server system.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
HEP Experiment Integration within GriPhyN/PPDG/iVDGL Rick Cavanaugh University of Florida DataTAG/WP4 Meeting 23 May, 2002.
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
VOX Project Status T. Levshina. Talk Overview VOX Status –Registration –Globus callouts/Plug-ins –LRAS –SAZ Collaboration with VOMS EDG team Preparation.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
May 8, 20071/15 VO Services Project – Status Report Gabriele Garzoglio VO Services Project – Status Report Overview and Plans May 8, 2007 Computing Division,
Apr 30, 20081/11 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Apr 30, 2008 Gabriele Garzoglio.
Virtualization within FermiGrid Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
Mar 28, 20071/9 VO Services Project Gabriele Garzoglio The VO Services Project Don Petravick for Gabriele Garzoglio Computing Division, Fermilab ISGC 2007.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Laboratório de Instrumentação e Física Experimental de Partículas GRID Activities at LIP Jorge Gomes - (LIP Computer Centre)
Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.
São Paulo Regional Analysis Center SPRACE Status Report 22/Aug/2006 SPRACE Status Report 22/Aug/2006.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Overview of Privilege Project at Fermilab (compilation of multiple talks and documents written by various authors) Tanya Levshina.
Role Based VO Authorization Services Ian Fisk Gabriele Carcassi July 20, 2005.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Conference name Company name INFSOM-RI Speaker name The ETICS Job management architecture EGEE ‘08 Istanbul, September 25 th 2008 Valerio Venturi.
Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center February.
4/25/2006Condor Week 1 FermiGrid Steven Timm Fermilab Computing Division Fermilab Grid Support Center.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
VO Privilege Activity. The VO Privilege Project develops and implements fine-grained authorization to grid- enabled resources and services Started Spring.
Open Science Grid & its Security Technical Group ESCC22 Jul 2004 Bob Cowles
Rutherford Appleton Lab, UK VOBox Considerations from GridPP. GridPP DTeam Meeting. Wed Sep 13 th 2005.
Farms User Meeting April Steven Timm 1 Farms Users meeting 4/27/2005
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
LCG and Tier-1 Facilities Status ● LCG interoperability. ● Tier-1 facilities.. ● Observations. (Not guaranteed to be wry, witty or nonobvious.) Joseph.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
Sep 25, 20071/5 Grid Services Activities on Security Gabriele Garzoglio Grid Services Activities on Security Gabriele Garzoglio Computing Division, Fermilab.
OSG Deployment Preparations Status Dane Skow OSG Council Meeting May 3, 2005 Madison, WI.
An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick & Steve Timm.
VOX Project Tanya Levshina. 05/17/2004 VOX Project2 Presentation overview Introduction VOX Project VOMRS Concepts Roles Registration flow EDG VOMS Open.
FermiGrid Keith Chadwick. Overall Deployment Summary 5 Racks in FCC:  3 Dell Racks on FCC1 –Can be relocated to FCC2 in FY2009. –Would prefer a location.
Sep 17, 20081/16 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Sep 17, 2008 Gabriele Garzoglio.
VOX Project Status T. Levshina. 5/7/2003LCG SEC meetings2 Goals, team and collaborators Purpose: To facilitate the remote participation of US based physicists.
Towards deploying a production interoperable Grid Infrastructure in the U.S. Vicky White U.S. Representative to GDB.
Victoria A. White Head, Computing Division, Fermilab Fermilab Grid Computing – CDF, D0 and more..
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
Windows Certification Paths OR MCSA Windows Server 2012 Installing and Configuring Windows Server 2012 Exam (20410) Administering Windows Server.
Development of the Fermilab Open Science Enclave Policy and Baseline Keith Chadwick Fermilab Work supported by the U.S. Department of.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
April 18, 2006FermiGrid Project1 FermiGrid Project Status April 18, 2006 Keith Chadwick.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
FermiGrid The Fermilab Campus Grid 28-Oct-2010 Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
Virtualization within FermiGrid Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
Ferbruary 2006FermiGrid – CHEP FermiGrid Status and Plans Keith Chadwick Fermilab Computing Division Communications and Computing.
Gene Oleynik, Head of Data Storage and Caching,
f f FermiGrid – Site AuthoriZation (SAZ) Service
Patrick Dreher Research Scientist & Associate Director
Open Science Grid at Condor Week
Presentation transcript:

November 16, 2004FermiGrid Project1 FermiGrid – Fermilab Grid Gateway Keith Chadwick Bonnie Alcorn Steve Timm

November 16, 2004FermiGrid Project2 FermiGrid - Strategy and Goals: In order to better serve the entire program of the laboratory the Computing Division will place all of its production resources in a Grid infrastructure called FermiGrid. This strategy will continue to allow the large experiments who currently have dedicated resources to have first priority usage of certain resources that are purchased on their behalf. It will allow access to these dedicated resources, as well as other shared Farm and Analysis resources, for opportunistic use by various Virtual Organizations (VOs) that participate in FermiGrid (i.e. all of our lab programs) and by certain VOs that use the Open Science Grid. (Add something about prioritization and scheduling – lab/CD – new forums). The strategy will allow us: to optimize use of resources at Fermilab to make a coherent way of putting Fermilab on the Open Science Grid to save some effort and resources by implementing certain shared services and approaches to work together more coherently to move all of our applications and services to run on the Grid to better handle a transition from Run II to LHC (and eventually to BTeV) in a time of shrinking budgets and possibly shrinking resources for Run II worldwide to fully support Open Science Grid and the LHC Computing Grid and gain positive benefit from this emerging infrastructure in the US and Europe.

November 16, 2004FermiGrid Project3 FermiGrid – What It Is: FermiGrid is a meta-facility composed of a number of existing “resources”, many of which are currently dedicated to the exclusive use of a particular stakeholder. FermiGrid (the facility) provides a way for jobs of one VO to run either on shared facilities (such as the current General Purpose Farm or a new GridFarm?) or on the Farms primarily provided for other VOs. (>>> needs wordsmithing to say what not how) FermiGrid will require some development and test facilities to be put in place in order to make it happen. FermiGrid will provide access to storage elements and storage and data movement services for jobs running on any of the compute elements of FermiGrid The resources that comprise FermiGrid will continue to be accessible in “local” mode as well as “Grid” mode

November 16, 2004FermiGrid Project4 The FermiGrid Project This is a cooperative project across the Computing Division and its stakeholders to define and execute the steps necessary to achieve the goals of FermiGrid Effort is expected to come from Providers of shared resources and services – CSS and CCF Stakeholders and providers of currently dedicated resources - Run II, CMS, MINOS, SDSS The total program of work is not fully known at this time – but the WBS is being fleshed out. It will involve at least the following Adding services required by some stakeholders to other stakeholders dedicated resources Work on authorization and accounting Providing some common FermiGrid Services (e.g …. ) Providing some head-nodes and gateway machines Modifying some stakeholders scripts, codes, etc. to run in the FermiGrid environment Working with OSG technical activities to make sure FermiGrid and OSG (and thereby LCG) are well aligned and interoperable Working on monitoring and web pages and whatever else it takes to make this all work and happen Evolving and defining forums for prioritizing access to resources and scheduling

November 16, 2004FermiGrid Project5 FermiGrid –Some Notations Condor = Condor / Condor-G as necessary.

November 16, 2004FermiGrid Project6 FermiGrid – The Situation Today Many separate clusters: CDF (x3), CMS, D0 (x3), GP Farms, FNALU Batch, etc. When the cluster “landlord” does not fully utilize the cluster cycles – it is very difficult for others to opportunistically utilize the excess computing capacity. In the face of flat or declining budgets, we need to make the most effective use of the computing capacity. We need some sort of system to capture the unused available computing and put it to use.

November 16, 2004FermiGrid Project7 FermiGrid – The State of Chaos Today

November 16, 2004FermiGrid Project8 FermiGrid – The Vision The Future is Grid enabled computing. Dedicated systems resources will be assimilated – slowly... Existing access to resources will be maintained. I am chadwick of grid – prepare to be assimilated… Not! Enable Grid based computing, but do not require all computing to be Grid. Preserve existing access to resources for current installations. Let a thousand flowers bloom – Well not quite. Implement Grid interfaces to existing resources without perturbation of existing access mechanisms. Once FermiGrid is in production, deploy new systems as Grid enabled from the get go. People will naturally migrate when they need expanded resources. Help people with their migrations?

November 16, 2004FermiGrid Project9 FermiGrid – The Mission FermiGrid is the Fermilab Grid Gateway infrastructure to accept jobs from the Open Science Grid, and following appropriate credential authorization, schedule these jobs for execution on Fermilab Grid resources.

November 16, 2004FermiGrid Project10 FermiGrid – The Rules First do no harm: Wherever possible, implement such that existing systems and infrastructure is not compromised. Only when absolutely necessary, require changes in existing systems or infrastructure, and work with those affected to minimize and mitigate the impact of the required changes. Provide resources and infrastructure to help experiments transition to a Grid enabled model of operation.

November 16, 2004FermiGrid Project11 FermiGrid – Players and Roles CSS Hardware & Operating System Management & Support. CCF Grid Infrastructure Application Management & Support. OSG & “A cast of thousands” Submit Jobs & utilize resources. –CDF –D0 –CMS –Lattice QCD –Sloan –Minos –MiniBoone –FNAL –Others?

November 16, 2004FermiGrid Project12 FermiGrid – System Evolution Start “small”, but plan for success. Build the FermiGrid gateway system as a cluster of redundant server systems to provide 24x7 service. Initial implementation will not be redundant, that will follow as soon as we learn how to implement the necessary failovers. We’re going to have to experiment a bit an learn how to operate these services. We will need the capability of testing upgrades without impacting production services. Schedule OSG jobs on “excess/unused” cycles from existing systems and infrastructure. How? Initial thoughts were to utilize checkpoint capability within Condor. Feedback from D0 and CMS is that this is not an acceptable solution. Alternatives – 24 hour CPU limit?, nice?, other? Will think about this more – policy?. Just think of FermiGrid like PACMAN… (munch, munch, munch…)

November 16, 2004FermiGrid Project13 FermiGrid – Software Components Operating System and Tools: Scientific Linux VDT + Globus toolkit. Cluster tools: –Keep the cluster “sane”. –Migrate services as necessary. Cluster aware file system: –Google file system? –Lustre? –other?. Applications and Tools: VOMS + VOMRS GUMS Condor-G + GRIS + GIIS + …

November 16, 2004FermiGrid Project14 FermiGrid – Overall Architecture FermiGrid Common Gateway Services HN CDF: Storage: SRM dcache D0: HN SDSS: Lattice QCD: CMS: GP Farm: SAZ VO Users Storage

November 16, 2004FermiGrid Project15 FermiGrid – General Purpose Farm Example “The D0 Wolf stealing food out of the mouth of babies.” Farm Head Node FermiGrid FBS Via Globus / Condor GP Farm Users VO Users

November 16, 2004FermiGrid Project16 FermiGrid – D0 Example SamGrid SamGfarm FNSF0 FermiGrid FBS D0 Jobs Globus / Condor “Babies stealing food out of the mouth of the D0 wolf” Via Globus / Condor VO Users

November 16, 2004FermiGrid Project17 FermiGrid – Future Grid Farms? FermiGrid Via Globus / Condor VO Users VO Users

November 16, 2004FermiGrid Project18 FermiGrid – Gateway Software See:

November 16, 2004FermiGrid Project19 FermiGrid – Gateway Hardware Architecture FNAL FermiGate1 FermiGate2 FermiGate3 Switch Cyclades FermiGrid

November 16, 2004FermiGrid Project20 FermiGrid – Gateway Hardware Roles FermiGate1 Primary for Condor + GRIS + GIIS Backup for FermiGate2 Secondary backup for FermiGate3 FermiGate2 Primary for VOMS + VOMRS Backup for FermiGate3 Secondary backup for FermiGate1 FermiGate3 Primary for GUMS + [PRIMA] (eventually) Backup for FermiGate1 Secondary backup for FermiGate2 All FermiGate systems will have VDT + Globus job manager.

November 16, 2004FermiGrid Project21 FermiGrid – Gateway Hardware Specification 3 x Poweredge 6650 Dual processor 3.0 Xeon MP, 4 MB cache Rapid rails for dell rack 4 GB DDR SDRAM, 8x512 PERC3-DC, 128MB 1 int, 1 ext. 2x 36GB 15k RPM drive 2x 73GB 10k RPM drive dual on-board 10/100/1000 nics Redundant power supply Dell Remote Access Card, Version III, without modem 24x IDE CD-Rom Poweredge Basic Setup 3yr same day 4 hr response parts _ onsite labor 24x7 $14, each Cyclades console + dual PM20 + local switch + Rack Total system cost ~= $50K Expandable in place by addition of processors or disks within systems.

November 16, 2004FermiGrid Project22 FermiGrid – Alternate Hardware Specification 3 x Poweredge 2850 (2U server) Dual processor 3.6 Xeon, 1MB cache, 800 MHz FSB Rapid rails for dell rack 4 GB DDR2 400 MHZ 4x1GB Embedded Perc4ei controller 2x 36Gb 15K RPM drive 2x 73Gb 10K RPM drive Dual on-board 10/100/1000 nics Redundant power supply Dell Remote Access Card, 4th generation 24x IDE CD-Rom Poweredge Basic Setup 3yr same day 4 hr response 24x7 parts + onsite labor $6, each Cyclades console + dual PM20 + local switch + Rack Total system cost ~= $25K Limited CPU expandability – can only add whole systems or perform forklift upgrade.

November 16, 2004FermiGrid Project23 FermiGrid – Condor and Condor-G Condor (Condor-G) will be used for batch queue management. Within FermiGrid gateway systems – definitely. May feed into other head node batch systems (eg. FBS) as necessary. VOs that “own” the resource will have priority access to the resource. Policy? - “guest” VOs will only be allowed to utilize idle/unused resources. Policy? – how quickly must a “guest” VO free resource when desired by owner VO? Condor checkpoint would provide this, but D0 and CMS jobs will not function in this environment. Alternatives – 24 hour CPU limit?, nice?, other? More thought required (perhaps helped by policy decisions above?). For Condor information see:

November 16, 2004FermiGrid Project24 FermiGrid – VO Management Currently VO management is performed via CMS in a “back pocket” fashion. Not a viable solution for the long term. CMS would probably like to direct that effort towards their work. We recommend that FermiGrid infrastructure should take over the VO Management Server/services and migrate onto the appropriate gateway system (FermiGate2). Existing VOs should be migrated to the new VO Management Server (in the FermiGrid gateway) once the FermiGrid gateway is commissioned. Existing VO management roles delegated to appropriate members of the current VOs. New VOs for existing infrastructure clients (eg. FNAL, CDF, D0, CMS, Lattice QCD, SDSS, others) should be created as necessary/authorized.

November 16, 2004FermiGrid Project25 FermiGrid – VO Creation and Support All new VOs created on the new VO Management Server by FermiGrid project personnel or Helpdesk. Policy? - VO creation authorization mechanism? VO management authority delegated to the appropriate members of the VO. Policy? - “FNAL” VO membership administered by the Helpdesk? Like accounts in the FNAL Kerberos domain and Fermi Windows 2000 domain. Policy? - Small experiments may apply to CD to have their VO managed by the Helpdesk also? Need to provide the Helpdesk with the necessary tools for VO membership management.

November 16, 2004FermiGrid Project26 FermiGrid – GUMS Grid User Management System Developed at BNL Translates a Grid identity to a local identity (certificate -> local user) Think of it as an automated mechanism to maintain the gridmap file. See: ATLAS VO STAR VO PHENIX VO … VO GUMS server Grid resource Grid resource Grid resource mapfile cache GUMS DB

November 16, 2004FermiGrid Project27 FermiGrid – Project Management Weekly FermiGrid project management meeting: Fridays from 2:00 PM to 3:00 PM in FCC1. We would like to empanel a set of Godparents Representatives from: –CMS –Run II –Grid Developers? –Security Team? –Other? Godparent panel would be used to provide (short term?) guidance and feedback to the FermiGrid project management team. Longer term guidance and policy from CD line management.

November 16, 2004FermiGrid Project28 FermiGrid – Time Scale for Implementation “Today”:Decide and order hardware for gateway systems Explore / kick tires on existing software. Jan 2005:Hardware installation. Begin software installation and initial configure. Feb-Mar 2005:Common Grid services available in non-redundant mode (Condor[-G], VOMS, GUMS, etc.). Future:Transition to redundant mode as hardware/software matures.

November 16, 2004FermiGrid Project29 FermiGrid – Open Questions Policy Issues? Lots of policy issues – need direction from CD management. Role of FermiGrid? Direct Grid access to Fermilab Grid resources directly without FermiGrid? Grid access to Fermilab Grid resources only via FermiGrid? “guest” VO access to Fermilab Grid resources only via FermiGrid? Resource Allocation? “owner” VO vs. “guest” VO? How fast? Under what circumstances? Grid Users Meeting a-la Farm Users Meeting? Accounting? Who, where, what, when, how? Recording vs. Access.

November 16, 2004FermiGrid Project30 FermiGrid – Guest vs. Owner VO Access “guest” VO Users “owner” VO Users Resource Head Node FermiGrid Gateway Not Allowed? Required?Allowed Allowed?

November 16, 2004FermiGrid Project31 FermiGrid – Fin Any Questions?