4/25/2006Condor Week 1 FermiGrid Steven Timm Fermilab Computing Division Fermilab Grid Support Center.

Slides:



Advertisements
Similar presentations
PRAGMA Application (GridFMO) on OSG/FermiGrid Neha Sharma (on behalf of FermiGrid group) Fermilab Work supported by the U.S. Department of Energy under.
Advertisements

Dec 14, 20061/10 VO Services Project – Status Report Gabriele Garzoglio VO Services Project WBS Dec 14, 2006 OSG Executive Board Meeting Gabriele Garzoglio.
Role Based VO Authorization Services Ian Fisk Gabriele Carcassi July 20, 2005.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Implementing Finer Grained Authorization in the Open Science Grid Gabriele Carcassi, Ian Fisk, Gabriele, Garzoglio, Markus Lorch, Timur Perelmutov, Abhishek.
Grid Services at NERSC Shreyas Cholia Open Software and Programming Group, NERSC NERSC User Group Meeting September 17, 2007.
October 13, 2005FermiGrid – Fall HEPiX FermiGrid Status and Plans Keith Chadwick Fermilab Computing Division Communications and.
Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
The Fermilab Campus Grid (FermiGrid) Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
VOX Project Status T. Levshina. Talk Overview VOX Status –Registration –Globus callouts/Plug-ins –LRAS –SAZ Collaboration with VOMS EDG team Preparation.
May 8, 20071/15 VO Services Project – Status Report Gabriele Garzoglio VO Services Project – Status Report Overview and Plans May 8, 2007 Computing Division,
G RID M IDDLEWARE AND S ECURITY Suchandra Thapa Computation Institute University of Chicago.
Virtualization within FermiGrid Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
Interactive Job Monitor: CafMon kill CafMon tail CafMon dir CafMon log CafMon top CafMon ps LcgCAF: CDF submission portal to LCG resources Francesco Delli.
Mine Altunay OSG Security Officer Open Science Grid: Security Gateway Security Summit January 28-30, 2008 San Diego Supercomputer Center.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
Metrics and Monitoring on FermiGrid Keith Chadwick Fermilab
Mar 28, 20071/9 VO Services Project Gabriele Garzoglio The VO Services Project Don Petravick for Gabriele Garzoglio Computing Division, Fermilab ISGC 2007.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
Mar 28, 20071/18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio OSG Resource Selection Service (ReSS) Don Petravick for Gabriele Garzoglio.
Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.
22 nd September 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
São Paulo Regional Analysis Center SPRACE Status Report 22/Aug/2006 SPRACE Status Report 22/Aug/2006.
Open Science Grid OSG CE Quick Install Guide Siddhartha E.S University of Florida.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, John McGee, OSG Engagement Manager School of Computing.
Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
1 Development of a High-Throughput Computing Cluster at Florida Tech P. FORD, R. PENA, J. HELSBY, R. HOCH, M. HOHLMANN Physics and Space Sciences Dept,
May 12, 2005Batch Workshop HEPiX Karlsruhe 1 Preparing for the Grid— Changes in Batch Systems at Fermilab HEPiX Batch System Workshop.
Overview of Privilege Project at Fermilab (compilation of multiple talks and documents written by various authors) Tanya Levshina.
Role Based VO Authorization Services Ian Fisk Gabriele Carcassi July 20, 2005.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center February.
Metrics and Monitoring on FermiGrid Keith Chadwick Fermilab
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
VO Privilege Activity. The VO Privilege Project develops and implements fine-grained authorization to grid- enabled resources and services Started Spring.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
OSG Abhishek Rana Frank Würthwein UCSD.
Farms User Meeting April Steven Timm 1 Farms Users meeting 4/27/2005
Tier 3 Status at Panjab V. Bhatnagar, S. Gautam India-CMS Meeting, July 20-21, 2007 BARC, Mumbai Centre of Advanced Study in Physics, Panjab University,
FermiGrid School Steven Timm FermiGrid School FermiGrid 201 Scripting and running Grid Jobs.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick & Steve Timm.
Open Science Grid Build a Grid Session Siddhartha E.S University of Florida.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
VOX Project Tanya Levshina. 05/17/2004 VOX Project2 Presentation overview Introduction VOX Project VOMRS Concepts Roles Registration flow EDG VOMS Open.
VOX Project Status T. Levshina. 5/7/2003LCG SEC meetings2 Goals, team and collaborators Purpose: To facilitate the remote participation of US based physicists.
Site Authorization Service Local Resource Authorization Service (VOX Project) Vijay Sekhri Tanya Levshina Fermilab.
An introduction to (Fermi)Grid September 14, 2007 Keith Chadwick.
FermiGrid Keith Chadwick Fermilab Computing Division Communications and Computing Fabric Department Fabric Technology Projects Group.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
Development of the Fermilab Open Science Enclave Policy and Baseline Keith Chadwick Fermilab Work supported by the U.S. Department of.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
April 18, 2006FermiGrid Project1 FermiGrid Project Status April 18, 2006 Keith Chadwick.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
Defining the Technical Roadmap for the NWICG – OSG Ruth Pordes Fermilab.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
FermiGrid The Fermilab Campus Grid 28-Oct-2010 Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
FermiGrid Highly Available Grid Services Eileen Berman, Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
Ferbruary 2006FermiGrid – CHEP FermiGrid Status and Plans Keith Chadwick Fermilab Computing Division Communications and Computing.
FermiGrid - PRIMA, VOMS, GUMS & SAZ Keith Chadwick Fermilab
FermiGrid - PRIMA, VOMS, GUMS & SAZ
f f FermiGrid – Site AuthoriZation (SAZ) Service
GLOW A Campus Grid within OSG
Presentation transcript:

4/25/2006Condor Week 1 FermiGrid Steven Timm Fermilab Computing Division Fermilab Grid Support Center

4/25/2006Condor Week 2 People FermiGrid Operations Team: Keith Chadwick (CD/CCF/FTP) – Project Leader Steve Timm (CD/CSS/FCS) – Linux OS Support Dan Yocum (CD/CCF/FTP) – Application Support Thanks to: Condor Team: M. Livny, J. Frey, A. Roy and many others. Globus Developers: C. Bacon, S. Martin. GridX1: R. Walker, D. Vanderster et al. Fermilab grid developers: G. Garzoglio, T. Levshina. Representatives of following OSG Virtual Organizations: CDF, DZERO, USCMS, DES, SDSS, FERMILAB, I2U2, NANOHUB, GADU. FermiGrid Web Site & Additional Documentation:

4/25/2006Condor Week 3 FCC—Feynman Computing Center

4/25/2006Condor Week 4 Fermilab Grid Computing Center

4/25/2006Condor Week 5 Computing at Fermilab Reconstruction and analysis of data for High Energy Physics Experiments > 4 Petabytes on tape Fast I/O to read file, many hours of computing, fast I/O to write Each job independent of other jobs. Simulation for future experiments (CMS at CERN) In two years need to scale to >50K jobs/day Each big experiment has independent cluster or clusters Diverse file systems, batch systems, management methods. More than 3000 dual-processor Linux systems in all

4/25/2006Condor Week 6 FermiGrid Project FermiGrid is a meta-facility established by Fermilab Computing Division Four elements: Common Site Grid Services Virtual Organization hosting (VOMS, VOMRS), Site-wide Globus GRAM gateway, Site AuthoriZation, MyProxy, GUMS. Bi-lateral Interoperability between various experimental stakeholders Interfaces to the Open Science Grid Grid interfaces to mass storage systems.

4/25/2006Condor Week 7 FermiGrid – Common Grid Services user GUMS identity mapping service SAZ site authorization service VOMS server UID mapping site control Gatekeeper Job manager Job scheduler Myproxy server

4/25/2006Condor Week 8 Hardware Dell 2850 Servers with dual 3.6 GHz Xeons, 4Gbytes of memory, 1000TX, Hardware Raid, Scientific Linux 3.0.4, VDT FermiGrid1: Site Wide Globus Gateway FermiGrid2: Site Wide VOMS & VOMRS Server FermiGrid3: Site Wide GUMS Server FermiGrid4: Myproxy server Site AuthoriZation server

4/25/2006Condor Week 9 Site Wide Gateway – Why: CMS WC1 CMS WC2 CDF CAF CDFD0 CAB D0 SDS S TA M GP Far m LQC D Desk tops Site Wide Gateway Myproxy Server VOMS Server SAZ Server GUMS Server ? ? ?

4/25/2006Condor Week 10 Site Wide Gateway Technique: This technique is closely adapted from a technique first used at GridX1 in Canada to forward jobs from the LCG into their clusters. We begin by creating a new Job Manager script in: $VDT_LOCATION/globus/lib/perl/Globus/GRAM/JobManager/condorg.pm This script takes incoming jobs and resubmits them to Condor-G on fermigrid1 Condor matchmaking is used so that the jobs will be forwarded to the member cluster with the most open slots. Each member cluster runs a cron job every five minutes to generate a ClassAD for their cluster. This is sent to fermigrid1 using condor_advertise. Credentials to successfully forward the job are obtained in the following manner: 1.User obtains a voms-qualified proxy in the normal fashion with voms-proxy-init 2.User sets X509_USER_CERT and X509_USER_KEY to point to the proxy instead of the usercert.pem and userkey.pem files 3.User uses myproxy-init to store the credentials, using myproxy, on the fermilab myproxy server myproxy.fnal.gov 4.jobmanager-condorg, which is running as the uid that the job will run on under fermigrid, executes a myproxy-get- delegation to get a proxy with full rights to resubmit the job. 5.Documentation of the steps to do this as a user is found in the Fermigrid User Guide:

4/25/2006Condor Week 11 Site Wide Gateway Animation: CMS WC1 CMS WC2 CDF CAF CDFD0 CAB D0 SDS S TA M GP Far m LQC D Desk tops Site Wide Gateway Myproxy Server VOMS Server SAZ Server GUMS Server Step 1 - user issues voms-proxy-init user receives voms signed credentials Step 2 - user stores their voms signed credentials on the myproxy server Step 3 – user submits their grid job via globus-job-run, globus-job-submit, or condor-g Step 4 – Gateway retrieves the previously stored proxy Step 5 – Gateway requests GUMS Mapping based on VO & Role Step 6 – Gateway checks against Site Authorization Service clusters send ClassAds via condor_advertise to the site wide gateway Step 7 - Grid job is forwarded to target cluster ? ? ?

4/25/2006Condor Week 12 Guest vs. Owner VO Access: OSG “guest” VO Users “owner” VO Users Resource Head Node FermiGrid Gateway & Central Services Required Allowed Fermilab “guest” VO Users Allowed Resource Head Node Resource Head Node Resource Head Node Allowed Not Allowed

4/25/2006Condor Week 13 OSG Interfaces for Fermilab: Four Fermilab clusters are directly accessible to OSG right now, General Purpose Grid Cluster (FNAL_GPFARM) US CMS Tier 1 Cluster (USCMS_FNAL_WC1_CE) LQCD cluster (FNAL_LQCD) SDSS cluster (SDSS_TAM) Two more clusters (CDF) accessible only by Fermigrid site gateway. Future Fermilab clusters will also only be accessible by Fermigrid site gateway. Shell script is used to make a condor classad and send it with condor_advertise Match is done based on number of free cpu’s and number of jobs waiting

4/25/2006Condor Week 14 OSG Requirements OSG Job flow: User pre-stages applications and data via gridftp/srmcp to shared areas on cluster (can be NFS or SRM-based storage element.) User submits a set of jobs to cluster Jobs take applications and data from cluster-wide shared directories. Results are written to local storage on cluster, then transferred across WAN Most OSG jobs expect common shared disk areas for applications, data, and user home directories. Our clusters are currently not shared. Most OSG jobs don’t use myproxy in submission sequence OSG makes use of monitoring to detect free resources, ours are not currently reported correctly. Need to make the gateway transparent to the OSG so it looks like any other OSG resource. Right now it only reports 4 CPU’s. Want to add possibility for VO affinity to the classad advertising of the gateway.

4/25/2006Condor Week 15 CEMon and matchmaking Fermigrid will be first large-scale deployment of OSG Resource Selection Service. Use CEMon (glite package) to send classads to central info gatherer. CEMON GIP fngp-osg CEMON GIP cmsosgce CEMON GIP fcdfosg1 Matchmaker (coll/neg) Info Gatherer Interactive Condor Client Fermigrid1 Jobmanager- condorg See P. Mhashlikar talk later in this conference

4/25/2006Condor Week 16 Shared data areas and storage elements At the moment OSG requires shared Application and Data areas Also needed, shared home directory for all users, (fermigrid has 226). It is planned to use a BlueArc NAS appliance to serve these to all the member clusters of FermiGrid. 24TB of disk is in process of being ordered. NAS head already in hand. Also being commissioned, a shared volatile Storage Element for fermigrid, supports SRM/dCache access for all grid users.

4/25/2006Condor Week 17 Getting rid of MyProxy Configure each individual cluster gatekeeper to accept restricted globus proxy—from just one host, the site gateway. On CDF clusters for example the gatekeeper is already restricted via tcp- wrappers to not take any connections from off-site. Could be restricted further to take connections only from glidecaf head and fermigrid1. Then change gatekeeper configuration, call it with “accept_limited” option, we would then be able to forward jobs without myproxy, and could call this the jobmanager-condor rather than the jobmanager-condorg. This has been tested in our test cluster, will move to production soon.

4/25/2006Condor Week 18 Reporting all resources MonALISA  just need a unified Ganglia view of all Fermigrid and MonALISA will show right number of cpu’s, etc. Also make it so MonALISA queries all condor pools in Fermigrid GridCat/ACDC-> have to change condor subroutines in MIS-CI to get the right total number of CPU’s from the cluster classads. Fairly straightforward GIP-> Need to change lcg-info-dynamic-condor script to do the right number of job slots per VO. Already had to do this once, not difficult.

4/25/2006Condor Week 19 Globus Gatekeeper Calls

4/25/2006Condor Week 20 VOMS access

4/25/2006Condor Week 21 GUMS user mappings