Farms User Meeting April 27 2005--Steven Timm 1 Farms Users meeting 4/27/2005

Slides:



Advertisements
Similar presentations
PRAGMA Application (GridFMO) on OSG/FermiGrid Neha Sharma (on behalf of FermiGrid group) Fermilab Work supported by the U.S. Department of Energy under.
Advertisements

Dec 14, 20061/10 VO Services Project – Status Report Gabriele Garzoglio VO Services Project WBS Dec 14, 2006 OSG Executive Board Meeting Gabriele Garzoglio.
Role Based VO Authorization Services Ian Fisk Gabriele Carcassi July 20, 2005.
CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004.
Implementing Finer Grained Authorization in the Open Science Grid Gabriele Carcassi, Ian Fisk, Gabriele, Garzoglio, Markus Lorch, Timur Perelmutov, Abhishek.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team.
Minerva Infrastructure Meeting – October 04, 2011.
F Run II Experiments and the Grid Amber Boehnlein Fermilab September 16, 2005.
SCD FIFE Workshop - GlideinWMS Overview GlideinWMS Overview FIFE Workshop (June 04, 2013) - Parag Mhashilkar Why GlideinWMS? GlideinWMS Architecture Summary.
OSG Area Coordinators Meeting Security Team Report Mine Altunay 01/29/2014.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
HEP Experiment Integration within GriPhyN/PPDG/iVDGL Rick Cavanaugh University of Florida DataTAG/WP4 Meeting 23 May, 2002.
1 Dynamic Application Installation (Case of CMS on OSG) Introduction CMS Software Installation Overview Software Installation Issues Validation Considerations.
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
INFSO-RI Enabling Grids for E-sciencE The US Federation Miron Livny Computer Sciences Department University of Wisconsin – Madison.
VOX Project Status T. Levshina. Talk Overview VOX Status –Registration –Globus callouts/Plug-ins –LRAS –SAZ Collaboration with VOMS EDG team Preparation.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
May 8, 20071/15 VO Services Project – Status Report Gabriele Garzoglio VO Services Project – Status Report Overview and Plans May 8, 2007 Computing Division,
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Apr 30, 20081/11 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Apr 30, 2008 Gabriele Garzoglio.
Virtualization within FermiGrid Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Grid User Management System Gabriele Carcassi HEPIX October 2004.
22 nd September 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
São Paulo Regional Analysis Center SPRACE Status Report 22/Aug/2006 SPRACE Status Report 22/Aug/2006.
Data Intensive Science Network (DISUN). DISUN Started in May sites: Caltech University of California at San Diego University of Florida University.
May 12, 2005Batch Workshop HEPiX Karlsruhe 1 Preparing for the Grid— Changes in Batch Systems at Fermilab HEPiX Batch System Workshop.
Outline: Tasks and Goals The analysis (physics) Resources Needed (Tier1) A. Sidoti INFN Pisa.
Role Based VO Authorization Services Ian Fisk Gabriele Carcassi July 20, 2005.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
4/25/2006Condor Week 1 FermiGrid Steven Timm Fermilab Computing Division Fermilab Grid Support Center.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
1 Cluster Development at Fermilab Don Holmgren All-Hands Meeting Jefferson Lab June 1-2, 2005.
VO Privilege Activity. The VO Privilege Project develops and implements fine-grained authorization to grid- enabled resources and services Started Spring.
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
FermiGrid School Steven Timm FermiGrid School FermiGrid 201 Scripting and running Grid Jobs.
Outline: Status: Report after one month of Plans for the future (Preparing Summer -Fall 2003) (CNAF): Update A. Sidoti, INFN Pisa and.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
Sep 25, 20071/5 Grid Services Activities on Security Gabriele Garzoglio Grid Services Activities on Security Gabriele Garzoglio Computing Division, Fermilab.
OSG Deployment Preparations Status Dane Skow OSG Council Meeting May 3, 2005 Madison, WI.
An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick & Steve Timm.
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
FermiGrid Keith Chadwick. Overall Deployment Summary 5 Racks in FCC:  3 Dell Racks on FCC1 –Can be relocated to FCC2 in FY2009. –Would prefer a location.
Sep 17, 20081/16 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Sep 17, 2008 Gabriele Garzoglio.
VOX Project Status T. Levshina. 5/7/2003LCG SEC meetings2 Goals, team and collaborators Purpose: To facilitate the remote participation of US based physicists.
November 16, 2004FermiGrid Project1 FermiGrid – Fermilab Grid Gateway Keith Chadwick Bonnie Alcorn Steve Timm.
Site Authorization Service Local Resource Authorization Service (VOX Project) Vijay Sekhri Tanya Levshina Fermilab.
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
II EGEE conference Den Haag November, ROC-CIC status in Italy
FermiGrid Keith Chadwick Fermilab Computing Division Communications and Computing Fabric Department Fabric Technology Projects Group.
Patrick Gartung 1 CMS 101 Mar 2007 Introduction to the User Analysis Facility (UAF) Patrick Gartung - Fermilab.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
April 18, 2006FermiGrid Project1 FermiGrid Project Status April 18, 2006 Keith Chadwick.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
Defining the Technical Roadmap for the NWICG – OSG Ruth Pordes Fermilab.
FermiGrid The Fermilab Campus Grid 28-Oct-2010 Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
Gene Oleynik, Head of Data Storage and Caching,
Open Science Grid Progress and Status
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Summary from last MB “The MB agreed that a detailed deployment plan and a realistic time scale are required for deploying glexec with setuid mode at WLCG.
Open Science Grid at Condor Week
Presentation transcript:

Farms User Meeting April Steven Timm 1 Farms Users meeting 4/27/2005

Farms User Meeting April Steven Timm 2 Agenda Events on farm past two weeks Scheduled downtimes New Users –M. Kostin, Accel. Division –A. Lebedev, E907/MIPP Existing User reports Special Presentation: Upcoming Transition of General Purpose Farms to Condor and Grid

Farms User Meeting April Steven Timm 3 Issues in last 2 weeks Thermal problems in LCC over weekend, no nodes went down. Down nodes on CDF farm: 1 out of 98 FBSNG, 1 out of 72 condor/CAF Down nodes on D0 farm— 12 out of 444 nodes Down nodes GP farm—0 out of 102 GP Farms networking was upgraded to gigabit on all nodes that are capable

Farms User Meeting April Steven Timm 4 Downtimes GP Farms—none scheduled D0 farms—moving 3 racks of worker nodes to GCC, to be scheduled CDF Farms, upgrade of condor/CAF nodes to SLF304, in progress

Farms User Meeting April Steven Timm 5

6 QueueProcess typeShareQPrioTime (GHz-hr) Quota 1CPU=100 AccelAccel_Worker AugerAuger_Worker Dark EnergyDES E898E898_Worker E898 ShortE898_Short E907e KTeVFast (inf)9000n/a KTeVLongKTeV_Long KTeVKTeV_Medium Minos MinosShortMinos_Short Run2MC SDSSImage SDSSSpectro Theory General Purpose Farms Allocations

Farms User Meeting April Steven Timm 7

8

9

10 GRID on General Purpose Farms Executive Summary: –A 14-node test cluster is available for testing Condor and grid jobs now –Plan tentatively to add new nodes to Condor/grid cluster this summer –Hope to complete transition to Condor batch system by end of calendar year 2005 –Local and grid submissions will still be allowed on General Purpose Farms –Existing GP Farms users will have same priority whether submitting via grid or locally –We will make sure appropriate training, documentation and support is available to help users with the transition. –Testing currently ongoing with first grid-enabled user SDSS/DES

Farms User Meeting April Steven Timm 11 Outline: Why use the Grid? Why use Condor Virtual Organizations The Open Science Grid GP Farms on the Open Science Grid Fermigrid Access to mass storage

Farms User Meeting April Steven Timm 12 Why the Grid? General Purpose Farms have limited resources and equipment budget All Fermilab CD resources have mandate from division to interoperate Adding a grid interface to the farms enables us to interoperate with the larger clusters at Fermilab (specifically CMS, CDF) and make use of extra resources. Negotiation to use resources of the Open Science Grid off-site is in progress as well.

Farms User Meeting April Steven Timm 13 Why Condor? Free software (but you can buy support). Supported by large team at U. of Wisconsin (and not by Fermilab programmers) Widely deployed in multi-hundred node clusters at Fermilab (CDF, CMS). New versions of Condor allow Kerberos 5 and x509 authentication Comes with Condor-G which simplifies submission of grid jobs Condor-C components allow for interoperation of independent Condor pools Some of our grid-enabled users take advantage of the extended Condor features, so it is the fastest way to get our users on the grid.

Farms User Meeting April Steven Timm 14 Virtual Organizations Each experiment is a Virtual Organization Membership is managed by VOMS software (Virtual Organization Management Service) and VOMRS software (Virtual Organization Management Registration Service) Virtual Organizations have already been created for all major user groups on the General Purpose Farms as part of Fermigrid project. We need at least one responsible person from each user group that is using the farms to say who should be members of their virtual organization. Groups we have identified: –sdss, ktev, miniboone, hypercp, minos, numi, accelerator, ppd_astro, ppd_theory, patriot (run2mc),auger

Farms User Meeting April Steven Timm 15 Open Science Grid Continuation of efforts that were begun in Grid3. Integration testing has been ongoing since February Provisioning and deployment is occurring as we speak. General Purpose Farms and CMS will both be Fermilab presences on the Open Science Grid 10 Virtual Organizations so far, mostly US-based: –USATLAS –USCMS –SDSS –fMRI (functional Magnetic Resonance Imaging, based at Dartmouth) –GADU (Applied Genomics, based at Argonne) –GRASE (Engineering applications, based at SUNY Buffalo) –LIGO –CDF –STAR –iVDGL

Farms User Meeting April Steven Timm 16 Current Fermi GP farms OSG presence Node fngp-osg as gatekeeper and condor master –(Dell dual Xeon 3.6 GHz) Software comes from the Virtual Data Toolkit – 14 worker nodes as condor pool (fnpc ) Can successfully run batch jobs submitted locally via Condor and across the grid via Condor-G Has passed all validation tests of the Open Science Grid Using the extended privilege authorization from the VO Privilege Project –Each group can define different roles for their users. –We can map whole group to one userid, several userids, or a pool of userid’s.

Farms User Meeting April Steven Timm 17 Current Architecture All home directories and staging areas are served off of FNSFO and will be accessible as before All OSG sites have $app and $data directories for applications and data transfer, these are served off of fngp-osg by NFS All VDT-related software (globus, condor, etc) served off of fngp-osg Grid jobs come in directly to fngp-osg and are farmed out to the 14 condor nodes.

Farms User Meeting April Steven Timm 18 Goals for GP Farms Grid Deployment GP Farms is very busy > 90% Two big productions about to start Need to preserve lions share of CPU cycles for existing users Jobs from groups that are not GP Farms users will have only opportunistic use of the farms. –Run at lowest priority (10 -6 of regular priority) –Limited in how many jobs they can start at once. At the moment OSG jobs confined to condor pool of 14 slow nodes that weren’t otherwise getting used at all. GP Farms users will be able to access allocated share of resources whether they come in via grid or not.

Farms User Meeting April Steven Timm 19 FNSFO FBSNG HEAD NODE ENSTORE GP Farms FBSNG Worker Nodes 102 currently ENCP FBS Submit NFS RAID Current Farms Configuration

Farms User Meeting April Steven Timm 20 FNGP- OSG Gate- keeper FNPCSRV1 FBSNG HEAD NODE GP Farms FBSNG Worker Nodes 102 currently ENSTORE Condor WN 14 currently New Condor WN 40 (coming this summer) Configuration with Grid NFS RAID FBS Submit Fermigrid1 Site gatekeeper Condor submit Job from OSG Job from Fermilab

Farms User Meeting April Steven Timm 21 Fermigrid Interface Fermigrid is providing common site services for virtual organization management (VOMS) and user mapping (GUMS) These services expected to be online in next month or two. All non-Fermi jobs will eventually go through site Fermigrid gatekeeper and be farmed out to the other clusters.

Farms User Meeting April Steven Timm 22 Access to mass storage Study currently under way. Encp access to Enstore will remain available from the head node. Want to open dccp, gridftp, srmcp interfaces to dCache Before this is done, more study needed on –Authentication mechanisms—can we access mass storage from the worker nodes –Resource load—public dCache would need to expand its disk pool if the demand increases significantly.

Farms User Meeting April Steven Timm 23 Support and Documentation oss.fnal.gov/scs/public/farms/grid/ oss.fnal.gov/scs/public/farms/grid/

Farms User Meeting April Steven Timm 24 Things to watch and try being continuously updated as we know more about what works. Hope to add sample Condor jobs shortly Those familiar with Condor can log into fngp-osg and try to submit local test jobs now. –Source /export/osg/grid/setup.csh to get all the software setup Grid job submission won’t work until we get the virtual organizations populated (except for SDSS). More presentations coming at these meetings in weeks to come Hope to organize a workshop this summer.