1 Open Science Grid Update for DOSAR Keith Chadwick, Fermilab.

Slides:



Advertisements
Similar presentations
Dec 14, 20061/10 VO Services Project – Status Report Gabriele Garzoglio VO Services Project WBS Dec 14, 2006 OSG Executive Board Meeting Gabriele Garzoglio.
Advertisements

9/25/08DLP1 OSG Operational Security D. Petravick For the OSG Security Team: Don Petravick, Bob Cowles, Leigh Grundhoefer, Irwin Gaines, Doug Olson, Alain.
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Jan 2010 Current OSG Efforts and Status, Grid Deployment Board, Jan 12 th 2010 OSG has weekly Operations and Production Meetings including US ATLAS and.
Open Science Grid Frank Würthwein UCSD. 2/13/2006 GGF 2 “Airplane view” of the OSG  High Throughput Computing — Opportunistic scavenging on cheap hardware.
1 Open Science Grid Middleware at the WLCG LHCC review Ruth Pordes, Fermilab.
F Run II Experiments and the Grid Amber Boehnlein Fermilab September 16, 2005.
Open Science Ruth Pordes Fermilab, July 17th 2006 What is OSG Where Networking fits Middleware Security Networking & OSG Outline.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
Key Project Drivers - FY11 Ruth Pordes, June 15th 2010.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
INFSO-RI Enabling Grids for E-sciencE The US Federation Miron Livny Computer Sciences Department University of Wisconsin – Madison.
1 Open Science Grid.. An introduction Ruth Pordes Fermilab.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
May 8, 20071/15 VO Services Project – Status Report Gabriele Garzoglio VO Services Project – Status Report Overview and Plans May 8, 2007 Computing Division,
G RID M IDDLEWARE AND S ECURITY Suchandra Thapa Computation Institute University of Chicago.
Apr 30, 20081/11 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Apr 30, 2008 Gabriele Garzoglio.
Virtualization within FermiGrid Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
Metrics and Monitoring on FermiGrid Keith Chadwick Fermilab
Mar 28, 20071/9 VO Services Project Gabriele Garzoglio The VO Services Project Don Petravick for Gabriele Garzoglio Computing Division, Fermilab ISGC 2007.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
Production Coordination Staff Retreat July 21, 2010 Dan Fraser – Production Coordinator.
OSG Production Report OSG Area Coordinator’s Meeting Aug 12, 2010 Dan Fraser.
Partnerships & Interoperability - SciDAC Centers, Campus Grids, TeraGrid, EGEE, NorduGrid,DISUN Ruth Pordes Fermilab Open Science Grid Joint Oversight.
The Open Science Grid OSG Ruth Pordes Fermilab. 2 What is OSG? A Consortium of people working together to Interface Farms and Storage to a Grid and Researchers.
Job and Data Accounting on the Open Science Grid Ruth Pordes, Fermilab with thanks to Brian Bockelman, Philippe Canal, Chris Green, Rob Quick.
Open Science Grid Open Science Grid: Beyond the Honeymoon Dane Skow Fermilab September 1, 2005.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
4/25/2006Condor Week 1 FermiGrid Steven Timm Fermilab Computing Division Fermilab Grid Support Center.
1 Open Science Grid.. An introduction Ruth Pordes Fermilab.
Metrics and Monitoring on FermiGrid Keith Chadwick Fermilab
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
VO Privilege Activity. The VO Privilege Project develops and implements fine-grained authorization to grid- enabled resources and services Started Spring.
Production Coordination Area VO Meeting Feb 11, 2009 Dan Fraser – Production Coordinator.
Alain Roy Computer Sciences Department University of Wisconsin-Madison Condor & Middleware: NMI & VDT.
Status Organization Overview of Program of Work Education, Training It’s the People who make it happen & make it Work.
The OSG and Grid Operations Center Rob Quick Open Science Grid Operations Center - Indiana University ATLAS Tier 2-Tier 3 Meeting Bloomington, Indiana.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
Sep 25, 20071/5 Grid Services Activities on Security Gabriele Garzoglio Grid Services Activities on Security Gabriele Garzoglio Computing Division, Fermilab.
Open Science Grid: Beyond the Honeymoon Dane Skow Fermilab October 25, 2005.
An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick & Steve Timm.
FermiGrid Keith Chadwick. Overall Deployment Summary 5 Racks in FCC:  3 Dell Racks on FCC1 –Can be relocated to FCC2 in FY2009. –Would prefer a location.
Открытая решетка науки строя открытое Cyber- инфраструктура для науки GRID’2006 Dubna, Россия Июнь 26, 2006 Robertovich Gardner Университет Chicago.
Sep 17, 20081/16 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Sep 17, 2008 Gabriele Garzoglio.
1 An update on the Open Science Grid for IHEPCCC Ruth Pordes, Fermilab.
1 Open Science Grid.. An introduction Ruth Pordes Fermilab.
Tier 3 Support and the OSG US ATLAS Tier2/Tier3 Workshop at UChicago August 20, 2009 Marco Mambelli –
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
FermiGrid Keith Chadwick Fermilab Computing Division Communications and Computing Fabric Department Fabric Technology Projects Group.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
April 18, 2006FermiGrid Project1 FermiGrid Project Status April 18, 2006 Keith Chadwick.
OSG Facility Miron Livny OSG Facility Coordinator and PI University of Wisconsin-Madison Open Science Grid Scientific Advisory Group Meeting June 12th.
Defining the Technical Roadmap for the NWICG – OSG Ruth Pordes Fermilab.
1 Open Science Grid Progress & Vision Keith Chadwick, Fermilab
Grid Colombia Workshop with OSG Week 2 Startup Rob Gardner University of Chicago October 26, 2009.
FermiGrid The Fermilab Campus Grid 28-Oct-2010 Keith Chadwick Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
1 Open Science Grid Middleware at the WLCG LHCC review Ruth Pordes, Fermilab.
FermiGrid - PRIMA, VOMS, GUMS & SAZ Keith Chadwick Fermilab
Open Science Grid Progress and Status
f f FermiGrid – Site AuthoriZation (SAZ) Service
Leigh Grundhoefer Indiana University
Open Science Grid at Condor Week
Presentation transcript:

1 Open Science Grid Update for DOSAR Keith Chadwick, Fermilab

2 Outline Cover status/happenings on OSG since last talk. A representative Campus Grid - FermiGrid DOSAR collaboration with OSG?

3 Current OSG deployment 96 Resources across production & integration infrastructures - increase of about 10 in last 6 months. Note first UK site : Star in Birmingham 27 Virtual Organizations including operations and monitoring groups - increase of about 5 in last 6 months >15,000 CPUs ~6 PB MSS ~4 PB disk

4 OSG Funding OSG will be co-funded by DOE and NSF for 5 years at $6M/year starting in Sept ‘06 Includes major deliverables for US LHC, LIGO and STAR; well used by CDF and D0; and potentially other experiment major deliverables in the future. Commitment to engage communities beyond Physics. Commitment to collaborate with EGEE and TeraGrid. Reminder:  Consortium policies are to be open to participation by all researchers.  Project responsibilities are to operate, protect, extend and support the Distributed Facility for the Consortium.

5 Project Management CS/technology providers part of management teams. LHC well represented in the management structure. External project technical leads (Globus, CDIGS, SOS, LIGO-PIF) part of the Executive Board. Partners (WLCG, TeraGrid, EGEE etc.) represented through liaisons on the Council. # # # # # # # # # # # # #

6 Will review distribution each year 34.3Total FTEs 4.0Management & Staff 8.5Extensions in capability and scale. 2.5Education, outreach & training 2.5Engagement (Year 1 only) 6.5Software release and support 4.5Security and troubleshooting 5.8Facility operations FTEs Effort Distribution in Year 1

7 OSG support for non-physics communities.. update Seeing DOSAR VO jobs at OU submitted through OSG services and accounted on MonaLisa. Alan Blatecky’s group at RENCI is porting the “award winning Bioportal” to OSG. >100 Nanotechnology jobs -- that run from days -- are being executed on LIGO, ATLAS and CMS sites. OSG is discussing partnership with the Northwest Indiana Computing Grid (NWICG) -- which riases the issue of Gaussian. When we start talking to Computational Chemistry we quickly run into licencing issues. Yes, we also say it is the responsibility of the project/VO.. But there are 50 sites on OSG. The P-Grade portal has been interfaced to a version of CHARMM molecular dynamics simulation package. Some versions of this also have licencing issues. Work on Campus Grids enabling Crimson Grid, NWICG, New York State Grid (NYSG), GPN (Nebraska Education/Training grid) partnerships. (Note: Partners do not have to contribute resources; collaboration can equally be in software, procedures, education, training, security etc.)

8 OSG Middleware Infrastructure Applications VO Middleware Core grid technology distributions: Condor, Globus, Myproxy Virtual Data Toolkit (VDT) core technologies + software needed by stakeholders: e.g.VOMS, CEMon VDS, MonaLisa, VO-Privilege. OSG Release Cache: VDT + OSG specific configuration + utilities. ATLAS Panda, DQ etc CMS cmssw, LFC, etc. Bio blast, charmm etc. LIGO LDR, Catalogs etc. User Science Codes and Interfaces Existing Operating, Batch systems and Utilities. OSG Middleware is deployed on existing farms and storage systems. OSG Middleware interfaces to the existing installations of OS, utilities and batch systems. VOs have VO scoped environments in which they deploy applications (and other files), execute code and store data. VOs are responsible for and have control over their end-to-end distributed system using the OSG infrastructure.

9 OSG Software Releases OSG released in May. Security patch for Globus released in September. Still supporting some sites at Future releases:  OSG Release Schedule and function priorities driven by needs of US LHC.  Balanced by need to prepare for medium-longer term sustainability and evolution.  Goal of 2 major releases a year with easier minor updates inbetween.

10 Upcoming VDT Work Provide smooth updates to a running production service:  In-place vs. on-the-side  Preserve old configuration while making big changes.  Make updates an hour not a days work. Support for more platforms?  More linux variants and needs of EGEE.  AIX needed by NERSC.  Mac OS X being tested by STAR, TIGRE, …  Solaris? Add dCache/SRM packaged in RPMs and tested on existing storage/data management testbed.

11 OSG TaskPriority Gratia1 Improve ease of upgrade of one VDT version to another1 BDII2 About 50 small bug fixes and enhancements2 New GIPs2 gLite client tools for interoperability purposes2 New VDS2 dCache/SRM as "RPM"3 tclGlobus3 Update platforms we build on3 New Globus - WSGRAM and GT43 CEMON 4 SQUID 4 Globus Virtual Workspaces,for Edge Services Framework5 Job managers/GIPs for Sun Grid Engine5 Switch from MySQL to MySQL-Max5 Upgrade Monalisa to the latest version5 MDS 45 VOMRS5 glexec6 Grid Exerciser6 Production release of the VDT (1.4.0)6 Expect to rely on VDT or Release early Some components being tested for earlier installation:  Gratia accounting  CEmon for Resource Selection Service. OU early tester of Gratia -- we appreciate this a lot.

12 Risk based view of the world Organizations implement controls over their activities so as to obtain acceptable residual risk. Organizations: Sites, VOs and Grids.  Each has a security process lifecycle.  Satisfaction jointly and severally. Each organization is captain of its own ship.  However, constrained to interoperate. Standards (e.g. OSG User AUP, Service AUP, VO AUP) aid interoperation.

13 Site-VO Interoperation Perceived Risk Actual Risk? Implement controls OperateOperate n y Perceived Risk Actual Risk? Implement controls OperateOperate n y SiteVO

14 Scaling :-( Every organization has to reduce its risk to acceptable residual risk, and the security has to interoperate.  Identified trust.  Something specific is relied on  Is not checked each time.  Apropos Managerial, Operational and Technical Controls, for trust items. Scaling is a problem, work needs to be done  Common standards -- I.e OSG AUP.  Fewer entities  Aggregate small organizations into larger organizations  Larger organizations have greater resources

15 Liaison work TAGPMA, DOEgridPMA, EUgridPMA  A PMA is Policy Management Authority MWSWG -- Meeting at SLAC with emphasis on interoperability. GGF -- Global Grid Forum JSPG -- Joint Security Policy Mgemt Group DOE Security CET DOE Grid Briefing User AUP, Service AUP, VO AUP Grid Joint Security Policy working group.

16 Science Grid Communications Broad set of activities  (Katie Yurkewicz)  News releases, PR, etc.  Science Grid This Week, 1.5 years (>1000 subscribers)

17 OSG Newsletter Monthly newsletter  (Katie Yurkewicz)  7 issues now osgnews

18 OSG Summary The Open Science Grid infrastructure is a part of the WLCG Collaboration. OSG supports and evolves the VDT core middleware in support of the EGEE and the WLCG as well as other users. The OSG software stack is deployed to interoperate with the EGEE and provide effective systems for the US LHC experiments among others.

19 What is FermiGrid? FermiGrid is:  An example of a campus Grid.  A set of common services, including:  The Fermilab site globus gateway, VOMS, VOMRS, GUMS, SAZ, MyProxy, Gratia Accounting, etc.  A forum for promoting stakeholder interoperability and resource sharing within Fermilab.  The portal from the Open Science Grid to Fermilab Compute and Storage Services:  Production: fermigrid1, fngp-osg, fcdfosg1, fcdfosg2, docabosg2, sdss-tam, FNAL_FERMIGRID_SE (public dcache), stken, etc…  Integration: fgtest1, fnpcg, etc…

20 People FermiGrid Operations Team:  Keith Chadwick (CD/CCF/FTP) – Project Leader - Monitoring and Metrics  Steve Timm (CD/CSS/FCS) – Linux OS Support, Condor, Globus Gatekeeper  Dan Yocum (CD/CCF/FTP) – Middleware Support - VOMS, VOMRS, GUMS  Neha Sharma (CD/CCF/FTP) - Middleware Support - Storage, Accounting FermiGrid Stakeholder Representatives:  Keith Chadwick – FermiGrid Common Services Representative  Doug Benjamin – CDF Stakeholder Representative  Ian Fisk – U.S.CMS Representative  Amber Boehnlein & Alan Jonckheere – D0 Stakeholder Representatives  Nickolai Kouropatkine – DES & SDSS Stakeholder Representative  Steve Timm – Fermilab General Purpose Farms Representative  Ruth Pordes – OSG Stakeholder Representative  Eileen Berman & Rob Kennedy - Storage (enstore & dcache) Representatives FermiGrid Web Site & Additional Documentation: 

21 Before FermiGrid CDF User Resource Head Node Workers CDF Resource Head Node Workers Common Resource Head Node Workers D0 Resource Head Node Workers CMS Existing Common Gateway & Central Services Common Gateway & Central Services New Guest User New Resources to be added to “Common” FermiGrid Resources

22 Virtual Organizations FermiGrid currently hosts the following Virtual Organizations:  augerhttp://  deshttp://decam.fnal.gov/  dzerohttp://www-d0.fnal.gov/  fermilabhttp://  gaduhttp://www-wit.mcs.anl.gov/Alex/GADU/Index.cgi  nanohubhttp://  sdsshttp://  ilchttp://ilc.fnal.gov/  lqcdhttp://lqcd.fnal.gov/  I2u2http://www-ed.fnal.gov/uueo/i2u2.html

23 SAZ - Site AuthoriZation Module We are in the process of implementing our SAZ (Site AuthoriZation) module for the Fermilab “Open Science Enclave”. SAZ will enable Fermilab to make Grid job authorization decisions for the Fermilab site based on the users DN, VO, Role and CA. Our current plan is to operate in a default accept mode for user credentials that are associated with known and “trusted” VOs and CAs.

24 Moving to Resource FNAL User Resources User Access as of 9/06

25 OSG Interfaces for Fermilab FermiGrid offers opportunistic use of FNAL computing resources through Open Science Grid (OSG) interfaces. Most of this work to accomplish this happened in the context of the installation and configuration of the Fermilab Common Grid Services and deployment and integration on the GP Farm, CDF OSG1&2, DO OSG CAB2 clusters. The Fermilab “job-forwarding” gateway has caused some problems for users that assume that a job submitted to the fork jobmanager will have access to the same filesystem that a job submitted to the condor-g jobmanager has.  We are in the process of deploying a multi-cluster filesystem based on the BlueArc NFS server appliance to provide a unified filesystem view from all clusters participating in FermiGrid job forwarding.

26 Metrics and Service Monitors In addition to the normal operation effort of running the various FermiGrid services, significant effort has been spent over the last year to collect and publish operational metrics and instrument service monitors:  Globus gatekeeper calls by jobmanager per day  Globus gatekeeper IP connections per day  VOMS calls per day  VOMS IP connections per day  VOMS service monitor  GUMS calls per day  GUMS service monitor  Resource selection and acceptance of the fermilab VO at OSG sites Metrics typically run once a day. Service monitors typically run multiple times per day and are equipped to detect problems with the service that they are monitoring, notify administrators and automatically restart services as necessary to ensure continuous operations:

27 Metrics - Globus Gatekeeper - fermigrid1

28 Metrics - Gatekeeper IP Connections

29 Metrics – VOMS - fermigrid2

30 Metrics - VOMS - fermilab VO groups

31 Metrics - VOMS IP Connections

32 Monitoring - VOMS - fermigrid2

33 Metrics – GUMS - fermigrid3

34 Monitoring - GUMS - fermigrid3

35 fermilab VO acceptance within the OSG. We have recently started development of a fermilab VO acceptance probe. Goals:  See where members of the fermilab VO can execute grid jobs in the OSG.  Cross check VO Resource Selector report.  Record detailed results on success / failure.  Provide a historical record.  Serve as a tool for Fermilab management.

36 Actions performed by the fermilab VO Probe Fetches the current Gridcat list of sites (production or integration). For each Compute Service entry in the Gridcat list of sites, the fermilab VO probe fetches the Gridcat information corresponding to the "Site Name" and parses out the: The Site Name The Gatekeeper Host Name The Site Application Directory ($APP) The Site Data Directory ($DATA) The Site Temporary Directory ($TMP) The Site Worker Node Temporary Directory ($WNTMP) The VO probe then runs the following series of tests against the Site: ping -c 1 gatekeeper_host_name or globusrun -a -r (if the ping failes). globus-run-job gatekeeper_host_name /usr/bin/printenv globus-url-copy -v -cd local_file gsiftp:gatekeeper_host_name:2811/temporary_file_name globus-url-copy -v -cd gsiftp:gatekeeper_host_name:2811/temporary_file_name /dev/null globus-run-job gatekeeper_host_name /bin/rm temporary _file_name The start and end time of the tests is recorded, as are the results of the individual probes. If an individual test in the sequence fails, then all remaining tests are skipped. Finally a detail report, summary report and trend plots are generated from the probe results.

37 fermilab VO probe - Production

38 fermilab VO probe - ITB

39 FermiGrid - Current Work Policy, Security and Authorization Infrastructure:  Complete development of Site AuthoriZation (SAZ) Service and deployment (target is Oct 1, 2006).  gLexec development and deployment (target is Oct 1, 2006).  Fermilab Open Science Enclave enumeration, trust relationships and policy. Public storage and storage element:  FNAL_FERMIGRID_SE - public dcache storage element - 7 TBytes.  Migration to BlueArc NFS appliance storage. Further Metrics and Service Monitor Development:  Generate appropriate tickets to the OSG GOC for sites which are listed in VORS but fail fermilab VO probe.  Enhance and extend Metrics and Service Monitors. Service Failover:  At the present time the services are running in non-redundant mode.  We are thinking about the best ways to implement service failover.  We have a test fault tolerant GUMS server in operation.  Linux-HA clustering, BlueArc, XEN, and other technologies are being looked at. Research & Development & Deployment of future ITBs and OSG releases:  Ongoing work…  testbed systems are essential so that we can perform the research, development & integration without impacting our production services.  Looking at XEN to facilitate this work also.