State of the OSG Software Stack Alain Roy OSG Software Coordinator.

Slides:



Advertisements
Similar presentations
From Entrepreneurial to Enterprise IT Grows Up Nate Baxley – ATLAS Rami Dass – ATLAS
Advertisements

OSG Area Coordinators Meeting Operations Rob Quick 2/22/2012.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
Key Project Drivers - FY11 Ruth Pordes, June 15th 2010.
OSG Area Coordinators Meeting Operations Rob Quick 2/22/2012.
OSG Operations Rob Quick July 10th, 2012 OSG Staff Retreat.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
Integration and Sites Rob Gardner Area Coordinators Meeting 12/4/08.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EG recent developments T. Ferrari/EGI.eu ADC Weekly Meeting 15/05/
G RID M IDDLEWARE AND S ECURITY Suchandra Thapa Computation Institute University of Chicago.
2 Sep Experience and tools for Site Commissioning.
Overview of Monitoring and Information Systems in OSG MWGS08 - September 18, Chicago Marco Mambelli - University of Chicago
OSG Software and Operations Plans Rob Quick OSG Operations Coordinator Alain Roy OSG Software Coordinator.
WLCG Nagios and the NGS. We have a plan NGS is using a highly customised version of the (SDSC written) INCA monitoring framework. It was became too complicated.
Rob Quick OSG Operations Area Coordinator Manager High Throughput Computing Indiana University Integrating OSG Operational Services Rob Quick OSG Operations.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
INFSO-RI Enabling Grids for E-sciencE Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March.
March 11, 2008 USCMS Tier-2 Workshop Oh Dear God Alain made a PowerPoint presentation 1.
Grid Operations Lessons Learned Rob Quick Open Science Grid Operations Center - Indiana University.
OSG Integration Activity Report Rob Gardner Leigh Grundhoefer OSG Technical Meeting UCSD Dec 16, 2004.
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
Site Validation Session Report Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
Top 10 Reasons to Upgrade to OSG Version Rob Quick OSG Operations Coordinator.
The OSG and Grid Operations Center Rob Quick Open Science Grid Operations Center - Indiana University ATLAS Tier 2-Tier 3 Meeting Bloomington, Indiana.
Jan 2010 OSG Update Grid Deployment Board, Feb 10 th 2010 Now having daily attendance at the WLCG daily operations meeting. Helping in ensuring tickets.
Markus Schulz LCG Deployment WLCG Middleware Status Report 16 th February, 2009.
LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 1 Tier 1 status, a summary based upon a internal review Volker Gülzow DESY.
OSG Area Coordinators Meeting Operations Rob Quick 1/11/2012.
OSG Area Report Production – Operations – Campus Grids Jan 11, 2011 Dan Fraser.
CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.
WLCG Technical Evolution Group: Operations and Tools Maria Girone & Jeff Templon Kick-off meeting, 24 th October 2011.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Ops Portal New Requirements.
OSG Deployment Preparations Status Dane Skow OSG Council Meeting May 3, 2005 Madison, WI.
OSG Report for DOE/NSF Joint Oversight Group U.S. Large Hadron Collider Program OSG Report for DOE/NSF Joint Oversight Group U.S. Large Hadron Collider.
1 Update at RAL and in the Quattor community Ian Collier - RAL Tier1 HEPiX FAll 2010, Cornell.
Area Coordinator Report for Operations Rob Quick 4/10/2008.
LCG Issues from GDB John Gordon, STFC WLCG MB meeting September 28 th 2010.
OSG Operations All Hands Meeting Rob Quick (Ops Coordinator) Slides by: Scott Teige and Kyle Gross.
OSG Area Report Production – Operations – Campus Grids June 19, 2012 Dan Fraser Rob Quick.
Operations Area Coordinator Report. 31 Jan Overview Operations Current Initiatives  RSV Version 2  New Probes, Easier Configuration, Improved.
Tier 3 Support and the OSG US ATLAS Tier2/Tier3 Workshop at UChicago August 20, 2009 Marco Mambelli –
User Support of WLCG Storage Issues Rob Quick OSG Operations Coordinator WLCG Collaboration Meeting Imperial College, London July 7,
Stefano Belforte INFN Trieste 1 EGEE OSG Interoperability March 14, 2007 EGEE/OSG interoperability.
Opensciencegrid.org Operations Interfaces and Interactions Rob Quick, Indiana University July 21, 2005.
EMI is partially funded by the European Commission under Grant Agreement RI EMI SA2 Report Andres ABAD RODRIGUEZ, CERN SA2.4, Task Leader EMI AHM,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
A closer look at the VDT RPMs Alain Roy OSG Software Coordinator.
Integration TestBed (iTB) and Operations Provisioning Leigh Grundhoefer.
The Great Migration: From Pacman to RPMs Alain Roy OSG Software Coordinator.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI GLUE 2: Deployment and Validation Stephen Burke egi.eu EGI OMB March 26 th.
OSG Area Coordinators Meeting Security Team Report Mine Altunay 8/15/2012.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
OSG Facility Miron Livny OSG Facility Coordinator and PI University of Wisconsin-Madison Open Science Grid Scientific Advisory Group Meeting June 12th.
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
Accounting Update John Gordon. Outline Multicore CPU Accounting Developments Cloud Accounting Storage Accounting Miscellaneous.
Monitoring Working Group Update Grid Deployment Board 5 th December, CERN Ian Neilson.
Why you should care about glexec OSG Site Administrator’s Meeting Written by Igor Sfiligoi Presented by Alain Roy Hint: It’s about security.
Grid Colombia Workshop with OSG Week 2 Startup Rob Gardner University of Chicago October 26, 2009.
Software Tools Group & Release Process Alain Roy Mine Altunay.
Ruth Pordes, March 2010 OSG Update – GDB May 12 th 2010 Operations Services 1 Periodic reliability problems with end to end publishing to WLCG BDII – as.
Communication, Communication, Communication
Operations Interfaces and Interactions
NGI and Site Nagios Monitoring
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
OSG 3.0 Installation at MWT2
Leigh Grundhoefer Indiana University
Presentation transcript:

State of the OSG Software Stack Alain Roy OSG Software Coordinator

March , OSG All-Hands The Current State OSG is the latest release  Being released today  Security update OSG 1.2.x will be the current stable release for the foreseeable future  No current plans for OSG 1.4  Incremental updates coming in OSG 1.2  Take home reading:  jorMinorUpdates 2

March , OSG All-Hands Coming soon in OSG 1.2 New software additions planned soon-ish  FTS client tools  edg-gridftp-client Software updates (minor upgrades)  Bestman/Xrootd  CEMon  Glexec/PRIMA  Gratia probes 3

March , OSG All-Hands Beginning to think about Globus/GRAM 5  Much better GRAM scalability  A few issues blocking deployment but not testing  Can be installed alongside GRAM 2 CREAM  Much better scalability than GRAM 2  Much more complicated to deploy than GRAM 5, but may be better long-term option With Igor Sfiligoi, currently investigating and understanding effort and obstacles 4

March , OSG All-Hands Improved communication State of the world:  OSG 1.2.x will be updated indefinitely  Running in production means we need to be cautious about software update Problem:  We don’t communicate software stack changes to you well enough: we need to clearly inform and listen Proposed Solution: Software Evolution Proposals (SEP)  Clearly define set of upcoming changes  Process for moving from draft proposal to accepted set of changes  Somewhat formal, not too rigid 5

March , OSG All-Hands Three SEPs exist SEP 1: SEP Purpose and Guidelines SEP 2: How to Retire Old Platforms SEP 3: Retiring RHEL 3, Debian 4, and equivalents Say, do you mind if we drop RHEL 3 and Debian 4 support? 6

March , OSG All-Hands Native Packaging: Why are we behind? By now, I hoped to have glexec & worker node RPMs available In fall, we took a detour: LIGO’s urgent need for native packages Took longer than we thought:  Both RPM & Debian—a lot to learn, and subtle differences between them  Different mindset than Pacman—took a while to adjust  We were encouraged to make excellent packages, and they came out really well and we learned a heck of a lot 7

March , OSG All-Hands Native packages: Where are we today? Separate LIGO packages  Globus subset + MyProxy + GSIOpenSSH + UberFTP  Source & binary packages  Debian & RPM  Really native (FHS-compliant, etc…) OSG/CMS Hadoop RPMs:  Donated by Michael Thomas, help from Abhishek Rana  Based on Pacman-version of VDT EGEE/WLCG RPMs  Binary only  Used in gLite That’s right: four separate native package distributions! 8

March , OSG All-Hands Native packages: What comes next? Glexec RPM: Real soon now (March)  Initial version will be simple  Install glexec  Touch up install a bit  Convert full install to a single RPM  Will gradually improve to be done well Worker node software Client software Services (GUMS, VOMS, CE)… Merge the multiple distributions 9 Note the lack of a precise timeline here…

March , OSG All-Hands There’s a lot to do! Path from basic glexec RPM to well-done glexec RPM: 1.Make binary RPMs for each software component 2.Cope properly with configuration  Evolution of configure-osg (or the like) and config.ini as a post-install step to help configure the installed software 3.Befriend the natives  FHS-compliance (or appearance of compliance)  Properly set up of services (fetch-crl…) This may not look like much, but there is a lot to do here to make RPMs that people will like Some steps may be easier for glexec, but if we do them well they will aid us as we move to the other software 10

March , OSG All-Hands Questions? None of our future plans are set in stone—this is a great time to give feedback. 11

State of Operations Rob Quick OSG Operations Coordinator 12

March , OSG All-Hands OSG Ops and the WLCG Start Up Support  Ticketing  WLCG Communications Infrastructure Service  BDII  WLCG Metrics  Distributed Services 13

March , OSG All-Hands Ticketing Adding Effort for WLCG Tickets  Earlier Hours in the US  Friday Meetings to Review WLCG Tickets Web Services Based Ticket Exchange  Removes Dependencies  Improved Alerts on Failure  This is in Place Between GGUS and OSG Footprints 14

March , OSG All-Hands WLCG Communications Daily Attendance at the WLCG Ops Meetings Discussion of WLCG Items at the OSG Operations and Production Meetings Heavy Interactions with EGI SAM and GGUS Groups 15

March , OSG All-Hands Infrastructure The real story is not what we are doing, but what we are not doing. 16

March , OSG All-Hands Infrastructure Services BDII  SLA Adopted in August 2009  99.86% Availability  99.99% Reliability  DNS RR Working Extremely Well  Major Machine Room Move in October  Added Munin Monitoring with Alarms 17

March , OSG All-Hands WLCG Metric Reporting RSV Collector (Beginning October 1)  99.67% Availability  99.79% Reliability Several Issues with the Messaging Service  Records Always Resent Successfully  Recalculations as Requested 18

March , OSG All-Hands Distributed Services Effort to Bring OSG Services Not Hosted by the GOC Into the Same Forums as the Indiana University Hosted Services  Gratia and ReSS  Operations and Production Meetings  GOC Notifications 19

March , OSG All-Hands Change Management Scheduled Release Periods Change Management Procedures Community Notification Revisited Determining What Needs to Be Done and What Needs to Not Be Done 20

March , OSG All-Hands Nature of Operations Over 8000 Resolved Tickets  Average Time to Touch Any WLCG Ticket: 177 Minutes (About 3 Hours)  Average Time to Touch Any WLCG Ticket Submitted Between 9-5: 11 Minutes 14 Services (15 if you count MonALISA) 24x7 for 3.5 Years  Plus a few deprecated services  More if you count Grid3/iVDGL 21

March , OSG All-Hands Questions? 22