Software Management Workshop Steve Traylen. Software Management(WG5) The aim of the working group is to look at deficiencies in deployed and upcoming.

Slides:



Advertisements
Similar presentations
Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
Advertisements

ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
29 June 2006 GridSite Andrew McNabwww.gridsite.org VOMS and VOs Andrew McNab University of Manchester.
GLite Status Stephen Burke RAL GridPP 13 - Durham.
Presenter Name Facility Name EDG Testbed Status Moving to Testbed Two.
Andrew McNab - Manchester HEP - 2 May 2002 Testbed and Authorisation EU DataGrid Testbed 1 Job Lifecycle Software releases Authorisation at your site Grid/Web.
Andrew McNab - EDG Access Control - 14 Jan 2003 EU DataGrid security with GSI and Globus Andrew McNab University of Manchester
Andrew McNab - Manchester HEP - 6 November Old version of website was maintained from Unix command line => needed (gsi)ssh access.
A Model for Grid User Management Rich Baker Dantong Yu Tomasz Wlodek Brookhaven National Lab.
Oxford Jan 2005 RAL Computing 1 RAL Computing Implementing the computing model: SAM and the Grid Nick West.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
Andrew McNab - Manchester HEP - 22 April 2002 UK Rollout and Support Plan Aim of this talk is to the answer question “As a site admin, what are the steps.
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
Lessons for the naïve Grid user Steve Lloyd, Tony Doyle [Origin: 1645–55; < F, fem. of naïf, OF naif natural, instinctive < L nātīvus native ]native.
Steve Traylen Particle Physics Department EDG and LCG Status 9 th December 2003
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
Maarten Litmaath (CERN), GDB meeting, CERN, 2006/02/08 VOMS deployment Extent of VOMS usage in LCG-2 –Node types gLite 3.0 Issues Conclusions.
Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National Laboratory April 4, 2002.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
Towards a Global Service Registry for the World-Wide LHC Computing Grid Maria ALANDES, Laurence FIELD, Alessandro DI GIROLAMO CERN IT Department CHEP 2013.
Certification and test activity IT ROC/CIC Deployment Team LCG WorkShop on Operations, CERN 2-4 Nov
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
Stephen Burke – Data Management - 3/9/02 Partner Logo Data Management Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
University of Bristol 5th GridPP Collaboration Meeting 16/17 September, 2002Owen Maroney University of Bristol 1 Testbed Site –EDG 1.2 –LCFG GridPP Replica.
Conference name Company name INFSOM-RI Speaker name The ETICS Job management architecture EGEE ‘08 Istanbul, September 25 th 2008 Valerio Venturi.
Grid Security Vulnerability Group Linda Cornwall, GDB, CERN 7 th September 2005
Presenter Name Facility Name UK Testbed Status and EDG Testbed Two. Steve Traylen GridPP 7, Oxford.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
2-Sep-02Steve Traylen, RAL WP6 Test Bed Report1 RAL and UK WP6 Test Bed Report Steve Traylen, WP6
Andrew McNab - Manchester HEP - 17 September 2002 UK Testbed Deployment Aim of this talk is to the answer the questions: –“How much of the Testbed has.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Linux Operations and Administration
Julia Andreeva on behalf of the MND section MND review.
Author - Title- Date - n° 1 Partner Logo WP5 Status John Gordon Budapest September 2002.
1Maria Dimou- cern-it-gd LCG November 2007 GDB October 2007 VOM(R)S Workshop report Grid Deployment Board.
CERN Running a LCG-2 Site – Oxford July - 1 LCG2 Administrator’s Course Oxford University, 19 th – 21 st July Developed.
15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK
INFSO-RI Enabling Grids for E-sciencE gLite Certification and Deployment Process Markus Schulz, SA1, CERN EGEE 1 st EU Review 9-11/02/2005.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Ideal information system - CMS Andrea Sciabà IS.
EGEE is a project funded by the European Union under contract IST Experiment Software Installation toolkit on LCG-2
INRNE's participation in LCG Elena Puncheva Preslav Konstantinov IT Department.
Stephen Burke – Sysman meeting - 22/4/2002 Partner Logo The Testbed – A User View Stephen Burke, PPARC/RAL.
The GridPP DIRAC project DIRAC for non-LHC communities.
EGEE is a project funded by the European Union under contract IST Issues from current Experience SA1 Feedback to JRA1 A. Pacheco PIC Barcelona.
II EGEE conference Den Haag November, ROC-CIC status in Italy
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
LCG/EGEE Operational Issues Stephen Burke RAL. November 1 st 2004LCG Operations - Issues Introduction List of problems to initiate discussion –A personal.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
Maria Alandes Pradillo, CERN Training on GLUE 2 information validation EGI Technical Forum September 2013.
The Grid Information System Maria Alandes Pradillo IT-SDC White Area Lecture, 4th June 2014.
Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
The EDG Testbed Deployment Details
NGI and Site Nagios Monitoring
OGF PGI – EDGI Security Use Case and Requirements
Classic Storage Element
lcg-infosites documentation (v2.1, LCG2.3.1) 10/03/05
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
Grid Management Challenge - M. Jouvin
The LHCb Computing Data Challenge DC06
Presentation transcript:

Software Management Workshop Steve Traylen

Software Management(WG5) The aim of the working group is to look at deficiencies in deployed and upcoming software that make the operation of a stable production service challenging. Wednesday Software Management 104-R-B0. –Deployment and removal of VOs. –Service discovery for VO members. –Fault tolerance of system. What can break the grid, restart services remotely? –Shutting down and upgrading running services. Which services can’t be shut down. –Feeding back issues to developers. How to get things changed as a result of service challenge. Thursday Software Management –Level of intrusion of software on WNs. Can it be reduced, removed, sand boxed, what is reasonable? –Application Software Installation Spark and Tank - Is there someone tell us about it. –(Perhaps this belongs in the fabric area) –Software deployment process. Goals of the pre production test. –Improving Grid wide scheduling. (Perhaps in fabric section.) Can the service challenges find what the problem is? Glue deficiencies, can /should we just extend Glue ourselves?

Those Present Steve Traylen RAL Chris Brew RAL Gonzalo Merino PIC Jeff Templon NIKHEF Robin Midddleton RAL Laurence Field CERN ? Torino Stephen Pickles Manchester – NGS Alex Romania - NGI Fabio Lyons - Piere CC-IN2P3 - ? GSI. Rhys Newman Oxford Mixture of site admins, end users and grid deployers.

Adding VOs at a Site We decided that VO services was being discussed in other groups – concentrated on the site level. Review of what has to be done. –Create group –Create pool accounts –Create /home directories –Touch gridmapdir –Adapt mkgridmap somehow. –Create storage areas with correct permissions. –Software area. –Queues for the VO. –Configure the end points for default BDII. –Modify access control lists on services. –VO specific environment VO_ATLAS_ (Currently a pain when only one SE to define for every VO) –Atlas software manager VO is also needed. –Maui config.

Could this be simplified? Nothing much could be removed or simplified. Main difficulty was editing LCFG profiles which is manual and error prone. –It was thought new modular install scripts would help by bunching actions for a VO together. –Modularity addressed site differences, e.g. NIS vs. LDAP vs. /etc/passwd. –Quattor schema was also designed with this in mind. Using VOMS and LCMAPS might make life easier on CE. –Only one set of pool accounts with secondary groups possible. –But with no VOMS aware SE the problem is not reduced.

Removing a VO Disabling a VO for a time was considered easy today. However destroying a VO and getting all of files shifted of site difficult. –Comes down to providing methods for removing services. Summary for adding/removing VOs. –Not a lot can be done at this time other than wrapping the process.

Service Discovery within a VO. Current issue were mainly confined to the user interface. –Sysadmin expected to provide default files for VO. –Sysadmin does not know even what VOs their “local” users are in. –The VO members can not find the services and new ones are added. We should request that the UI is able to look in the information system to locate RB and L&B service. –Although this is would be the default the user can also specify a static configuration file. –They or the site may trust or want to use a particular one themselves and so override selection. –The sysadmin only has to provide an end point of the information - of course this led to multiple information systems.

Multiple Information Views. This was not being discussed elsewhere so was considered. General consensus was that it was undesirable. –It was introduced to protect established sites from sites in testing. Improved tolerance in MW has made this less important. –It is now used by VOs to black list sites quickly. Using site tags was not quick enough and could be impossible to do (e.g. down GridFTP server on CE).

VO area of schema. Could a there be a space in Glue for VOs to publish a sites status for their benefits. –This was considered a possible schema extension. Having middleware respect it though is considerable work. –Maybe easier for the RB as the tags could be matched against. –Data handling would be useful since running jobs would have their output directed elsewhere. Summary, –We want to get rid of multiple information views but this solution may be too hard.

Shutting Down Services Various Scenarios –Planned e.g. Disabling a VO –ASAP e.g. Kernel upgrade. –Unplanned e.g. Pulling the cable out. Current status is published for services. –The exact times are not published, though recorded in the GOC. Again adding this to schema is possible but the added intelligence in MW is serious work. We can’t submit it as a bug, only a recommendation for future work.

Feedback to Developers Many sources of bugs and requests, –C & T testbed. –Pre-Production Users/Admin –Production Users/Admins Most of these will come via the user/operation support system – GGUS. A free for all direct to developers will drown them. Members of operations helpdesk need to be the interface between users/admins and developers and to influence priorities of the developers.