13 June 2007Operations Workshop, Stockholm1 Hepix/WLCG System Management WG Alessandra Forti Operations Workshop 14 June 2007.

Slides:



Advertisements
Similar presentations
Andrew McNab - Manchester HEP - 24 May 2001 WorkGroup H: Software Support Both middleware and application support Installation tools and expertise Communication.
Advertisements

23 May 2007Hep Sysman, RAL Hepix/WLCG System Management WG: an update Alessandra Forti Hep Sysman, RAL 23 May 2007.
AD User Import From SIMS.NET
July 2010 D2.1 Upgrading strategy Javier Soto Catalog Release 3. Communities.
BiodiversityCatalogue How-Tos Robert Haines. BiodiversityCatalogue Home Hover over the ‘s for more information!
Next Gen Web Solutions Student Employment Employer Training Template.
Seattle Drupal Clinic Introduction to Drupal and Web Content Management.
NorthGrid status Alessandra Forti Gridpp13 Durham, 4 July 2005.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
Software Documentation Written By: Ian Sommerville Presentation By: Stephen Lopez-Couto.
Tripwire Enterprise Server – Getting Started Doreen Meyer and Vincent Fox UC Davis, Information and Education Technology June 6, 2006.
Creating Online Class Communities Jennifer Dorman Discovery Education
SharePoint Step by Step Step by Step Table of Contents Portal versus Communities sites How to View All Your Project Sites The Basic SharePoint Layout SharePoint.
Drupal Workshop Introduction to Drupal Part 1: Web Content Management, Advantages/Disadvantages of Drupal, Drupal terminology, Drupal technology, directories.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL, UK) 4 May 2011 HEPiX, GSI, Darmstadt david.kelsey at stfc.ac.uk.
Turkey IDA Info-Day PM Session, September 25, 2003 CIRCA 1 CIRCA : The IDA Collaborative Software Tool Grzegorz Ambroziewicz European Commission - DG Enterprise.
Virtual Company Group 8 Presentation Date: June /04/2017
MAE Atlassian Tool Suite Administration Training July 8 th, 2013.
A guide for UICET for using Wikispaces.  A wiki is a web page or collection of web pages that can be linked together as a website.  Wikis are often.
Web 2.0 for Government Knowledge Management Everyone benefits by sharing knowledge March 24, 2010 Emerging Technologies Work Group Rich Zaziski, CEO FYI.
SMART Agency Tipsheet Staff List This document focuses on setting up and maintaining program staff. Total Pages: 14 Staff Profile Staff Address Staff Assignment.
How Web Servers and the Internet Work by by: Marshall Brainby: Marshall Brain
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
ISG: Course Account Training Resources: >Tutor/TA Handbook >Training>scripts >ISG WiKi.
Easy Chair Online Conference Submission, Tracking and Distribution Process: Getting Started + Information for Reviewers AMS World Marketing Congress /
VOMS Alessandra Forti HEP Sysman meeting April 2005.
0 eCPIC User Training: Resource Library These training materials are owned by the Federal Government. They can be used or modified only by FESCOM member.
Proposal for the new group web infrastructure SFT Group meeting 3/7/2009 Yves Perrin.
IFORM ACCOUNT MAINTENANCE ICT4D SESSION 4. IFORMBUILDER WEBSITE REQUIREMENTS To access the iFormBuilder website, you need the following items: -Reliable.
Two Rivers Chapter Website Navigating through …. Visit
Wiki Workshop Tech PD.
OT Connections is AOTA’s new online community which allows occupational therapists, occupational therapy assistants and students to connect with each.
Seattle Drupal Clinic Introduction to Drupal Part 1: Web Content Management, Advantages/Disadvantages of Drupal, Drupal terminology.
Graphing and statistics with Cacti AfNOG 11, Kigali/Rwanda.
Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.
Scottish Centre for Regeneration (SCR) – Learning Networks quick guide to the online forum platform.
Training by the Office of Library and Information Services Contact for more information: karen.gardner- or
Security Policy Update LCG GDB Prague, 4 Apr 2007 David Kelsey CCLRC/RAL
Training and Dissemination Enabling Grids for E-sciencE Jinny Chien, ASGC 1 Training and Dissemination Jinny Chien Academia Sinica Grid.
GGUS at PEB – –- page 1 LCG Klaus-Peter Mickel, GridKa Karlsruhe LCG-PEB-Meeting ( ) The Global Grid User Support Model (Report of GDB.
Brussels, Belgium, Nov 26 th, 2009 Limassol, Cyprus, April 6 th, 2011 Dissemination Activities Marco Winckler & Philippe Palanque Interactive Critical.
Mtivity Client Support System Quick start guide. Mtivity Client Support System We are very pleased to announce the launch of a new Client Support System.
ESCMID phone ESCMID/ESGAP Open Virtual Learning Community (OVLC) Draft Concept of Future Development and Costs November.
COORENOR COORENOR Web Portal COORENOR Agenda Where we are? (Summarize features of the COORENOR web portal.) Where are we going? (Show how to.
Portal Update Plan Ashok Adiga (512)
Online Submission and Management Information -- Authors AMS Annual Conference / AMS WMC Click on play to begin show.
Andrew McNabGrid in 2002, Manchester HEP, 7 Jan 2003Slide 1 Grid Work in 2002 Andrew McNab High Energy Physics University of Manchester.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Site Manageability & Monitoring Issues for LCG Ian Bird IT Department, CERN LCG MB 24 th October 2006.
EGEE is a project funded by the European Union under contract IST GLite Integration Infrastructure Integration Team JRA1.
Proposal for a Global Network for Beam Instrumentation [BIGNET] BI Group Meeting – 08/06/2012 J-J Gras CERN-BE-BI.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Operations procedures: summary for round table Maite Barroso OCC, CERN
Gail Matthews WIKI is a fabulous solution to engage students outside (and inside) of the classroom. Wiki Instructions: The Basic How-To.
23 January 2007WLCG workshop, CERN System Management Working Group Alessandra Forti WLCG workshop CERN, 23 January 2007.
>Learning together: introducing Wikis - secondary.
Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.
INFSO-RI Enabling Grids for E-sciencE Fabric and Management WG Davide Salomoni NIKHEF Lyon, ARM-3 –
EMI INFSO-RI Testbed for project continuous Integration Danilo Dongiovanni (INFN-CNAF) -SA2.6 Task Leader Jozef Cernak(UPJŠ, Kosice, Slovakia)
II EGEE conference Den Haag November, ROC-CIC status in Italy
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
INFSO-RI Enabling Grids for E-sciencE GOCDB2 Matt Thorpe / Philippa Strange RAL, UK.
Monitoring Working Group Update Grid Deployment Board 5 th December, CERN Ian Neilson.
Knowledge Hub Walkthrough August
Knowledge Hub Walkthrough August
Online Submission and Management Information -- Authors
Quattor Usage at Nikhef
ICOTS Helpdesk Training
Manage your Interest Group
Welcome to the European Shoemaker e-learning platform introduction
Tribal Stewardship Cohort Program
Presentation transcript:

13 June 2007Operations Workshop, Stockholm1 Hepix/WLCG System Management WG Alessandra Forti Operations Workshop 14 June 2007

13 June 2007Operations Workshop, Stockholm2 Layout Mandate WEB site Wiki Repositories Group Dissemination Conclusions

13 June 2007Operations Workshop, Stockholm3 Mandate: Intro One of the problems observed (by EGEE and LCG) in providing a reliable grid service is the reliability of the local fabric services of participating sites. The SMWG should bring together the existing expertise in different area of fabric management to build a common repository of tools and knowledge for the benefit of HEP system managers’ community. The idea is not to present all possible tools nor to create new ones, but to recommend specific tools for specific problems according to the best practices already in use at sites. Although this group is proposed in order to help improve grid sites reliability, the results should be useful to any site running similar local services. Two areas should be improved by the group: tools and documentation.

13 June 2007Operations Workshop, Stockholm4 Mandate: Goals Improve overall level of grid site reliability, focusing on improving system management practices, sharing expertise, experience and tools Provide a repository –Management tools –Fabric monitoring sensors –HOWTOs Provide site manager input to requirements on grid monitoring and management tools Propose existing tools to the grid monitoring working group as solutions to general problems Produce a Grid Site Fabric Management cook-book –Recommend basic tools to cover essential practices, including security management –Discover what are common problems for sites and document how experienced sites solve them –Document collation of best practices for grid sites Point out holes in existing documentation sets Identify training needs –To be addressed in a workshop or by EGEE for example? We have been contacted and haven’t replied yet.

13 June 2007Operations Workshop, Stockholm5 Preliminary list of areas and tools System Management Areas –Filesystems: ext(2,3), XFS, NFS, AFS, dcache, DPM –Networking: Interfaces, IPs, Routers, Gateways, NAT –Databases: mysql, Oracle, ldap, gdbm –Processes: system, users monitoring –Servers: http, dhcp, dns, ldap, sendmail or other, sshd, (grid)ftp rfio –Batch systems: LSF, Torque, Maui, BQS, Sun Grid Engine, Condor –Security: login access pool accounts, certificates management and monitoring, non required services, ports list backups, monitoring(file systems, processes, networking), log files (grid services included) –……… Common Fabric Monitoring and Management Tools –Monitoring: Ganglia, Nagios, Ntop, Home grown, SAM, GridICE, Lemon –Management: Cfengine, Npaci rocks, Kickstart, Quattor –Security: iptables, rootkit, tripwire, nmap, ndiff, tcpdump, syslog, yummit –Grid Configuration: Yaim, Quattor

13 June 2007Operations Workshop, Stockholm6 WEB site WEB site has been setup in Manchester – It’s based on GridSite – allows ACLs control based on x509 certificates The WEB site hosts –Wiki (Cookbook requested in the mandate) –Subversion repositories (sharing scripts)

13 June 2007Operations Workshop, Stockholm7 Subversion Repositories Integrated with GridSite –Read access is allowed to anyone –Write access based on certificates no need to create accounts but need to be added to the ACLs –Different repositories have different ACLs Fabric-management (SMWG) Fabric-monitoring (SMWG) Grid-monitoring (GSWG) –Created, will be used soon. Other. Creating a repo is very easy!

13 June 2007Operations Workshop, Stockholm8 Subversion repositories (2) The tools should be management scripts or monitoring sensors written by sys admins to solve a local problem –However they should be generic enough to work at other sites Each script should have a banner containing the following information –Description –Author –Institute –Creation date –License –Repository version number Scripts not necessarily committed by the author –Always with their permission and license they want to use. There are currently 13 scripts in the repositories –We need more!

13 June 2007Operations Workshop, Stockholm9 Wiki It is also integrated with GridSite –Accounts based on DN rather than user name and password. Simple rules to edit the wiki: –Each article should belong at least to one category to facilitate navigation and identification of the problem. –If the article contains a link to a script in the repositories it should belong to the “category scripts” –Each article or portion of article should bear the name and institute of the source if it is not the same as the page author. For example if the text is extracted from a received .

13 June 2007Operations Workshop, Stockholm10 Wiki (2) Structure of categories is hierarchical with four top categories –Fabric management –Fabric monitoring –Best Practices (mostly basic and grid security) –Scripts to help navigate the repositories Subcategories are normally associated with a tool or one of the areas listed in a previous slide and then there are the articles. –Fabric Management (category) -> Cfengine (subcategory) -> Getting_started (article) Content at the moment: – there are 51 articles and 20 categories

13 June 2007Operations Workshop, Stockholm11 Wiki(3) If good documentation is available some where else put just a pointer to the existing documentation. –Apply the minimum effort philosophy. For example Quattor page just points to the Quattor working group site after a small introduction. –But if someone wants to add an article with it’s own experience can do it. Editing is currently done by me in a non systematic way. –Mostly assign articles to categories. However we used a wiki rather than writing a static document to avoid editing issues –Everyone should feel free to help writing an article or edit a stub.

13 June 2007Operations Workshop, Stockholm12 SMWG Group Chairs: –Alessandra Forti (University of Manchester) –Michel Jouvin (LAL) Sent a call for participation to –HEPiX and all the T1s Mailing list: –26 subscribers Meetings normally every fortnight the details are here: – –Mainly to give updates about what people have done in the two weeks. Haven’t had one in a while will resume them after Stockholm.

13 June 2007Operations Workshop, Stockholm13 SMWG Group (2) All the work is based on people volunteering to share –There are no dedicated people So there is no definition of group –Some people have only subscribed the mailing list –Some have subscribed the mailing list, participated to the meetings and done some work –Some people have acted as consultants and accepted their scripts to be distributed but are not on the mailing list nor come to the meetings –Some people have actually started editing the wiki with some stubs without even being in contact with any member of the group (i.e. mailing list subscribers) It’s a start but it is not easy make people volunteer

13 June 2007Operations Workshop, Stockholm14 Dissemination Sent an to dcache user forum –They are the main users outside the group RSS feeds for OSCT point to some articles in the wiki Talks –HEPiX, UK HEP Sysman, GDB, Ops Workshop Should put a link from HEPiX WEB site –Michel is looking into it –Send an to HEPiX mailing list when this is done Link from LCG WEB site? –Haven’t discussed this with anyone yet Send an to LCG-ROLLOUT –Was waiting to have a bit more content to convince people of the usefulness.

13 June 2007Operations Workshop, Stockholm15 Conclusions There is a mandate There is a wiki There are repositories There is a group We need only people to contribute Questions?