OSG S ITE INSTALLATION AND M AINTENANCE Suchandra Thapa Computation Institute University of Chicago.

Slides:



Advertisements
Similar presentations
Open Science Grid Use of PKI: Wishing it was easy A brief and incomplete introduction. Doug Olson, LBNL PKI Workshop, NIST 5 April 2006.
Advertisements

AustrianGrid, LCG & more Reinhard Bischof HPC-Seminar April 8 th 2005.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
MyOSG: A user-centric information resource for OSG infrastructure data sources Arvind Gopu, Soichi Hayashi, Rob Quick Open Science Grid Operations Center.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
Key Project Drivers - FY11 Ruth Pordes, June 15th 2010.
Rsv-control Marco Mambelli – Site Coordination meeting October 1, 2009.
CILogon OSG CA Mine Altunay Jim Basney TAGPMA Meeting Pittsburgh May 27, 2015.
Chapter 18: Windows Server 2008 R2 and Active Directory Backup and Maintenance BAI617.
SRM at Clemson Michael Fenn. What is a Storage Element? Provides grid-accessible storage space. Is accessible to applications running on OSG through either.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
OSG Site Provide one or more of the following capabilities: – access to local computational resources using a batch queue – interactive access to local.
Integration and Sites Rob Gardner Area Coordinators Meeting 12/4/08.
| nectar.org.au NECTAR TRAINING Module 5 The Research Cloud Lifecycle.
OSG Fundamentals Scot Kronenfeld - Marco Mambelli –
OSG Security Kevin Hill. Goals Operational Security – Identify software vulnerabilities – observing the practices of our VOs and sites, and sending alerts.
Blueprint Meeting Notes Feb 20, Feb 17, 2009 Authentication Infrastrusture Federation = {Institutes} U {CA} where both entities can be empty TODO1:
G RID M IDDLEWARE AND S ECURITY Suchandra Thapa Computation Institute University of Chicago.
OSG S ITE INSTALLATION AND M AINTENANCE Suchandra Thapa Computation Institute University of Chicago.
J OINING OSG Suchandra Thapa Computation Institute University of Chicago.
Overview of Monitoring and Information Systems in OSG MWGS08 - September 18, Chicago Marco Mambelli - University of Chicago
Evolution of the Open Science Grid Authentication Model Kevin Hill Fermilab OSG Security Team.
Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
Open Science Grid OSG CE Quick Install Guide Siddhartha E.S University of Florida.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks David Kelsey RAL/STFC,
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Next steps with EGEE EGEE training community.
Introduction to OSG Security Suchandra Thapa Computation Institute University of Chicago March 19, 20091GSAW 2009 Clemson.
Introduction to OSG Fundamentals Suchandra Thapa Computation Institute University of Chicago March 19, 20091GSAW 2009 Clemson.
Mine Altunay July 30, 2007 Security and Privacy in OSG.
Grid Operations Lessons Learned Rob Quick Open Science Grid Operations Center - Indiana University.
Overview of Privilege Project at Fermilab (compilation of multiple talks and documents written by various authors) Tanya Levshina.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
Portal Update Plan Ashok Adiga (512)
Configuring and Troubleshooting Identity and Access Solutions with Windows Server® 2008 Active Directory®
The OSG and Grid Operations Center Rob Quick Open Science Grid Operations Center - Indiana University ATLAS Tier 2-Tier 3 Meeting Bloomington, Indiana.
DTI Mission – 29 June LCG Security Ian Neilson LCG Security Officer Grid Deployment Group CERN.
GRID Centralized Management of the Globus grid-mapfile Carlo Rocca, INFN Catania.
Auditing Project Architecture VERY HIGH LEVEL Tanya Levshina.
OSG Site Admin Workshop - Mar 2008Using gLExec to improve security1 OSG Site Administrators Workshop Using gLExec to improve security of Grid jobs by Alain.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
Open Science Grid Build a Grid Session Siddhartha E.S University of Florida.
OSG Storage VDT Support and Troubleshooting Concerns Tanya Levshina.
CernVM-FS Infrastructure for EGI VOs Catalin Condurache - STFC RAL Tier1 EGI Webinar, 5 September 2013.
VOX Project Tanya Levshina. 05/17/2004 VOX Project2 Presentation overview Introduction VOX Project VOMRS Concepts Roles Registration flow EDG VOMS Open.
The GRIDS Center, part of the NSF Middleware Initiative Grid Security Overview presented by Von Welch National Center for Supercomputing.
VOX Project Status T. Levshina. 5/7/2003LCG SEC meetings2 Goals, team and collaborators Purpose: To facilitate the remote participation of US based physicists.
Tier 3 Support and the OSG US ATLAS Tier2/Tier3 Workshop at UChicago August 20, 2009 Marco Mambelli –
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
OSG PKI Transition Impact on CMS. Impact on End User After March , DOEGrids CA will stop issuing or renewing certificates. If a user is entitled.
Storage Element Security Jens G Jensen, WP5 Barcelona, May 2003.
OSG Security: Updates on OSG CA & Federated Identities Mine Altunay, PhD OSG Security Team OSG AHM March 24, 2015.
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
Development of the Fermilab Open Science Enclave Policy and Baseline Keith Chadwick Fermilab Work supported by the U.S. Department of.
OSG PKI Transition Mine Altunay OSG Security Officer
OSG Area Coordinators Meeting Security Team Report Mine Altunay 8/15/2012.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
OSG Fundamentals Adapted from: Alain Roy Terrence Martin Suchandra Thapa.
Certificate Security For Users Obtaining and Using Your Personal Certificate using the OSG PKI Kyle Gross – OSG Operations Support Lead Elizabeth Prout.
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
New OSG Virtual Organization Security Training OSG Security Team.
Grid Colombia Workshop with OSG Week 2 Startup Rob Gardner University of Chicago October 26, 2009.
OSG Fundamentals Alain Roy Terrence Martin Suchandra Thapa.
OSG Security Kevin Hill.
Active Directory Administration
Leigh Grundhoefer Indiana University
Presentation transcript:

OSG S ITE INSTALLATION AND M AINTENANCE Suchandra Thapa Computation Institute University of Chicago

I NTRODUCTION TO OSG Introduction to OSG Site Policies and Requirements Installing an OSG site Maintaining a site Q&A time April 23, 2009 NCGS 2009 Chapel Hill 2

I NTRODUCTION TO OSG OSG stands for Open Science Grid Provides high-throughput computing across US Currently more than 75 sites Recent stats: 282,912 jobs for 433,051 hours Used 75 sites Jobs by ~20 different virtual organizations 92% of jobs succeeded Underestimate: 4 sites didn’t report anything Provides opportunistic computing for VOs Focus on high-throughput computing rather than high performance computing April 23, NCGS 2009 Chapel Hill

B ASIC T ERMS CE – Compute Element SE – Storage Element VO – Virtual Organization WN – Worker Node GOC – Grid Operations Center GUMS – Grid User Management Server April 23, NCGS 2009 Chapel Hill

S ITE P OLICIES AND R EQUIREMENTS Introduction to OSG Site Policies and Requirements Installing an OSG site Maintaining a site Q&A time April 23, 2009 NCGS 2009 Chapel Hill 5

Y OUR ROLE AS AN ADMIN As a site admin, you should: Keep in touch with OSG (downtime, security, etc.) Provide the GOC with a site policy and register your site with the GOC Respond to trouble tickets or inquiries from GOC Plan your site’s layout Update software as needed (within limits) Participate and be a good community member April 23, NCGS 2009 Chapel Hill

S UPPORT PROVIDED FOR ADMINS OSG provides: Software and ancillary information (configuration tools, documentation, recommendations) Assistance in keeping site running smoothly Help in troubleshooting and installing software Users for your site An exciting, cutting-edge, 21st-century collaborative distributed computing grid cloud buzzword-compliant environment April 23, NCGS 2009 Chapel Hill

I NCIDENT R ESPONSE P OLICY Incident: any real or suspected event that poses a real or potential threat You MUST Report and Respond Report: entDiscoveryReporting Respond: follow this policy in collaboration with OSG

I NCIDENT R ESPONSE P OLICY (P ART 2) When contacting OSG, let us know: If any certs are compromised, or suspicious If any VO accounts are affected Have you informed any CA for revocation ? Have you shut down the node ? Will you ? Any suspicious connection out of your node to another grid resource ? Any corrupted data Please KEEP US INFORMED, keep ing during your forensics, even if you think it is embarrassing – We are ALL in this together

I NSTALLING AN OSG S ITE Introduction to OSG terms and operations Site Policies and Requirements Installing an OSG site Maintaining a site Q&A time April 23, 2009 NCGS 2009 Chapel Hill 10

S ITE PLANNING Bureaucratic details Cluster layout Disk layout / sharing Authorization April 23, NCGS 2009 Chapel Hill

B UREAUCRACY Certificates (personal/host) VO registrations Registration with OSG Need a site name (e.g. UC_ITB) Need contacts (security, admin, etc.) Site policy on web April 23, NCGS 2009 Chapel Hill

S TARTING OUT Everyone using OSG gets a personal certificate because it is required to do any activity on an OSG resource Will need to know or contact someone with DOEGrids certificate in order to obtain a personal certificate April 23, NCGS 2009 Chapel Hill

S ITE R EGISTRATION USING OIM Done using OIM at Will need to register first, After GOC approves registration : Registrations > Resources > Add New Resource April 23, NCGS 2009 Chapel Hill

C LUSTER L AYOUT How is software / data being shared NFS can work but gets bogged down with larger workloads Where do services run? Single server vs. dedicated servers Worker node software? Locally present on worker nodes vs. served over nfs Certificates shared? April 23, NCGS 2009 Chapel Hill

R EQUIRED D IRECTORIES FOR CE / C LUSTER OSG_APP : Store VO applications  Must be shared (usually NFS)  Must be writeable from CE, readable from WN  Must be usable by whole cluster OSG_GRID : Stores WN client software  May be shared or installed on each WN  May be read-only (no need for users to write)  Has a copy of CA Certs & CRLs, which must be up to date OSG_WN_TMP: temporary directory on worker node  May be static or dynamic  Must exist at start of job  Not guaranteed to be cleaned by batch system April 23, NCGS 2009 Chapel Hill

O PTIONAL DIRECTORIES FOR CE OSG_DATA : Data shared between jobs Must be writable from the worker nodes Potentially massive performance requirements Cluster file system can mitigate limitations with this file system Performance & support varies widely among sites 0177 permission on OSG_DATA (like /tmp) April 23, NCGS 2009 Chapel Hill

S PACE R EQUIREMENTS Varies between VOs  Some VOs download all data & code per job (may be Squid assisted), and return data to VO per job.  Other VOs use hybrids of OSG_APP and/or OSG_DATA OSG_APP used by several VOs, not all.  1 TB storage is reasonable  Serve from separate computer so heavy use won’t affect other site services. OSG_DATA sees moderate usage.  1 TB storage is reasonable  Serve it from separate computer so heavy use of OSG_DATA doesn’t affect other site services. OSG_WN_TMP is not well managed by VOs and you should be aware of it.  ~100GB total local WN space  ~10GB per job slot. April 23, NCGS 2009 Chapel Hill

W ORKER N ODE S TORAGE Provide about 12GB per job slot Therefore 100GB for quad core 2 socket machine Not data critical, so can use RAID 0 or similar for good performance April 23, NCGS 2009 Chapel Hill

A UTHORIZATION Two major setups: Gridmap setup File with list of mappings between DN and local account Can be generated by edg-mkgridmap script Doesn’t handle users in mulitple VOs or with VOMS roles GUMS Service with list of mappings A little more complicated to setup Centralizes mappings for entire site in single location Handles complex cases better (e.g. blacklisting, roles, multiple VO membership) Preferred for sites with more complex requirements Ideally on dedicated system (can be VM) April 23, NCGS 2009 Chapel Hill

CE I NSTALLATION O VERVIEW Prerequistes Certificates Users Installation Pacman Configuration Getting things started April 23, NCGS 2009 Chapel Hill

L OCAL ACCOUNTS You need following local accounts: User for RSV Daemon account used by most of VDT Globus user is optional but will be used if found April 23, NCGS 2009 Chapel Hill

B ASIC INSTALLATION AND CONFIGURATION Install Pacman Download tar.gz Untar (keep in own directory) Source setup Make OSG directory Example: /opt/osg symlink to /opt/osg-1.0 Run pacman commands Get CE (pacman –get OSG:ce) Get job manager interface (pacman –get OSG:Globus-Condor-Setup) Configure Run edg-mkgridmap or gums-host-cron Configure CA certificates updater Edit config.ini Run configure_osg.py (configure-osg.py –c) Start services (vdt-control –on) April 23, NCGS 2009 Chapel Hill

S ITE M AINTENANCE Introduction to OSG terms and operations Site Policies and Requirements Installing an OSG site Maintaining a site Q&A time April 23, 2009 NCGS 2009 Chapel Hill 24

U PDATING CA S CAs are regularly updated New CAs added Old CAs removed Tweaks to existing CAs If you don’t keep up to date: May be unable to authenticate some user May incorrectly accept some users Easy to keep up to date vdt-update-certs Runs once a day, gets latest CA certs April 23, NCGS 2009 Chapel Hill

M ONITORING S ITE S TATUS Several tools available RSV Part of install Will present a web page with quick status update site functionality Can integrate with nagios Nagios/Ganglia/Cacti Presents information on non-grid specific details of cluster Can set up alerts, pages, etc. Gratia Provides accounting information on jobs running on your site Useful to see who is using your site and how much utilization comes from various users Daily/Weekly reports Provides quick information on your site and osg at large at a glance April 23, NCGS 2009 Chapel Hill

RSV OUTPUT FOR UC_ITB RESOURCE Web page generated by RSV showing the output of various probes. Clicking on the probe output will give history for last few invocations and error output April 23, NCGS 2009 Chapel Hill

D AILY E MAIL REPORT ON P RODUCTION S ITES Example of the daily sent out to administrators with information on jobs and sites over the last day April 23, NCGS 2009 Chapel Hill

G RATIA R EPORT FOR THE LAST M ONTH AT F ERMIGRID This shows the daily usage by VOs of the Fermigrid resource over the last month as VO validations were run April 23, NCGS 2009 Chapel Hill

G ANGLIA O UTPUT FOR C LUSTER AT UC Top level information on the servers and compute nodes at a small cluster at the University of Chicago, clicking on hosts will allow more detailed information on each host to be obtained April 23, NCGS 2009 Chapel Hill

I NCREMENTAL U PDATES Frequent (Every 1-4 weeks) Can be done within a single installation Either manually: Turn off services Backup installation directory Perform update (move configuration files, pacman updates, etc.) Re-enable services Or use vdt-updater script (automates the above steps) April 23, NCGS 2009 Chapel Hill

M AJOR U PDATES Irregular (Every 6-12 months) Must be a new installation Can copy configuration from old installation Process: Point to old install Perform new install Turn off old services Turn on new services April 23, NCGS 2009 Chapel Hill

Q UESTIONS ? T HOUGHTS ? C OMMENTS ? Introduction to OSG terms and operations Site Policies and Requirements Installing an OSG site Maintaining a site Q&A time April 23, 2009 NCGS 2009 Chapel Hill 33

A CKNOWLEDGEMENTS Alain Roy Terrence Martin April 23, NCGS 2009 Chapel Hill