Introduction to OSG Fundamentals Suchandra Thapa Computation Institute University of Chicago March 19, 20091GSAW 2009 Clemson.

Slides:



Advertisements
Similar presentations
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Advertisements

Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Services and Operations in Polish NGI M. Radecki,
Andrew McNab - EDG Access Control - 14 Jan 2003 EU DataGrid security with GSI and Globus Andrew McNab University of Manchester
DESIGNING A PUBLIC KEY INFRASTRUCTURE
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
MyOSG: A user-centric information resource for OSG infrastructure data sources Arvind Gopu, Soichi Hayashi, Rob Quick Open Science Grid Operations Center.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
Key Project Drivers - FY11 Ruth Pordes, June 15th 2010.
Rsv-control Marco Mambelli – Site Coordination meeting October 1, 2009.
OSG Public Storage and iRODS
OSG S ITE INSTALLATION AND M AINTENANCE Suchandra Thapa Computation Institute University of Chicago.
SRM at Clemson Michael Fenn. What is a Storage Element? Provides grid-accessible storage space. Is accessible to applications running on OSG through either.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
OSG Site Provide one or more of the following capabilities: – access to local computational resources using a batch queue – interactive access to local.
Integration and Sites Rob Gardner Area Coordinators Meeting 12/4/08.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Publication and Protection of Site Sensitive Information in Grids Shreyas Cholia NERSC Division, Lawrence Berkeley Lab Open Source Grid.
OSG Fundamentals Scot Kronenfeld - Marco Mambelli –
Blueprint Meeting Notes Feb 20, Feb 17, 2009 Authentication Infrastrusture Federation = {Institutes} U {CA} where both entities can be empty TODO1:
G RID M IDDLEWARE AND S ECURITY Suchandra Thapa Computation Institute University of Chicago.
OSG S ITE INSTALLATION AND M AINTENANCE Suchandra Thapa Computation Institute University of Chicago.
Mine Altunay OSG Security Officer Open Science Grid: Security Gateway Security Summit January 28-30, 2008 San Diego Supercomputer Center.
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
J OINING OSG Suchandra Thapa Computation Institute University of Chicago.
Overview of Monitoring and Information Systems in OSG MWGS08 - September 18, Chicago Marco Mambelli - University of Chicago
Evolution of the Open Science Grid Authentication Model Kevin Hill Fermilab OSG Security Team.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
Open Science Grid OSG CE Quick Install Guide Siddhartha E.S University of Florida.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
Introduction to OSG Security Suchandra Thapa Computation Institute University of Chicago March 19, 20091GSAW 2009 Clemson.
Mine Altunay July 30, 2007 Security and Privacy in OSG.
Overview of Privilege Project at Fermilab (compilation of multiple talks and documents written by various authors) Tanya Levshina.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center February.
Trusted Virtual Machine Images a step towards Cloud Computing for HEP? Tony Cass on behalf of the HEPiX Virtualisation Working Group October 19 th 2010.
Portal Update Plan Ashok Adiga (512)
VO Privilege Activity. The VO Privilege Project develops and implements fine-grained authorization to grid- enabled resources and services Started Spring.
Configuring and Troubleshooting Identity and Access Solutions with Windows Server® 2008 Active Directory®
The OSG and Grid Operations Center Rob Quick Open Science Grid Operations Center - Indiana University ATLAS Tier 2-Tier 3 Meeting Bloomington, Indiana.
DTI Mission – 29 June LCG Security Ian Neilson LCG Security Officer Grid Deployment Group CERN.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Ops Portal New Requirements.
WLCG Authentication & Authorisation LHCOPN/LHCONE Rome, 29 April 2014 David Kelsey STFC/RAL.
Open Science Grid Build a Grid Session Siddhartha E.S University of Florida.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
CernVM-FS Infrastructure for EGI VOs Catalin Condurache - STFC RAL Tier1 EGI Webinar, 5 September 2013.
VOX Project Tanya Levshina. 05/17/2004 VOX Project2 Presentation overview Introduction VOX Project VOMRS Concepts Roles Registration flow EDG VOMS Open.
VOX Project Status T. Levshina. 5/7/2003LCG SEC meetings2 Goals, team and collaborators Purpose: To facilitate the remote participation of US based physicists.
Open Science Grid OSG Resource and Service Validation and WLCG SAM Interoperability Rob Quick With Content from Arvind Gopu, James Casey, Ian Neilson,
Tier 3 Support and the OSG US ATLAS Tier2/Tier3 Workshop at UChicago August 20, 2009 Marco Mambelli –
The Great Migration: From Pacman to RPMs Alain Roy OSG Software Coordinator.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
Trusted Virtual Machine Images the HEPiX Point of View Tony Cass October 21 st 2011.
OSG Area Coordinators Meeting Security Team Report Mine Altunay 8/15/2012.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
OSG Facility Miron Livny OSG Facility Coordinator and PI University of Wisconsin-Madison Open Science Grid Scientific Advisory Group Meeting June 12th.
OSG Fundamentals Adapted from: Alain Roy Terrence Martin Suchandra Thapa.
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Grid Colombia Workshop with OSG Week 2 Startup Rob Gardner University of Chicago October 26, 2009.
OSG Fundamentals Alain Roy Terrence Martin Suchandra Thapa.
BeStMan/DFS support in VDT OSG Site Administrators workshop Indianapolis August Tanya Levshina Fermilab.
Nov 05, 2008, PragueSA3 Workshop1 A short presentation from Owen Synge SA3 and dCache.
Leigh Grundhoefer Indiana University
Presentation transcript:

Introduction to OSG Fundamentals Suchandra Thapa Computation Institute University of Chicago March 19, 20091GSAW 2009 Clemson

Overview Introduction to OSG terms and operations Will discuss OSG fundamentals Have about 80 minutes for a 100 minutes timeslot Questions encouraged Q&A time afterwards March 19, 20092GSAW 2009 Clemson

Introduction to OSG OSG stands for Open Science Grid Provides high-throughput computing across US – Currently more than 70 sites – Recent stats: 282,912 jobs for 433,051 hours Used 75 sites Jobs by ~20 different virtual organizations 92% of jobs succeeded Underestimate: 4 sites didn’t report anything – Provides opportunistic computing for VOs Focus on high-throughput computing rather than high performance computing March 19, 20093GSAW 2009 Clemson

Basic Terms CE – Compute Element SE – Storage Element VO – Virtual Organization WN – Worker Node GOC – Grid Operations Center VDT – Virtual Data Toolkit DN – Distinguished name March 19, 20094GSAW 2009 Clemson

Who uses OSG? About 20 virtual organizations – High-energy physics uses a large chunk of OSG – But several other sciences are actively using OSG. nanoHUB: nanotechnology simulations LIGO: detecting gravitational waves CHARMM: molecular dynamics More at _We’re_Doing/Research_Highlights March 19, 2009GSAW 2009 Clemson5

Starting out Everyone using OSG gets a personal certificate because it is required to do any activity on an OSG resource This even applies to troubleshooting your own site! Let’s get one now! March 19, 2009GSAW 2009 Clemson6

Typical usage breakdown March 19, 2009GSAW 2009 Clemson7

VOs OSG is in part organized by Virtual Organizations A VO allows members of a collaboration or group to retain that same grouping on the OSG Try joining a VO now gedu/ March 19, 2009GSAW 2009 Clemson8

Overriding principle: Autonomy Sites and VOs are autonomous – Admins are free to make decisions about site – OSG provides software and recommendations about configuration – Admins are allowed to decided when and if to upgrade – Admins are responsible for site but OSG provides operational support March 19, 2009GSAW 2009 Clemson9

Your role as an admin As a site admin, you should: – Keep in touch with OSG (downtime, security, etc.) – Respond to trouble tickets or inquiries from GOC – Plan your site’s layout – Update software as needed (within limits) – Participate and be a good community member March 19, 2009GSAW 2009 Clemson10

Support provided for admins OSG provides: – Software and ancillary information (configuration tools, documentation, recommendations) – Assistance in keeping site running smoothly – Help in troubleshooting and installing software – Users for your site – An exciting, cutting-edge, 21st-century collaborative distributed computing grid cloud buzzword-compliant environment March 19, 2009GSAW 2009 Clemson11

VDT Stands for Virtual Data Toolkit Team based in Madison at UW-Madison Integrates and provides a large collection of software Provides the software distribution that many US grids use (including OSG) March 19, 2009GSAW 2009 Clemson12

VDT Example GUMS – Authorizes users at a site – Maps global user name to local UID VDT includes dependencies. For example, GUMS needs: March 19, 2009GSAW 2009 Clemson13 ApacheCA Certificates TomcatConfiguration scripts MysqlInfrastructure

OSG Software Stack Consists of: – VDT Software PLUS – Additional OSG Specific bits E.g. CE – VDT Subset Globus RSV PRIMA … and another dozen – OSG bits: Information about OSG VOs OSG configuration script (configure_osg.py) March 19, 2009GSAW 2009 Clemson14

Overview of OSG components CE – Compute Element – Provides point of interface for tools attempting to run jobs or work on a cluster – Users submit jobs to this system – OSG provides a package that installs all software needed for this component SE – Storage Element – Several implementations dCache Bestman – Manages data and storage services on cluster WN – Worker Node – Software found on each compute node on grid – Provides software that incoming jobs may depend on (e.g. curl, srmcp, gsiftp, etc.) Client – Client Software – Provides software that users can use to submit and manage jobs and data on OSG – May be superseded by VO specific software Other tools (more specific and not necessarily used by many people) March 19, 2009GSAW 2009 Clemson15

5000 meter overview of CE GRAM : Allows job submissions and passes them on to local batch manager Gridftp : Provides data transfer services into and out of cluster CEMon / GIP : Provides information to central services Gratia : Sends accounting information on jobs run to central server RSV : Provides probes to monitor health of the CE User authorization : Needed to connect certificates to user accounts March 19, 2009GSAW 2009 Clemson16

Basic CE March 19, 2009GSAW 2009 Clemson17 GRAM GridFTP Authorization RSV CEMon/GIP Submit jobs ? ? Test Query Gratia

GRAM Two different flavors – OSG provides and supports both – Very different implementations GT2 – What most users and VOs use – Very stable and well understood – On the other hand, fairly old GT4 (aka ws-gram) – Web services enabled job submission – Currently in transition – Used primarily by LIGO March 19, 2009GSAW 2009 Clemson18

Gratia Collects information about what jobs have run on your site and by whom Hooks into GRAM and/or job manager Cron job also present Sends information to a central server Can connect and query central service to get reports and graphs Option exists for a local server March 19, 2009GSAW 2009 Clemson19

CEMon / GIP These work together  Essential for accurate information about your site  End-users see this information Generic Information Provider (GIP)  Scripts to scrape information about your site  Some information is dynamic (queue length)  Some is static (site name) CEMon  Reports information to OSG GOC’s BDII  Reports to OSG Resource Selector (ReSS) March 19, 2009GSAW 2009 Clemson20

RSV System to run tests on various components of your site Presents a web page with red/green overview and links to more specific information on test results Optional interface to nagios Can be run on a server other than CE March 19, 2009GSAW 2009 Clemson21

Site planning Bureaucratic details Cluster layout Disk layout / sharing Authorization March 19, 2009GSAW 2009 Clemson22

Bureaucracy Certificates (personal/host) VO registrations Registration with OSG – Need a site name (e.g. UC_ITB) – Need contacts (security, admin, etc.) Site policy on web March 19, 2009GSAW 2009 Clemson23

Site Registration using OIM Let’s register your site Go to Your Account > Your Profile > Update Info Registrations > Resources > Add New Resource For now, use OSG-ITB, use OSG for actual production resource March 19, 2009GSAW 2009 Clemson24

Site planning How is software / data being shared – NFS can work but gets bogged down with larger workloads – Where do services run? Single server vs. dedicated servers – Worker node software? Locally present on worker nodes vs. served over nfs – Certificates shared? March 19, 2009GSAW 2009 Clemson25

More complicated setup March 19, 2009GSAW 2009 Clemson26 GRAM GridFTP CEMon/GIP Submit jobs Gratia GUMS (Authorization service) RSV (For Testing) NFS Server

Required Directories for CE / Cluster OSG_APP: Store VO applications  Must be shared (usually NFS)  Must be writeable from CE, readable from WN  Must be usable by whole cluster OSG_GRID: Stores WN client software  May be shared or installed on each WN  May be read-only (no need for users to write)  Has a copy of CA Certs & CRLs, which must be up to date OSG_WN_TMP: temporary directory on worker node  May be static or dynamic  Must exist at start of job  Not guaranteed to be cleaned by batch system March 19, 2009GSAW 2009 Clemson27

Optional directories for CE OSG_DATA: Data shared between jobs – Must be writable from the worker nodes – Potentially massive performance requirements – Cluster file system can mitigate limitations with this file system – Performance & support varies widely among sites – 0177 permission on OSG_DATA (like /tmp) Squid server: HTTP proxy can assist many VOs and sites in reducing load – Reduces VO web server load – Efficient and reliable for site – Fairly low maintenance – Can help with CRL maintenance on worker nodes March 19, 2009GSAW 2009 Clemson28

Space Requirements Varies between VOs  Some VOs download all data & code per job (may be Squid assisted), and return data to VO per job.  Other VOs use hybrids of OSG_APP and/or OSG_DATA OSG_APP used by several VOs, not all.  1 TB storage is reasonable  Serve from separate computer so heavy use won’t affect other site services. OSG_DATA sees moderate usage.  1 TB storage is reasonable  Serve it from separate computer so heavy use of OSG_DATA doesn’t affect other site services. OSG_WN_TMP is not well managed by VOs and you should be aware of it.  ~100GB total local WN space  ~10GB per job slot. March 19, 2009GSAW 2009 Clemson29

Worker Node Storage Provide about 12GB per job slot Therefore 100GB for quad core 2 socket machine Not data critical, so can use RAID 0 or similar for good performance March 19, 2009GSAW 2009 Clemson30

Authorization Two major setups: – Gridmap setup File with list of mappings between DN and local account Can be generated by edg-mkgridmap script Doesn’t handle users in mulitple VOs or with VOMS roles – Service with list of mappings (GUMS) A little more complicated to setup Centralizes mappings for entire site in single location Handles complex cases better (e.g. blacklisting, roles, multiple VO membership) Preferred for sites with more complex requirements Ideally on dedicated system (can be VM) Can add SAZ service for authorization March 19, 2009GSAW 2009 Clemson31

CE Installation Overview Prerequistes – Certificates – Users Installation – Pacman Configuration Getting things started March 19, 2009GSAW 2009 Clemson32

PKI Certificates Used for all authentication Your site needs PKI certificates  I assume you understand basics  You need a public cert  You need a private key  Often referred to informally, incorrectly as “certificate” Your site needs two certificates  Host certificate  HTTP certificate  Best to get these in advance  Optionally you may need RSV certificate Online documentation on getting them: ertificates March 19, 2009GSAW 2009 Clemson33

Local accounts You need following local accounts User for RSV Daemon account used by most of vdt Globus user is optional but will be used if found March 19, 2009GSAW 2009 Clemson34

Pacman The OSG Software stack is installed with Pacman  Yes, custom installation software Why?  Mostly historical reasons  Makes multiple installations and non-root installations easy Why not?  It’s different from what you’re used to  It sometimes breaks in strange ways  Updates can be difficult Will we always use Pacman?  Maybe  Investigating alternatives but changing existing infrastructure is hard  Work ongoing to support RPM/deb in the future March 19, 2009GSAW 2009 Clemson35

Pacman (part deux) Easy installation – Download pacman – Untar and source shell script – Start using – Look ma! No root! Gotcha: – Installs into current directory March 19, 2009GSAW 2009 Clemson36

Using pacman Let’s try to do a simple pacman installation Try an OSG client installation tarballs/pacman-latest.tar.gz tarballs/pacman-latest.tar.gz pacman -get OSG:client March 19, 2009GSAW 2009 Clemson37

Documentation Twiki – OSG collaborative documentation – Used throughout OSG Installation documentation tation/ March 19, 2009GSAW 2009 Clemson38

Basic installation and configuration Install Pacman  Download  Untar (keep in own directory)  Source setup Make OSG directory  Example: /opt/osg symlink to /opt/osg-1.0 Run pacman commands  Get CE  Get job manager interface Configure  Edit config.ini  Run configure_osg.py  Start services March 19, 2009GSAW 2009 Clemson39

CA Certificates What are they? – Public certificate for certificate authorities – Used to verify authenticity of user certificates Why do you care? – If you don’t have them, users can’t access your site March 19, GSAW 2009 Clemson

Installing CA Certificates The OSG installation will not install CA certificates by default – Users will not be able to access your site! To install CA certificates – Edit a configuration file to select what CA distribution you want vdt-update-certs.conf – Run a script vdt-setup-ca-certificates 41 March 19, 2009GSAW 2009 Clemson

Choices for CA certificates You have two choices: – Recommended: OSG CA distribution IGTF + TeraGrid-only – Optional: VDT CA distribution IGTF only (Eventually) Same as OSG CA (Today) IGTF: Policy organization that makes sure that CAs are trustworthy You can make your own CA distribution You can add or remove CAs March 19, GSAW 2009 Clemson

Why all this effort for CAs? Certificate authentication is the first hurdle for a user to jump through Do you trust all CAs to certify users? – Does your site have a policy about user access? – Do you only trust US CAs? European CAs? – Do you trust the IGTF-accredited Iranian CA? Does the head of your institution? March 19, GSAW 2009 Clemson

Updating CAs CAs are regularly updated – New CAs added – Old CAs removed – Tweaks to existing CAs If you don’t keep up to date: – May be unable to authenticate some user – May incorrectly accept some users Easy to keep up to date – vdt-update-certs Runs once a day, gets latest CA certs March 19, GSAW 2009 Clemson

CA Certificate RPM There is an alternative for CA Certificate installation: RPM – We have an RPM for each CA cert distribution – No deb package yet – Install and keep up to date with yum – Some details not discussed here: read the docs March 19, GSAW 2009 Clemson

Certificate Revocation Lists (CRLs) It’s not enough to have the CAs CAs publish CRLs: lists of certificates that have been revoked – Sometimes revoked for administrative reasons – Sometimes revoked for security reasons You really want up to date CRLs CE provides periodic update of CRLs – Program called fetch-cr – Runs once a day (today) – Will run four times a day (soon) March 19, GSAW 2009 Clemson

Updates We periodically release updates to OSG software stack Announced by VDT team on vdt-discuss mailing list – Not OSG-specific announcement or update procedure Announced by GOC – OSG-specific instructions GSAW 2009 ClemsonMarch 19,

Two kinds of updates Incremental updates Major updates GSAW 2009 ClemsonMarch 19,

Incremental Updates Frequent (Every 1-4 weeks) Can be done within a single installation Process: – Turn off services – Backup installation directory – Perform update – Re-enable services March 19, 2009GSAW 2009 Clemson49

Major Updates Irregular (Every 6-12 months) Must be a new installation Can copy configuration from old installation Process: – Point to old install – Perform new install – Turn off old services – Turn on new services March 19, 2009GSAW 2009 Clemson50

Incremental updates are a mess If you apply each update as it’s available, it’s not too bad. VDT supplies instructions for each update – Sometimes there are picky details – It’s unclear how to do multiple updates at once What order should steps be done in? What are all the appropriate picky details? New updater script being released March 19, GSAW 2009 Clemson

The grand future of updates VDT team is working on improved mechanism for doing updates – Fully scripted to get picky details right – Hopefully easy to keep right, even with multiple updates – Still in progress – Release date – end of this month March 19, GSAW 2009 Clemson

Discussion, Questions Questions? Thoughts? Comments? March 19, GSAW 2009 Clemson

Acknowledgements Alain Roy Terrence Martin March 19, 2009GSAW 2009 Clemson54