Building, Monitoring and Maintaining a Grid. Jorge Luis Rodriguez 2 Grid Summer Workshop 2006, June 26-30 What we’ve already learned –What are grids,

Slides:



Advertisements
Similar presentations
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Advertisements

 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004.
Implementing Finer Grained Authorization in the Open Science Grid Gabriele Carcassi, Ian Fisk, Gabriele, Garzoglio, Markus Lorch, Timur Perelmutov, Abhishek.
Andrew McNab - EDG Access Control - 14 Jan 2003 EU DataGrid security with GSI and Globus Andrew McNab University of Manchester
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
Grid Services at NERSC Shreyas Cholia Open Software and Programming Group, NERSC NERSC User Group Meeting September 17, 2007.
Simo Niskala Teemu Pasanen
Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
HEP Experiment Integration within GriPhyN/PPDG/iVDGL Rick Cavanaugh University of Florida DataTAG/WP4 Meeting 23 May, 2002.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
Building, Monitoring and Maintaining a Grid Jorge Luis Rodriguez University of Florida July 11-15, 2005.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
The Disk Resource Manager A Berkeley SRM for Disk Cache Implementation and Testing Experience on the OSG Jorge Luis Rodriguez IBP/Grid Mini-Workshop Apr.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
The National Grid Cyberinfrastructure Open Science Grid and TeraGrid John-Paul “JP” Navarro TeraGrid Area Co-Director for Software Integration Mike Wilde.
1 Open Science Grid.. An introduction Ruth Pordes Fermilab.
The (US) National Grid Cyberinfrastructure Open Science Grid and TeraGrid.
VOX Project Status T. Levshina. Talk Overview VOX Status –Registration –Globus callouts/Plug-ins –LRAS –SAZ Collaboration with VOMS EDG team Preparation.
May 8, 20071/15 VO Services Project – Status Report Gabriele Garzoglio VO Services Project – Status Report Overview and Plans May 8, 2007 Computing Division,
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
G RID M IDDLEWARE AND S ECURITY Suchandra Thapa Computation Institute University of Chicago.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
VDT 1 The Virtual Data Toolkit 7.th EU DataGrid Internal Project Conference Heidelberg / Germany Todd Tannenbaum (Miron Livny) (Alain.
TeraGrid CTSS Plans and Status Dane Skow for Lee Liming and JP Navarro OSG Consortium Meeting 22 August, 2006.
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The Open Science Grid OSG Ruth Pordes Fermilab. 2 What is OSG? A Consortium of people working together to Interface Farms and Storage to a Grid and Researchers.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Overview of Privilege Project at Fermilab (compilation of multiple talks and documents written by various authors) Tanya Levshina.
Role Based VO Authorization Services Ian Fisk Gabriele Carcassi July 20, 2005.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Yet Another Grid Project: The Open Science Grid at SLAC Matteo Melani, Booker Bense and Wei Yang SLAC Hepix Conference 10/13/05, SLAC, Menlo Park, CA,
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
VO Privilege Activity. The VO Privilege Project develops and implements fine-grained authorization to grid- enabled resources and services Started Spring.
OSG AuthZ components Dane Skow Gabriele Carcassi.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
Alain Roy Computer Sciences Department University of Wisconsin-Madison Condor & Middleware: NMI & VDT.
VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)
The OSG and Grid Operations Center Rob Quick Open Science Grid Operations Center - Indiana University ATLAS Tier 2-Tier 3 Meeting Bloomington, Indiana.
The National Grid Cyberinfrastructure Open Science Grid and TeraGrid.
DTI Mission – 29 June LCG Security Ian Neilson LCG Security Officer Grid Deployment Group CERN.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
MGRID Architecture Andy Adamson Center for Information Technology Integration University of Michigan, USA.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
The National Grid Cyberinfrastructure Open Science Grid and TeraGrid John-Paul “JP” Navarro TeraGrid Area Co-Director for Software Integration Mike Wilde.
VOX Project Status T. Levshina. 5/7/2003LCG SEC meetings2 Goals, team and collaborators Purpose: To facilitate the remote participation of US based physicists.
1 An update on the Open Science Grid for IHEPCCC Ruth Pordes, Fermilab.
Opensciencegrid.org Operations Interfaces and Interactions Rob Quick, Indiana University July 21, 2005.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
OSG Facility Miron Livny OSG Facility Coordinator and PI University of Wisconsin-Madison Open Science Grid Scientific Advisory Group Meeting June 12th.
Defining the Technical Roadmap for the NWICG – OSG Ruth Pordes Fermilab.
Grid Colombia Workshop with OSG Week 2 Startup Rob Gardner University of Chicago October 26, 2009.
Jean-Philippe Baud, IT-GD, CERN November 2007
Open Science Grid Progress and Status
Monitoring and Information Services Technical Group Report
Leigh Grundhoefer Indiana University
Presentation transcript:

Building, Monitoring and Maintaining a Grid

Jorge Luis Rodriguez 2 Grid Summer Workshop 2006, June What we’ve already learned –What are grids, why we want them and who is using them: Intro –Grid Authentication and Authorization –Harnessing CPU cycles with condor –Data Management and the Grid In this lecture –Fabric level infrastructure: Grid building blocks –National Grid efforts in the US The Open Science Grid TeraGrid Introduction

Jorge Luis Rodriguez 3 Grid Summer Workshop 2006, June Computational Clusters Storage Devices Networks Grid Resources and Layout: –User Interfaces –Computing Elements –Storage Elements –Monitoring Infrastructure… Grid Building Blocks

Jorge Luis Rodriguez 4 Grid Summer Workshop 2006, June Dell Cluster at the University of Florida High Performance Computing Center (Phase I) Computer Clusters Cluster Management “frontend” Tape Backup robots I/O Servers typically RAID fileserver Disk Arrays The bulk are Worker Nodes A few Headnodes, gatekeepers and other service nodes

Jorge Luis Rodriguez 5 Grid Summer Workshop 2006, June A Typical Cluster Installation Network Switch Pentium III Head Node/Frontend Server Pentium III Worker Nodes WAN  Cluster Management OS Deployment Configuration Many options ROCKS (kickstart) OSCAR (sys imager) Sysconfig  Computing Cycles  Data Storage  Connectivity I/O Node + Storage

Jorge Luis Rodriguez 6 Grid Summer Workshop 2006, June Networking Internal Networks (LAN) –Private, accessible only to servers inside a facility –Some sites allow outbound connectivity via Network Address Translation –Typical technologies used Ethernet (0.1, 1 & 10 Gbps) HP, Low Latency interconnects –Myrinet: 2, 10 Gbps –Infiniband: max at 120Gbps External connectivity –Connection to Wide Area Network –Typically achieved via same switching fabric as internal interconnects Network Switch Pentium III Head Node/Frontend Server Pentium III Worker Nodes WAN “one planet one network” Global Crossing I/O Node + Storage

Jorge Luis Rodriguez 7 Grid Summer Workshop 2006, June The Wide Area Network Ever increasing network capacities are what make grid computing possible, if not inevitable The Global Lambda Integrated Facility for Research and Education (GLIF)

Jorge Luis Rodriguez 8 Grid Summer Workshop 2006, June 26-30

Jorge Luis Rodriguez 9 Grid Summer Workshop 2006, June Batch scheduling systems –Submit many jobs through a head node #!/bin/sh for each i in $list_o_jobscripts do /usr/local/bin/condor_submit $i done –Execution done on worker nodes Many different batch systems are deployed on the grid –condor (highlighted in lecture 5) –pbs, lsf, sge… Primary means of controlling CPU usage, enforcing allocation policies and scheduling of jobs on the local computing infrastructure Computation on a Clusters Network Switch Pentium III Head Node/Frontend Server Pentium III Worker Nodes WAN I/O Node + Storage

Jorge Luis Rodriguez 10 Grid Summer Workshop 2006, June Computation on Super Computers What is a super computer? –Machines with lots of share memory and large number of symmetric multiprocessors –Also large farms with low latency interconnects… Applications tailored to a specific to Super computer, class or hardware –Hardware optimized applications –Massively parallel jobs on large SMP machines –Also Message Passing Interface treat cluster with fast interconnects as SMP machine

Jorge Luis Rodriguez 11 Grid Summer Workshop 2006, June Storage Devices Many hardware technologies deployed from: Single fileserver Linux box with lots of disk: RAID 5… Typically used for work space and temporary space a.k.a. local or “tactical” storage to Large Scale Mass Storage Systems Large peta-scale disk + tape robots systems Ex: FNAL’s Enstore MSS –dCache disk frontend –Powderhorn tape backend Typically used as permanent stores “strategic” storage StorageTek Powderhorn Tape Silo

Jorge Luis Rodriguez 12 Grid Summer Workshop 2006, June Tactical Storage Typical Hardware Components –Servers: Linux, RAID controllers… –Disk Array IDE, SCSI, Fiber Channel attached RAID levels 5, 0, 50, 1… Local Access –Volumes mounted across compute cluster nfs, gpfs, afs… –Volume Virtualization dCache pnfs Remote Access –gridftp: globus-url-copy –SRM interface space reservation request scheduling Network Switch Pentium III Head Node/Frontend Server Pentium III Worker Nodes WAN /share/DATA = nfs:/tmp1 /share/TMP = nfs:/tmp2 /share/DATA = nfs:/tmp1 /share/TMP = nfs:/tmp2 /tmp1 /tmp2

Jorge Luis Rodriguez 13 Grid Summer Workshop 2006, June Layout of Typical Grid Site Computing Fabric Grid Middleware Grid Level Services + + => A Grid Site => globus Compute Element Storage Element User Interface Authz server Monitoring Element Monitoring Clients Services Data Management Services Grid Operations The GridThe Grid globus

Jorge Luis Rodriguez 14 Grid Summer Workshop 2006, June World Grid Resources TeraGrid + OSG + EGEE sites

Jorge Luis Rodriguez 15 Grid Summer Workshop 2006 June National Grid Infrastructure The Open Science Grid & The TeraGrid

Jorge Luis Rodriguez 16 Grid Summer Workshop 2006, June Grid Resources in the US Origins: –National Grid (iVDGL, GriPhyN, PPDG) and LHC Software & Computing Projects Current Compute Resources: –61 Open Science Grid sites –Connected via Inet2, NLR.... from 10 Gbps – 622 Mbps –Compute & Storage Elemets –All are Linux clusters –Most are shared Campus grids Local non-grid users –More than 10,000 CPUs A lot of opportunistic usage Total computing capacity difficult to estimate Same with Storage Origins: –National Grid (iVDGL, GriPhyN, PPDG) and LHC Software & Computing Projects Current Compute Resources: –61 Open Science Grid sites –Connected via Inet2, NLR.... from 10 Gbps – 622 Mbps –Compute & Storage Elemets –All are Linux clusters –Most are shared Campus grids Local non-grid users –More than 10,000 CPUs A lot of opportunistic usage Total computing capacity difficult to estimate Same with Storage Origins: –National Super Computing Centers, funded by the National Science Foundation Current Compute Resources: –9 TeraGrid sites –Connected via dedicated multi- Gbps links –Mix of Architectures ia64, ia32: LINUX Cray XT3 Alpha: True 64 SGI SMPs – Resources are dedicated but Grid users share with local and grid users 1000s of CPUs, > 40 TeraFlops –100s of TeraBytes Origins: –National Super Computing Centers, funded by the National Science Foundation Current Compute Resources: –9 TeraGrid sites –Connected via dedicated multi- Gbps links –Mix of Architectures ia64, ia32: LINUX Cray XT3 Alpha: True 64 SGI SMPs – Resources are dedicated but Grid users share with local and grid users 1000s of CPUs, > 40 TeraFlops –100s of TeraBytes The TeraGrid The OSG

Jorge Luis Rodriguez 17 Grid Summer Workshop 2006, June The Open Science Grid A consortium of universities and National Laboratories to build a sustainable grid infrastructure for science in the U.S. …

Jorge Luis Rodriguez 18 Grid Summer Workshop 2006, June AstroPhysics LIGO VO The Open Science Grid UW Campus Grid Tier2 site A OSG Operations BNL cluster FNAL cluster User Communities Biology nanoHub HEP Physics CMS VO HEP Physics CMS VO HEP Physics CMS VO HEP Physics CMS VO Astromomy SDSS VO Astronomy SDSS VO Nanotech nanoHub AstroPhysics LIGO VO Astrophysics LIGO VO OSG Resource Providers VO support center RP support center VO support center VO support center A RP support center RP support center A UW Campus Grid Dep. cluste r Dep. cluste r Dep. cluste r Dep. cluste r Virtual Organization (VO): Organization composed of institutions, collaborations and individuals, that share a common interest, applications or resources. VOs can be both consumers and providers of grid resources.

Jorge Luis Rodriguez 19 Grid Summer Workshop 2006, June The OSG: A High Level View Grid Software and ENV Deployment OSG Provisioning Authorization, Accounting and Authentication OSG Privilege Grid Monitoring and Information Systems OSG Monitoring and Information Grid Operations User & Facilities Support OSG Operations

Jorge Luis Rodriguez 20 Grid Summer Workshop 2006 June OSG Authentication, Authorization & Accounting “Authz”

Jorge Luis Rodriguez 21 Grid Summer Workshop 2006, June Authentication & Authorization Authentication: Verify that you are who you say you are –OSG users typically use the DOEGrids CA –OSG sites also accept CAs from LCG and other organizations including TeraGrid Authorization: Allow a particular user to use a particular resource –Method based on flat files (the gridmap-file) –Privilege method, used primarily at US-LHC sites

Jorge Luis Rodriguez 22 Grid Summer Workshop 2006, June OSG Authentication (1) The gridmap-file –Physical mapping of users Distinguished Name (DN) to local Unix account "/C=CH/O=CERN/OU=GRID/CN=Laurence Field 3171" ivdgl "/C=CH/O=CERN/OU=GRID/CN=Michela Biglietti 4798" usatlas1 "/C=CH/O=CERN/OU=GRID/CN=Shulamit Moed 9840" usatlas1 "/C=ES/O=DATAGRID-ES/O=PIC/CN=Andreu Pacheco Pages" cdf "/C=FR/O=CNRS/OU=LPNHE/CN=Antonio cdf "/C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Daniele cdf "/C=IT/O=INFN/OU=Personal Certificate/L=CNAF/CN=Subir Sarkar" cdf "/C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Ignazio Lazzizzera" cdf "/C=IT/O=INFN/OU=Personal Certificate/L=Pisa/CN=Armando Fella" cdf "/C=IT/O=INFN/OU=Personal Certificate/L=Roma 1/CN=Daniel Jeans" cdf "/C=IT/O=INFN/OU=Personal Certificate/L=Trieste/CN=stefano cdf "/C=UK/O=eScience/OU=Birmingham/L=ParticlePhysics/CN=carlo nicola colacino" ligo rmingham/L=ParticlePhysics/CN=chris messenger" ligo rmingham/L=ParticlePhysics/CN=virginia re" ligo rmingham/L=ParticlePhysics/CN=virginia re 0C74" ligo

Jorge Luis Rodriguez 23 Grid Summer Workshop 2006, June OSG Authentication (2) vomss://grid03.uits.indiana.edu.…/ivdglpl edg-mkgridmap.sh vomss://lcg-voms.cern.ch:8443/voms/cms vomss://voms.fnal.gov:8443/voms/nanohub CMS DNs user DNs Grid site A Grid site B Grid site N CMS VOMS nanHub VOMS OSG VOMS gridmap-file VOMS= Virtual Organization Management System DN=Distinguished Name edg= European Data Grid (EU grid project)

Jorge Luis Rodriguez 24 Grid Summer Workshop 2006, June The Privilege Project Application of a Role Based Access Control model for OSG An advanced authorization mechanism

Jorge Luis Rodriguez 25 Grid Summer Workshop 2006, June The Privilege Project Provides A more flexible way to assign DNs to local UNIX qualifiers, (uid, gid…) –VOMSes are still used to store grid identities –But gone are the static gridmap-files –voms-proxy-init replaces grid-proxy-init Allows a user to specify a role along with unique ID –Access rights granted based on user’s VO membership User selected role(s) Grid Identity Unix ID Certificate DN  Role(s)  Grid Identity Unix ID Certificate DN  Role(s)  |  UID

Jorge Luis Rodriguez 26 Grid Summer Workshop 2006, June Server with VDT 1.3 based on GT3.2 Server with VDT>1.3 based on gt3.2 Client server (UI) Web-Service Container Privilege Project Components VOMS Server Servers with VDT > 1.3 based on gt3.2 Gridmap callout gridFTP & Gate- keeper job-manager PRIMA module 6. instantiates GUMS Identity Mapping Service (manages user accounts on resources, incl. dynamic allocation) 3. Standard globus-job-runrequest with VOMS-extended proxy Client tool for role selection: VOMS-proxy-init 1. VOMS-proxy-init request with specified role 2. Retrieves VO membership and role attribute User Management (VOMSRS) SAML Statement: Decision=Permit, with obligation local UID=xyz, GID=xyz 5. HTTPS/SOAP Response: May user “Markus Lorch” of “VO=USCMS / Role=prod” access this resource? 4. HTTPS/SOAP Request: SAML Query: VO membership synchronization VOMS Attribute Repository An OSG site A VO service

Jorge Luis Rodriguez 27 Grid Summer Workshop 2006 June OSG Grid Monitoring

Jorge Luis Rodriguez 28 Grid Summer Workshop 2006, June OSG Grid Monitoring stor_stat Ganglia … GIP job_state Monitoring information DataBase Collector others… MonALISA Discovery Service ACDC GINI, SOAP, WDSL… GRAM: jobman-mis https: Web Services GridCat MonALISA server MIS-Core Infrastructure MDS Monitoring Information Consumer API Historical information DataBase Site Level Infrastructure Grid Level Clients

Jorge Luis Rodriguez 29 Grid Summer Workshop 2006, June OSG MDS: GIP and BDII The Generic Information Provider (GIP) –Collects & formats information for a site’s GRIS –Integrated with other OSG MIS systems The Berkley Database Information System (BDII) –LDAP information repository of GLUE information collected from a site’s GRIS –GRIS is part of the globus’ MDS information system –EGEE Interoperability between OSG & EGEE grids

Jorge Luis Rodriguez 30 Grid Summer Workshop 2006, June OSG Grid Level Clients Tools provide basic information about OSG resources –Resource catalog: official tally of OSG sites –Resource discovery: what services are available, where are they and how do I access it –Metrics Information: Usage of resources over time Used to asses scheduling priorities –Where and when should I send my jobs? –Where can I put my output? Used to monitor health and status of the Grid

Jorge Luis Rodriguez 31 Grid Summer Workshop 2006, June GridCat Functions as: OSG Site Catalog Site Basic Functionality Tests

Jorge Luis Rodriguez 32 Grid Summer Workshop 2006, June MonALISA

Jorge Luis Rodriguez 33 Grid Summer Workshop 2006 June OSG Provisioning: Grid Middleware & ENV Deployment OSG Software Cache OSG Meta Packager

Jorge Luis Rodriguez 34 Grid Summer Workshop 2006, June The OSG ENV Provide access to grid middleware ($GRID) –On the gatekeeper node via shared space –Local disk on the worker node via wn-client.pacman OSG “tactical” or local storage directories –$APP: global, where you install applications –$DATA: global, write job output staging area –SITE_READ/SITE_WRITE: global, but on a Storage Element on site –$WN_TMP: local to Worker Node, available to job

Jorge Luis Rodriguez 35 Grid Summer Workshop 2006, June The OSG Software Cache Most software comes from the Virtual Data Toolkit (VDT) OSG components include –VDT configuration scripts –Some OSG specific packages too Pacman is the OSG Meta-packager –This is how we deliver the entire cache to Resource Providers

Jorge Luis Rodriguez 36 Grid Summer Workshop 2006, June What is The VDT ? A collection of software –Grid software –Virtual data software –Utilities An easy installation mechanism –Goal: Push a button, everything just works –Two methods: Pacman: installs and configures it all RPM: installs some of the software, but no configuration A support infrastructure –Coordinate bug fixing –Help desk

Jorge Luis Rodriguez 37 Grid Summer Workshop 2006, June Condor Group Condor/Condor-G DAGMan Fault Tolerant Shell ClassAds NeST Globus (pre WS & GT4 WS) Job submission (GRAM) Information service (MDS) Data transfer (GridFTP) Replica Location (RLS) EDG & LCG Make Gridmap Cert. Revocation list updater Glue & Gen. Info. provider VOMS Condor Group Condor/Condor-G DAGMan Fault Tolerant Shell ClassAds NeST Globus (pre WS & GT4 WS) Job submission (GRAM) Information service (MDS) Data transfer (GridFTP) Replica Location (RLS) EDG & LCG Make Gridmap Cert. Revocation list updater Glue & Gen. Info. provider VOMS What is in the VDT? (A lot!) ISI & UC Chimera & Pegasus NCSA MyProxy GSI OpenSSH UberFTP LBL PyGlobus Netlogger DRM Caltech MonALISA jClarens (WSR) VDT VDT System Profiler Configuration software ISI & UC Chimera & Pegasus NCSA MyProxy GSI OpenSSH UberFTP LBL PyGlobus Netlogger DRM Caltech MonALISA jClarens (WSR) VDT VDT System Profiler Configuration software US LHC GUMS PRIMA Others KX509 (U. Mich.) Java SDK (Sun) Apache HTTP/Tomcat MySQL Optional packages Globus-Core {build} Globus job-manager(s) US LHC GUMS PRIMA Others KX509 (U. Mich.) Java SDK (Sun) Apache HTTP/Tomcat MySQL Optional packages Globus-Core {build} Globus job-manager(s) Condor Group Condor/Condor-G DAGMan Fault Tolerant Shell ClassAds NeST Globus (pre WS & GT4 WS) Job submission (GRAM) Information service (MDS) Data transfer (GridFTP) Replica Location (RLS) EDG & LCG Make Gridmap Cert. Revocation list updater Glue & Gen. Info. provider VOMS Condor Group Condor/Condor-G DAGMan Fault Tolerant Shell ClassAds NeST Globus (pre WS & GT4 WS) Job submission (GRAM) Information service (MDS) Data transfer (GridFTP) Replica Location (RLS) EDG & LCG Make Gridmap Cert. Revocation list updater Glue & Gen. Info. provider VOMS ISI & UC Chimera & Pegasus NCSA MyProxy GSI OpenSSH UberFTP LBL PyGlobus Netlogger DRM Caltech MonALISA jClarens (WSR) VDT VDT System Profiler Configuration software ISI & UC Chimera & Pegasus NCSA MyProxy GSI OpenSSH UberFTP LBL PyGlobus Netlogger DRM Caltech MonALISA jClarens (WSR) VDT VDT System Profiler Configuration software US LHC GUMS PRIMA Others KX509 (U. Mich.) Java SDK (Sun) Apache HTTP/Tomcat MySQL Optional packages Globus-Core {build} Globus job-manager(s) US LHC GUMS PRIMA Others KX509 (U. Mich.) Java SDK (Sun) Apache HTTP/Tomcat MySQL Optional packages Globus-Core {build} Globus job-manager(s) Core software User Interface Computing Element Storage Element Authz System Monitoring System

Jorge Luis Rodriguez 38 Grid Summer Workshop 2006, June Pacman Pacman is: –a software environment installer (or Meta-Packager) –a language for defining software environments –an interpreter that allows creation, installation, configuration, update, verification and repair of installation environments –takes care of dependencies Pacman makes installation of all types of software easy LCG/Scram ATLAS/CMT Globus/GPT Nordugrid/RPM LIGO/tar/make D0/UPS-UPD CMS DPE/tar/make NPACI/TeraGrid/tar/make OpenSource/tar/make Commercial/tar/make % pacman –get OSG:CE Enables us to easily and coherently combine and manage software from arbitrary sources. ATLAS NPACI D-Zero iVDGL UCHEP VDT CMS/DPE LIGO Enables remote experts to define installation config updating for everyone at once.

Jorge Luis Rodriguez 39 Grid Summer Workshop 2006, June Pacman Installation 1.Download Pacman – 2.Install the “package” –cd –pacman -get OSG:OSG_CE_0.2.1 –ls condor/ globus/ post-install/ setup.sh edg/ gpt/ replica/ vdt/ ftsh/ perl/ setup.csh vdt-install.log /monalisa...

Jorge Luis Rodriguez 40 Grid Summer Workshop 2006 June OSG Operations

Jorge Luis Rodriguez 41 Grid Summer Workshop 2006, June Grid Operations Do this as part of a National distributed system Monitoring and Maintaining the Health of the Grid User support Application support VO issues Monitoring Grid status –Use of Grid monitors and verification routines Report, route and track problems and resolution –Trouble ticket system Repository of resource contact information

Jorge Luis Rodriguez 42 Grid Summer Workshop 2006, June Operations Model in OSG

Jorge Luis Rodriguez 43 Grid Summer Workshop 2006, June Ticket Routing in OSG OSG infrastructure SC private infrastructure User in VO1 notices problem at RP3, notifies their SC (1). SC-C opens ticket (2) and assigns to SC-F. SC-F gets automatic notice (3) and contacts RP3 (4). Admin at RP3 fixes and replies to SC-F (5). SC-F notes resolution in ticket (6). SC-C gets automatic notice of update to ticket (7). SC-C notifies user of resolution (8). User confirms resolution (9). SC-C closes ticket (10). SC-F gets automatic notice of closure (11). SC-F notifies RP3 of closure (12).

Jorge Luis Rodriguez 44 Grid Summer Workshop 2006, June OSG Integration Test Bed A grid for development of the OSG You will use ITB sites in the exercises today FIUPG Site on OSG

Jorge Luis Rodriguez 45 Grid Summer Workshop 2006, June The TeraGrid “The world’s largest collection of supercomputers” Slides courtesy of Jeffrey Gardner & Charlie Catllet

Jorge Luis Rodriguez 46 Grid Summer Workshop 2006, June TeraGrid: A High Level View Grid Software and ENV Deployment CTSS Authorization, Accounting and Authentication TG Allocation and Accounting Grid Monitoring and Information Systems MDS4 & Inca User & Facilities Support Help desk/Portal and ASTA

Jorge Luis Rodriguez 47 Grid Summer Workshop 2006 June TeraGrid Allocation & Accounting

Jorge Luis Rodriguez 48 Grid Summer Workshop 2006, June TeraGrid Allocation Researchers request “allocation of resource” through formal process –Process works similarly as that for submitting a NSF grant proposal –There are eligibility requirements US faculty member or researcher for an non-profit organization Principle Investigators submits CV More… –Description of research, requirements etc. –Proposal is peer reviewed by allocation committees: DAC: Development Allocation Committee MRAC: Medium Resource Allocation Committee LRAC: Large Resource Allocation Committee

Jorge Luis Rodriguez 49 Grid Summer Workshop 2006, June Authentication, Authorization & Accounting TG Authentication & Authorization is automatic –User accounts are created when allocation is granted –Resources can be accessed through: ssh: via password, ssh keys Grid access: via GSI mechanism (grid-mapfile, proxies…) –Accounts created across TG sites users in allocation Accounting system is oriented towards TG Allocation Service Units (ASU) –Accounting system is well defined and monitored closely –Each TG sites is responsible for its own accounting

Jorge Luis Rodriguez 50 Grid Summer Workshop 2006 June TeraGrid Monitoring and Validation

Jorge Luis Rodriguez 51 Grid Summer Workshop 2006, June TeraGrid and MDS4 Information providers: –Collect information from various sources Local batch system; Torque, PBS Cluster monitoring; ganglia, Clumon… Spits out XML in a standard schema (attribute value pairs) Information is collected into local Index service Global TG wide Index collector with WebMDS Site1 GT4 Container WS- GRAM MDS4 Index Clumon PBS Site2 GT4 Container WS- GRAM MDS4 Index Ganglia Torque TG Wide Index WebMDS Browser Application

Jorge Luis Rodriguez 52 Grid Summer Workshop 2006, June Inca: TeraGrid Monitoring… Inca is a framework for the automated testing, benchmarking and monitoring of Grid resource –Periodic scheduling of information gathering –Collects and archives site status information –Site validation & verification Checks site services & deployment Checks software stack & environment –Inca can also site performance measurements

Jorge Luis Rodriguez 53 Grid Summer Workshop 2006 June TeraGrid Grid Middleware & Software Environment

Jorge Luis Rodriguez 54 Grid Summer Workshop 2006, June The TeraGrid Environment SoftEnv: all software on TG can be accessed via keys defined in $HOME/.soft SoftEnv system is user configurable Environment can also be accessed at run time for WS GRAM jobs You will be interacting with SoftEnv during the exercises later today

Jorge Luis Rodriguez 55 Grid Summer Workshop 2006, June TeraGrid Software: CTSS CTSS: Coordinated TeraGrid Software Service –A suite of software packages that includes globus toolkit, condor-g, myproxy, openssh… –Installed at every TG site

Jorge Luis Rodriguez 56 Grid Summer Workshop 2006, June TeraGrid User & Facility Support The TeraGrid Help desk –Central location for user support –Routing of trouble tickets TeraGrid portal: –User’s view of TG Resources Allocations… –Access to Docs!

Jorge Luis Rodriguez 57 Grid Summer Workshop 2006, June TeraGrid’s ASTA Program Advanced Support for TeraGrid Application –Help application scientists with TG resources –Associates one or more TG staff with application scientists Sustained effort A minimum of 25% FTE –Goal Maximize effectiveness of application software & TeraGrid resources

Jorge Luis Rodriguez 58 Grid Summer Workshop 2006 June Topics Not Covered Managed Storage Grid Scheduling More

Jorge Luis Rodriguez 59 Grid Summer Workshop 2006, June Managing Storage Problems: –No real good way to control the movement of files into and out of site Data is staged by fork processes! Anyone with access to the site can submit such a request and swamp the server –There is also no space allocation control A grid user can dump files of any size on a resource If users do not cleanup sys, admin have to intervene These can easily overwhelm a resource

Jorge Luis Rodriguez 60 Grid Summer Workshop 2006, June Managing Storage A Solution: SRM ( Storage Resource Manager ) Grid enabled interface to put data on a site –Provides scheduling of data transfer requests –Provides reservation of storage space Technologies in the OSG pipeline –dCache/SRM (disk cache with SRM) Provided by DESY & FNAL SE(s) available to OSG as a service from the USCMS VO –DRM (Disk Resource Manager) Provided by LBL Can be added on top of a normal UNIX file system $> globus-url-copy srm://ufdcache.phys.ufl.edu/cms/foo.rfz \ gsiftp://cit.caltech.edu/data/bar.rfz

Jorge Luis Rodriguez 61 Grid Summer Workshop 2006, June Grid Scheduling The problem: With job submission this still happens! Grid Site B Grid Site A User Interface VDT Client ? Grid Site X Why do I have to do this by hand? Why do I have to do this by hand?

Jorge Luis Rodriguez 62 Grid Summer Workshop 2006, June Grid Scheduling Possible Solutions –Sphinx (GriPhyN, UF) Work flow based dynamic planning (late binding) Policy based scheduling More details ask Laukik –Pegasus (GriPhyN, ISI/UC) DAGman based planner and Grid scheduling (early binding) More details in Work Flow –Resource Broker (LCG) Match maker based Grid scheduling Employed by application running on LCG Grid resources

Jorge Luis Rodriguez 63 Grid Summer Workshop 2006, June Much Much More is Needed Continue the hardening of middleware and other software components Continue the process of federating with other Grids –OSG with TeraGrid –OSG with LHC/EGEE, NordiGrid… Continue to synchronize the Monitoring and Information Service Infrastructure Improve documentation

Jorge Luis Rodriguez 64 Grid Summer Workshop 2006, June Conclude with a simple example 1.Log on to a User Interface; 2.Get your grid proxy “logon to the grid” grid-proxy-init 3.Check OSG MIS clients To get list of available sites: depends on your VO affiliation To discover site specific information needed by your job ie, Available services: hostname, port numbers Tactical storage location: $app, $data, $tmp, $wntmp 4.Install your application bins at selected sites 5.Submit your jobs to selected sites via condor-G 6.Check OSG MIS clients to see if jobs have completed 7.Do something like this: If [ 0 ] then echo “Have a coffee (beer, margarita…)” else echo “its going to be a long night” fi

Jorge Luis Rodriguez 65 Grid Summer Workshop 2006, June To learn more: The Open Science Grid top level page – The TeraGrid top level page – The TeraGrid portal – The globus website – The iVDGL website – The GriPhyN website –

Jorge Luis Rodriguez 66 Grid Summer Workshop 2006 June The End

Jorge Luis Rodriguez 67 Grid Summer Workshop 2006, June Data the TG Gridftp is available at all sites: –Provides: GSI on control and data channels Parallel streams third party transfers Stripped –Each TG sites has 1 to several dedicated GridFTP enabled servers TeraGrida sites are equiped with various gridftp clients globus-url-copy –Standard globus gridftp clients (see lectures) uberftp –interactive GridFTP client. supports GSI authentication, parallel file transfers. tgcp –wrapper for globus-url-copy (optimized tcp buffer sizes…parallel streams…) –Interfaced with RFT (Reliable Transfer Service), performs third party transfers make sure files gets to destination see lectures?

Jorge Luis Rodriguez 68 Grid Summer Workshop 2006 June Based on: Building, Monitoring and Maintaining a Grid Jorge Luis Rodriguez University of Florida June 26-30, 2006