NorduGrid Architecture EDG ATF Meeting CERN – June 12 th 2002 Anders Wäänänen.

Slides:



Advertisements
Similar presentations
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Advertisements

Andrew McNab - Manchester HEP - 2 May 2002 Testbed and Authorisation EU DataGrid Testbed 1 Job Lifecycle Software releases Authorisation at your site Grid/Web.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
NorduGrid Grid Manager developed at NorduGrid project.
Andrew McNab - EDG Access Control - 14 Jan 2003 EU DataGrid security with GSI and Globus Andrew McNab University of Manchester
A Computation Management Agent for Multi-Institutional Grids
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
The NorduGrid Toolkit: Overview and architecture The 4 th NorduGrid Workshop November 11 th 2002, Uppsala Anders Wäänänen.
Andrew McNab - Manchester HEP - 6 November Old version of website was maintained from Unix command line => needed (gsi)ssh access.
Swedish participation in DataGrid and NorduGrid Paula Eerola SWEGRID meeting,
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Grid Computing Reinhard Bischof ECFA-Meeting March 26 th 2004 Innsbruck.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
The NorduGrid project: Using Globus toolkit for building Grid infrastructure presented by Aleksandr Konstantinov Mattias Ellert Aleksandr Konstantinov.
1 Deployment of an LCG Infrastructure in Australia How-To Setup the LCG Grid Middleware – A beginner's perspective Marco La Rosa
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Overview of the NorduGrid Information System Balázs Kónya 3 rd NorduGrid Workshop 23 May, 2002, Helsinki.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Introduction to NorduGrid ARC / Arto Teräs Slide 1(16) Introduction to NorduGrid ARC Arto Teräs Free and Open Source Software Developers' Meeting.
Andrew McNab - Manchester HEP - 5 July 2001 WP6/Testbed Status Status by partner –CNRS, Czech R., INFN, NIKHEF, NorduGrid, LIP, Russia, UK Security Integration.
KNMI Applications on Testbed 1 …and other activities.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Support for system administrators Feedback from site admins after Testbed1 experience 05/03/2002 4th EDG WS Paris G. Merino, IFAE.
NorduGrid Architecture and tools CHEP2003 – UCSD Anders Wäänänen
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
C. Loomis – Integration Status- October 31, n° 1 Integration Status October 31, 2001
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
Quick Introduction to NorduGrid Oxana Smirnova 4 th Nordic LHC Workshop November 23, 2001, Stockholm.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
DataGRID WPMM, Geneve, 17th June 2002 Testbed Software Test Group work status for 1.2 release Andrea Formica on behalf of Test Group.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
August 30, 2002Jerry Gieraltowski Launching ATLAS Jobs to either the US-ATLAS or EDG Grids using GRAPPA Goal: Use GRAPPA to launch a job to one or more.
Andrew McNab - Manchester HEP - 17 September 2002 UK Testbed Deployment Aim of this talk is to the answer the questions: –“How much of the Testbed has.
ATLAS Data Challenge on NorduGrid CHEP2003 – UCSD Anders Wäänänen
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
Application examples Oxana Smirnova (Lund, EPF) 3 rd NorduGrid Workshop, May23, 2002.
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
Tests at Saclay D. Calvet, A. Formica, Z. Georgette, I. Mandjavidze, P. Micout DAPNIA/SEDI, CEA Saclay Gif-sur-Yvette Cedex.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
10 March Andrey Grid Tools Working Prototype of Distributed Computing Infrastructure for Physics Analysis SUNY.
CERN Certification & Testing LCG Certification & Testing Team (C&T Team) Marco Serra - CERN / INFN Zdenek Sekera - CERN.
Bob Jones – Project Architecture - 1 March n° 1 Project Architecture, Middleware and Delivery Schedule Bob Jones Technical Coordinator, WP12, CERN.
Consorzio COMETA - Progetto PI2S2 UNIONE EUROPEA Grid2Win : gLite for Microsoft Windows Elisa Ingrà - INFN.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
The EDG Testbed Deployment Details
Oxana Smirnova, Jakob Nielsen (Lund University/CERN)
U.S. ATLAS Grid Production Experience
Leigh Grundhoefer Indiana University
Presentation transcript:

NorduGrid Architecture EDG ATF Meeting CERN – June 12 th 2002 Anders Wäänänen

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 2 Overview u Project overview u Architecture u Features u Future plans and directions

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 3 Project Overview u Launched in spring 2001, with the aim of creating a Grid infrastructure in the Nordic countries u Partners from Denmark, Norway, Sweden, and Finland u Meant to be the Nordic branch of the EU DataGrid (EDG) project testbed u Relies on very limited human resources (3 full-time researchers, few part-time ones) with funding from NorduNet2

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 4 Resources & organization u 5 dedicated Linux test-clusters (3 to 5 CPUs each) + a couple of stand- alone machines + a couple of “real” production clusters u Good connectivity provided by the NORDUNet network u The steering group and the technical working group, 5 persons each u Most of the communications are done via the Internet or phone conferences; the technical working group convenes bi-monthly at different sites u Plenary workshops twice a year (last in Helsinki in May)

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 5 People in NorduGrid u Technical Group n Aleksandr Konstantinov n Balázs Kónya n Mattias Ellert n Oxana Smirnova n Anders Wäänänen u Application (ATLAS) support n Jakob Langgard

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 6 NorduGrid involvement in EDG u WP2 – GDMP/Replica u WP3 – MDS, Schema u WP5 – Castor u WP6 – Integration Team, Testbeds, Bugzilla, Globus Configuration u WP8 – User requirements, testing u WP12 – EDG License u Certificate Authority, Security, Authorization u Testing and bug fixes

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 7 The development u Initial Hardware installed by June 2001 u NorduGrid Authentication System was put into operation in May 2001 u The first middleware was deployed and the sites were Grid-enabled by July 2001 u Further Grid services were put into operation (November-December 2001): n NorduGrid User Management System (Virtual Organization) n NorduGrid Information System n Grid Data Mirroring Package (GDMP) n Data replication catalog u Deployment & evaluation of the first (Testbed 1) release of the EDG Middleware (December-January)

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 8 Philosophy u Resource owners have full control over their resources u Installation details should not be dictated n Method, OS version, configuration, etc… u As little restriction on configuration as possible n Compute nodes should not be required to be on the public network u NorduGrid software should be able to use existing system and Globus installation n Globus RPMs provided

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 9 ProjectTimeline u June Initial Hardware installed u May NorduGrid Authentication System was put into operation u July The first middleware was deployed and the sites were Grid- enabled u November – December Further Grid services were put into operation: n NorduGrid User Management System (Virtual Organization) n NorduGrid Information System n Grid Data Mirroring Package (GDMP) n Data replication catalog u December – January Deployment & evaluation of the first release of the EDG middleware (Testbed 1)

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 10 Facing Reality u NorduGrid was only an 18 months project compared to 3 years for EU DataGrid u Expected to run the ATLAS Data Challenge on a working Grid testbed in May 2002 in the Nordic countries u Continuing problems with EDG testbed stability u Architecture problems with bottlenecks and fragile system components u The urgent need to have something stable and working resulted in the decision to create a new architecture not necessarily compatible with EDG

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 11 A Job Submission Example UI JDL Logging & Book-keeping ResourceBroker Output “sandbox” Input “sandbox” Job Submission Service StorageElement ComputeElement Brokerinfo Output “sandbox” Input “sandbox”InformationService Job Status ReplicaCatalogue Author. &Authen. Job Submit Job Query Job Status

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 12 Strategy u Define new architecture with stability as main feature n Remove bottlenecks n Tune system to reflect reality u Implement robust core subsystems using Globus components u Use existing working subsystems from Globus and the EDG for the missing features and enhance where needed u Keep it simple – while functional

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 13 NorduGrid task flow

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 14 NorduGrid Architecture u Use Globus components to build a working Grid u Globus RPM distribution based on Globus 2.0 final for RedHat 7.2 n Has also been verified on Slackware and Mandrake u Use existing MDS with improved schema u Use GridFTP protocol with servers and clients built into applications u Use existing Replica Catalog to manage data u Replace most of Globus Resource management u Rewrite User Interface with broker added

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 15 Information Schemas u New Object Classes: n Clusters n Queues n Jobs n Storage elements n Replica Catalogs u Namespace: nordugrid u Example: n nordugrid-cluster

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 16 NorduGrid MDS

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 17 MDS Stability u Problems: Stability, hanging/freezing u Stress testing showed that crashes were more frequent with: n Invalid attributes n Many and frequent registrants n Many and freqent searches u GIIS backend singled out as the culprint n Found bug in part of code which is invoked with many entries in MDS n Never seen by Globus – now reported and fixed n Patched versions from NorduGrid Web site u Since patching no problems found with MDS

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 18 Job submission u Globus Gatekeeper/jobmanager interface still supported u Jobmanager basically bypassed and resource management handled by Grid Manager u Grid Manager handles n Interface between outside world and local resource management system n Download and uploads of input/output data u A grid-ftp server with virtual directories provided as a replacement to the Gatekeeper and Jobmanager

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 19 User interface u New command line tools: n ngsub, ngstat, ngget, ngkill,… u Queries MDS and choose matching resource when doing submission u Handles upload and download using Globus support transfer protocols

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 20 Configuration u Centralized common configuration n No need to know all the configuration files for the individual components (eg. the Globus MDS) n Less error prone and easier for debugging n Easier for site admins with only one or two configuration files to worry about n Adopt an advanced version of the configuration file used in the EDG n Simple attribute=value not flexible enough u Use 2 configuration files (Globus and NorduGrid): globus.conf (No NorduGrid information) n nordugrid.conf

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 21 Old globus.conf GLOBUS_LOCATION=/opt/globus-beta21 GLOBUS_JOBMANAGERS="fork pbs" X509_GATEKEEPER_CERT=/etc/grid-security/globus-gatekeeper.cert X509_GATEKEEPER_KEY=/etc/grid-security/globus-gatekeeper.key GLOBUS_GRAM_JOB_MANAGER_QDEL=/usr/local/pbs/bin/qdel GLOBUS_GRAM_JOB_MANAGER_QSTAT=/usr/local/pbs/bin/qstat GLOBUS_GRAM_JOB_MANAGER_QSUB=/usr/local/pbs/bin/qsub GRID_INFO_EDG=yes GRID_INFO_GRIS=no GRID_INFO_USER=root GRID_INFO_GRIS_REG_GIIS=Denmark GRID_INFO_GRIS_REG_HOST=grid.nbi.dk GRID_INFO_GRIS_REG_PORT=2135 #GRID_INFO_OVERWRITE=no

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 22 Common Grid Configuration u The format of the configuration file is: [section] Attr1=val1 Attr2=val2 [section/subsection] Attr3=val3 Attr3=val4 Attr1=val5 [section/subsection/subsubsection/…] myattr=myval

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 23 Globus Configuration u Locations : n /etc/globus.conf n $GLOBUS_CONFIG/etc/globus.conf u Examples: [gatekeeper] port=12345 [mds/gris] providers=“globus-gris ng” # Doing multiple GRIS registrations [mds/gris/registration/NBI] regname=NBI [mds/gris/registration/Denmark] regname=Denmark reghost=grid.nbi.dk regport=2136

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 24 Globus configuration (2) # Almost all internal MDS parameters can be set [mds/giis/Denmark] Cachetime=120 # Simple GIIS Registration [mds/giis/Copenhagen/registration/Denmark] # Multiple registrations to sites with same VO name [mds/giis/Denmark/registration/site1] regname=NorduGrid reghost=grid.nbi.dk [mds/giis/Denmark/registration/site2] regname=NorduGrid reghost=grid.quark.lu.se [mds/giis/Denmark/registration/site3] regname=NorduGrid reghost=grid.uio.no regperiod=60

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 25 NorduGrid Load Monitor

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 26 NorduGrid Jobs

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 27 Prepare production scripts % mkprod usage: mkprod [events/sample] [start] % mkprod Creating partition script : dc simu.0001.nordugrid.xrsl Creating partition script : dc simu.0002.nordugrid.xrsl Creating partition script : dc simu.0003.nordugrid.xrsl … Creating partition script : dc simu.0019.nordugrid.xrsl Creating partition script : dc simu.0020.nordugrid.xrsl %

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 28 dc simu.0001.nordugrid.xrsl &(executable="/$ATLAS_ROOT/bin/atlsim") (arguments="-w 0 -b dc1.kumac project=dc1 pgroup=nordugrid step=simu partition=0001 nskip=0 ntrig=100 dset= nset=0017") (stdout=out.txt)(stderr=err.txt) (outputfiles= ("out.txt" "") ("err.txt" "") ("dc simu.0001.nordugrid.zebra" "gsiftp://lscf.nbi.dk/ATLAS/dc1-17/dc simu.0001.nordugrid.zebra“) ("dc simu.0001.nordugrid.his“ "gsiftp://lscf.nbi.dk/ATLAS/dc1-17/dc simu.0001.nordugrid.his") ) (inputFiles= ("atlas.kumac" " ("atlsim.makefile" " ("atlsim.logon.kumac" " ("dc1.kumac" " ("dc1.root“ ) (jobname="dc simu.0001.nordugrid") (* 20 hours seem to be enough for 100 events *) (MaxCPUTime=1200) (* Try to make download faster *) (ftpthreads=4) (runtimeenvironment=ATLAS-3.0.1)

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 29 dc1.kumac MACRO atlsimrun project=dc1 pgroup=nordugrid step=simu partition=0001 nskip=0 ntrig=2 dset= nset=0016 shell uname -a lfn = [project].[dset].[step].[partition].[pgroup] exec atlas#root Ag/Version batch gtime Rung [partition] 1 Ranlux [partition] ghist [lfn].his gmake -C. * - set atlas-standard configuration (inner, center, all) exec atlas#config ALL exec atlas#calo_shift 4 mode OUTP simu 2 mode FWDC geom 2 mode HEPE hist 100 * - load dice and rootIO codes, compile field here make adice.start MagneticField/MagneticFieldAge _ MagneticField/MagneticFieldCore -ladice make atlprod Database/AthenaRoot/RootTableObjects -lRootKernel * - select filters etc. Gvertex Gspread TFLT ETAP * - select I/O call AguDSET($quote([dset].[nset])) call AguFILE(1) * - next line may produce an uncorrect error message which should be ignored gfile u dc1.root E gfile O [lfn].zebra skip [nskip] trig [ntrig] shell ls -l RETURN

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 30 Future work u Data management n Transparent shared access to data from computing nodes - even during execution u Authorization and accounting n More fine grain access control to resources n Better user separation and access control on files u Authentication n Distributed registration authority - OpenCA u Continued information system test n Sites or nodes should not be able to affect the MDS stability

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 31 Conclusion and status u Working production Grid testbed exists u Stable information system (MDS) u Approximately 130 CPUs scattered across Denmark, Norway and Sweden u First job submitted on March 28 u Runs the ATLAS data challenge 1 with ATLAS software distributed as RPMs u Live status monitor available from the web site:

Anders Wäänänen – ATF Meeting – June 12 th 2002 – 32 Resources u Documentation and source code are available for download u Main Web site: n u Repository n ftp://ftp.nbi.dk/pub/nordugrid/ ftp://ftp.nbi.dk/pub/nordugrid/