Presentation is loading. Please wait.

Presentation is loading. Please wait.

NorduGrid Architecture EDG ATF Meeting CERN – June 12 th 2002 Anders Wäänänen.

Similar presentations


Presentation on theme: "NorduGrid Architecture EDG ATF Meeting CERN – June 12 th 2002 Anders Wäänänen."— Presentation transcript:

1 NorduGrid Architecture EDG ATF Meeting CERN – June 12 th 2002 Anders Wäänänen

2 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 2 Overview u Project overview u Architecture u Features u Future plans and directions

3 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 3 Project Overview u Launched in spring 2001, with the aim of creating a Grid infrastructure in the Nordic countries u Partners from Denmark, Norway, Sweden, and Finland u Meant to be the Nordic branch of the EU DataGrid (EDG) project testbed u Relies on very limited human resources (3 full-time researchers, few part-time ones) with funding from NorduNet2

4 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 4 Resources & organization u 5 dedicated Linux test-clusters (3 to 5 CPUs each) + a couple of stand- alone machines + a couple of “real” production clusters u Good connectivity provided by the NORDUNet network u The steering group and the technical working group, 5 persons each u Most of the communications are done via the Internet or phone conferences; the technical working group convenes bi-monthly at different sites u Plenary workshops twice a year (last in Helsinki in May)

5 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 5 People in NorduGrid u Technical Group n Aleksandr Konstantinov n Balázs Kónya n Mattias Ellert n Oxana Smirnova n Anders Wäänänen u Application (ATLAS) support n Jakob Langgard

6 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 6 NorduGrid involvement in EDG u WP2 – GDMP/Replica u WP3 – MDS, Schema u WP5 – Castor u WP6 – Integration Team, Testbeds, Bugzilla, Globus Configuration u WP8 – User requirements, testing u WP12 – EDG License u Certificate Authority, Security, Authorization u Testing and bug fixes

7 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 7 The development u Initial Hardware installed by June 2001 u NorduGrid Authentication System was put into operation in May 2001 u The first middleware was deployed and the sites were Grid-enabled by July 2001 u Further Grid services were put into operation (November-December 2001): n NorduGrid User Management System (Virtual Organization) n NorduGrid Information System n Grid Data Mirroring Package (GDMP) n Data replication catalog u Deployment & evaluation of the first (Testbed 1) release of the EDG Middleware (December-January)

8 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 8 Philosophy u Resource owners have full control over their resources u Installation details should not be dictated n Method, OS version, configuration, etc… u As little restriction on configuration as possible n Compute nodes should not be required to be on the public network u NorduGrid software should be able to use existing system and Globus installation n Globus RPMs provided

9 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 9 ProjectTimeline u June 2001 - Initial Hardware installed u May 2001 - NorduGrid Authentication System was put into operation u July 2001 - The first middleware was deployed and the sites were Grid- enabled u November – December 2001 - Further Grid services were put into operation: n NorduGrid User Management System (Virtual Organization) n NorduGrid Information System n Grid Data Mirroring Package (GDMP) n Data replication catalog u December – January 2002 - Deployment & evaluation of the first release of the EDG middleware (Testbed 1)

10 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 10 Facing Reality u NorduGrid was only an 18 months project compared to 3 years for EU DataGrid u Expected to run the ATLAS Data Challenge on a working Grid testbed in May 2002 in the Nordic countries u Continuing problems with EDG testbed stability u Architecture problems with bottlenecks and fragile system components u The urgent need to have something stable and working resulted in the decision to create a new architecture not necessarily compatible with EDG

11 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 11 A Job Submission Example UI JDL Logging & Book-keeping ResourceBroker Output “sandbox” Input “sandbox” Job Submission Service StorageElement ComputeElement Brokerinfo Output “sandbox” Input “sandbox”InformationService Job Status ReplicaCatalogue Author. &Authen. Job Submit Job Query Job Status

12 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 12 Strategy u Define new architecture with stability as main feature n Remove bottlenecks n Tune system to reflect reality u Implement robust core subsystems using Globus components u Use existing working subsystems from Globus and the EDG for the missing features and enhance where needed u Keep it simple – while functional

13 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 13 NorduGrid task flow

14 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 14 NorduGrid Architecture u Use Globus components to build a working Grid u Globus RPM distribution based on Globus 2.0 final for RedHat 7.2 n Has also been verified on Slackware and Mandrake u Use existing MDS with improved schema u Use GridFTP protocol with servers and clients built into applications u Use existing Replica Catalog to manage data u Replace most of Globus Resource management u Rewrite User Interface with broker added

15 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 15 Information Schemas u New Object Classes: n Clusters n Queues n Jobs n Storage elements n Replica Catalogs u Namespace: nordugrid u Example: n nordugrid-cluster

16 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 16 NorduGrid MDS

17 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 17 MDS Stability u Problems: Stability, hanging/freezing u Stress testing showed that crashes were more frequent with: n Invalid attributes n Many and frequent registrants n Many and freqent searches u GIIS backend singled out as the culprint n Found bug in part of code which is invoked with many entries in MDS n Never seen by Globus – now reported and fixed n Patched versions from NorduGrid Web site u Since patching no problems found with MDS

18 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 18 Job submission u Globus Gatekeeper/jobmanager interface still supported u Jobmanager basically bypassed and resource management handled by Grid Manager u Grid Manager handles n Interface between outside world and local resource management system n Download and uploads of input/output data u A grid-ftp server with virtual directories provided as a replacement to the Gatekeeper and Jobmanager

19 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 19 User interface u New command line tools: n ngsub, ngstat, ngget, ngkill,… u Queries MDS and choose matching resource when doing submission u Handles upload and download using Globus support transfer protocols

20 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 20 Configuration u Centralized common configuration n No need to know all the configuration files for the individual components (eg. the Globus MDS) n Less error prone and easier for debugging n Easier for site admins with only one or two configuration files to worry about n Adopt an advanced version of the configuration file used in the EDG n Simple attribute=value not flexible enough u Use 2 configuration files (Globus and NorduGrid): globus.conf (No NorduGrid information) n nordugrid.conf

21 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 21 Old globus.conf GLOBUS_LOCATION=/opt/globus-beta21 GLOBUS_JOBMANAGERS="fork pbs" X509_GATEKEEPER_CERT=/etc/grid-security/globus-gatekeeper.cert X509_GATEKEEPER_KEY=/etc/grid-security/globus-gatekeeper.key GLOBUS_GRAM_JOB_MANAGER_QDEL=/usr/local/pbs/bin/qdel GLOBUS_GRAM_JOB_MANAGER_QSTAT=/usr/local/pbs/bin/qstat GLOBUS_GRAM_JOB_MANAGER_QSUB=/usr/local/pbs/bin/qsub GRID_INFO_EDG=yes GRID_INFO_GRIS=no GRID_INFO_USER=root GRID_INFO_GRIS_REG_GIIS=Denmark GRID_INFO_GRIS_REG_HOST=grid.nbi.dk GRID_INFO_GRIS_REG_PORT=2135 #GRID_INFO_OVERWRITE=no

22 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 22 Common Grid Configuration u The format of the configuration file is: [section] Attr1=val1 Attr2=val2 [section/subsection] Attr3=val3 Attr3=val4 Attr1=val5 [section/subsection/subsubsection/…] myattr=myval

23 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 23 Globus Configuration u Locations : n /etc/globus.conf n $GLOBUS_CONFIG/etc/globus.conf u Examples: [gatekeeper] port=12345 [mds/gris] providers=“globus-gris ng” # Doing multiple GRIS registrations [mds/gris/registration/NBI] regname=NBI [mds/gris/registration/Denmark] regname=Denmark reghost=grid.nbi.dk regport=2136

24 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 24 Globus configuration (2) # Almost all internal MDS parameters can be set [mds/giis/Denmark] Cachetime=120 # Simple GIIS Registration [mds/giis/Copenhagen/registration/Denmark] # Multiple registrations to sites with same VO name [mds/giis/Denmark/registration/site1] regname=NorduGrid reghost=grid.nbi.dk [mds/giis/Denmark/registration/site2] regname=NorduGrid reghost=grid.quark.lu.se [mds/giis/Denmark/registration/site3] regname=NorduGrid reghost=grid.uio.no regperiod=60

25 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 25 NorduGrid Load Monitor

26 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 26 NorduGrid Jobs

27 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 27 Prepare production scripts % mkprod usage: mkprod [events/sample] [start] % mkprod 2000 100 Creating partition script 0001 1 - 100 : dc1.000017.simu.0001.nordugrid.xrsl Creating partition script 0002 101 - 200 : dc1.000017.simu.0002.nordugrid.xrsl Creating partition script 0003 201 - 300 : dc1.000017.simu.0003.nordugrid.xrsl … Creating partition script 0019 1801 - 1900 : dc1.000017.simu.0019.nordugrid.xrsl Creating partition script 0020 1901 - 2000 : dc1.000017.simu.0020.nordugrid.xrsl %

28 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 28 dc1.000017.simu.0001.nordugrid.xrsl &(executable="/$ATLAS_ROOT/bin/atlsim") (arguments="-w 0 -b dc1.kumac project=dc1 pgroup=nordugrid step=simu partition=0001 nskip=0 ntrig=100 dset=000017 nset=0017") (stdout=out.txt)(stderr=err.txt) (outputfiles= ("out.txt" "") ("err.txt" "") ("dc1.000017.simu.0001.nordugrid.zebra" "gsiftp://lscf.nbi.dk/ATLAS/dc1-17/dc1.000017.simu.0001.nordugrid.zebra“) ("dc1.000017.simu.0001.nordugrid.his“ "gsiftp://lscf.nbi.dk/ATLAS/dc1-17/dc1.000017.simu.0001.nordugrid.his") ) (inputFiles= ("atlas.kumac" "http://www.nbi.dk/~waananen/atlas.kumac") ("atlsim.makefile" "http://www.nbi.dk/~waananen/atlsim.makefile") ("atlsim.logon.kumac" "http://www.nbi.dk/~waananen/atlsim.logon.kumac") ("dc1.kumac" "http://www.nbi.dk/~waananen/dc1.kumac") ("dc1.root“ "rc://@grid.uio.no:389/lc=ATLAS,rc=NorduGrid,dc=nordugrid,dc=org/gen0017_1.root") ) (jobname="dc1.000017.simu.0001.nordugrid") (notify="waananen@nbi.dk") (* 20 hours seem to be enough for 100 events *) (MaxCPUTime=1200) (* Try to make download faster *) (ftpthreads=4) (runtimeenvironment=ATLAS-3.0.1)

29 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 29 dc1.kumac MACRO atlsimrun project=dc1 pgroup=nordugrid step=simu partition=0001 nskip=0 ntrig=2 dset=000016 nset=0016 shell uname -a lfn = [project].[dset].[step].[partition].[pgroup] exec atlas#root Ag/Version batch gtime 0 1000 1 Rung [partition] 1 Ranlux [partition] ghist [lfn].his gmake -C. * - set atlas-standard configuration (inner, center, all) exec atlas#config ALL exec atlas#calo_shift 4 mode OUTP simu 2 mode FWDC geom 2 mode HEPE hist 100 * - load dice and rootIO codes, compile field here make adice.start MagneticField/MagneticFieldAge _ MagneticField/MagneticFieldCore -ladice make atlprod Database/AthenaRoot/RootTableObjects -lRootKernel * - select filters etc. Gvertex -0.015 -0.15 0.56 Gspread 0.0015 0.0015 5.6 TFLT ETAP -6.0 6.0 0.0 6.3 * - select I/O call AguDSET($quote([dset].[nset])) call AguFILE(1) * - next line may produce an uncorrect error message which should be ignored gfile u dc1.root E gfile O [lfn].zebra skip [nskip] trig [ntrig] shell ls -l RETURN

30 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 30 Future work u Data management n Transparent shared access to data from computing nodes - even during execution u Authorization and accounting n More fine grain access control to resources n Better user separation and access control on files u Authentication n Distributed registration authority - OpenCA u Continued information system test n Sites or nodes should not be able to affect the MDS stability

31 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 31 Conclusion and status u Working production Grid testbed exists u Stable information system (MDS) u Approximately 130 CPUs scattered across Denmark, Norway and Sweden u First job submitted on March 28 u Runs the ATLAS data challenge 1 with ATLAS software distributed as RPMs u Live status monitor available from the web site: http://www.nordugrid.org/

32 Anders Wäänänen – ATF Meeting – June 12 th 2002 – 32 Resources u Documentation and source code are available for download u Main Web site: n http://www.nordugrid.org/ http://www.nordugrid.org/ u Repository n ftp://ftp.nbi.dk/pub/nordugrid/ ftp://ftp.nbi.dk/pub/nordugrid/


Download ppt "NorduGrid Architecture EDG ATF Meeting CERN – June 12 th 2002 Anders Wäänänen."

Similar presentations


Ads by Google