www.nordugrid.org NorduGrid: the light-weight Grid solution LCSC 2003 Linköping, October 23, 2003 Oxana Smirnova.

Slides:



Advertisements
Similar presentations
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Advertisements

Grid Resource Allocation Management (GRAM) GRAM provides the user to access the grid in order to run, terminate and monitor jobs remotely. The job request.
NorduGrid Grid Manager developed at NorduGrid project.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Andrew McNab - EDG Access Control - 14 Jan 2003 EU DataGrid security with GSI and Globus Andrew McNab University of Manchester
A Computation Management Agent for Multi-Institutional Grids
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Minimum intrusion GRID. Build one to throw away … So, in a given time frame, plan to achieve something worthwhile in half the time, throw it away, then.
Current Monte Carlo calculation activities in ATLAS (ATLAS Data Challenges) Oxana Smirnova LCG/ATLAS, Lund University SWEGRID Seminar (April 9, 2003, Uppsala)
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Workload Management Massimo Sgaravatto INFN Padova.
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
The NorduGrid project: Using Globus toolkit for building Grid infrastructure presented by Aleksandr Konstantinov Mattias Ellert Aleksandr Konstantinov.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
The EDG Testbed Deployment Details The European DataGrid Project
Overview of the NorduGrid Information System Balázs Kónya 3 rd NorduGrid Workshop 23 May, 2002, Helsinki.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Introduction to NorduGrid ARC / Arto Teräs Slide 1(16) Introduction to NorduGrid ARC Arto Teräs Free and Open Source Software Developers' Meeting.
Conference xxx - August 2003 Anders Ynnerman EGEE Nordic Federation Director Swedish National Infrastructure for Computing Linköping University Sweden.
Grid Computing - AAU 14/ Grid Computing Josva Kleist Danish Center for Grid Computing
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
NorduGrid Architecture and tools CHEP2003 – UCSD Anders Wäänänen
Computational grids and grids projects DSS,
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
The NorduGrid Project Oxana Smirnova Lund University November 3, 2003, Košice.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
Andrew McNabETF Firewall Meeting, NeSC, 5 Nov 2002Slide 1 Firewall issues for Globus 2 and EDG Andrew McNab High Energy Physics University of Manchester.
Quick Introduction to NorduGrid Oxana Smirnova 4 th Nordic LHC Workshop November 23, 2001, Stockholm.
Production Tools in ATLAS RWL Jones GridPP EB 24 th June 2003.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
Using the EMI testbed ARC middleware Marek Kočan University of P. J. Šafárik, Košice.
The NorduGrid Information System Balázs Kónya GGF July, 2002, Edinburgh.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host.
Creating and running an application.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
ATLAS Data Challenge on NorduGrid CHEP2003 – UCSD Anders Wäänänen
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
Performance of The NorduGrid ARC And The Dulcinea Executor in ATLAS Data Challenge 2 Oxana Smirnova (Lund University/CERN) for the NorduGrid collaboration.
The NorduGrid toolkit user interface Mattias Ellert Presented at the 3 rd NorduGrid workshop, Helsinki,
Andrew McNab - Globus Distribution for Testbed 1 Globus Distribution for Testbed 1 Andrew McNab, University of Manchester
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
Application examples Oxana Smirnova (Lund, EPF) 3 rd NorduGrid Workshop, May23, 2002.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
Overview of ATLAS Data Challenge Oxana Smirnova LCG/ATLAS, Lund University GAG monthly, February 28, 2003, CERN Strongly based on slides of Gilbert Poulard.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
NorduGrid's ARC: A Grid Solution for Decentralized Resources Oxana Smirnova (Lund University/CERN) for the NorduGrid collaboration ISGC 2005, Taiwan.
Next Generation of Grid Services for the NorduGrid NEC 2003 Varna, September 19, 2003 Oxana Smirnova.
The EDG Testbed Deployment Details
Oxana Smirnova, Jakob Nielsen (Lund University/CERN)
U.S. ATLAS Grid Production Experience
NorduGrid and LCG middleware comparison
Presentation transcript:

NorduGrid: the light-weight Grid solution LCSC 2003 Linköping, October 23, 2003 Oxana Smirnova

2 Some facts  NorduGrid is: –A Globus-based Grid middleware solution for Linux clusters –A large international 24/7 production quality Grid facility –A resource routinely used by researchers since summer 2002 –A freely available software –A project in development  NorduGrid is NOT: –Derived from other Grid solutions (e.g. EU DataGrid) –An application-specific tool –A testbed anymore –A finalized solution

3 Some history  Initiated by several Nordic universities –Copenhagen, Lund, Stockholm, Oslo, Bergen, Helsinki  Started in January 2001 –Initial budget: 2 years, 3 new positions –Initial goal: to deploy EU DataGrid middleware to run “ATLAS Data Challenge”  Cooperation with EU DataGrid –Common Certification Authority and Virtual Organization tools, Globus2 configuration –Common applications (high-energy physics research)  Switched from deployment to R&D in February 2002 –Forced by the necessity to execute “ATLAS Data Challenges” –Deployed a light-weight and yet reliable and robust Grid solution in time for the ATLAS DC tests in May 2002  Will continue for 4-5 years more (and more?..) –Form the ”North European Grid Federation” together with the Dutch Grid, Belgium and Estonia –Will provide middleware for the ”Nordic Data Grid Facility” –…as well as for the Swedish Grid facility SWEGRID, Danish Center for Grid Computing, Finnish Grid projects etc

4 The resources  Almost everything the Nordic academics can provide (ca 1000 CPUs in total): –4 dedicated test clusters (3-4 CPUs) –Some junkyard-class second-hand clusters (4 to 80 CPUs) –Few university production-class facilities (20 to 60 CPUs) –Two world-class clusters in Sweden, listed in Top500 (238 and 398 CPUs)  Other resources come and go –Canada, Japan – test set-ups –CERN, Dubna – clients –It’s open so far, anybody can join or part –Number of other installations unknown  People: –the “core” team keeps growing –local sysadmins are only called up when users need an upgrade

5 A snapshot

6 NorduGrid specifics 1.It is stable by design: a)The nervous system: distributed yet stable Information System (Globus’ MDS 2.2+patches) b)The heart(s): Grid Manager, the service to be installed at master nodes (based on Globus, replaces GRAM) c)The brain(s): User Interface, the client/broker that can be installed anywhere as a standalone module (makes use of Globus) 2.It is light-weight, portable and non-invasive: a)Resource owners retain full control; Grid Manager is effectively a yet another user (with many faces though) b)Nothing has to be installed on worker nodes c)No requirements w.r.t. OS, resource configuration, etc. d)Clusters need not be dedicated e)Runs on top of existing Globus installation (e.g. VDT) f)Works with any Linux flavor, Solaris, Tru64 3.Strategy: start with something simple that works for users and add functionality gradually

7 How does it work  Information system knows everything Information system –Substantially re-worked and patched Globus MDS –Distributed and multi-rooted –Allows for a pseudo-mesh topology –No need for a centralized broker  The server (“Grid manager”) on each gatekeeper does most of the jobGrid manager –Pre- and post- stages files –Interacts with LRMS –Keeps track of job status –Cleans up the mess –Sends mails to users  The client (“User Interface”) does the brokering, Grid job submission, monitoring, termination, retrieval, cleaning etcUser Interface –Interprets user’s job task –Gets the testbed status from the information system –Forwards the task to the best Grid Manager –Does some file uploading, if requested

8 Information System  Uses Globus’ MDS 2.2 –Soft-state registration allows creation of any dynamic structure –Multi-rooted tree –GIIS caching is not used by the clients –Several patches and bug fixes are applied  A new schema is developed, to serve clusters –Clusters are expected to be fairly homogeneous

9 Front-end and the Grid Manager  Grid Manager replaces Globus’ GRAM, still using Globus Toolkit TM 2 libraries  All transfers are made via GridFTP  Added a possibility to pre- and post-stage files, optionally using Replica Catalog information  Caching of pre-staged files is enabled  Runtime environment support

Summary of Grid services on the front-end machine  GridFTP server –Plugin for job submission via a virtual directory –Conventional file access with Grid access control  LDAP server for information services  Grid Manager

The User Interface  Provides a set of utilities to be invoked from the command line:  Contains a broker that polls MDS and decides to which queue at which cluster a job should be submitted –The user must be authorized to use the cluster and the queue –The cluster’s and queue’s characteristics must match the requirements specified in the xRSL string (max CPU time, required free disk space, installed software etc) –If the job requires a file that is registered in a Replica Catalog, the brokering gives priority to clusters where a copy of the file is already present –From all queues that fulfills the criteria one is chosen randomly, with a weight proportional to the number of free CPUs available for the user in each queue –If there are no available CPUs in any of the queues, the job is submitted to the queue with the lowest number of queued job per processor ngsubto submit a task ngstatto obtain the status of jobs and clusters ngcatto display the stdout or stderr of a running job nggetto retrieve the result from a finished job ngkillto cancel a job request ngcleanto delete a job from a remote cluster ngrenewto renew user’s proxy ngsyncto synchronize the local job info with the MDS ngcopyto transfer files to, from and between clusters ngremoveto remove files

Job Description: extended Globus RSL (&(executable="recon.gen.v5.NG") (arguments="dc lumi hlt.pythia_jet_17.zebra" "dc lumi02.recon hlt.pythia_jet_17.eg7.602.ntuple" "eg7.602.job" “999") (stdout="dc lumi02.recon hlt.pythia_jet_17.eg7.602.log") (stdlog="gridlog.txt")(join="yes") ( |(&(|(cluster="farm.hep.lu.se")(cluster="lscf.nbi.dk")(*cluster="seth.hpc2n.umu.se"*)(cluster="login-3.monolith.nsc.liu.se")) (inputfiles= ("dc lumi hlt.pythia_jet_17.zebra" "rc://grid.uio.no/lc=dc1.lumi ,rc=NorduGrid,dc=nordugrid,dc=org/zebra/dc lumi hlt.pythia_jet_17.zebra") ("recon.gen.v5.NG" " ("eg7.602.job" " ("noisedb.tgz" " ) (inputfiles= ("dc lumi hlt.pythia_jet_17.zebra" "rc://grid.uio.no/lc=dc1.lumi ,rc=NorduGrid,dc=nordugrid,dc=org/zebra/dc lumi hlt.pythia_jet_17.zebra") ("recon.gen.v5.NG" " ("eg7.602.job" " ) (outputFiles= ("dc lumi02.recon hlt.pythia_jet_17.eg7.602.log" "rc://grid.uio.no/lc=dc1.lumi02.recon ,rc=NorduGrid,dc=nordugrid,dc=org/log/dc lumi02.recon hlt.pythia_jet_17.e g7.602.log") ("histo.hbook" "rc://grid.uio.no/lc=dc1.lumi02.recon ,rc=NorduGrid,dc=nordugrid,dc=org/histo/dc lumi02.recon hlt.pythia_jet_17. eg7.602.histo") ("dc lumi02.recon hlt.pythia_jet_17.eg7.602.ntuple" "rc://grid.uio.no/lc=dc1.lumi02.recon ,rc=NorduGrid,dc=nordugrid,dc=org/ntuple/dc lumi02.recon hlt.pythia_jet_1 7.eg7.602.ntuple")) (jobname="dc lumi02.recon hlt.pythia_jet_17.eg7.602") (runTimeEnvironment="ATLAS-6.0.2") (CpuTime=1440)(Disk=3000)(ftpThreads=10))

Task flow Grid Manager Gatekeeper GridFTP RSL Front-end Cluster B Cluster A B!

Performance  The main load: “ATLAS Data Challenge 1” (DC1) –April 5th 2002: first job submitted –May 10th 2002: first pre-DC1-validation-job –End-May 2002: now clear that the system is mature enough to do and manage real production. –DC1, phase1 (detector simulation): Total number of jobs: 1300, ca. 24 hours of processing 2 GB of input each Total output size: 762 GB All files uploaded to Storage Elements and registered in the Replica Catalog. –DC1, phase2 (pile-up of data): Piling up the events above with a background signal 1300 jobs, ca. 4 hours each –DC1, phase3 (reconstruction of signal) 2150 jobs, 5-6 hours of processing 1 GB of input each  Other applications: –Calculations for string fragmentation models (Quantum Chromodynamics) –Quantum lattice models calculations (sustained load of 150+ long jobs at any given moment for several days) –Particle physics analysis and modeling  At peak production, up to 500 jobs were managed by the NorduGrid at the same time

What is needed for installation  A cluster or even a single machine  For a server: –Any Linux flavor (binary RPMs exist for RedHat and Mandrake, ev. for Debian) –A local resource management system, e.g., PBS –Globus installation (NorduGrid has an own distribution in a single RPM) –Host certificate (and user certificates) –Some open ports (depends on the cluster size) –One day to go through all the configuration details  The owner always retains a full control –Installing NorduGrid does not give automatic access to the resources –And other way around –But with a bit of negotiations, one can get access to very considerable resources on a very good network  Current stable release is ; daily CVS snapshots are available

Summary  NorduGrid pre-release (currently ) works reliably  Release 1.0 is slowly but surely on its way; many fixes are still needed  We welcome developers: much functionality is still missing, such as: –Bookkeeping, accounting –Group- and role-based authorization –Scalable resource discovery and monitoring service –Interactive tasks –Integrated, scalable and reliable data management –Interfaces to other resource management systems  We welcome new users and resources –Nordic Data Grid Facility will provide support