CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.

Slides:



Advertisements
Similar presentations
Grid Wizard Enterprise Basic Tutorial Using Web Control Panel.
Advertisements

Generic MPI Job Submission by the P-GRADE Grid Portal Zoltán Farkas MTA SZTAKI.
Grid Resource Allocation Management (GRAM) GRAM provides the user to access the grid in order to run, terminate and monitor jobs remotely. The job request.
NorduGrid Grid Manager developed at NorduGrid project.
Part 7: CondorG A: Condor-G B: Laboratory: CondorG.
Parasol Architecture A mild case of scary asynchronous system stuff.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
CMS HLT production using Grid tools Flavia Donno (INFN Pisa) Claudio Grandi (INFN Bologna) Ivano Lippi (INFN Padova) Francesco Prelz (INFN Milano) Andrea.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Computer Organization and Architecture
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Workload Management Massimo Sgaravatto INFN Padova.
BMC Control-M Architecture By Shaikh Ilyas
Grids and Globus at BNL Presented by John Scott Leita.
Operating Systems (CSCI2413) Lecture 3 Processes phones off (please)
Expanding scalability of LCG CE A.Kiryanov, PNPI.
Asynchronous Web Services Approach Enrique de Andrés Saiz.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
High Performance Louisiana State University - LONI HPC Enablement Workshop – LaTech University,
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Grid Computing I CONDOR.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Oracle 10g Database Administrator: Implementation and Administration Chapter 2 Tools and Architecture.
Grid job submission using HTCondor Andrew Lahiff.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
August 13, 2003Eric Hjort Getting Started with Grid Computing in STAR Eric Hjort, LBNL STAR Collaboration Meeting August 13, 2003.
CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR2 Disk.
Production Tools in ATLAS RWL Jones GridPP EB 24 th June 2003.
Open Science Grid OSG CE Quick Install Guide Siddhartha E.S University of Florida.
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
Grid Operations Centre LCG Accounting Trevor Daniels, John Gordon GDB 8 Mar 2004.
Tarball server (for Condor installation) Site Headnode Worker Nodes Schedd glidein - special purpose Condor pool master DB Panda Server Pilot Factory -
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
Job Submission with Globus, Condor, and Condor-G Selim Kalayci Florida International University 07/21/2009 Note: Slides are compiled from various TeraGrid.
Introduction to Grid Computing and its components.
LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
JSS Job Submission Service Massimo Sgaravatto INFN Padova.
Open Science Grid Build a Grid Session Siddhartha E.S University of Florida.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
Consorzio COMETA - Progetto PI2S2 UNIONE EUROPEA Grid2Win : gLite for Microsoft Windows Elisa Ingrà - INFN.
Five todos when moving an application to distributed HTC.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Peter Kacsuk – Sipos Gergely MTA SZTAKI
Condor-G Making Condor Grid Enabled
Credential Management in HTCondor
Condor-G: An Update.
Presentation transcript:

CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver

CERN Introduction LCG overview: –Contains software for Workload Management Data Management Information System/Monitoring Services Storage Management Scaling concerns –Workload management built on top of the Globus Toolkit, –Discuss only Workload Management issues here, specifically those related to Globus

CERN The problem LCG not development project, but the underlying Globus toolkit has some scaling features that impact us –Jobs managed by a ‘JobManager’. –Interface between the JobManager and the local batch system performed by an intermediate layer. (Perl script) Two broad problem areas: –Access to a shared file system area between all batch workers is assumed by the JobManager Scripts. –Inherent scaling problems due to one instance of the JobManager associated to a user job.

CERN GRAM Client Job Control Globus Gatekeeper Globus Job Manager Job Manager scriptLocal Batch system

CERN Shared file system Shared file system not acceptable to everyone. Wanted to remove the requirement of sharing between batch workers Submission model is that of a central entity that receives and handles job queries and submission to local batch system. This is called the Globus gatekeeper. Shared file system requirement comes from need to make X509 certificate available to job Stdout and stderr available to the gatekeeper after (during) the job execution.

CERN Steps in submitting a job Gatekeeper proceeds through a number of states: Stage in of files required by job Copying X509 user proxy Submit job to batch system Monitor job status in the batch system Receive refreshed X509 proxy during lifetime of job Allow access to stdout/stderr of job. Optionally return output files Cleanup and free resources held for job Gatekeeper Worker1Worker2

CERN The GASS cache Uses GASS cache: a file system based database that allows an instance of a file to be associated to a URL and a TAG Globus provides IO routines to access both local and remote GASS cache entries How to avoid sharing the file system containing GASS cache? –Export entries at start of job, create local cache on target batch worker during the life of the job. –At the end of the job return the contents of the cache and add back to the cache the gatekeeper is working with.

CERN Exporting the GASS cache GASS caches… Cache on gatekeeper: Entries for Job1 Job2 Job3 … Worker2: Local cache for Job2 Worker1: Local cache for Job1 Worker3: Local cache for Job3

CERN More on cache handling Exporting and importing is done using a globus-url-copy (FTP) Special considerations: Initial X509 certificate required to start import of cache. –Use stage in facility of batch system. (For PBS this implies scp access from the batch worker to the gatekeeper machine) X509 proxy certificate needs to be updated during the life of a job. –Pull proxy from gatekeeper cache when the proxy on the batch worker is near expiry. Stdout and Stderr from the job are returned as entries in the cache. – The local batch system will also have a mechanism to return these. If used the two are concatenated. The globus mechanism for staging in/out files will also need explicit coping of the staged in file set from the gatekeeper and the return of files to be staged out.

CERN JobManager handling The other problem… JobManager associated to each job –By default there is a job manager in the process table on the gatekeeper machine for each globus job submitted or running. Limited by number of processes Limited by memory available Limited by scheduling or other system resources –Each JobManager also needs to periodically query the state of the job Needs to fire up a JobManager script and use batch system commands There is already a solution for this… –Condor-G already used by LCG as the job submission service –Condor team have an interesting way to address the number of JobManagers

CERN Condor-G solution Condor-G solution is to make use of existing GRAM facility: –Once a job is submitted to the local batch system the associated JobManager can signaled to exit. –Not much use in itself, as it must be restarted in order to query the job’s status in the batch system. Condor-G can run a special ‘grid monitor’ task on the gatekeeper machine, on behalf of each user: –This calls the JobManager script interface to query the status of each job in turn from the batch system. The status list for all of the jobs is returned periodically to the Condor-G machine. –For jobs that have left the batch system a JobManager is restarted and the final stages of the job are concluded as normal.

CERN The Condor-G grid monitor 1 For a given user… Job1 Stage in Submit to bs Poll status Stage out Cleanup JobManager Job2 Stage in Submit to bs Poll status Stage out Cleanup Manager killed Job3 Stage in Submit to bs Poll status Stage out Cleanup Grid monitor for user Condor-G machine

CERN Grid monitor 2 Query Job 1 JM Script to Poll Job 1 Query Job 2 JM Script to Poll Job 2 Return results Parse list of jobs Wait for next poll

CERN Remaining issues Still some problems: Potentially large load on the batch system, series of queries every check interval. For large number of jobs the check time can far exceed the check interval. The system is tightly coupled to the batch system. –Slow response to the queries or submission requests can rapidly cause the gatekeeper to become process bound or prevent the grid monitor returning any results to Condor-G.

CERN Total query time Partially address the problem of total query time by making some optimisation in the grid manager: Query only as many jobs as is possible in one scan period. –Assume that the others have not changed state since last query –Start with the jobs whose status is most aged in the next cycle Parse list of jobsParse list of changed jobs Query Job 3 JM Script to Poll Job 3 Query Job 4 JM Script to Poll Job 4 Query Job 1 JM Script to Poll Job 1

CERN Changes to JobManager scripts Address coupling to batch system and batch system load through the JobManager scripts themselves: –Globus supply JobManager interfaces to Condor, LSF and PBS –Wrote lcg versions of these New job managers for import/export of GASS cache to batch workers Architecture change to address remaining batch system issues –Batch load to be reduced by caching batch system query –(45 second cache) –Less coupling to batch system by introducing queues at various stages of the job cycle.

CERN Job progression through LCG JM Add queues and service by asynchronous processes Job1 Stage in Submit to bs Poll status Stage out Cleanup Export and Submission queue Cleanup queue Import queue Grid monitor for user Batch status cache Queue & Cache Service processes

CERN Summary Globus toolkit has scaling limitations in the job submission model Condor-G already has an interesting solution –Optimisation possible LCG flavour of the JobManager scripts –Avoid the necessity of sharing the gatekeeper GASS cache to the batch worker machines –Loosen the binding to the batch system –Reduce batch system query frequency

CERN Future The work so far should allow 1000s of jobs to be handled by a gatekeeper –However still little work done on scaling with number of users In the future it may be a consideration to consider changing the Globus JobManager itself –Both grid monitor and LCG JobManagers are trying to deal with issues related to the underlying JobManager and GRAM design.