Expanding scalability of LCG CE A.Kiryanov, PNPI.

Slides:



Advertisements
Similar presentations
Legacy code support for commercial production Grids G.Terstyanszky, T. Kiss, T. Delaitre, S. Winter School of Informatics, University.
Advertisements

CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Development of test suites for the certification of EGEE-II Grid middleware Task 2: The development of testing procedures focused on special details of.
CREAM: Update on the ALICE experiences WLCG GDB Meeting Patricia Méndez Lorenzo (IT/GS) CERN, 11th March 2009.
A Computation Management Agent for Multi-Institutional Grids
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
Load Test Planning Especially with HP LoadRunner >>>>>>>>>>>>>>>>>>>>>>
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Resource management system for distributed environment B4. Nguyen Tuan Duc.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Dynamic Firewalls and Service Deployment Models for Grid Environments Gian Luca Volpato, Christian Grimm RRZN – Leibniz Universität Hannover Cracow Grid.
1 Dynamic Application Installation (Case of CMS on OSG) Introduction CMS Software Installation Overview Software Installation Issues Validation Considerations.
VOMS Alessandra Forti HEP Sysman meeting April 2005.
1 BIG FARMS AND THE GRID Job Submission and Monitoring issues ATF Meeting, 20/06/03 Sergio Andreozzi.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
LCG / ARC Interoperability Status Michael Grønager, PhD (UNI-C / NBI) January 19, 2006, Uppsala.
Grid job submission using HTCondor Andrew Lahiff.
WLCG Nagios and the NGS. We have a plan NGS is using a highly customised version of the (SDSC written) INCA monitoring framework. It was became too complicated.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
Maarten Litmaath (CERN), GDB meeting, CERN, 2006/02/08 VOMS deployment Extent of VOMS usage in LCG-2 –Node types gLite 3.0 Issues Conclusions.
Grid Operations Centre LCG Accounting Trevor Daniels, John Gordon GDB 8 Mar 2004.
TeraGrid Advanced Scheduling Tools Warren Smith Texas Advanced Computing Center wsmith at tacc.utexas.edu.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Guide to Linux Installation and Administration, 2e1 Chapter 11 Using Advanced Administration Techniques.
Review of Condor,SGE,LSF,PBS
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
K. Harrison CERN, 3rd March 2004 GANGA CONTRIBUTIONS TO ADA RELEASE IN MAY - Outline of Ganga project - Python support for AJDL - LCG analysis service.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
BQS integration in gLite-CE TCG meeting, CERN 01/11/2006 Sylvain Reynaud, Fabio Hernandez.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
T3g software services Outline of the T3g Components R. Yoshida (ANL)
HTCondor-CE for USATLAS Bob Ball AGLT2/University of Michigan OSG AHM March, 2015 Bob Ball AGLT2/University of Michigan OSG AHM March, 2015.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
Summary on PPS-pilot activity on CREAM CE
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
The CREAM CE: When can the LCG-CE be replaced?
Introduction to Grid Technology
Ruslan Fomkin and Tore Risch Uppsala DataBase Laboratory
GLAST Release Manager Automated code compilation via the Release Manager Navid Golpayegani, GSFC/SSAI Overview The Release Manager is a program responsible.
WP1 activity, achievements and plans
Chapter 2: The Linux System Part 1
GRID Workload Management System for CMS fall production
Presentation transcript:

Expanding scalability of LCG CE A.Kiryanov, PNPI.

Why LCG CE? Probably the most ancient piece of LCG/gLite software which is still there Based on Globus/VDT with some extra bits like LCAS/LCMAPS, DGAS accounting, LCG jobmanagers GRAM interface, could be used both directly via Globus API/CLI and via RB/WMS or Condor-G Has visible scalability problems for quite a long time

Scalability problems Was not initially designed to deal with so many users and jobs at once  It’s not LCG fault, it’s initial Globus architecture  Situation became even worse with pool accounts and VOMS roles “One process per job” or “one process per user” approach  All processes run in parallel, no queues or other machinery to control this As a consequence more than 10 users working in parallel can draw LCG CE completely irresponsive (load average over 100)  Linux kernel is still far from perfect in handling such a heavy load  Takes a light run to CC and back just to press a reset button

Solutions (so far) Use of monitoring tools to keep track of system load on LCG CEs and reboot them as necessary Deploy a cluster of CEs to serve a relatively large fabric (LCG CEs cannot be used in fault- tolerant or load-balancing manner) Wait for gLite CE (RIP) or Cream CE (pre-beta quality) What about now? LHC data will come this year  no time left to start from scratch

Task description 1.Try to find a problem source 2.Try to get rid of it (partially or completely if possible) 3.Try to make changes as transparent as possible to not ruin existing applications and to avoid a long-term testing and certification 4.Evaluate the impact of changes 5.goto 1

Quest for a seat of the trouble: internal saboteurs globus-gatekeeper globus-job- manager globus-job-manager-script.pl Runs as root. Accepts external connections and performs user mapping. Single process, always alive. Runs as local user. Talks GRAM protocol over network and keeps track of the job state. One process per job. Alive only when necessary. Several times On every action Bad Guy #1. Written in Perl, talks to batch system via Perl modules (pbs, lcgpbs, lsf, condor, etc). Only does one thing at a time and then exits. Compiles itself from source on every invocation (sometimes hundreds times per second).

Quest for a seat of the trouble: external saboteurs fork job grid_manager_monitor_agent.* Bad Guy #2. Written in Perl, one process per local user, submitted from RB/WMS/Condor-G as a fork job. Initially was introduced by Condor-G to work-around jobmanager problems from the previous slide, but hit the same pitfall with introduction of pool accounts. А perfect way to draw CE resources, but you cannot disable them – RB and WMS use fork jobs to submit monitoring agents. Once

Problem summary All these processes running in parallel induce a very high load on a system Doing everything in parallel is actually much slower than doing it in series Frequent recompilation of a Perl code, that never changes, is a waste of resources

Modifications Perl Jobmanagers – A memory-persistent daemon was developed to avoid Perl code recompilation and to provide queue-based control over parallel requests from binary jobmanagers – Perl script was replaced by a tiny client written in C, that communicates with daemon via domain socket Grid Manager Monitor Agents – An existing machinery inside “fork” jobmanager was used to detect incoming agents and replace them with modified “light” ones – A monitoring daemon was developed to supply job state information to “light” monitoring agents

globus-job- manager globus-job-manager-script.pl Name stays the same, but stuffing is different: a tiny compiled executable On every action globus-job-manager-marshal fork() on new connection Domain socket Runs as root, accepts connections over a domain socket, keeps a queue of requests and allows only a limited number of them to be served in parallel Runs as local user, serves one jobmanager request and then exits Perl jobmanagers reinvented

fork job Runs as root, updates job state files for all users Grid Monitor Agents reinvented as well State files globus-gma fork() on state poll Runs as local user, polls job state and exits

Test results (1) Original version: 10 users 1000 jobs Modified version: 30 users 4000 jobs

Test results (2) Last version: 50 users 4000 jobs Stress test of the last version with both globus-gma and globus- job-manager-marshal packages installed on Certification TestBed

Installation time Load average CPU utilization Network utilization Installation on CERN CE cluster

Achievements System load on CE is decreased by factor of 3 to 5 Jobs start and finish faster (especially visible with lots of short jobs) CE can handle significantly larger number of jobs and different users No interface changes, no need to modify existing software Software can be tuned for CEs with different CPU/disk performance Modified LCG CEs are in LCG/gLite production since April 2008, already installed on many sites Modifications made are actually not LCG-dependant. Software can be used by any Grid infrastructure that relies on Globus 2.x style CE

Thank you!