University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 1/30 Monitoring of a distrubuted computing system: the Grid Master Degree – 19/12/2005 Marco MEONI
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 2/30 Content I.Grid Concepts and Grid Monitoring II.MonALISA Adaptations and Extensions III.PDC04 Monitoring and Results IV.Conclusions and Outlooks
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 3/30 Section I Grid Concepts and Grid Monitoring
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 4/30 ALICE experiment at CERN LHC 1) Heavy Nuclei and proton-proton colliding 2) Secondary particles are produced in the collision 3) These particles are recorded by the ALICE detector 4) Particle properties (trajectories, momentum, type) are reconstructed by the AliRoot software 5) ALICE physicists analyse the the data and search for physics signals of interest
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 5/30 Grid Computing Grid Computing definition - coordinated use of large sets of heterogenous, geographically distributed resources to allow high-performance computation The AliEn system - pull rather than push architecture: the scheduling service does not need to know the status of all resources in the Grid – the resources advertise themselves; - robust and fault tolerant, where resources can come and go at any point in time; - interfaces to other Grid flavours allowing for rapid expansion of the size of the computing resources, transparently for the end user.
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 6/30 R-GMA: an example of implementation Jini (Sun): provides the technical basis Grid Monitoring Producer Consumer Registry Transfer Data Store location Lookup location GMA Architecture
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 7/30 MonALISA framework Distributed monitoring service system using JINI/JAVA and WSDL/SOAP technologies Each MonALISA server acts as a dynamic service system and provides the functionality to be discovered and used by any other services or clients that require such information
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 8/30 Section II MonALISA Adaptations and Extensions
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 9/30 Farms monitoring MonALISA Adaptations User Java class to interface MonALISA and bash script to monitor the site A Web Repository as a front-end for production monitoring Stores history view of the monitored data Displays the data in variety of predefined histograms and other visualisation formats Simple interfaces to user code: custom consumers, configuration modules, user-defined charts, distributions MonALISA Agent WNs CE Monitoring script Java interface class Monitored data User codeMonALISA frameworkGrid resources
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 10/30 Repository Setup 1.Packages installation (Tomcat, MySQL) 2.Configuration of main servlets for ALICE VO 3.Setup of scripts for startup/shutdown/backup A Web Repository as a front-end for monitoring Keeps full history of monitored data Shows data in a moltitude of histograms Added new presentation formats to provide a full set (gauges, distributions) Simple interfaces to user code: custom consumers, custom tasks Installation and Maintenance All the produced plots have been built and customized as from as many configuration files SQL, parameters, colors, type cumulative or averaged behaviour smooth, fluctuations user time intervals …many others
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 11/30 Repository Added a Java thread (DirectInsert) to feed directly the Repository, without passing by the MonALISA agents Ad hoc java thread Jobs information TOMCAT JSP/servlets AliEn Jobs Monitoring Centralized or distributed? AliEn native APIs to retrieve job status snapshots Job is submitted >1h >3h (Error_I) (Error_A) (Error_S) (Error_E) (Error_R) (Error_V, VT, VN) (Error_SV)
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 12/30 7+ Gb of performance information, 24.5M records During DC data from ~2K monitored parameters arrive every 2/3 mins Data Collecting: alimonitor.cern.ch aliweb01.cern.ch Online Replication Data Replication: MASTER DB REPLICA DB MonALISA Agents Repository Web Services AliEn API LCG Interface WNs monitoring (UDP) Web Repository ROOT CARROT Grid AnalysisData collecting and Grid Monitoring 1min Averaging process 10 min100 min 60 bins for each basic information FIFO Repository DataBase(s)
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 13/30 Storage and monitoring tools of the Data Challenge running parameters, task completion and resource status Web Repository
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 14/30 Visualisation Formats Stacked Bars Statistics and real-time tabulated CE Load factors and tasks completion Menù Snapshots and Pie charts Running history
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 15/30 SourceCategoryNumberExamples AliEn APICE load factors63Run load, queue load SE occupancy62Used space, free space, number of files Job information557Job status: running, saving, done, failures SoapCERN Network traffic29Size of traffic, number of files LCGCPU – Jobs48Free CPUs, jobs running and waiting ML services on MQJob summary34Job status: running, saving, done, failures AliEn parameters15DB load, Perl processes ML servicesSites info1060Paging, threads, I/O, processes Job execution efficiencySuccessfuly done jobs / all submitted jobs System efficiencyError (CE) free jobs / all submitted jobs AliRoot efficiencyError (AliROOT) free jobs / all submitted jobs Resource efficiencyRunning (queued) jobs / max_running (queued) Monitored parameters Derived classes k parameters and 24,5M records with 1 minute granularity Analysis of the collected data allows for improvement of the Grid performance
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 16/30 MonALISA Extensions Job monitoring of Grid users Application Monitoring (ApMon) at WNs Repository Web Services Using AliEn commands ( ps –a, jobinfo #jobid, ps –X -st ) + output parsing Jobs JDL scanning Results presented in the same web front end Alternative to ApMon for WEB repository purposes - dont need MonALISA agents - store data directly into the DB repository Used to monitor Network Traffic through the ftp servers of ALICE at CERN ApMon is a set of flexible APIs that can be used by any application to send monitoring information to MonALISA services, via UDP datagrams Allows for data aggregation and scaling of the monitoring system Developed a light monitoring C++ class to include within the Process Monitor payload
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 17/30 MonALISA Extensions Distributions for principle of Analysis First attempt for a Grid performance tuning, based on real monitored data Use of ROOT and Carrot features Cache system to optimize the requests MonALISA Repository ROOT histogram server process (central cache) ROOT/Carrot histogram clients 1. ask for histogram 2. query NEW data 3. send NEW data 4. send resulting object/file HTTP A p a c h e
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 18/30 Section III PDC04 Monitoring and Results
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 19/30 PDC04 Purpose: test and validate the ALICE Offline computing model: –Produce and analyse ~10% of the data sample collected in a standard data-taking year –Use the complete set of off-line software: AliEn, AliROOT, LCG, Proof and, in Phase 3, the ARDA user analysis prototype Structure: logically divided in three phases: 1.Phase 1 - Production of underlying Pb+Pb events with different centralities (impact parameters) + production of p+p events 2.Phase 2 - Mixing of signal events with different physics content into the underlying Pb+Pb events 3.Phase 3 – Distributed analysis
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 20/30 Storage Task - simulate the data flow in reverse: events are produced at remote centres and stored in the CERN MSS Master job submission, Job Optimizer, RB, File catalogue, processes control, SE… Central servers CEs Sub-jobs Job processing AliEn-LCG interface Sub-jobs RB Job processing CEs CERN CASTOR: disk servers, tape Output files LCG is one AliEn CE PDC04 Phase 1
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 21/30 Start 10/03, end 29/05 (58 days active) Maximum jobs running in parallel: 1450 Average during active period: 430 Total number of jobs running in parallel 18 computing centres participating Total CPU profile Aiming for continuous running, not always possible due to resources constraints
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 22/30 Efficiency Successfully done jobs all submitted jobs Error (CE) free jobs all submitted jobs Error (AliROOT) free jobs all submitted jobs Calculation principle: jobs are submitted only once
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 23/30 Phase 1 of PDC04 Statistics Number of jobs Job duration h (cent1), 5h (peripheral 1), 2.5h (peripheral 2-5) Files per job Number of entries in AliEn FC Number of files in CERN MSS M 1.3M File size26TB Total CPU work LCG CPU work 285MSI-2k hours 67MSI-2k hours
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 24/30 Master job submission, Job Optimizer (N sub-jobs), RB, File catalogue, processes monitoring and control, SE… Central servers CEs Sub-jobs Job processing AliEn-LCG interface Sub-jobs RB Job processing CEs Storage CERN CASTOR: underlying events Local SEs CERN CASTOR: backup copy Storage Primary copy Local SEs Output files Underlying event input files zip archive of output files Register in AliEn FC: LCG SE: LCG LFN = AliEn PFN edg(lcg) copy®ister File catalogue PDC04 Phase 2 Task - simulate the event reconstruction and remote event storage
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 25/30 Start 01/07, end 26/09 (88 days active) As in the 1st phase, general equilibrium in CPU contribution AliEn direct control: 17 CEs, each with a SE CERN-LCG is encompassing the LCG resources worldwide (also with local/close SEs) Individual sites: CPU contribution
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 26/30 Sites occupancy Outside CERN, sites such as Bari, Catania and JINR have generally run always at the maximum capacity
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 27/30 Phase 2: Statistics and Failures Number of jobs Job duration h/job Conditions62 Number of events15.2M Number of files in AliEn FC Number of files in storage 9M 4.5M distributed at 20 CEs world-wide Storage at CERN MSS Storage at remote CEs 30TB 10TB Network transfer200TB from CERN to remote CEs Total CPU work750MSI-2k hours SubmissionCE local scheduler not responding1% Loading input dataRemote SE not responding3% During executionJob aborted (insufficient WN memory or AliRoot problems) Job cannot start (missing application software directory) Job killed by CE local scheduler (too long) WN or global CE malfunction (all jobs on a given site are lost) 10% Saving output dataLocal SE not responding2%
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 28/30 File Catalogue query CE and SE processing User job (many events ) Data set (ESDs, other) Job Optimizer Sub-job 1Sub-job 2Sub-job n CE and SE processing CE and SE processing Job Broker Grouped by SE files location Submit to CE with closest SE Output file 1Output file 2Output file n File merging job Job output PDC04 Phase 3 Task – user data analysis
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 29/30 Analysis Distribution of number of running jobs - mainly depends on number of waiting jobs in TQ and availability of free CPU at the remote CEs Occupancy versus the number of queued jobs - there is an increase of the occupancy as more jobs are waiting in the local batch queue and a saturation is reached at around 60 queued jobs Start September 2004, end January 2005 Distributions charts built on top of ROOT environment using the Carrot web interface
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 30/30 Section IV Conclusions and Outlook
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 31/30 Lessons from PDC04 User jobs have been running for 9 months using AliEn MonALISA has provided a flexible and complete monitoring framework successfully adapted to the needs of Data Challenge MonALISA has given the expected results for performance tuning and workload balancing Approach step by step: from resources tuning to resources optimization MonALISA has been able to gather, store, plot, sort and group large variety of monitored parameters, either basic or derived in a rich set of presentation formats The Repository has been the only source of historical information and the modular architecture has made possible a development of variety of custom modules (~800 lines of fundamental source code and ~3k lines to perform service tasks) PDC04 has been a real example of successful Grid interoperability by interfacing AliEn and LCG and proving the AliEn design scalability The usage of MonALISA in ALICE has been documented in an article for a conference at Computing in High Energy and Nuclear Physics (CHEP) 04, Interlaken - Switzerland Unprecedented experience to develop and improve a monitoring framework on top of a real functioning Grid, massively testing the involved software technologies Easy to extend the framework and replace components with equivalent ones following the technical needs or strategic choices
University of Florence – Mon, 19 Dec 2005 – Marco MEONI - 32/30 Credits Dott. F.Carminati, L.Betev, P.Buncic and all colleagues in ALICE for the enthusiasm they trasmitted during this work MonALISA team collaborative anytime I needed