03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.

Slides:



Advertisements
Similar presentations
ALICE G RID SERVICES IP V 6 READINESS
Advertisements

CWG10 Control, Configuration and Monitoring Status and plans for Control, Configuration and Monitoring 16 December 2014 ALICE O 2 Asian Workshop
MONITORING WITH MONALISA Costin Grigoras. M ONITORING WITH M ON ALISA What is MonALISA ? MonALISA communication architecture Monitoring modules ApMon.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week –
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
Pilots 2.0: DIRAC pilots for all the skies Federico Stagni, A.McNab, C.Luzzi, A.Tsaregorodtsev On behalf of the DIRAC consortium and the LHCb collaboration.
G RID SERVICES IP V 6 READINESS
Online Monitoring with MonALISA Dan Protopopescu Glasgow, UK Dan Protopopescu Glasgow, UK.
Informix IDS Administration with the New Server Studio 4.0 By Lester Knutsen My experience with the beta of Server Studio and the new Informix database.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
1 Ramiro Voicu, Iosif Legrand, Harvey Newman, Artur Barczyk, Costin Grigoras, Ciprian Dobre, Alexandru Costan, Azher Mughal, Sandor Rozsa Monitoring and.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES PhEDEx Monitoring Nicolò Magini CERN IT-ES-VOS For the PhEDEx.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
Monitoring, Accounting and Automated Decision Support for the ALICE Experiment Based on the MonALISA Framework.
1 Iosif Legrand, Harvey Newman, Ramiro Voicu, Costin Grigoras, Catalin Cirstoiu, Ciprian Dobre An Agent Based, Dynamic Service System to Monitor, Control.
May PEM status report. O.Bärring 1 PEM status report Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.
CERN – Alice Offline – Thu, 03 Feb 2005 – Marco MEONI - 1/18 Monitoring of a distributed computing system: the AliEn Grid Alice Offline weekly meeting.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Graphing and statistics with Cacti AfNOG 11, Kigali/Rwanda.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
Site operations Outline Central services VoBox services Monitoring Storage and networking 4/8/20142ALICE-USA Review - Site Operations.
Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.
SAN DIEGO SUPERCOMPUTER CENTER Inca TeraGrid Status Kate Ericson November 2, 2006.
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
Overview of ALICE monitoring Catalin Cirstoiu, Pablo Saiz, Latchezar Betev 23/03/2007 System Analysis Working Group.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Monitoring with MonALISA Costin Grigoras. What is MonALISA ?  Caltech project started in 2002
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
3D Testing and Monitoring Lee Lueking LCG 3D Meeting Sept. 15, 2005.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
AliEn central services Costin Grigoras. Hardware overview  27 machines  Mix of SLC4, SLC5, Ubuntu 8.04, 8.10, 9.04  100 cores  20 KVA UPSs  2 * 1Gbps.
+ AliEn site services and monitoring Miguel Martinez Pedreira.
Update of SAM Implementation ALICE TF Meeting 18/10/07.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting.
October 2006 Iosif Legrand 1 Iosif Legrand California Institute of Technology An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
CERN - IT Department CH-1211 Genève 23 Switzerland CASTOR F2F Monitoring at CERN Miguel Coelho dos Santos.
TF meeting – July 13, 2006 Support for taking actions in MonALISA Costin Grigoras.
MONITORING WITH MONALISA Costin Grigoras. M ON ALISA COMMUNICATION ARCHITECTURE MonALISA software components and the connections between them Data consumers.
1 R. Voicu 1, I. Legrand 1, H. Newman 1 2 C.Grigoras 1 California Institute of Technology 2 CERN CHEP 2010 Taipei, October 21 st, 2010 End to End Storage.
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc Alarm framework requirements Andrea Sciabà Tony Wildish.
1 Grid2003 Monitoring, Metrics, and Grid Cataloging System Leigh GRUNDHOEFER, Robert QUICK, John HICKS (Indiana University) Robert GARDNER, Marco MAMBELLI,
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
MONALISA MONITORING AND CONTROL Costin Grigoras. O UTLINE MonALISA services and clients Usage in ALICE Online SE discovery mechanism Data management 3.
Storage discovery in AliEn
Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
Federating Data in the ALICE Experiment
Daniele Bonacorsi Andrea Sciabà
California Institute of Technology
CMS High Level Trigger Configuration Management
ALICE Monitoring
Evolution of SAM in an enhanced model for monitoring the WLCG grid
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
Conditions Data access using FroNTier Squid cache Server
Storage elements discovery
AliEn central services (structure and operation)
Monitoring of the infrastructure from the VO perspective
Presentation transcript:

03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP Victoria, BC, Canada

03/09/2007http://pcalimonitor.cern.ch/2 Contents Data collection and storage Visualization methods Processes automation Tools Monitoring data analysis Future plans

03/09/2007http://pcalimonitor.cern.ch/3 Data collection and storage Long History DB LCG Tools ApMon AliEn Job Agent ApMon AliEn Job Agent ApMon AliEn Job Agent MonALISA LCG Site ApMon AliEn CE ApMon AliEn SE ApMon Cluster Monitor ApMon AliEn TQ ApMon AliEn Job Agent ApMon AliEn Job Agent ApMon AliEn Job Agent ApMon AliEn CE ApMon AliEn SE ApMon Cluster Monitor ApMon AliEn IS ApMon AliEn Optimizers ApMon AliEn Brokers ApMon MySQL Servers ApMon CastorGrid Scripts ApMon API Services MonaLisaRepository Aggregated Data rss vsz cpu time run time job slots free space nr. of files open files Queued JobAgents cpu ksi2k job status disk used processes load net In/out jobs status sockets migrated mbytes active sessions MyProxy status Alerts Actions

03/09/2007http://pcalimonitor.cern.ch/4 Data collection and storage MonALISA services gather ~300K unique parameters with a rate of 250Hz Out of these ~40K (raw and derived) time series are stored in the repository DB with a rate of 30Hz New series can be defined on the fly, changes to the collection filters are applied right away without any service restart The DB is now 150GB (1.5G data points) We use the following archival schema for old data: 2 minutes bins for the last 2 months 30 minutes bins for the last 6 months 2.5 hours bins for more (almost 2 years already) On average users are calling dynamic charts every 2-5 seconds In these conditions the load on the repository machine is negligible ( )

03/09/2007http://pcalimonitor.cern.ch/5 Visualization methods Various type of charts, with different detail levels System overview as the global map General interest widgets in all the pages General purpose charts, based on a simple configuration file: history as points, areas or bars, pie charts, bar charts, spider charts etc Specialized pages Daily/weekly/monthly reports

03/09/2007http://pcalimonitor.cern.ch/6 Visualization methods

03/09/2007http://pcalimonitor.cern.ch/7 Vizualisation methods

03/09/2007http://pcalimonitor.cern.ch/8 Vizualisation methods

03/09/2007http://pcalimonitor.cern.ch/9 Vizualisation methods

03/09/2007http://pcalimonitor.cern.ch/10 Process automation The monitoring information is used by an automatic decision taking framework to: Submit new jobs (by watching the queue parameters) Restart site services (whenever the VoBox-level monitoring finds out that a service is not accessible + the central services are ok) Send notifications when the problem didn’t go away after an automatic restart Dynamically modify the DNS aliases of the central services for an efficient load balancing Most of the actions are defined in plain text configuration files, making the system easily and dynamically tunable to fit the ever changing needs

03/09/2007http://pcalimonitor.cern.ch/11 Tools Anybody can subscribe to be notified by or through RSS feeds in case of problems with various components of the system: central/site services, storages, proxies, general annoucements and so on: A Firefox toolbar helps to quickly spot current issues: Certificate-based administrative interface helps the Grid managers with day-to-day operations (site services management, production jobs, software packages, pledged resources tracking etc)

03/09/2007http://pcalimonitor.cern.ch/12 Monitoring data analysis Until recently users were restricted to use only predefined charts Now we have implemented a completely customizable interface through which users can define their own charts: Evolution in time for some parameters Values histograms Scatter plots (for correlating 2 time series) Possibility to define derivate series on the fly (sum / difference / average of primary series)

03/09/2007http://pcalimonitor.cern.ch/13 Monitoring data analysis

03/09/2007http://pcalimonitor.cern.ch/14 Future plans Increase the detail level for user jobs More flexibility in defining custom charts Add other sources of events to which users can subscribe to (eg. SAM tests) Storage management (pinning / staging of collections) We are opened to suggestions, so please let us know what you would like to see!