Publishing ALICE data & CVMFS infrastructure monitoring

Slides:



Advertisements
Similar presentations
TCP Monitor and Auto Tuner. Need Analysis Enable monitoring of TCP Connections Enable maximum bandwidth utilization No such utility available in MONALISA.
Advertisements

Server 2012 R2 Essentials - What’s new ? Bart #techninebe Technine Group.
ALICE G RID SERVICES IP V 6 READINESS
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 11 Managing and Monitoring a Windows Server 2008 Network.
ALICE DATA ACCESS MODEL Outline ALICE data access model - PtP Network Workshop 2  ALICE data model  Some figures.
G RID SERVICES IP V 6 READINESS
ALICE data access WLCG data WG revival 4 October 2013.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
What’s New in Fireware v11.9.5
Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.
Xrootd Monitoring for the CMS Experiment Abstract: During spring and summer 2011 CMS deployed Xrootd front- end servers on all US T1 and T2 sites. This.
N EWS OF M ON ALISA SITE MONITORING
A Brief Documentation.  Provides basic information about connection, server, and client.
Maintaining and Updating Windows Server Monitoring Windows Server It is important to monitor your Server system to make sure it is running smoothly.
Site operations Outline Central services VoBox services Monitoring Storage and networking 4/8/20142ALICE-USA Review - Site Operations.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,
Thank you. Harel Ben Attia Senior Software Engineer River A data workflow management system.
Overview of ALICE monitoring Catalin Cirstoiu, Pablo Saiz, Latchezar Betev 23/03/2007 System Analysis Working Group.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Monitoring with MonALISA Costin Grigoras. What is MonALISA ?  Caltech project started in 2002
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
EGEE is a project funded by the European Union under contract IST VO box: Experiment requirements and LCG prototype Operations.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
JAliEn Java AliEn middleware A. Grigoras, C. Grigoras, M. Pedreira P Saiz, S. Schreiner ALICE Offline Week – June 2013.
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
AliEn central services Costin Grigoras. Hardware overview  27 machines  Mix of SLC4, SLC5, Ubuntu 8.04, 8.10, 9.04  100 cores  20 KVA UPSs  2 * 1Gbps.
ALICE DATA ACCESS MODEL Outline 05/13/2014 ALICE Data Access Model 2  ALICE data access model  Infrastructure and SE monitoring.
+ AliEn site services and monitoring Miguel Martinez Pedreira.
Update of SAM Implementation ALICE TF Meeting 18/10/07.
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
RSV and Nagios in OSG Rob Quick. March 11, 2008 USCMS Tier-2 Workshop 2 Current State of OSG ~ 100 Sites ~ 30 VOs April 8th:  216,000 jobs (85% successful)
MCSA Windows Server 2012 Pass Upgrading Your Skills to MCSA Windows Server 2012 Exam By The Help Of Exams4Sure Get Complete File From
ALICE WLCG operations report Maarten Litmaath CERN IT-SDC ALICE T1-T2 Workshop Torino Feb 23, 2015 v1.2.
Storage discovery in AliEn
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
HPDC Grid Monitoring Workshop June 25, 2007 Grid monitoring from the VO/user perspectives Shava Smallen.
Lessons learned administering a larger setup for LHCb
Federating Data in the ALICE Experiment
Jean-Philippe Baud, IT-GD, CERN November 2007
WLCG IPv6 deployment strategy
Troubleshooting Tools
Virtualization and Clouds ATLAS position
California Institute of Technology
Boots Cassel Villanova University
ALICE Monitoring
Open Source distributed document DB for an enterprise
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
Grid status ALICE Offline week Nov 3, Maarten Litmaath CERN-IT v1.0
Security Monitoring in a Nagios world
A Messaging Infrastructure for WLCG
Torrent-based software distribution
Analysis Operations Monitoring Requirements Stefano Belforte
Storage elements discovery
Grid status ALICE Offline week March 30, Maarten Litmaath CERN-IT v1.1
AliEn central services (structure and operation)
Monitoring Session Intro
John Gordon (STFC) APEL PT Leader
Offline monitoring, shifter dashboard
VCE Dumps
Monitoring of the infrastructure from the VO perspective
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Simplified Development Toolkit
Process Monitoring and Control Systems
Backup Monitoring – EMC NetWorker
Network Monitoring System
Presentation transcript:

Publishing ALICE data & CVMFS infrastructure monitoring Costin.Grigoras@cern.ch

Publishing ALICE data VoBox services health AliEn services running status CE, ClusterMonitor, CMreport Proxy status and time left The certificate used to start AliEn services Delegated proxy, proxy server, proxy of the machine Storage Element test results ADD and GET results

Publishing details dashb-test-mb.cern.ch:6162 Persistent SSL connection client certificate authentication Using ActiveMQ Java library ver. 5.9.1 activemq-client and activemq-stomp JARs Running as a thread in the central MonALISA repository for ALICE Currently pushing 640 values every 30 minutes

Message structure Headers: Body: nagios_host=alimonitor.cern.ch persistent=true destination=/topic/sam.alice.metric {"mlServiceName": "CNAF", "hostName": "ui01-alice.cr.cnaf.infn.it", "serviceFlavour": "AliEn-VoBox-Test", "siteName": "CNAF", "metricStatus": "OK", "metricName": "Proxy of the machine", "summaryData": "Proxy is ok", "gatheredAt": "ui01-alice.cr.cnaf.infn.it", "timestamp": "2014-06-04T15:38:53Z", "voName": "alice", "detailsData": "Time left: 20:51"}

CVMFS infrastructure monitoring proposal Now a critical service, for not only ALICE Currently missing information about the performance of the Stratum 0/1 and the local site proxies Some bits of information in various places, like availability of Stratum 0, awstats ... Not enough to assess whether the services performance is OK Some sites are alerted for failures by the users (tasks failing)

To address that Deploy a monitoring service on each server of the infrastructure Full host monitoring (CPU, memory, network IO, disk IO performance, sockets and processes in each state) CVMFS and Squid-specific probes (catalogue version, request counters, size) Real time access to the parameters plus Alarms, history of all parameters, simple display options Trivial now to integrate in dashboard

Additionally MonALISA services also perform the network topology discovery out of the box This would help with the automatic configuration of local site proxies Similar algorithm as for the automatic SE selection for ALICE jobs