CCR GRID 2010 (Catania) Daniele Gregori, Stefano Antonelli, Donato De Girolamo, Luca dell’Agnello, Andrea Ferraro, Guido Guizzunti, Pierpaolo Ricci, Felice.

Slides:



Advertisements
Similar presentations
NAGIOS AND CACTI NETWORK MANAGEMENT AND MONITORING SYSTEMS.
Advertisements

Cut Costs and Increase Productivity in your IT Organization with Effective Computer and Network Monitoring. Copyright © T3 Software Builders, Inc 2004.
ActiveXperts Network Monitor Monitors servers, workstations and devices for availability Alerts and corrects.
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
June 2010 At A Glance The Room Alert Adapter software in conjunction with AVTECH Room Alert™ devices assists in monitoring computer room environments as.
MONITORING TOOLS Open Source Security Tools to monitor your network.
Advanced Workgroup System. Printer Admin Utility Monitors printers over IP networks Views Sharp and non-Sharp SNMP Devices Provided Standard with Sharp.
Keeping Tabs on Your Network First, a Horror Story Types of Management Tools What is SNMP? Dartmouth’s Net Management InterMapper demo Questions Rich Brown.
Monitoring a Large-Scale Network: Selecting the Right Tool Sayadur Rahman United International University & Network Manager, Financial Service.
Institute of Computer Science AGH Performance Monitoring of Java Web Service-based Applications Włodzimierz Funika, Piotr Handzlik Lechosław Trębacz Institute.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Client – Server Architecture A Basic Introduction Kathleen R. Murray, Ph.D. May 2002.
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
TUTORIAL # 2 INFORMATION SECURITY 493. LAB # 4 (ROUTING TABLE & FIREWALLS) Routing tables is an electronic table (file) or database type object It is.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Hsu Chun-Hung Network Benchmarking Lab
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Federico Ruggieri INFN-CNAF GDB Meeting 10 February 2004 INFN TIER1 Status.
Client – Server Architecture. Client Server Architecture A network architecture in which each computer or process on the network is either a client or.
Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.
Overview of day-to-day operations Suzanne Poulat.
Installation and Development Tools National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The SEASR project and its.
INFN-GRID Testbed Monitoring System Roberto Barbera Paolo Lo Re Giuseppe Sava Gennaro Tortone.
A monitoring tool for a GRID operation center Sergio Andreozzi (INFN CNAF), Sergio Fantinel (INFN Padova), David Rebatto (INFN Milano), Gennaro Tortone.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Performance Monitoring of SLAC Blackbox Nodes Using Perl, Nagios, and Ganglia Roxanne Martinez Mentor: Yemi Adesanya United States Department of Energy.
Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.
CCNA4 v3 Module 6 v3 CCNA 4 Module 6 JEOPARDY K. Martin.
HLRmon accounting portal DGAS (Distributed Grid Accounting System) sensors collect accounting information at site level. Site data are sent to site or.
Master thesis Analysis and implementation of monitoring systems of active network equipment. Scientific advisor: Univ. Prof., Dr. Hab., Pavel TOPALA Master.
Management of the LHCb Online Network Based on SCADA System Guoming Liu * †, Niko Neufeld † * University of Ferrara, Italy † CERN, Geneva, Switzerland.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CF Monitoring: Lemon, LAS, SLS I.Fedorko(IT/CF) IT-Monitoring.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Client – Server Architecture A Basic Introduction 1.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
A Service-Based SLA Model HEPIX -- CERN May 6, 2008 Tony Chan -- BNL.
+ Support multiple virtual environment for Grid computing Dr. Lizhe Wang.
INRNE's participation in LCG Elena Puncheva Preslav Konstantinov IT Department.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
DataTAG is a project funded by the European Union International School on Grid Computing, 23 Jul 2003 – n o 1 GridICE The eyes of the grid PART I. Introduction.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CC Monitoring I.Fedorko on behalf of CF/ASI 18/02/2011 Overview.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
IT-INFN-CNAF Status Update LHC-OPN Meeting INFN CNAF, December 2009 Stefano Zani 10/11/2009Stefano Zani INFN CNAF (TIER1 Staff)1.
Daniele Cesini - INFN CNAF. INFN-CNAF 20 maggio 2014 CNAF 2 CNAF hosts the Italian Tier1 computing centre for the LHC experiments ATLAS, CMS, ALICE and.
Retele de senzori EEMon Electrical Energy Monitoring System.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Queensland University of Technology Nagios – an Open Source monitoring solution and it’s deployment at QUT.
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th A proposal for distributed computing monitoring for SuperB G.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc Alarm framework requirements Andrea Sciabà Tony Wildish.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
Servizi core INFN Grid presso il CNAF: setup attuale
Daniele Bonacorsi Andrea Sciabà
Dynamic Extension of the INFN Tier-1 on external resources
BEST CLOUD COMPUTING PLATFORM Skype : mukesh.k.bansal.
INFN CNAF TIER1 Network Service
INFNGRID Monitoring Group report
Brief overview on GridICE and Ticketing System
LHC-OPN Meeting Janet (London), 8-9 March 2010
EDT-WP4 monitoring group status report
Monitoring of the infrastructure from the VO perspective
Presentation transcript:

CCR GRID 2010 (Catania) Daniele Gregori, Stefano Antonelli, Donato De Girolamo, Luca dell’Agnello, Andrea Ferraro, Guido Guizzunti, Pierpaolo Ricci, Felice Rosso, Vladimir Sapunenko, Riccardo Veraldi, Paolo Veronesi, Cristina Vistoli, Giulia Vita Finzi, Stefano Zani INFN CNAF Monitor and Control system The Italian National Institute of Nuclear Physics (INFN) hosts the Tier1 for LHC Experiments at CNAF located in Bologna. CNAF’s TIER1 is the main INFN computing facility in Italy. It provides computing and storage resources to LHC experiments. Due to the huge complexity of a Tier-1 center, the use of control systems is fundamental for the management and the operation of the center.At INFN-CNAF, numerous solutions have been adopted, from commercial to open source products till fully INFN-CNAF developed systems. Adopted open source solutions have been strongly adapted to specific needs; in particular, for monitoring systems like LeMon, MRTG, sFlow analyzer and for alarm system NAGIOS. A monitoring system, Red Eye, has been developed for the Farming division to allow a suitable control of the worker nodes without overloading the cpu of the server machine. Finally, a dashboard has been developed, to whom described control systems send critical alarms (sent via sms to an operator as well). The dashboard can be exploited to get an historical view of the Tier-1 and national services' state and to allow a quick web control. DASHBOARD Dashboard is a tool which summarizes and updates information about the state of all the INFN CNAF divisions (Infrastructure, Network, Farming, Storage, Grid Operation and National Services), useful for people on shift. Each division has defined critical situations related to machines or services exploiting different systems below defined. These information are sent to a MySQL database interfaced to a web page via Python and PHP scripts. Web page shows one box for each division and three colors tied to three states: green=OK orange=WARNING red=CRITICAL Time evolution of each service can be accessed from a 'search' form in the web page. For Dashboard interface, a Python based gateway middleware named “dashgw” has been developed to interact with each commercial monitoring tool able to send only s with a user-definable format. DASHBOARD Dashboard is a tool which summarizes and updates information about the state of all the INFN CNAF divisions (Infrastructure, Network, Farming, Storage, Grid Operation and National Services), useful for people on shift. Each division has defined critical situations related to machines or services exploiting different systems below defined. These information are sent to a MySQL database interfaced to a web page via Python and PHP scripts. Web page shows one box for each division and three colors tied to three states: green=OK orange=WARNING red=CRITICAL Time evolution of each service can be accessed from a 'search' form in the web page. For Dashboard interface, a Python based gateway middleware named “dashgw” has been developed to interact with each commercial monitoring tool able to send only s with a user-definable format. RED EYE Red Eye is an INFN CNAF developed software useful for monitoring, accounting, analyzing, reporting, alarming and self-acting used by the Farming division. It uses a pull technology with one single server needed and up to about 1000 computers monitored. Every five minutes Red Eye monitors data and takes one of the following actions: report the value on an autorefresh web Page and if there is an allarm condition send an , send a SMS, report to INFN Tier-1 Dashboard. Red Eye analyzes syslog messages (all Worker Nodes (WN) and farm servers log all messages on a single server) taking actions in case of errors and sending a notification. For drawing plots GNUPlot is used. RED EYE Red Eye is an INFN CNAF developed software useful for monitoring, accounting, analyzing, reporting, alarming and self-acting used by the Farming division. It uses a pull technology with one single server needed and up to about 1000 computers monitored. Every five minutes Red Eye monitors data and takes one of the following actions: report the value on an autorefresh web Page and if there is an allarm condition send an , send a SMS, report to INFN Tier-1 Dashboard. Red Eye analyzes syslog messages (all Worker Nodes (WN) and farm servers log all messages on a single server) taking actions in case of errors and sending a notification. For drawing plots GNUPlot is used. Clicking on the violet icon the pop up will appear showing the network status Clicking on the violet icon the pop up will appear showing the network status Clicking on the blue icon the pop up will appear showing the job submission status via Red Eye Clicking on the blue icon the pop up will appear showing the job submission status via Red Eye NAGIOS Clicking on the warning or critical message in the Storage, National, Network or GRID boxes the related Nagios page will appear. Nagios is the "de facto" open source industrial standard for monitor and control system. It is used by some of the INFN Tier-1 divisions: Storage, Network and Grid Operation. It comes with default plugins and particular plugins for local needs have been developed (General Parallel File System (GPFS), Storage Resource Manager (SRM), Tivoli Storage Manager (TSM)). At INFN Tier-1, advanced feature of Nagios, called cluster service, has been exploited to control state of services defining downgrade of the service if any of the servers is unavailable but the service is still guaranteed and critical state if the service is unavailable at all. It is configured to send alarm messages (warning or critical) via mail, sms and web interface to the dashboard. NAGIOS Clicking on the warning or critical message in the Storage, National, Network or GRID boxes the related Nagios page will appear. Nagios is the "de facto" open source industrial standard for monitor and control system. It is used by some of the INFN Tier-1 divisions: Storage, Network and Grid Operation. It comes with default plugins and particular plugins for local needs have been developed (General Parallel File System (GPFS), Storage Resource Manager (SRM), Tivoli Storage Manager (TSM)). At INFN Tier-1, advanced feature of Nagios, called cluster service, has been exploited to control state of services defining downgrade of the service if any of the servers is unavailable but the service is still guaranteed and critical state if the service is unavailable at all. It is configured to send alarm messages (warning or critical) via mail, sms and web interface to the dashboard. Monitoring system at INFN CNAF is based on a finite state machine (FSM). It is configured in two layers: a low level layer based on customized scripts under Nagios, Lemon, MRTG, sFlow, Red Eye and a commercial software (named TAC) and an high level based on a customized dashboard. LEMON, MRTG, sFlow Lemon (LHC Era Monitoring) is a CERN developed open source software used at INFN Tier-1 for monitoring hosts or groups of hosts (clusters). It's main components are a client, a server and a web interface. To monitor a particular metric you have to write a so called sensor; an agent on each monitored node gets metric informations at regular time steps from sensors and sends them to the server, in our case Oracle based. Informations are stored as rrd (Round Robin Database) files which are exploited by the web interface for visualization of plots. Beside default sensors, at INFN Tier-1 we have developed sensors for particular requests. MRTG is a tool which uses SNMP to read the traffic counters of any SNMP-Managed network device and draw pretty pictures showing how much traffic has passed through each interface. sFlow® is an industry standard technology for monitoring high speed switched networks. It gives complete visibility into the use of networks in terms of hosts conversation or network flows. In other words it answers at question:"Who and how is using the available bandwidth of my network links?”. LEMON, MRTG, sFlow Lemon (LHC Era Monitoring) is a CERN developed open source software used at INFN Tier-1 for monitoring hosts or groups of hosts (clusters). It's main components are a client, a server and a web interface. To monitor a particular metric you have to write a so called sensor; an agent on each monitored node gets metric informations at regular time steps from sensors and sends them to the server, in our case Oracle based. Informations are stored as rrd (Round Robin Database) files which are exploited by the web interface for visualization of plots. Beside default sensors, at INFN Tier-1 we have developed sensors for particular requests. MRTG is a tool which uses SNMP to read the traffic counters of any SNMP-Managed network device and draw pretty pictures showing how much traffic has passed through each interface. sFlow® is an industry standard technology for monitoring high speed switched networks. It gives complete visibility into the use of networks in terms of hosts conversation or network flows. In other words it answers at question:"Who and how is using the available bandwidth of my network links?”. TAC Clicking on the warning or critical message in the infrastructure box the TAC page will appear showing all sensors state.. A set of dedicated software tools (Business Development Manager) tackles the cooling and electrical power issues of a high density datacenter as the Tier1. Such tools were implemented and customized for the Tier1 datacenter requirements imposed to provide redundant cooling and power to the hosted IT equipment. They allow to monitor all components involved and alert faults to the dashboard. The consumption data are used to optimize the efficiency and reduce costs. TAC Clicking on the warning or critical message in the infrastructure box the TAC page will appear showing all sensors state.. A set of dedicated software tools (Business Development Manager) tackles the cooling and electrical power issues of a high density datacenter as the Tier1. Such tools were implemented and customized for the Tier1 datacenter requirements imposed to provide redundant cooling and power to the hosted IT equipment. They allow to monitor all components involved and alert faults to the dashboard. The consumption data are used to optimize the efficiency and reduce costs.