GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 1 Fabric monitoring for LCG-1 in the CERN Computer Center Jan van Eldik CERN-IT/FIO/SM 7 th GridPP.

Slides:



Advertisements
Similar presentations
© Copyright 2007 Exempler Telecom Test Automation System Exempler - We pride ourselves with providing lightweight robust engineering solutions.
Advertisements

GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Fabric and Storage Management GridPP Fabric and Storage Management GridPP 24/24 May 2001.
CERN – BT – 01/07/ Cern Fabric Management -Hardware and State Bill Tomlin GridPP 7 th Collaboration Meeting June/July 2003.
B A B AR and the GRID Roger Barlow for Fergus Wilson GridPP 13 5 th July 2005, Durham.
Fabric Management at CERN BT July 16 th 2002 CERN.ch.
26/05/2004HEPIX, Edinburgh, May Lemon Web Monitoring Miroslav Šiket CERN IT/FIO
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Torsten Antoni – LCG Operations Workshop, CERN 02-04/11/04 Global Grid User Support - GGUS -
CCTracker Presented by Dinesh Sarode Leaf : Bill Tomlin IT/FIO URL
© SICOM AS - Printed © SICOM AS - Printed Key to reducing interface costs Increases the operators ability to implement.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
HetnetIP Ethernet BackHaul Configuration Automation Demo.
19/06/2002WP4 Workshop - CERN WP4 - Monitoring Progress report
The Premier Software Usage Analysis and Reporting Toolset CELUG Presentation – May 12, 2010 LT-Live : License Tracker’s License Server Monitor.
DataGrid is a project funded by the European Union CHEP 2003 – March 2003 – Towards automation of computing fabrics... – n° 1 Towards automation.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
Institute of Computer Science AGH Performance Monitoring of Java Web Service-based Applications Włodzimierz Funika, Piotr Handzlik Lechosław Trębacz Institute.
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
Performance and Exception Monitoring Project Tim Smith CERN/IT.
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, LCG Software Activities in India Rajesh K. Computer Division BARC.
7/2/2003Supervision & Monitoring section1 Supervision & Monitoring Organization and work plan Olof Bärring.
1 Linux in the Computer Center at CERN Zeuthen Thorsten Kleinwort CERN-IT.
Olof Bärring – WP4 summary- 6/3/ n° 1 Partner Logo WP4 report Status, issues and plans
Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
Partner Logo DataGRID WP4 - Fabric Management Status HEPiX 2002, Catania / IT, , Jan Iven Role and.
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2
May PEM status report. O.Bärring 1 PEM status report Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Fabric Infrastructure LCG Review November 18 th 2003 CERN.ch.
20-May-2003HEPiX Amsterdam EDG Fabric Management on Solaris G. Cancio Melia, L. Cons, Ph. Defert, I. Reguero, J. Pelegrin, P. Poznanski, C. Ungil Presented.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Spring 2003 EPICS Collaboration Controls Group CZAR 2.0 (in development) Christopher A. Larrieu Chris Slominski.
Maite Barroso – WP4 Barcelona – 13/05/ n° 1 -WP4 Barcelona- Closure Maite Barroso 13/05/2003
Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.
Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP
May http://cern.ch/hep-proj-grid-fabric1 EU DataGrid WP4 Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2 [Including slides prepared by Lex Holt.]
Lemon Monitoring Presented by Bill Tomlin CERN-IT/FIO/FD WLCG-OSG-EGEE Operations Workshop CERN, June 2006.
EU 2nd Year Review – Feb – WP4 demo – n° 1 WP4 demonstration Fabric Monitoring and Fault Tolerance Sylvain Chapeland Lord Hess.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
Online System Status LHCb Week Beat Jost / Cern 9 June 2015.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
April 2003 Iosif Legrand MONitoring Agents using a Large Integrated Services Architecture Iosif Legrand California Institute of Technology.
23.March 2004Bernd Panzer-Steindel, CERN/IT1 LCG Workshop Computing Fabric.
The CERN Computer Centre Supervision Project Helge Meinhard / CERN-IT HEPiX Catania 2002/04/18
3D Testing and Monitoring Lee Lueking LCG 3D Meeting Sept. 15, 2005.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CF Monitoring: Lemon, LAS, SLS I.Fedorko(IT/CF) IT-Monitoring.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
CERN - IT Department CH-1211 Genève 23 Switzerland CASTOR F2F Monitoring at CERN Miguel Coelho dos Santos.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CC Monitoring I.Fedorko on behalf of CF/ASI 18/02/2011 Overview.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
Partner Logo Olof Bärring, WP4 workshop 10/12/ n° 1 (My) Vision of where we are going WP4 workshop, 10/12/2002 Olof Bärring.
Lemon Computer Monitoring at CERN Miroslav Siket, German Cancio, David Front, Maciej Stepniewski Presented by Harry Renshall CERN-IT/FIO-FS.
OSCAR Symposium – Quebec City, Canada – June 2008 Proposal for Modifications to the OSCAR Architecture to Address Challenges in Distributed System Management.
Hepix EDG Fabric Monitoring tutorial – n° 1 Introduction to EDG Fabric Monitoring Sylvain Chapeland.
WP4 meeting Heidelberg - Sept 26, 2003 Jan van Eldik - CERN IT/FIO
System Monitoring with Lemon
Monitoring and Fault Tolerance
Status of Fabric Management at CERN
LEMON – Monitoring in the CERN Computer Centre
Miroslav Siket, Dennis Waldron
Towards automation of computing fabrics using tools from the fabric management workpackage of the EU DataGrid project Maite Barroso Lopez (WP4)
Database Services for CERN Deployment and Monitoring
Presentation transcript:

GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 1 Fabric monitoring for LCG-1 in the CERN Computer Center Jan van Eldik CERN-IT/FIO/SM 7 th GridPP Collaboration meeting July 1, 2003

GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 2 Outline Fabric monitoring developments at CERN Architectural overview Deployment: status & plans for LCG-1 Outlook

GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 3 Fabric Monitoring at CERN Improved fabric management is key part of LCG programme EDG WP4 develops tools for automated installation, configuration, fabric monitoring, fault tolerance IT/FIO Supervision & Monitoring section: develop and deploy a monitoring solution for LHC-era A lot of expertise: EDG WP4 monitoring developments, PVSS Scada studies, SNMP studies, operator alarm displays, … Architecture based on functional requirements gathered by PEM project Important objective: fabric monitoring for LCG-1 at Cern

GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 4 Requirements and architecture Measurement Repository Monitored nodes Sensor Monitoring Sensor Agent Cache Consumer Local Consumer Sensor Consumer Global Consumer Database Both for performance and exception monitoring Local and global consumers Scalable, extensible, robust

GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 5 EDG WP4 implementation Measurement Repository (MR) Monitored nodes Sensor Monitoring Sensor Agent (MSA) Cache Consumer Local Consumer Sensor Consumer Global Consumer Monitoring Sensor Agent Calls plug-in sensors to sample configured metrics Stores all collected data in a local disk buffer Sends the collected data to the global repository Plug-in sensors Programs/scripts that implements a simple sensor- agent ASCII text protocol A C++ interface class is provided on top of the text protocol to facilitate implementation of new sensors The local cache Assures data is collected also when node cannot connect to network Allows for node autonomy for local repairs Transport Transport is pluggable. Two protocols over UDP and TCP are currently supported where only the latter can guarantee the delivery Measurement Repository The data is stored in a database A memory cache guarantees fast access to most recent data, which is normally what is used for fault tolerance correlations Database Repository API SOAP RPC Query history data Subscription to new data Database Proprietary flat-file database Oracle Open source interface to be developed

GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 6 Deployment status in Cern CC MSA with sensors for performance and exception monitoring, measuring quantities per box Deployed on ~1500 RedHat Linux nodes 30 clusters, with specific configuration files Batch1000 nodes Interactive70 nodes Disk server200 nodes Tape server80 nodes WWW, DB, MISC200 nodes

GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 7 Status of exception monitoring ~50 possible alarms per monitored node HighLoad, DaemonDead, FileSysFull, install / config problems Operator alarm displays –PVSS-based, developed as part of PVSS-tests –WP4 alarm display under active development

GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 8 PVSS operator alarm display

GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 9 WP4 operator alarm display

GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 10 Performance monitoring WP4 Measurement Repository with Oracle backend is currently being deployed in the CERN CC for LCG-1 Data access –C-API to the repository is available, Perl and Java implementations to be done –Simple CLI is being delivered –GUI is being delivered

GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 11 Anamon

GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 12 Open issues Current solution is still very node-centric Not much experience with consumers No correlations engines, no corrective actions yet… Integration with configuration system to be done

GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 13 Summary and Outlook Fabric monitoring infrastructure for LCG-1 at Cern is being deployed Monitoring Sensor Agent has been operating very well Measurement Repository will now be challenged Consumers can start consuming… An interesting 6 months period await us!