Local Monitoring at SARA Ron Trompert SARA. Ganglia Monitors nodes for Load Memory usage Network activity Disk usage Monitors running jobs.

Slides:



Advertisements
Similar presentations
NAGIOS AND CACTI NETWORK MANAGEMENT AND MONITORING SYSTEMS.
Advertisements

Role Based VO Authorization Services Ian Fisk Gabriele Carcassi July 20, 2005.
Setting up and configuring BCO EE (BPA) Linux Console How I Learned to Stop Worrying and Love BCO EE Dima Seliverstov 3/3/2014.
ManageEngine TM Applications Manager 8 Monitoring Custom Applications.
Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.
NGOP J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle and Streams Diagnostics and Monitoring Eva Dafonte Pérez Florbela Tique Aires.
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
Module 15: Monitoring. Overview Formulate requirements and identify resources to monitor in a database environment Types of monitoring that can be carried.
Thomas Finnern Evaluation of a new Grid Engine Monitoring and Reporting Setup.
5 Copyright © 2008, Oracle. All rights reserved. Configuring the Oracle Network Environment.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
August 28, 1998New features in PATROL 3.01 New features in PATROL version 3 Michael Jung (TU-Berlin), Waltraut Niepraschk (DESY) System overview Patrol.
© Ericsson Interception Management Systems, 2000 CELLNET Drop Administering IMS Module Objectives Manage the directory structure and files Manage.
OSG Public Storage and iRODS
System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
TELE 301 Lecture 10: Scheduled … 1 Overview Last Lecture –Post installation This Lecture –Scheduled tasks and log management Next Lecture –DNS –Readings:
workshop eugene, oregon What is network management? System & Service monitoring  Reachability, availability Resource measurement/monitoring.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
LCG Middleware Testing in 2005 and Future Plans E.Slabospitskaya, IHEP, Russia CERN-Russia Joint Working Group on LHC Computing March, 6, 2006.
Learningcomputer.com SQL Server 2008 – Administration, Maintenance and Job Automation.
Performance Guard 4.1 Presentation Toll Free Dial In Number:(877) Int'l Access/Caller Dial In Number:(601) PARTICIPANT CODE: Doug.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Distributed monitoring system. Why Monitor? Solve them! Identify Problems Ensure conduct Requirements Manage many computers Spot trends in the system.
A Brief Documentation.  Provides basic information about connection, server, and client.
1 Implementing Monitoring and Reporting. 2 Why Should Implement Monitoring? One of the biggest complaints we hear about firewall products from almost.
BNL DDM Status Report Hironori Ito Brookhaven National Laboratory.
Optimisation of Grid Enabled Storage at Small Sites Jamie K. Ferguson University of Glasgow – Jamie K. Ferguson – University.
Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
SRM Monitoring 12 th April 2007 Mirco Ciriello INFN-Pisa.
CCNA4 v3 Module 6 v3 CCNA 4 Module 6 JEOPARDY K. Martin.
Youngil Kim Awalin Sopan Sonia Ng Zeng.  Introduction  Concept of the Project  System architecture  Implementation – HDFS  Implementation – System.
Silberschatz, Galvin and Gagne  Operating System Concepts UNIT II Operating System Services.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
ITGS Network Architecture. ITGS Network architecture –The way computers are logically organized on a network, and the role each takes. Client/server network.
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang.
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
SRM-2 Road Map and CASTOR Certification Shaun de Witt 3/3/08.
Monitoring with InfluxDB & Grafana
EGEE is a project funded by the European Union under contract INFSO-RI Grid accounting with GridICE Sergio Fantinel, INFN LNL/PD LCG Workshop November.
LCG Storage Workshop “Service Challenge 2 Review” James Casey, IT-GD, CERN CERN, 5th April 2005.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Mario Reale – GARR NetJobs: Network Monitoring Using Grid Jobs.
Tivoli Provisioning Manager V5.1 FP1 © 2006 IBM Corporation L2 GO Training Local TCA Install Przemyslaw Maciolek
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
EGEE-II INFSO-RI Enabling Grids for E-sciencE WLCG File Transfer Service Sophie Lemaitre – Gavin Mccance Joint EGEE and OSG Workshop.
DB Questions and Answers open session (comments during session) WLCG Collaboration Workshop, CERN Geneva, 24 of April 2008.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
SQL Database Management
Quick Look on dCache Monitoring at FNAL
TYPES OF SERVER. TYPES OF SERVER What is a server.
By - Ricardo Sanchez, Ken Wolters and William Hibbard
Initial job submission and monitoring efforts with JClarens
Presentation transcript:

Local Monitoring at SARA Ron Trompert SARA

Ganglia Monitors nodes for Load Memory usage Network activity Disk usage Monitors running jobs

Ganglia

Monitoring the storage infrastructure

Monitoring the storage infrastructure: Argus Network monitoring Cern router (ping) SARA-r1 router (ping) Cern fts machines (ping) Monitoring services gridftp doors  checks service listening on port 2811 srmdoor  checks service listening on port 8443 gsidcapdoor  checks service listening on port Rfiod (cxfs/dmf clients)  checks service listening on port 5001

Monitoring the storage infrastructure: Argus Checks the services/network at user- specified intervals Sends when user-specified error conditions occur Standard build-in tests Tcp/udp http,snmp,imap, sql databases Possibility to sound an alarm Possibility to invoke services rather just do a tcp check Supports graphing

Monitoring the storage infrastructure: Argus It’s free Easy to configure No SL rpms  Needs some perl modules and other programs not in SL.

Monitoring the storage infrastructure: Argus

Monitoring Throughput Monitoring disk-disk throughput with home- grown python scripts and the dCache billing files. Information is extracted from billing files every hour and put in a postgres db  Was written before dcache logged the billing data in postgres Scripts running in the cron every hour extract info from db and use gnuplot to generate the graphs Requires Gnuplot development release and a patch for histogram support

Monitoring throughput Host-based and VO-based graphs Overall and per VO and per host Graphs of throughput, number of transfers and amount of data moved

Monitoring throughput Monitoring disk-tape throughput with home grown python scripts and DMF’s dmdlogs Same operation as disk-disk

FTS Monitoring Settings ands agent status Scripts uses services.xml file to get info about myproxy information and ChannelManagement service Run in the cron every 5 minutes Scripts renew proxy when needed Scripts get info about channel settings with the glite-transfer-channel-list command Does not require access to the oracle db. Able to monitor other people’s FTS.

FTS Monitoring

Transfers and jobs Scripts retrieve information from oracle db of transfers and jobs of the past 24 hours  At the time of writing the glite-transfer-* I/F did not support getting the necessary information about jobs and transfers. Displays statistics. Displays information about individual jobs and transfers  Status  Source  Destination  Reason of the failure, if applicable

FTS Monitoring Statistics

FTS Monitoring Transfers

FTS Monitoring Transfers

FTS Monitoring Future plan Combine job and transfer output so you have information about individual transfers, VOs and channels in one view

tatransfer