AFFAIR – fabric monitoring ROOT 2005 Tome Antičić Ruđer Bošković Institute, Zagreb,Croatia ALICE,CERN Tome Antičić Ruđer Bošković.

Slides:



Advertisements
Similar presentations
26/05/2004HEPIX, Edinburgh, May Lemon Web Monitoring Miroslav Šiket CERN IT/FIO
Advertisements

SDN + Storage.
CWG10 Control, Configuration and Monitoring Status and plans for Control, Configuration and Monitoring 16 December 2014 ALICE O 2 Asian Workshop
Online Monitoring and plane checkout Online monitoring is used at the mine for: –sanity checks - “is everything working” –diagnostics - rates, hot/dead.
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
1P. Vande Vyvre - CERN/PH ALICE DAQ Technical Design Report DAQ TDR Task Force Tome ANTICICFranco CARENA Wisla CARENA Ozgur COBANOGLU Ervin DENESRoberto.
O. Stézowski IPN Lyon AGATA Week September 2003 Legnaro Data Analysis – Team #3 ROOT as a framework for AGATA.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/20 New Experiences with the ALICE High Level Trigger Data Transport.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.
Sept TPC readoutupgade meeting, Budapest1 DAQ for new TPC readout Ervin Dénes, Zoltán Fodor KFKI, Research Institute for Particle and Nuclear Physics.
F Fermilab Database Experience in Run II Fermilab Run II Database Requirements Online databases are maintained at each experiment and are critical for.
Control and monitoring of on-line trigger algorithms using a SCADA system Eric van Herwijnen Wednesday 15 th February 2006.
Virtual Memory Tuning   You can improve a server’s performance by optimizing the way the paging file is used   You may want to size the paging file.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
GroundsOpsStaff (Last Updated 9/9/2010) A Grounds Operations and Staffing Computer Application Based on APPA Operational Guidelines for Grounds Management.
Step Arena Storage Introduction. 2 HDD trend- SAS is the future Source: (IDC) Infostor June 2008.
MSS, ALICE week, 21/9/041 A part of ALICE-DAQ for the Forward Detectors University of Athens Physics Department Annie BELOGIANNI, Paraskevi GANOTI, Filimon.
1. Outline 4 functions of a typical operating system of a PC(4) Resource management Operating systems organise how to: Load programs from backing storage.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Hsu Chun-Hung Network Benchmarking Lab
J OINT I NSTITUTE FOR N UCLEAR R ESEARCH OFF-LINE DATA PROCESSING GRID-SYSTEM MODELLING FOR NICA 1 Nechaevskiy A. Dubna, 2012.
Module 7: Fundamentals of Administering Windows Server 2008.
System performance monitoring in the ALICE Data Acquisition System with Zabbix Adriana Telesca October 15 th, 2013 CHEP 2013, Amsterdam.
1 Alice DAQ Configuration DB
The ALICE DAQ: Current Status and Future Challenges P. VANDE VYVRE CERN-EP/AID.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Graphing and statistics with Cacti AfNOG 11, Kigali/Rwanda.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
CMS pixel data quality monitoring Petra Merkel, Purdue University For the CMS Pixel DQM Group Vertex 2008, Sweden.
Chapter 6: Using The Windows Performance and Reliability Monitor.
Planning and status of the Full Dress Rehearsal Latchezar Betev ALICE Offline week, Oct.12, 2007.
Trigger Software Validation Olga Igonkina (U.Oregon), Ricardo Gonçalo (RHUL) TAPM Open Meeting – April 12, 2007 Outline: Reminder of plans Status of infrastructure.
Silberschatz, Galvin and Gagne  Operating System Concepts UNIT II Operating System Services.
News on GEM Readout with the SRS, DATE & AMORE
Online Monitoring for the CDF Run II Experiment T.Arisawa, D.Hirschbuehl, K.Ikado, K.Maeshima, H.Stadie, G.Veramendi, W.Wagner, H.Wenzel, M.Worcester MAR.
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
Florida Institute of Technology, Melbourne, FL
MICE CM28 Oct 2010Jean-Sebastien GraulichSlide 1 Detector DAQ o Achievements Since CM27 o DAQ Upgrade o CAM/DAQ integration o Online Software o Trigger.
Performance Testing Test Complete. Performance testing and its sub categories Performance testing is performed, to determine how fast some aspect of a.
Irina Makhlyueva ALICE DAQ group 28 February 2005 DATE multi-stream recorder.
Monitoring Update David Lawrence, JLab Feb. 20, /20/14Online Monitoring Update -- David Lawrence1.
The Past... DDL in ALICE DAQ The DDL project ( )  Collaboration of CERN, Wigner RCP, and Cerntech Ltd.  The major Hungarian engineering contribution.
R.Divià, CERN/ALICE Challenging the challenge Handling data in the Gigabit/s range.
Hyperion :High Volume Stream Archival Divya Muthukumaran.
Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.
AliRoot survey: Analysis P.Hristov 11/06/2013. Are you involved in analysis activities?(85.1% Yes, 14.9% No) 2 Involved since 4.5±2.4 years Dedicated.
Page 1 Monitoring, Optimization, and Troubleshooting Lecture 10 Hassan Shuja 11/30/2004.
Monitoring for the ALICE O 2 Project 11 February 2016.
R.Divià, CERN/ALICE 1 ALICE off-line week, CERN, 9 September 2002 DAQ-HLT software interface.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
ITMT 1371 – Window 7 Configuration 1 ITMT Windows 7 Configuration Chapter 8 – Managing and Monitoring Windows 7 Performance.
Maintaining and Updating Windows Server 2008 Lesson 8.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
( ) 1 Chapter # 8 How Data is stored DATABASE.
Barthélémy von Haller CERN PH/AID For the ALICE Collaboration The ALICE data quality monitoring system.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
AFFAIR A Flexible Fabric and Application Information Recorder Tome Anticic 1, Ruzica Piskac 2, Vedran Sego 2 For the ALICE collaboration 1) Rudjer Boskovic.
CCS Engineering Tools The tools are used help development and debugging of VLT SW control applications This presentation will provide a general view of.
INFNGRID Monitoring Group report
Diskpool and cloud storage benchmarks used in IT-DSS
LHC experiments Requirements and Concepts ALICE
TPC Commissioning: DAQ, ECS aspects
ITS combined test seen from DAQ and ECS F.Carena, J-C.Marin
Migration Strategies – Business Desktop Deployment (BDD) Overview
Cover page.
ALICE Data Challenges Fons Rademakers Click to add notes.
Chapter 4: Simulation Designs
Presentation transcript:

AFFAIR – fabric monitoring ROOT 2005 Tome Antičić Ruđer Bošković Institute, Zagreb,Croatia ALICE,CERN Tome Antičić Ruđer Bošković Institute, Zagreb,Croatia ALICE,CERN AFFAIR a flexible fabric and application information recorder

AFFAIR – fabric monitoring ROOT 2005 Why/What is AFFAIR? Why/What is AFFAIR? SWITCH PDSPDSPDSPDS 1.25 GB/s Gigabit SWITCH GDC 40 Mb/sec 60 Mb/s Multi Event bufffers Inner tracking system TPCTRD Particle identifcation MuonTrigger detectors Trigger data L0 trigger L1 trigger L2 trigger 1.2 msec 5.5 msec 88.0 msec Trigger system x 50 GDC EDM 1oo Mb/s L3 trigger x x x 334 DDL RORC LDC RORC LDC RORC LDC RORC LDC RORC LDC RORC LDC 216 1oo Mb/s DDL-Detector Data Link RORC-Read-Out Receiver Card LDC-Local Data Concentrator GDC-Global Data Collector EDM-Event Destination Manager PDS-Permanent Data Storage DATE But, Affair is also able to run in stand alone mode (no DATE)

AFFAIR – fabric monitoring ROOT 2005 RequirementsRequirements  Monitor system performance (bandwidth, CPU, disk usage, …)  Monitor DATE performance (LDC/GDC/DDL bandwidth, events recorded,…)  Need down to 10 (or even less) sec updates  Should be as “invisible” as possible  No growing (or better yet none) logfiles on monitored nodes  Not cpu intensive  Not network intensive  Web access to processed, real time data in the form of graphs, histograms,..  Scalable – should work equally well for 10 as for 1000 computers  All monitored data should be permanently stored for offline analysis Has to work, with no lost data, crashes, etc, no maintainance So some choices made, wich may not be optimal, but gets the job done

AFFAIR – fabric monitoring ROOT 2005 AFFAIR structure ~ System collector DATE collector /proc DATE shared memory System collector DATE collector /proc DATE shared memory AFFAIR Monitor data rrd 1 rrd 2 rrd 3 Root program that reads files and creates plots  Round robin excellent way to write/read file fast and easy, with no performance loss  Works with fixed amount of data (fixed time depth), so unchanging size DIMDIM Web/apache/php

AFFAIR – fabric monitoring ROOT 2005 Snapshot plots

AFFAIR – fabric monitoring ROOT 2005 Time dependent plots Full lines average Dashed lines max values Rates (kB/sec) for last 24 hours for some GDC nodes Rates (kB/sec) for last 7 days for some GDC nodes

AFFAIR – fabric monitoring ROOT 2005 Time dependent plots II

AFFAIR – fabric monitoring ROOT 2005 Web interface  Web interface written using php/java script  Completely automatically generated  New variables, monitored sets automatically reflected in plots

AFFAIR – fabric monitoring ROOT 2005 Web interface II

AFFAIR – fabric monitoring ROOT 2005 AFFAIR structure ~ System collector DATE collector /proc LDC/GDC shared memory System collector DATE collector /proc LDC/GDC shared memory AFFAIR Monitor data DIMDIM root 1 root 2 root 3 Root program that reads files and creates plots from root files  As have hundreds asynchronous storage calls every few seconds, have one root file per node Web/apache/php

AFFAIR – fabric monitoring ROOT 2005 Offline analysis  Detailed histograms (aggregate and individual) can now also be created

AFFAIR – fabric monitoring ROOT 2005 ROOT GUI for configuration/operation

AFFAIR – fabric monitoring ROOT 2005 ROOT GUI for monitoring

AFFAIR – fabric monitoring ROOT 2005 Graph configuration  All graphs created using one configuration file  Completely defines units/ labels/ if graphs aggregate / if graphs superimposed  Thus no code intervention needed to create the plots  New monitored variables can be added and configured easily GUI in process But not easy:as far as I am aware, cannot easily add rows of data

AFFAIR – fabric monitoring ROOT 2005 ConclusionConclusion  AFFAIR successfully monitors hundreds of nodes  Field tested in ALICE Data Challenges  ROOT huge part of it  It is a work in progress:  Much more detailed offline analysis  Add feature to see performance data/plots on mobiles/palm pilots  A lot more work on the GUI  Add high/low warnings  …