Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.

Slides:



Advertisements
Similar presentations
TCP Monitor and Auto Tuner. Need Analysis Enable monitoring of TCP Connections Enable maximum bandwidth utilization No such utility available in MONALISA.
Advertisements

ALICE G RID SERVICES IP V 6 READINESS
CWG10 Control, Configuration and Monitoring Status and plans for Control, Configuration and Monitoring 16 December 2014 ALICE O 2 Asian Workshop
May 2005 Iosif Legrand 1 Iosif Legrand California Institute of Technology May 2005 An Agent Based, Dynamic Service System to Monitor, Control and Optimize.
MONITORING WITH MONALISA Costin Grigoras. M ONITORING WITH M ON ALISA What is MonALISA ? MonALISA communication architecture Monitoring modules ApMon.
1 Virtual Machine Resource Monitoring and Networking of Virtual Machines Ananth I. Sundararaj Department of Computer Science Northwestern University July.
June 2003 Iosif Legrand MONitoring Agents using a Large Integrated Services Architecture Iosif Legrand California Institute of Technology.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
October 2003 Iosif Legrand Iosif Legrand California Institute of Technology.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle and Streams Diagnostics and Monitoring Eva Dafonte Pérez Florbela Tique Aires.
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
ALICE DATA ACCESS MODEL Outline ALICE data access model - PtP Network Workshop 2  ALICE data model  Some figures.
September 2005 Iosif Legrand 1 End User Agents: extending the "intelligence" to the edge in Distributed Service Systems Iosif Legrand California Institute.
Experience of xrootd monitoring for ALICE at RDIG sites G.S. Shabratova JINR A.K. Zarochentsev SPbSU.
ALICE data access WLCG data WG revival 4 October 2013.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
Online Monitoring with MonALISA Dan Protopopescu Glasgow, UK Dan Protopopescu Glasgow, UK.
ACAT 2003 Iosif Legrand Iosif Legrand California Institute of Technology.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
1 Ramiro Voicu, Iosif Legrand, Harvey Newman, Artur Barczyk, Costin Grigoras, Ciprian Dobre, Alexandru Costan, Azher Mughal, Sandor Rozsa Monitoring and.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
Monitoring, Accounting and Automated Decision Support for the ALICE Experiment Based on the MonALISA Framework.
February 2006 Iosif Legrand 1 Iosif Legrand California Institute of Technology February 2006 February 2006 An Agent Based, Dynamic Service System to Monitor,
1 Iosif Legrand, Harvey Newman, Ramiro Voicu, Costin Grigoras, Catalin Cirstoiu, Ciprian Dobre An Agent Based, Dynamic Service System to Monitor, Control.
PROOF Cluster Management in ALICE Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
N EWS OF M ON ALISA SITE MONITORING
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
October, 2000.A Self Organsing NN for Job Scheduling in Distributed Systems I.C. Legrand1 Iosif C. Legrand CALTECH.
Site operations Outline Central services VoBox services Monitoring Storage and networking 4/8/20142ALICE-USA Review - Site Operations.
CMS pixel data quality monitoring Petra Merkel, Purdue University For the CMS Pixel DQM Group Vertex 2008, Sweden.
Overview of ALICE monitoring Catalin Cirstoiu, Pablo Saiz, Latchezar Betev 23/03/2007 System Analysis Working Group.
Monitoring with MonALISA Costin Grigoras. What is MonALISA ?  Caltech project started in 2002
Copyright 2007, Information Builders. Slide 1 Machine Sizing and Scalability Mark Nesson, Vashti Ragoonath June 2008.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
April 2003 Iosif Legrand MONitoring Agents using a Large Integrated Services Architecture Iosif Legrand California Institute of Technology.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
AliEn central services Costin Grigoras. Hardware overview  27 machines  Mix of SLC4, SLC5, Ubuntu 8.04, 8.10, 9.04  100 cores  20 KVA UPSs  2 * 1Gbps.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. Overview of Ethernet Networking A Rev /31/2011.
ALICE DATA ACCESS MODEL Outline 05/13/2014 ALICE Data Access Model 2  ALICE data access model  Infrastructure and SE monitoring.
October 2006 Iosif Legrand 1 Iosif Legrand California Institute of Technology An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed.
© 2015 Pittsburgh Supercomputing Center Opening the Black Box Using Web10G to Uncover the Hidden Side of TCP CC PI Meeting Austin, TX September 29, 2015.
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
Monitoring for the ALICE O 2 Project 11 February 2016.
WLCG Transfers Dashboard A unified monitoring tool for heterogeneous data transfers. Alexandre Beche.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
Analysis efficiency Andrei Gheata ALICE offline week 03 October 2012.
1 R. Voicu 1, I. Legrand 1, H. Newman 1 2 C.Grigoras 1 California Institute of Technology 2 CERN CHEP 2010 Taipei, October 21 st, 2010 End to End Storage.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
1 Grid2003 Monitoring, Metrics, and Grid Cataloging System Leigh GRUNDHOEFER, Robert QUICK, John HICKS (Indiana University) Robert GARDNER, Marco MAMBELLI,
WLCG Transfers monitoring EGI Technical Forum Madrid, 17 September 2013 Pablo Saiz on behalf of the Dashboard Team CERN IT/SDC.
Storage discovery in AliEn
Federating Data in the ALICE Experiment
ALICE internal and external network
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
California Institute of Technology
ALICE Monitoring
Open Source distributed document DB for an enterprise
Update on Plan for KISTI-GSDC
Experiment Dashboard overviw of the applications
Grid Computing.
Storage elements discovery
AliEn central services (structure and operation)
Ákos Frohner EGEE'08 September 2008
Publishing ALICE data & CVMFS infrastructure monitoring
Monitoring of the infrastructure from the VO perspective
Initial job submission and monitoring efforts with JClarens
SLAC monitoring Web Services
Presentation transcript:

Xrootd Monitoring and Control Harsh Arora CERN

Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module

Collecting Xrootd Parameters Monalisa Service External Storage Xrootd System/ Network Info Sends monitoring data Control Xrootd Services

Proposed Architecture to Monitoring & Control the Xrootd in ALICE Long History DB MonALISA Network Monitoring ApMon XROOTD SERVER System Monitoring MonaLisa Xrootd Repository Aggregated Data Alerts Actions Iosif Legrand June 2012 Control Module Control / Update MonALISA Network Monitoring ApMon XROOTD SERVER System Monitoring Control Module MonALISA Network Monitoring ApMon XROOTD SERVER System Monitoring Control Module

  Run a MonALISA service on each host running an XROOTD server ( a new dedicated group )   Control the xrootd server   Start / Stop / Update   Dynamically Change conf parameters   Collect Monitoring information from the xrootd servers using the new & improved monitoring functionality from xrootd   Perform full network measurements and tests (RTT, available bandwidth, topology )   Monitor the storage system used by xrootd servers   Create a dedicated MonALISA repository 5 Iosif Legrand June 2012 Proposed Architecture to Monitoring & Control the Xrootd in ALICE

Active Available Bandwidth measurements between all the ALICE grid sites (2) Iosif Legrand October Iosif Legrand June 2012 ALICE measurements between all VO Boxes Important to have similar Data for Xrootd servers

7 Advantages of Using a dedicated MonALISA group to monitor and control all the XROOTD servers  Easy to maintain and update a critical service for Offline computing  Significantly improve the monitoring information and will help to better understand the way storage system are used in the ALICE Computing model (Distributed Storage for Data)  Control the xrootd servers and can dynamically configure systems based on how they are used.  Monitor the true network connectivity  Monitor the real storage used by xrootd servers  Monitor the connection and transfer per client / job  Can really help in developing optimization procedures for complex data flows Iosif Legrand June 2012

% of CPU Used by Monalisa

Parameters Collection and Verification

% of time spent by CPU in User Mode Monalisa Host Monitoring Reported by Xrootd

Parameters Open_files Srv_rd_mbytes Srv_wr_files

Parameters(Cont.) Srv_rd_filesSrv_wr_mbytes Space_free

Average Load over last 15 min. Load15

Ambiguous Parameters  Connected Clients  Number of Jobs Requiring a Thread

Dedicated MLRepository

Detailed Monitoring per Client  Monitoring per Client  Patch to xrootd-3.1.0

Client Monitoring Example Number of Bytes

Additional Information provided by monitoring each client from the xrootd server  Global Map of Data Transfers  Debug individual clients  Understand better the effect of long RTT

Preparing Correlations Plots  Bad xrootd Server Detection  High CPU Usr with Low Disk Throughput  -ve correlation b/w Traffic and Disk IOUtil(%)  High Disk IOUtil(%) with Less Read Write Rate – Probably Damaged Disk  High Load Value with Less number of clients

Monalisa on Each Server  Local CPU Monitoring Info  Understanding Network Topology  Traffic Information  Monitoring directly the storage system  Control

Future Plan

Control via Monalisa  Start/Stop xrootd server  Upgrade

Problems  Latest version of xrootd  Incompatible with xrootd-ftsofs-1.1.0

Thank You