WLCG infrastructure monitoring proposal Pablo Saiz IT/SDC/MI 16 th August 2013.

Slides:



Advertisements
Similar presentations
DMTF Cloud Standards Cloud Management & OVF Update to ITU-T SG13.
Advertisements

WLCG Operations and Tools TEG Monitoring – Experiment Perspective Simone Campana and Pepe Flix Operations TEG Workshop, 23 January 2012.
WLCG Monitoring Consolidation NEC`2013, Varna Julia Andreeva CERN IT-SDC.
LHCbPR V2 Sasha Mazurov, Amine Ben Hammou, Ben Couturier 5th LHCb Computing Workshop
Kelly Davis Architecture of GAT Kelly Davis AEI-MPG.
Using TOSCA Requirements /Capabilities Monitoring Use Case (Primer Considerations) Proposal by CA Technologies, IBM, SAP, Vnomic.
1 of 5 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
WSS 3.0 Architecture and Enhancements Ashvini Shahane Member – Synergetics Research Lab.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Input from CMS Nicolò Magini Andrea Sciabà IT/SDC 5 July 2013.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Overview Scale out architecture Servers, services, and topology in Central Administration.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES PhEDEx Monitoring Nicolò Magini CERN IT-ES-VOS For the PhEDEx.
Processing of the WLCG monitoring data using NoSQL J. Andreeva, A. Beche, S. Belov, I. Dzhunov, I. Kadochnikov, E. Karavakis, P. Saiz, J. Schovancova,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
VO-Ganglia Grid Simulator Catalin Dumitrescu, Mike Wilde, Ian Foster Computer Science Department The University of Chicago.
Towards a Global Service Registry for the World-Wide LHC Computing Grid Maria ALANDES, Laurence FIELD, Alessandro DI GIROLAMO CERN IT Department CHEP 2013.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Stuart Kenny and Stephen Childs Trinity.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Automatic Resource & Usage Monitoring Steve Traylen/Flavia Donno CERN/IT.
XROOTD AND FEDERATED STORAGE MONITORING CURRENT STATUS AND ISSUES A.Petrosyan, D.Oleynik, J.Andreeva Creating federated data stores for the LHC CC-IN2P3,
Adxstudio Portals Training
Accounting Update John Gordon and Stuart Pullinger January 2014 GDB.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Julia Andreeva on behalf of the MND section MND review.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
Conclusions on Monitoring CERN A. Read ADC Monitoring1.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
PDS4 Demonstration Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
Network integration with PanDA Artem Petrosyan PanDA UTA,
VOX Project Status T. Levshina. 5/7/2003LCG SEC meetings2 Goals, team and collaborators Purpose: To facilitate the remote participation of US based physicists.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
Excel Services Displays all or parts of interactive Excel worksheets in the browser –Excel “publish” feature with optional parameters defined in worksheet.
Modern Development Technologies in SharePoint SHAREPOINT SATURDAY OMAHA APRIL, 2016.
SharePoint Fest 2013 Chicago What’s New and Exciting (and not so great) in SharePoint Designer 2013 Workflows Ira Fuchs – SharePoint Technical Specialist,
UPV-IBM’S BIG DATA OBSERVATORY & HADOOP INFRASTRUCTURE MANAGEMENT Damian Segrelles, Germán Moltó & Ignacio Blanquer,
StratusLab is co-funded by the European Community’s Seventh Framework Programme (Capacities) Grant Agreement INFSO-RI Demonstration StratusLab First.
XRootD Monitoring Report A.Beche D.Giordano. Outlines  Talk 1: XRootD Monitoring Dashboard  Context  Dataflow and deployment model  Database: storage.
Co-ordination & Harmonisation of Advanced e-Infrastructures for Research and Education Data Sharing Research Infrastructures Grant Agreement n
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Regionalisation summary Prague 1.
WLCG Accounting Task Force Update Julia Andreeva CERN GDB, 8 th of June,
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc Alarm framework requirements Andrea Sciabà Tony Wildish.
CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Present and Future Pedro Andrade (CERN IT) 31 st August.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI First Ops Tools Long Term Sustainability F2F David Collados 1First Ops Tools.
WLCG Transfers monitoring EGI Technical Forum Madrid, 17 September 2013 Pablo Saiz on behalf of the Dashboard Team CERN IT/SDC.
Accounting Review Summary and action list from the (pre)GDB Julia Andreeva CERN-IT WLCG MB 19th April
IT Monitoring Service Status and Progress 1 Alberto AIMAR, IT-CM-MM.
Daniele Bonacorsi Andrea Sciabà
Monitoring Evolution and IPv6
Key Activities. MND sections
POW MND section.
Evolution of SAM in an enhanced model for monitoring the WLCG grid
FTS Monitoring Ricardo Rocha
Artem Petrosyan (JINR), Danila Oleynik (JINR), Julia Andreeva (CERN)
Advancements in Availability and Reliability computation Introduction and current status of the Comp Reports mini project C. Kanellopoulos GRNET.
Monitoring Of XRootD Federation
Monitoring of the infrastructure from the VO perspective
Agile testing for web API with Postman
Presentation transcript:

WLCG infrastructure monitoring proposal Pablo Saiz IT/SDC/MI 16 th August 2013

16 August 2013 Infrastructure monitoring P. Saiz 2 Table of contents I.Summary of the progress II.Desired structure of applications III.Proposal for infrastructure monitoring

I.Summary 16 August 2013 Infrastructure monitoring P. Saiz 3

16 August 2013 Infrastructure monitoring P. Saiz 4 Motivation  Reduction on number of people  Redefining scope of applications  Combining expertise  Step out and evaluate other alternatives  Goal:  Offer (at least) same QoS with less resources

16 August 2013 Infrastructure monitoring P. Saiz 5 Status so far  WLCG monitoring consolidation group created  Applications supported by the section Applications supported by the section  Applications used Applications used  … so now we know what to provide

16 August 2013 Infrastructure monitoring P. Saiz 6 How to provide it  Visualization  Documentation  Deployment  Recurrent tasks  Input from our experience  Input from other groups  What is available out there  Split in different areas of work  Source of Information  Transport  Storage  Aggregation  Review of the areas Review of the areas

II.Structure of applications 16 August 2013 Infrastructure monitoring P. Saiz 7

16 August 2013 Infrastructure monitoring P. Saiz 8 Different layers of applications Collect information Transport Storage Visualize Aggregate Recurrent Tasks Documentation Deployment

Collect information Transport Storage Visualize Aggregate Recurrent Tasks Documentation Deployment 16 August 2013 Infrastructure monitoring P. Saiz 9 Deployment  Using openstack, puppet, hiera, foreman  Quota of 100 nodes, 240 cores  Multiple templates already created  Development machine (7 nodes)  Web servers ( SSB, SUM, WLCG transfers, Job: 16 nodes )  Elastic Search (6 nodes), Hadoop (4 nodes)  Currently working on nagios installation  Migrating machines from quattor to AI  Koji and Bamboo for build system and continuous integration Deployment

Collect information Transport Storage Visualize Aggregate Recurrent Tasks Documentation Deployment 16 August 2013 Infrastructure monitoring P. Saiz 10 Source of information  Gather info from external, internal sources.  Publish it in the transport layer Collect information Nagios GOCDB REBUS OIM Savannah Other app

Collect information Transport Storage Visualize Aggregate Recurrent Tasks Documentation Deployment 16 August 2013 Infrastructure monitoring P. Saiz 11 Transport  Message Broker  Local files  HTTP PUT/GET  UDP  (table in DB)? Transport

Collect information Transport Storage Visualize Aggregate Recurrent Tasks Documentation Deployment 16 August 2013 Infrastructure monitoring P. Saiz 12 Storage Archival Current Metrics Meta data Meta data Accepts any data #jobs, status of a service, downtime, pledges, channel status Metric, Instance, Time Range, Value Archival Long term data (Same format as Metric Storage)? Current Metrics Most common views Metadata Profiles Topology

Collect information Transport Storage Visualize Aggregate Recurrent Tasks Documentation Deployment 16 August 2013 Infrastructure monitoring P. Saiz 13 Aggregation  Treated as another metric  Might collect input from previous metrics  Current schema of ‘CMS Site readiness’ Summary Site readiness Availability Aggregate

Collect information Transport Storage Visualize Aggregate Recurrent Tasks Documentation Deployment 16 August 2013 Infrastructure monitoring P. Saiz 14 Visualize Visualization Server: HTML skeleton REST API with JSON data Cache: memcache, varnish Client Common library + plugin jQuery Common MVC No obvious choice… Plots (Interactive, Exportable, Embeddable) Highcharts

III.Infrastructure monitoring 16 August 2013 Infrastructure monitoring P. Saiz 15

16 August 2013 Infrastructure monitoring P. Saiz 16 Current situation  Big system, difficult to maintain/evolve  Many internal dependencies  Multiple schemas, aggregations:  SSB, MRS, ACE  Scope much bigger than what we need  Limit to WLCG  Usage of probes Usage of probes  Does not test what the experiments are doing!  Non-trivial deployment of new tests  Based on technologies available at the time of the design  New requests from experiments:  Test whatever they want  Availability vs Usability  Combine Dashboard/SAM apps

Infrastructure monitoring 16 August 2013 Infrastructure monitoring P. Saiz 17 Collect information Transport Storage Visualize Aggregate Recurrent Tasks Documentation Deployment NagiosPledge DownPilot HC VO feed MyWLCG SSB SUM Trend Report ACE POEM Archival Metrics

And for the prototype… 16 August 2013 Infrastructure monitoring P. Saiz 18 Collect information Transport Storage Visualize Aggregate Recurrent Tasks Documentation Deployment NagiosPledge DownDirect HC VO feed MyWLCG SSB SUM Trend Report ACE POEM Archival Metrics SSB Storage  Records status changes  Same procedure as any other metric New Data Processed Data consume2db SSB format Simplified MRS  Accepts any data  No foreign keys!  No status calculation  300K messages per day All the data in storage have the same format:  Instance, Metric, Time range, Value  Source could be nagios, pilot framework, VO-defined metrics, availabilities

16 August 2013 Infrastructure monitoring P. Saiz 19 And now we can see metrics… 14 August 2013 Infrastructure monitoring P. Saiz 19

16 August 2013 Infrastructure monitoring P. Saiz 20 Aggregation  Combination of ACE +SSB Virtual Columns  Two types:  Horizontal: Ins 1 (M 1 …M n )  Ins 1 (M p )  Vertical: M 1 (Ins 1 …Ins n )  Ins p (M 2 )  Initial options for “and”, “or” of current status  Later on, might be extended to ‘sliding window’  Full description Full description

16 August 2013 Infrastructure monitoring P. Saiz 21 Examples of aggregation ATLAS_CRITICAL WN Site (expand this column) ATLAS_CRITICAL WN Site (expand this column)

Summary 16 August 2013 Infrastructure monitoring P. Saiz 22  Lots of progress towards unified schema  Data can be published from different sources  Nagios, VO-defined metrics, ACE, (HC, Job Pilots)  Single schema for storage  Components talk to each other through API  Getting close to a “proof of concept”  Aggregation needs some work  Visualization might need adjusting  Other tasks can go in parallel  NoSQL evaluation  Nagios configuration  Only active metrics