Monitoring Evolution and IPv6

Slides:



Advertisements
Similar presentations
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
Advertisements

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Input from CMS Nicolò Magini Andrea Sciabà IT/SDC 5 July 2013.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
WLCG infrastructure monitoring proposal Pablo Saiz IT/SDC/MI 16 th August 2013.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
Streaming Analytics with Spark 1 Magnoni Luca IT-CM-MM 09/02/16EBI - CERN meeting.
MND section. Summary of activities Job monitoring In collaboration with GridView and LB teams enabled full chain from LB harvester via MSG to Dashboard.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Present and Future Pedro Andrade (CERN IT) 31 st August.
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
WLCG Transfers monitoring EGI Technical Forum Madrid, 17 September 2013 Pablo Saiz on behalf of the Dashboard Team CERN IT/SDC.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
WLCG Operations Coordination Andrea Sciabà IT/SDC GDB 11 th September 2013.
Accounting Review Summary and action list from the (pre)GDB Julia Andreeva CERN-IT WLCG MB 19th April
Monitoring Evolution 1 Alberto AIMAR, IT-CM-MM. Outline Mandate Data Centres Monitoring Experiments Dashboards Architecture Plans Status Demo 2.
IT Monitoring Service Status and Progress 1 Alberto AIMAR, IT-CM-MM.
HPDC Grid Monitoring Workshop June 25, 2007 Grid monitoring from the VO/user perspectives Shava Smallen.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc News from the CMS computing and offline monitoring.
Kibana, Grafana and Zeppelin on Monitoring data
Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.
Experiments Monitoring Plans and Progress
Daniele Bonacorsi Andrea Sciabà
Connected Infrastructure
WLCG IPv6 deployment strategy
Jacek Otwinowski (Data Preparation Group)
WLCG Workshop 2017 [Manchester] Operations Session Summary
James Casey, CERN IT-GD WLCG Workshop 1st September, 2007
Update on CERN IT Unified Monitoring Architecture (UMA)
NGI and Site Nagios Monitoring
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
Data Analytics and CERN IT Hadoop Service
Hadoop and Analytics at CERN IT
Evolution of tools for WLCG operations Julia Andreeva, CERN IT
Key Activities. MND sections
ALICE Monitoring
POW MND section.
CWG10 Control, Configuration and Monitoring
FTS Monitoring Ricardo Rocha
New monitoring applications in the dashboard
Experiment Dashboard overviw of the applications
New Big Data Solutions and Opportunities for DB Workloads
Savannah to Jira Migration
IT Monitoring Service Status and Progress
Update from the HEPiX IPv6 WG
Connected Infrastructure
Short update on the latest gLite status
A Messaging Infrastructure for WLCG
ALICE – FAIR Offline Meeting KVI (Groningen), 3-4 May 2010
Monitoring Of XRootD Federation
Solutions for federated services management EGI
Monitoring of the infrastructure from the VO perspective
Data Analytics – Use Cases, Platforms, Services
Danilo Dongiovanni INFN-CNAF
Presentation transcript:

Monitoring Evolution and IPv6 Alberto AIMAR, IT-CM-MM

Outline Context Data Centres Monitoring Experiments Dashboards Architecture Plans Status Demo

Monitoring Data Centre Monitoring Experiment Dashboards Monitoring of DC at CERN and Wigner Hardware, operating system, and services Data Centres equipment (PDUs, temperature sensors, etc.) Used by service providers in IT, experiments Experiment Dashboards Sites availability, data transfers, job information, reports Used by WLCG, experiments, sites and users Both hosted by CERN IT, in different teams

Context Focus for 2016 Regroup monitoring activities hosted by CERN/IT (Data Centres, Experiment Dashboards, ETF, HammerCloud, etc) Continue existing services Uniform with CERN IT practices Management of services, communication, tools (e.g. GGUS and SNOW tickets) Starting with Merge Data Centres and Experiment Dashboards monitoring technologies Review existing monitoring usage and needs (IT, WLCG, etc) Investigate new technologies Unchanged support while collecting user feedback

Unified Monitoring Architecture Data Sources Storage/Search Transport Views Data Centers Processing WLCG Data kafka

Experiment Dashboards Operation Teams Sites Analysis + Production Real time and Accounting views Data transfer Data access Site Status Board SAM3 Google Earth Dashboard Sites Data Management Monitoring General Public Users Outreach Job Monitoring Infrastructure Monitoring Operation Teams Operation Teams Experiment Dashboard covers the full-range of experiments’ computing activities Provides information to different categories of users Sites Sites 300-500 users per day

Experiment Dashboards Job monitoring, sites availability, data management and transfers Used by experiments operation teams, sites, users, WLCG

Processing & Aggregation Current Monitoring Data Sources z Transport Storage &Search Display & Reports Data Centres Monitoring Metrics Manager Flume HDFS Kibana Lemon Agent AMQ ElasticSearch Jupyter XSLS Kafka Oracle Zeppelin ATLAS Rucio ElasticSearch Data mgmt and transfers Flume FTS Servers HDFS AMQ DPM Servers Oracle GLED Dashboards (ED) XROOTD Servers ElasticSearch Kibana CRAB2 Oracle Zeppelin Monitoring Job CRAB3 ElasticSearch HTTP Collector WM Agent Processing & Aggregation SQL Collector Farmout Grid Control MonaLISA Collector Spark Real Time (ED) CMS Connect Hadoop Jobs Accounting (ED) PANDA WMS GNI API (ED) ProdSys Oracle PL/SQL Nagios ESPER Infrastructure Monitoring AMQ VOFeed Spark SSB (ED) HTTP GET OIM Oracle PL/SQL SAM3 (ED) HTTP PUT ES Queries GOCDB API (ED) ESPER REBUS

Processing & Aggregation Unified Monitoring Data Sources Transport z Storage &Search Display & Reports Metrics Manager Lemon Agent XSLS ATLAS Rucio FTS Servers Hadoop HDFS DPM Servers ElasticSearch XROOTD Servers Other CRAB2 Flume Kibana CRAB3 AMQ Jupyter WM Agent Kafka Processing & Aggregation Zeppelin Farmout Other Grid Control CMS Connect PANDA WMS Spark ProdSys Hadoop Jobs Nagios GNI VOFeed Other OIM GOCDB REBUS

Status Producers and Transport Storage and Search Processing Moving all data via new transport (Flume, AMQ, Kafka) Storage and Search Data in ES and Hadoop Processing Doing aggregation and processing via Spark Display and reports Experimenting using only the standard features of ES, Kibana, Spark, Hadoop Introduce notebooks and data discovery General Selecting technologies, learning on the job, looking for expertise Evolve interfaces (e.g. dashbords for users, shifters, sites, managers)

IPv6 and Monitoring for WLCG Data Sources We are confident that there are no major issues : No major changes vs the check in 2013 Evolution to the new architecture will take IPv6 into account Using the main stream technologies, very little code of our own Data sources Relies on external systems providing monitoring data Depends in data provided by FTS, Rucio, Panda, CRAB3, Xrootd, etc. MonALISA is external, used by ALICE and other projects (tested by ML devs) Transport Receives data via AMQ/Stomp, Flume, UDP, databases and HTTP sources. It is matter of staying up to date with ipv6-ready versions Metrics Manager Lemon Agent XSLS ATLAS Rucio FTS Servers DPM Servers XROOTD Servers CRAB2 CRAB3 WM Agent Farmout Grid Control CMS Connect PANDA WMS ProdSys Nagios VOFeed OIM GOCDB REBUS

IPv6 and Monitoring for WLCG z Storage &Search Display & Reports Transport Hadoop HDFS ElasticSearch Flume Other Kibana AMQ Jupyter Kafka Processing & Aggregation Zeppelin Spark Other Hadoop Jobs GNI Other Storage Storing mostly the host names only, as strings In a few cases the current Experiment Dashboards may store WN IP and will be fixed in the migration ElasticSearch has an IPv4 data type, but not IPv6 at the moment. Will come. Display and reports Only showing IPv4 and IPv6 hosts, names as strings Web applications can easily be made reachable by IPv6 nodes, actions will be needed (just like any other web server)

Plans Unified architecture and technologies Focus on migrating to common architecture Review the existing architecture, areas and data Update to new technologies in several areas Better perfomance and new versions with new features and major improvements Look into technologies as needed (collectd, Kafka, Grafana, etc.) Benefit from experience and feedback received from Experiments , WLCG and IT groups Move to central services Central service for ES is being created, InfluxDB for time series DBoD Continue to use central Hadoop services Continue with standard operations and upgrades At least for all 2016 Make available the new monitoring platform, in parallel with the existing ones

Conclusions No major changes vs. What reported in 2013 Mainstream technologies benefit from community effort and/or official support for IPv6 readiness. Evolution to the new architecture is used to review the whole monitoring data IPv6 is one of the reviews we will do No specific issues for monitoring detected

Demo Data in ElasticSearch FTS and Xrootd transfers data Examples of Dashboards Data discovery and error investigation Specific views for specific tasks VO overview Site manager

Data Centres Monitoring

Monitoring Technologies Area Services and Components Technologies Functions Data Collectors Metric Manager CERN Metrics registration Lemon Agent CERN+Flume Metrics producers (about 15000) Transport Gateway Flume Transport host metrics XSLS Service metrics Messaging Active MQ Messaging of metrics River Kafka Streaming of metrics Aggregation Foz Spark Processing streamed metrics Archive HDFS Hadoop Long term storage Displays Meter ElasticSearch + Kibana Dashboard for metrics Timber ElasticSearch+ Kibana Dashboard for logs Meter Proxy CLI and HTTP Interface to ES Alerts GNI Alarms handling