Download presentation
Presentation is loading. Please wait.
Published byLynne Johnston Modified over 8 years ago
1
TSA1.4 Infrastructure for Grid Management Tiziana Ferrari, EGI.eu EGI-InSPIRE – SA1 Kickoff Meeting1
2
Goal The purpose of this task is the deployment of the infrastructure for Grid management consisting of a set of services and tools needed by the NGI/EIRO Operations Centres regionally and/or centrally for the running of the Grid software services, for Grid monitoring (including SLA and security monitoring), and ongoing Grid management.
3
Internal O-N and O-E tasks O-E-1 GOCDB 0.5 FTE, UK O-E-3 Monitoring infrastructure 0.25 CERN, 0.25 GRNET, O-E-4 Operations portal and dashboard 0.25 FTE FR O-E-12 Tools for network troubleshooting and monitoring 0.25 FTE IT O-N-1 Grid topology database O-N-3 Grid repositories (for operational tools) O-N-4 operations portal and dashboard
4
O-E-1 GOCDB deployment: Current situation CENTRAL GOCDB4 WSGUI GOCDB module REGION / NGI Local users GOCDB3 WSGUI central users EGI tools central tools Read/Write Read only GOCDBPI_v4 GOCDBPI Courtesy of G.Mathieu
5
GOCDB deployment: Wanted situation CENTRAL GOCDB4 WSGUI GOCDB module REGION / NGI Local users INPUT GOCDB4 WSGUI GOCDB module central users EGI tools central tools Read/Write Read only GOCDBPI_v4 Release timeline First half of July, if well planned and well announced; accounting portal still relying on GOCDB3
6
O-E-3 Montoring Validation of Nagios instances –Nagios migrated on May 26 th : ROC: ITALY, UKI NGI: NGI_Greece –Nagios migrated on June 1 st : ROC Central Europe ROC IGALC ROC Latin America ROC South Western Europe Remaining instances will be migrated during June: –ROC: AP, Canada, France, Germany/Switzerland, NE, Russia, SEE –NGI: NGI_PL, NGI_France, NGI_BY, NGI_SK, NGI_SI, NGI_HR, NGI_CZ (by now running on CERN Nagios instances) Courtesy of J.Casey, D.Collados
7
O-E-3 Monitoring (cont) Nagios-based availability/reliability reports compared to SAM reports –Statistics comparable (small improvement with Nagios by its design) SAM –Proposed date for switching off: June 15 MyEGI portal deployment model: –central project instance (CERN) + NGI instances Monitoring of monitoring –https://ops-monitor.cern.ch/nagios/https://ops-monitor.cern.ch/nagios/ –Requested feedback and ideas for more services/probes to deploy (got some input from the ENOC)
8
Central Oracle DBs currently deployed at CERN: –Aggregated Topology Provider (ATP) –Metric Description Database (MDDB) –Metric Results Store (MRS) Evolution During Y1: –Improve profiles management in MDDB –Implement history functionality in ATP –Integrate & deploy the three DBs into one single account –Maintenance & bug fixing O-E-3 Monitoring: Central DBs status 8
9
O-E-3 Monitoring: Messaging Currently: 3 sites with brokers +1 broker for APEL accounting Y1 evolution: –it was an aim of the general broker network to support authorization as required by APEL –APEL to migrate once that has been achieved –Until then APEL will run one or more brokers to support APEL depending on STFC view of the risks of a single point of failure.
10
O-E-4 Operations portal and dashboard 2 Central Web Applications : –historical portal: http:cic.gridops.orghttp:cic.gridops.org –recent portal: http://operations-portal.in2p3.frhttp://operations-portal.in2p3.fr hosting the Operations Dashboard Module This module will be proposed in a regional package: June 8th Other features will be migrated progressively to the new portal and integrated step by step in the regional package Courtesy of C. L’Orphelin
11
O-E-4 Central Instance of the dashboard: Architecture
12
O-E-4 Availibility and failover High availability context : –Each configuration of Lavoisier is copied in SVN –The database Mysql is backed-up Restoration of the back-up : 30 min –The Web machine is hosted in a cluster No automatic failover yet. The DNS switch and the replication of data will be studied during the 1st year. The central instance could be used in case of troubles on the Regional instances.
13
O-E-4: Migration plans Migration to the rest of key features to Symfony and the new Portal : –VO ID Card –Broadcast tool –User tracking –VO / Sites resources browser Propose regional modules when possible of these features
14
O-E-12 Network tools DownCollector Polling tool reporting on reachability of GOCDB services (tests on TCP ports) Central server running the probes, star-based architecture EGEE III instance: https://ccenoc.in2p3.fr/DownCollector/ migrated to GARR (Italy) https://perfsonarlitetss.dir.garr.it/DownCollector/https://ccenoc.in2p3.fr/DownCollector/https://perfsonarlitetss.dir.garr.it/DownCollector/ –will be accessible through a new portal dedicated to the O-E-12 task, which will be available at the URL http://eginet.garr.it to be setuphttp://eginet.garr.it High Availability currently not available (to be defined in Y1) Originally developed by IN2P3 CC-Lyon (EGEE SA2) GARR 14 Courtesy of M.Reale
15
O-E-12 Network troubleshooting perfSONAR-lite TroubleShooting Services Started in EGEE-III, entirely designed by SA2 Developments lead by DFN/Erlangen Central server orchestrating on demand e2e measurements between light probes hosted by Grid sites Bandwidth measurements DNS lookup Traceroute Port testing Ping
16
O-E-12 perfSONAR-lite TSS 16 http://www.dfn.de/en/enhome/x-win/download-of-perfsonar-lite-tss/
17
O-E-12 perfSONAR-lite TSS: future –initial deployment strategy within the EGI required O-E-12 testing and deployment campaigns in the next weeks –core development needed to further improve security related to available bandwidth tests and simply AA –DFN and CNRS are interested in be engaged with the future development 17
18
Y1 Milestones and deliverables MS401 Operational Tools regionalisation status (INFN) PM1 in collaboration with TSA1.5 Contribution to MSA406 “Deployment plan for the distribution of operational tools to the NGIs/EIROs “ (see TSA1.3) Contribution to MSA404 “Operational Level Agreements (OLAs)“ (see TSA1.8)
19
Short/medium term issues Migration to nagios server final layout, upgrade of the dashboard and gstat, fasing out of GOCDB3 Is current failover/HA of central operational tools sufficient? Measurement of availability/reliability of tools (central/regional MyEGI portals, dashboard, GGUS, regional helpdesk, central/regional monitoring infrastructure,...) Contribution to the definition of OLAs concerning tools
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.