SEE-GRID-SCI Antun Balaz SA1 Leader Institute of Physics Belgrade National, Regional and World-wide Grid eInfrastructures Regional Grid Training University of Belgrade, June 2008 The SEE-GRID-SCI initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no
Overview AEGIS infrastructure SEE-GRID infrastructure SEE-GRID operations SEE-GRID Operational and monitoring tools SEE-GRID Service Level Agreement VO management EGEE infrastructure Other World-wide Grid infrastructures Regional Grid Training, University of Belgrade, June 2008
AEGIS infrastructure (1) AEGIS01-PHY-SCL, Institute of Physics Belgrade: 704 CPUs, 25 TB AEGIS02-RCUB: 12 CPUs AEGIS03-ELEF-LEDA, Faculty of Electronic Engineering, Nis: 4 CPUs AEGIS04-KG, CSANU & University of Kragujevac: 42 CPUs, 0.85 TB AEGIS05-ETFBG: 28 CPUs AEGIS06-AOB, Astronomical Observatory Belgrade (retired) AEGIS07-PHY-ATLAS, Institute of Physics Belgrade: 128 CPUs All core services sufficient for the operation of our national VO AEGIS Regional Grid Training, University of Belgrade, June 2008
AEGIS infrastructure (2) Core services: AEGIS CA (UOB-RCUB) AEGIS VOMS server (IPB) BDII (IPB) WMS (IPB) LFC (IPB, UOB-RCUB) Redundant core services are the next step User support through SEE-GRID Helpdesk or through national mailing lists Site support through SEE-GRID Helpdesk, through national mailing lists, and in person Extensive guides on operations Regional Grid Training, University of Belgrade, June 2008
AEGIS01-PHY-SCL at IPB (1) Regional Grid Training, University of Belgrade, June 2008
AEGIS01-PHY-SCL at IPB (2) Regional Grid Training, University of Belgrade, June 2008
SEE-GRID Infrastructure (1) Regional Grid Training, University of Belgrade, June 2008
SEE-GRID Infrastructure (2) SEE-GRID infrastructure contains currently the following resources: 35 sites in SEE-GRID production (31 at the end of Y1) CPUs: 2200 total (1150 at the end of Y1) Storage: TB (23.94 TB at the end of Y1) Typical machine configuration: dual or quad-core CPUs, with 1GB of RAM per CPU core All sites on gLite-3, with 20 sites already on gLite-3.1 and the rest on gLite-3.0; Scientific Linux used as a base OS (SL4 for gLite-3.1, SL3 for gLite-3.0 services), but others also present (CentOS, Debian) New gLite services deployed: glite-WMS/LB actively used, together with lcg-RB gLite-3.1 lcg-CE tested and deployed on several sites gLite-3.1 SE_dpm largely replaces the old SE_classic Experience and detailed guides on deploying natively compiled 64-bit architecture gLite services: worker nodes and disk storage servers Regional Grid Training, University of Belgrade, June 2008
SEE-GRID Infrastructure (3) SEE-GRID total CPUs, May 2006 – April 2008 (from GStat)
SEE-GRID Infrastructure (4) SEE-GRID Core services Catch-all Certification Authority enables regional sites to obtain user and host certificates Virtual Organisation Management Service (VOMS), authorization system for the SEE-GRID Virtual Organisation (VO), supporting groups and roles deployed two instances (master and slave) for failover Workload management service (lcg-RB and glite-WMSLB) deployed several instances for failover Information Services (BDII) deployed several instances for failover MyProxy is operational supports certificate renewal FTS deployed used in production Regional Grid Training, University of Belgrade, June 2008
SEE-GRID Infrastructure (5) As sites mature, they migrate to EGEE Croatia, Turkey, Serbia, Romania However, this depends on agencies providing funding for the hardware Each participating institute has its own strategy Regional Grid Training, University of Belgrade, June 2008
SEE-GRID Operations (1) Regional Grid Training, University of Belgrade, June 2008
SEE-GRID Operations (2) Distributed Operations – currently one ROC EGI: SEE ROC probably integrated with the SEE-GRID ROC Pilot SLA established Monitoring and Accounting Tools Helpdesk tickets procedures Generic support group for users TPM-like (monitoring open tickets created by users, trying to solve the simple ones, route the tickets, etc.). Country level user support groups Step towards stand-alone operations Grid-Operator-On-Duty shifts introduced to improve site availabilities and resolve all operational issues SEEGRID Wiki with detailed information for site admins: VOMS Role=ops used for SAM jobs submission Regional Grid Training, University of Belgrade, June 2008
SEE-GRID Operational & monitoring tools (1) HGSM HELP-DESK BDII R-GMA SAM GSTAT (Taiwan) GSTAT (Taiwan) VOMS RTM (UK) RTM (UK) Google maps Google maps BBmSAM GridICE MonALISA NAGIOS WiatG Accounting Regional Grid Training, University of Belgrade, June 2008
SEE-GRID Operational & monitoring tools (2) Operational & monitoring tools deployment status Hierarchical Grid Site Management (HGSM) – Turkey Service Availability Monitoring (SAM) (+ porting to MySQL) – Bosnia and Herzegovina with CERN support Helpdesk - Romania BBmSAM - Bosnia and Herzegovina GridICE – FYR of Macedonia SEE-GRID GoogleEarth – Turkey + SEE-GRID GoogleMaps - Turkey Global Grid Information Monitoring System (GStat) – ASGC, Taiwan R-GMA and Accounting Portal – Bulgaria Nagios - Bulgaria Real Time Monitor (RTM) – and Turkey (HGSM) MONitoring Agents using a Large Integrated Services Architecture (MonALISA) – Romania What is at the Grid (WiatG) – CERN with support from Serbia Regional Grid Training, University of Belgrade, June 2008
SEE-GRID Operational & monitoring tools (3) Integration status HGSM+SAM, HGSM+BBmSAM Automatic creation of list of sites to be tested HGSM+BDII Automatic creation of list of sites in the infrastructure HGSM+GStat Automatic creation of list of sites to be monitored HGSM+RTM, HGSM+R-GMA Automatic creation of list of sites monitoring and for accounting VOMS+Helpdesk Automatically create new user accounts when accessing helpdesk Certificate based access for Helpdesk HGSM HELP-DESK BDII R-GMA SAM GSTAT VOMS RTM Google maps Google maps BBmSAM Regional Grid Training, University of Belgrade, June 2008
HGSM database SEE-GRID GOCDB Introduced as a lightweight version of GOCDB Allows us to easily change its format when necessary and to adapt it to regional needs Allows us to provide custom exports on demand, depending on operational tools/application developers Contains statical information about all sites Developed and maintained by TUBITAK-ULAKBIM, Turkey Used by EUMedGRID, other regional projects expressed interest Regional Grid Training, University of Belgrade, June 2008
,, 18/x BBmSAM portal Created for SLA monitoring Generating site availability statistics according to several criteria Overview (HTML) and full dump (CSV) of data possible Extended into full SAM portal Availability for last 24h period for all sites/services Latest results per service History for nodes/services BBmobileSAM Optimized for small-screen devices and low bandwidth Possible filtering of sites Possible three levels of details BBmSAM & BBmobileSAM Regional Grid Training, University of Belgrade, June 2008
SEE-GRID SLA Hardware and connectivity criteria Min. amount of resources for sites to participate in the infrastructure Network to fulfill operations test requirements Level of support Site and security administrators availability and response time Level of expertise Site and security administrators declaration of expertise VO support Site to provide support to SEEGRID VO and its OPS role Conformance to Operational Metrics Site availability Downtimes SEE-GRID-2 SLA communicated to EGEE Regional Grid Training, University of Belgrade, June 2008
Conformance to SEE-GRID SLA (1) Availabilities of SEE-GRID CEs Regional Grid Training, University of Belgrade, June 2008
Conformance to SEE-GRID SLA (2) Weighted availabilities of SEE-GRID CEs Regional Grid Training, University of Belgrade, June 2008
Conformance to SEE-GRID SLA (3) Availabilities of SEE-GRID SEs Regional Grid Training, University of Belgrade, June 2008
Conformance to SEE-GRID SLA (4) Weighted availabilities of SEE-GRID SEs Regional Grid Training, University of Belgrade, June 2008
VO Management Regional catch-all SEEGRID VO Members from all participating institutes Distributed VO management: all countries have VOMS admin representatives National VOs Serbia (AEGIS VO) Romania Turkey Regional VO is supported on all sites Other regional discipline-oriented VOs will be created soon (SEE- GRID-SCI) Seismology Meteorology Environmental sciences etc. Regional Grid Training, University of Belgrade, June 2008
EGEE infrastructure 250 sites in 50 countries 11 federations More than 55k CPUs Amount of available storage hard to specify, several thousands of PB All core services redundantly available Serbia is part of EGEE-SEE ROC Provides 800+ CPUs, out of region’s 2900 CPUs More details: Accounting for the last three months: Serbia provides around 4% of all accounting in EGEEE EGEE-SEE provides around 9.5% of all accounting in EGEE Serbia provides around 52% of all EGEE-SEE accounting (mostly to AEGIS, SEEGRID, SEE and ATLAS VO) Regional Grid Training, University of Belgrade, June 2008
Other World-wide Grid infrastructures WLCG – World-wide LHC Computing Grid (EGEE subset, with firm commitments to LHC VOs) D-Grid TeraGrid OSG DEISA Regional Grid Training, University of Belgrade, June 2008