Download presentation
Presentation is loading. Please wait.
1
SA1 – Infrastructure Operations Report
PSC06 Meeting Istanbul, 7-8 December 2009 Antun Balaz SA1 Leader Institute of Physics Belgrade The SEE-GRID-SCI initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no
2
Overview SA1 objectives, metrics, activities SA1 deliverables status
SA1 milestones status Infrastructure development Infrastructure management Service Level Agreement Infrastructure usage Network link to Moldova status Collaboration/Interoperation Action points
3
SA1 objectives and metrics
Objective 2: Providing infrastructure for new communities O2.1: Expand the current infrastructure MTSA1.1: Increase in the number of computing and storage resources (tables given in DoW) O2.2: Inclusion of Armenia and Georgia MTSA1.2: Number of Grid sites and processing and storage resources (tables given in DoW) O2.3: Achieve high reliability, availability and automation MTSA1.3: Increase of the average overall Grid site availability (M01 >= 70%, M12 >= 75%, M24 >= 81%) MTSA1.4: Number of successful jobs ran as % of total jobs (M01 >= 50%, M12 >= 55%, M24 >= 60%) MTSA1.5: Number of management tools expanded or developed (+achieving tools integration and automation) O2.4: Provision of the network link to Moldova
4
SA1 activities SA1.1: Implementation of the advanced SEE-GRID infrastructure SA1.1.1: Expand the existing SEE-GRID infrastructure and deploy Grid middleware components and OS in SEE Resource Centers SA1.1.2: Operate the SEE-GRID infrastructure SA1.1.3: Deploy and Operate the core services for new VOs SA1.1.4: Catch-all CA and deployment and operational support for new and emerging Grid CAs SA1.1.5: Certify and migrate SEE-GRID sites from regional to global production-level eInfrastructure SA1.2: Resource Centre SLA monitoring and enforcement SA1.2.1: SLA detailed specification, identification and deployment of operational tools relevant for SLA monitoring SA1.2.2: Monitoring, assessment and enforcement of RC conformance to SLA SA1.3: Network Resource Provision SA1.3.1: Network resource provision and liaison with regional eInfrastructure networking projects SA1.3.2: Procurement of a link between Moldova and GEANT
5
SA1 deliverables status
DSA1.1a: Infrastructure Deployment Plan (M04) CERN, Editor: D. Stojiljkovic DSA1.2: SLA detailed specification and related monitoring tools (M05) UOBL, Editor: M. Savic DSA1.3a: Infrastructure overview and assessment (M12) UKIM, Editor: B. Jakimovski DSA1.1b: Infrastructure Deployment Plan (M14) UOB-IPB, Editor: A. Balaz DSA1.3b: Infrastructure overview and assessment (M23)
6
SA1 milestones status MSA1.1: Infrastructure deployment plan defined (M04) CERN (verified by DSA1.1a) MSA1.2: SLA structure and enforcement plan defined (M05) UoBL (verified by DSA1.2) MSA1.3: Network link for Moldova established (M23) RENAM: (verified by the operational link to MD and DSA1.3b) MSA1.4: Infrastructure performance and usage assessed (M23) UKIM (verified by DSA1.3b)
7
Infrastructure development
History available at
8
Infrastructure expansion
SEE-GRID-SCI infrastructure contains currently the following resources: Dedicated CPUs: around what we have promised Storage: drop due to TR-01-ULAKBIM withdrawal Commitments excel file 38 sites in SEE-GRID-SCI production (2 less than during PSC05) Typical machine configuration: dual or quad-core CPUs, with 1GB of RAM per CPU core; many sites with 64-bit architecture All sites on gLite-3.1 -> 3.2; Scientific Linux 4.x or 5.x used as a base OS, but others also present (CentOS, Debian) Metrics MTSA1.1 generally fulfilled Armenia and Georgia have deployed new Grid sites and joined the SEE-GRID infrastructure – MTSA1.2
9
Core services (1) Catch-all Certification Authority
enables regional sites to obtain user and host certificates Virtual Organisation Management Service (VOMS), For each scientific community deployed in two instances for failover Supporting groups and roles Workload management service (glite-WMS/LB) and Information Services (BDII) For each scientific community deployed in several instances for failover Logical File Catalogue (LFC) MyProxy Supports certificate renewal for all deployed WMS/RB services File Transfer Service (FTS) Used in production Relational Grid Monitoring Architecture (R-GMA), Registry and Schema SEE-GRID accounting publisher, with support for MPI jobs accounting AMGA Metadata Catalogue
10
Core services (2) No core services in: Albania Montenegro Moldova
Georgia
11
Infrastructure management
12
Grid operations Convergence in procedures with EGEE-SEE in (extended) region Monitoring switched to new core services, to the new VO Migration to SL5/gLite-3.2 MPI issues Deployment of sites RO missing 1 site AL missing 5 sites MD missing 1 site BA missing 1 site ME missing 1 site Excellent progress in AM
13
Migration to gLite-3.2 gLite-UI 3.2 x86_64 under SL5 – tested by IPB
Many issues relevant for its usage are solved Tested with all types of Grid jobs Recommended for installation gLite-WN 3.2 x86_64 under SL5 – tested by IPB Missing packages are provided at SCL Repository service (MPI packages, the latest version of Torque client) gLite-BDII 3.2 x86_64 under SL5 – tested by IPB Significant improvement in performance of this service gLite-CREAM 3.2 x86_64 under SL5 – tested by IPB Improvement in performance Recommended for installation (lcg-CE still mandatory)
14
SEE-GRID-SCI MPI Support
gLite MPI Admin Guide Proper installation and configuration of parallel (MPI) jobs on gLite-based Grid infrastructure MPI related RPMs (i386 and x86_64) MPICH (mpich-1.2.7p1) MPICH2 (mpich p1) OPENMPI (openmpi-1.2.5) MPIEXEC (mpiexec-0.82) MPI-START (i2g-mpi-start ) gLite MPI User Guide Information for users on how to run parallel (MPI) jobs on gLite-based Grid infrastructures BBmSAM MPI tests submission (currently non-critical)
15
Operational/monitoring tools (1)
Hierarchical Grid Site Management (HGSM) (+interface to GOCDB) – Turkey BBMSAM Service Availability Monitoring + extensions – Bosnia and Herzegovina with Serbia support Helpdesk + NMTT (+ interoperation with EGEE-SEE and GGUS + intergration with Nagios) – Romania with CERN support SEE-GRID GoogleEarth – Turkey + ic.ac.uk Global Grid Information Monitoring System (GStat) – ASGC, Taiwan R-GMA and Accounting Portal – Bulgaria Nagios - Bulgaria Real Time Monitor (RTM) – ic.ac.uk and Turkey (HGSM) MONitoring Agents using a Large Integrated Services Architecture (MonALISA) – Romania WatG Browser – Serbia WMSMon tool – Serbia Pakiti – Greece GSSVA (security-enabled Pakiti extension) – SZTAKI SEE-GRID Wiki with detailed information for site administrators
16
Operational/monitoring tools (2)
Static Database: HGSM Static database containing all relevant data about all SEE-GRID-SCI sites Synchronized with the real situation Monitoring BBmSAM Portal that provides access to the database of SAM tests results Central tools for identification of operational problems Provides SLA metrics
17
Operational/monitoring tools (3)
Gstat Central tool for monitoring of the information system of SEE-GRID-SCI infrastructure Nagios Collection of alarms raised by various tools In the future, automatic creation of Helpdesk tickets will be implemented Pakiti Helps the system administrator keeping multiples machines up-to-date and prevent unpatched machines to be kept silently on the network. GSSVA (JRA1)
18
Operational/monitoring tools (4)
WatG Browser Web-based Grid Information System visualization application Detailed overview of the status and availability of various Grid resources Queries and presents data obtained from gLite-based e-Infrastructure at different layers WMSmon Aggregated and detailed status view of all monitored WMS services Links to the appropriate troubleshooting guides Real Time Monitor Using satellite imagery from NASA, these clients display the SEE-GRID-SCI as it is geographically spread over the region GridIce Googlemap MonaLisa
19
Operational/monitoring tools (5)
Helpdesk: OneOrZero Central reference point for tracking of all operational and user problems Identified problems are reported through the Helpdesk and assigned to the appropriate supported NMTT (JRA1) Accounting portal Collects the accounting data from all SEE-GRID-SCI sites through apel MPI-enabled accounting publisher developed by the project Provides aggregated accounting data by site, country, institution, application Operations wiki
20
Operational Software Repository
The SEE-GRID-SCI Operational Software Repository provides RPMs produced within SEE-GRID-SCI and collection of mirrors of interest to SEE-GRID-SCI operations Repository: Repository documentation: Current list of mirror relevant for SEE-GRID-SCI operations Scientific Linux 4.x Scientific Linux 5.x LCG-CAs gLite 3.1 gLite 3.2 JPackages 1.7 JPackages 5.0 DAG
21
Service Level Agreement
Sites need to conform to SEE-GRID-SCI SLA availability and reliability criteria Monitoring done automatically by the BBmSAM portal New SLA defined (80% availability goal) SLA Enforcement Team (SET) was established to monitor the conformance of sites to SLA Sites that fully conform to the SLA availability (> 80%) are upgraded to the new status in HGSM: seegrid_certified Sites that do not conform are un-certified SLA Q6 excel file
22
Infrastructure usage (1)
23
Infrastructure usage (2)
24
Infrastructure usage (3)
25
Infrastructure usage (4)
26
Infrastructure usage (5)
27
Infrastructure usage (6)
28
Infrastructure usage (7)
Overall accounting: 20.6M Base CPU hours Accounting excel file SEE-GRID-SCI supported applications: 44k Base CPU hours
29
Network link to Moldova status
Two stages Upgrade of the existing radio-relay link Chisinau-Iasi: this is actually implemented approach that has restrictions on the perspective growth; currently realized operational connection to RoEduNet has 2x155 Mbps capacity Provision of the direct Dark Fiber link Chisinau-Iasi; contract signed in February 2009 In November 2008 an updated proposal was submitted to NATO for co-finding of the Dark Fiber link NATO Science for Peace Committee authorities positively evaluated the updated proposal and in February 2009, NATO project co-director received confirmation that the revised proposal was accepted Update?
30
Collaboration/Interoperation (1)
NA4: support of discipline VOs (core services and resources) and apps JRA1: developing and deploying OTs NA3: providing training infrastructure (core services and resources) NA2: inputs to/implementation of policy documents Infrastructure fully interoperable with EGEE and a number of other regional Grid infrastructures Active participation in EGEE Operations Automation Team (OAT) Joint work and development of Nagios solutions for Grid resources Interoperation of HGSM and GOCDB; regionalization and testing of GOCDB Testing of GStat 2.0 Grid-Operator-on-Duty experiences communicated to EGEE Basis for regionalization of COD Sharing of tools with other projects/infrastructures: WMSMON, WatG Collaboration with the EDGeS project on establishing interoperability with infrastructures based on desktop Grids
31
Collaboration/Interoperation (2)
32
Collaboration/Interoperation (3)
33
Collaboration/Interoperation (4)
34
IPB publications on Grid operational/monitoring tools
D. Vudragovic, A. Balaz, V. Slavnic, A. Belic, "DWARF - The framework for authorized YUM/APT repositories management", Proceedings of the INFOTEH 2009 Conference, Jahorina, Bosnia and Herzegovina, E-V-8, (2009) D. Vudragovic, V. Slavnic, A. Balaz, A. Belic, "WMSMON – gLite-WMS Monitoring Tool", Proceedings of the MIPRO 2009 Conference, Opatija, Croatia, GSV-02, (2009) D. Vudragovic, A. Balaz, V. Slavnic, A. Belic, "Serbian Participation in Grid Computing Projects", Proceedings of the NEC2009 Conference, Varna, Bulgaria (2009) D. Vudragovic, J. Simonovic, A. Balaz, A. Belic, "WMSMon – gLite-WMS/LB Monitoring Tool", EGEE09 Conference, Barcelona, Spain (2009) D. Vudragovic, J. Simonovic, A. Balaz, A. Belic, "WatG Browser – Grid Information System Browser", EGEE09 Conference, Barcelona, Spain (2009) V. Slavnic, B. Ackovic, D. Vudragovic, A. Balaz, A. Belic, "Operational Grid Tools Developed at SCL", Proceedings of the SEE-GRID-SCI User Forum 2009, Istanbul, Turkey (2009) V. Slavnic, B. Ackovic, D. Vudragovic, A. Balaz, A. Belic, M. Savic, "Grid Site Monitoring Tools Developed and Used at SCL", Proceedings of the SEE-GRID-SCI User Forum 2009, Istanbul, Turkey (2009) IPB SEE-GRID-SCI Related Publications /28
35
Other IPB publications in which SEE-GRID-SCI is acknowledged (1/2)
I. Vidanovic, A. Balaz, A. Bogojevic, A. Pelster, "Calculation of Tc of 87Rb BEC using High-order Effective Actions", Book of Abstracts of the "Quo Vadis BEC?" Conference, Bad Honnef, Germany, P2 (2008) A. Balaz, I. Vidanovic, A. Bogojevic, A. Pelster, "Path Integrals Without Integrals", Proceedings of the DPG-2008 Conference, Berlin, Germany, Abstract DY (2008) A. Balaz, A. Bogojevic, I. Vidanovic, A. Pelster, "Recursive Schrodinger equation approach to faster converging path integrals", Phys. Rev. E 79, (2009) I. Vidanovic, A. Bogojevic, A. Belic, "Properties of Quantum Systems via Diagonalization of Transition Amplitudes I: Discretization Effects”, Phys. Rev. E 80 (2009) I. Vidanovic, A. Bogojevic, A. Balaz, A. Belic, "Properties of Quantum Systems via Diagonalization of Transition Amplitudes II: Systematic Improvements of Short-time Propagation ", Phys. Rev. E 80 (2009) J. Živkovic, M. Mitrovic, B. Tadic, "Correlation Patterns in Gene Expressions Along the Cell Cycle of Yeast", Studies in Computational Intelligence (2009) IPB SEE-GRID-SCI Related Publications /28
36
Other IPB publications in which SEE-GRID-SCI is acknowledged (2/2)
V. Slavnic, A. Balaz, D. Stojiljkovic, A. Belic, A. Bogojevic, "SPEEDUP - Optimization and Porting of Path Integral MC Code to New Computing Architectures", Proceedings of the NEC2009 Conference, Varna, Bulgaria (2009) A. Balaz, I. Vidanovic, A. Bogojevic, A. Pelster, "Short-time Effective Action Approach for Numerical Studies of Rotating Ideal BECs”, Book of Abstracts of the Conference on Research Frontiers in Ultra-Cold Atoms, International Centre for Theoretical Physics, Trieste, Italy (2009) A. Balaz, I. Vidanovic, A. Bogojevic, A. Pelster, "Ultrafast Converging Path Integral Approach for Rotating Ideal Bose Gases", Proceedings of the DPG-2009 Conference, Dresden, Germany, Abstract DY-1.4 (2009) B. Novakovic, A. Balaz, Z. Kneževic, M. Potocnik, "Computation of asteroid proper elements on the Grid", Serb. Astron. J. 179 (2009) V. Slavnic, A. Balaz, D. Stojiljkovic, A. Belic, A. Bogojevic, "Optimization and Porting of the Path Integral Monte Carlo SPEEDUP Code to New Computing Architectures", Proceedings of the SEE-GRID-SCI User Forum 2009, Istanbul, Turkey (2009). IPB SEE-GRID-SCI Related Publications /28
37
Action points (1) AP42: HGSM interface with GOCDB (Hakan, ongoing)
AP43: Helpdesk statistics (AlexS, 15 Sep 08 -> 15 Sep 09) AP49: Nagios – integration of all alarms; “CIC dashboard” (Emanouil, 30 Jun 08 -> 15 Sep 09) AP54: Wiki reorganization (Boro, 30 Jun 08 -> 15 Sep 09) AP98: WMS monitoring supported on all WMS+LBs (Dusan, site admins, 15 Sep 09 -> 30 Dec 09) AP163: GOOD Templates functionality in Nagios to be implemented (Emanouil, 15 Sep 09) AP214: Define logical DNS names for SA1 services (Antun, 15 Sep 09 -> 30 Dec 09) AP221: Deployment of additional core services for discipline VOs per country (Antun, site admins, 15 Sep 09 -> 30 Dec 09) [Closed, migrated to AP254] AP225: SA1 tools and procedures should be presented to other projects (Antun, ongoing) AP226: All countries are encouraged to install their own set of operational tools (GIMs, Antun, Ioannis, ongoing) AP246: All sites to install VOMS certificates (site admins, 15 Oct 09) AP247: Accounting portal to produce reliable data (Emanouil, 15 Oct 09) AP248: BBmSAM to produce full SLA reports (Mihajlo, 15 Oct 09)
38
Action points (2) AP249: Migration of 64-bit WNs to SL5/gLite-3.2 (Dusan, site admins, 15 Dec 09) AP250: Re-certification of bad sites (SET, GIMs, 15 Oct 09) AP251: Send list of wrongly configured WMS (Emanouil, 15 Sep 09) AP252: JRA1 tools to be deployed in SA1 (Antun, Branko, 30 Oct 09) AP253: Wiki pages re organization and information tools to be checked for consistency (Antun, Boro, 30 Oct 09) AP254: All countries install core services (GIMs, 30 Oct 09)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.