Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Operations in PL-Grid M. Radecki, T. Szepieniec,

Slides:



Advertisements
Similar presentations
Building Portals to access Grid Middleware National Technical University of Athens Konstantinos Dolkas, On behalf of Andreas Menychtas.
Advertisements

Using the Self Service BMC Helpdesk
Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Services and Operations in Polish NGI M. Radecki,
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Ab initio grid chemical software ports – transferring.
08/11/908 WP2 e-NMR Grid deployment and operations Technical Review in Brussels, 8 th of December 2008 Marco Verlato.
Electronic reporting in Poland 27th Voorburg Group Meeting Warsaw, Poland October 1st to October 5th, 2012 Central Statistical Office of Poland.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks From ROCs to NGIs The pole1 and pole 2 people.
CGW 2003 Institute of Computer Science AGH Proposal of Adaptation of Legacy C/C++ Software to Grid Services Bartosz Baliś, Marian Bubak, Michał Węgiel,
Polish Infrastructure for Supporting Computational Science in the European Research Space QoS provisioning for data-oriented applications in PL-Grid D.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
The National Grid Service User Accounting System Katie Weeks Science and Technology Facilities Council.
Event Management & ITIL V3
A Web Based Workorder Management System for California Schools.
A Proposal of Application Failure Detection and Recovery in the Grid Marian Bubak 1,2, Tomasz Szepieniec 2, Marcin Radecki 2 1 Institute of Computer Science,
GILDA testbed GILDA Certification Authority GILDA Certification Authority User Support and Training Services in IGI IGI Site Administrators IGI Users IGI.
DataNet – Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC.
EGEE is a project funded by the European Union under contract IST User support in EGEE Alistair Mills Torsten Antoni EGEE-3 Conference 20 April.
GGUS at PEB – –- page 1 LCG Klaus-Peter Mickel, GridKa Karlsruhe LCG-PEB-Meeting ( ) The Global Grid User Support Model (Report of GDB.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE User Support Infrastructure Torsten.
Lightweight construction of rich scientific applications Daniel Harężlak(1), Marek Kasztelnik(1), Maciej Pawlik(1), Bartosz Wilk(1) and Marian Bubak(1,
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Operational Architecture of PL-Grid project M.Radecki,
Candidates: Administrators:
Cracow Grid Workshop, October 15-17, 2007 Polish Grid Polish NGI Contribution to EGI Resource Provisioning Function Automatized Direct Communication Tomasz.
INFSO-RI Enabling Grids for E-sciencE An overview of EGEE operations & support procedures Jules Wolfrat SARA.
Julia Andreeva on behalf of the MND section MND review.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
Incident Management Ensuring that all deviations from the specified service levels are registered and that normal service is resumed as soon as possible.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Ops Portal New Requirements.
Participation of JINR in CERN- INTAS project ( ) Korenkov V., Mitcin V., Nikonov E., Oleynik D., Pose V., Tikhonenko E. 19 march 2004.
1 Network Quarantine At Cornell University Steve Schuster Director, Information Security Office.
INFSO-RI SA2 ETICS2 first Review Valerio Venturi INFN Bruxelles, 3 April 2009 Infrastructure Support.
The National Grid Service User Accounting System Katie Weeks Science and Technology Facilities Council.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Grid Oversight in Service Level Agreement environment Małgorzata Krakowian,
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
CERN - IT Department CH-1211 Genève 23 Switzerland t IT-GD-OPS attendance to EGEE’09 IT/GD Group Meeting, 09 October 2009.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What all NGIs need to do: Helpdesk / User.
EGI-Engage is co-funded by the Horizon 2020 Framework Programme of the European Union under grant number GGUS Service Provider GGUS –
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid is a Bazaar of Resource Providers and.
EGI Process Assessment and Improvement Plan – EGI core services – Tiziana Ferrari FedSM project 1EGI Process Assessment and Improvement Plan (Core Services)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
Grid Deployment Technical Working Groups: Middleware selection AAA,security Resource scheduling Operations User Support GDB Grid Deployment Resource planning,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
Scuola Grid - Martina Franca, Thursday 08 November Il Sistema di Supporto INFNGrid & GGUS ( Global Grid User.
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI VO Services Activities VO Services Activities NA3 F2F Meeting (3/03/2011)
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Regional Helpdesk GRNET Example Gkamas Vasileios NGI_GRNET User Support Team.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Operations Portal OTAG September, 21th 2011 Cyril L’Orphelin – CCIN2P3/CNRS.
1 The Life-Science Grid Community Tristan Glatard 1 1 Creatis, CNRS, INSERM, Université de Lyon, France The Spanish Network for e-Science 2/12/2010.
Polish NGI: PL-Grid Marcin Radecki EGI-InSPIRE – SA1 Kickoff Meeting 1.
Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Grid Resource Bazaar Platform for resource allocation.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI COD activity in EGI-InSPIRE Marcin Radecki CYFRONET, Poland & COD Team 9/29/2016.
CERN WLCG Grid Storage Systems Deployment Flavia Donno, CERN 6 November 2007 Organization of Storage Support through GGUS Flavia Donno CERN/IT-GD CERN.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
PL-Grid – an example of NGI support structure Marcin Radecki
Brief overview on GridICE and Ticketing System
Maite Barroso, SA1 activity leader CERN 27th January 2009
Nordic ROC Organization
GGUS Partnership between FZK and ASCC
LCG Operations Workshop, e-IRG Workshop
Leigh Grundhoefer Indiana University
Interaction with resource providers: selection, SLA, support
User Support in EGI Reactive and proactive services
Support services for EGI portal-* communities
Presentation transcript:

Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Operations in PL-Grid M. Radecki, T. Szepieniec, M. Krakowian, T. Szymocha, M. Zdybek, D. Harezlak, and J. Andrzejewski ACC CYFRONET AGH Cracow Grid Workshop Cracow,

2 Outline  Goal of Grid Operations  PL-Grid services for users  User registration and account management – PL-Grid Portal  Incident reporting  Usage monitoring  PL-Grid services for Polish NGI  service availability monitoring  grid usage accounting  issue tracking  High level view on EGI, NGI and PL-Grid Operations  Incident Management in PL-Grid  Grid Infrastructure Monitoring  Operations Communication and Documentation

3 Goal of PL-Grid Operations  coordinate and fulfill activities and processes required to provide and manage services for PL-Grid users  manage the technology required to provide and support these services

4 PL-Grid infrastructure services  Services for users  access to computing power and storage space in 5 largest Polish computing centers  scientific software (e.g Gaussian, Fluent, Povray)  user account management system  facilities to report problems & service requests  resource usage monitoring system  application portals and other tools for users (soon)  PL-Grid as Polish NGI is obliged to provide some services interfaced to EGI  service availability monitoring system  issue tracking and user support system  accounting (resource usage) system

5 User account management  Motivation: necessity to determine if user is entitled to use PL-Grid resources  Registration process confirms a user is researcher affiliated to Polish research unit or ward: undergraduates, PhD students authorized by supervisor  Registration must be on-line for user  Implementation: PL-Grid Portal based on Liferay engine  Successful user registration results in Portal account - PL-Grid “entry point” for the user  Easily extended with new functionality using JSR 268 portlets  Ability to re-use rich Liferay components library like e.g. forum, wiki  PL-Grid specific features  Easy personal certificate access - ability to get X.509 certificate on-line scope limited to PL-Grid services only  User account data integrated with PL-Grid tools & services User login used for services allowing login/password authentication/authorization  Broadcast tool to contact all users

6 User account management – 1 st year experiences  PL-Grid user registration opened at last year's CGW  PL-Grid Portal technology changed from Java Spring through Google Web Tookit to Liferay  Agreed formal process description documents indispensable  user registration important for all PL-Grid computing centers  procedure security  User statistics (as of )  Registered users: 204 PL-Grid staff: 64  independent researchers: 56  wards: 84 Jan – Oct 2010 no. of registered users

7 PL-Grid Scientific Software & Helpdesk  PL-Grid offers access to both commercial and free scientific applications  NAMD, ADF, Blender, CFour, CPMD, Dalton, Fluent, Gamess, Gaussian, Gromacs, NWChem, Povray, Turbomole  Availability of software and current status are monitored and results are feed to incident management system  higher availability for users  Users can check if program failed due to their fault of computing center problem  Issues with monitoring  monitoring system designed for site admins, web interface unacceptable for users, consider possibility of using myEGI portal when available  PL-Grid Helpdesk allows reporting issues, problems and service requests  Reporting can be done via phone call, or PL-Grid Helpesk web interface, phone call reports are registered by operator  Report registration returns a user with incident identifier allows to refer and modify the incident later on  Incident transferred to EGI level if solution lies beyond the scope of Polish NGI still can be managed via PL-Grid Helpdesk

8 Resource Usage Monitoring System  Motivation: PL-Grid grant accounting, daily data reports for users  In first prototype available the users can track their resource usage  status of jobs daily  daily workload (CPU-, walltime) per computing center  Currently used in parallel with EGI accounting - APEL

9 EGI, NGI & PL-Grid Operations – high level view EGI: Central Operator on Duty NGI: Regional Operator on Duty EGI Operations Dashboard GGUS PL-Grid Helpdesk WebSvc Regional Technical Support Site Administrators use Operations Support Teams Operations Support Tools Monitoring JMS

10 PL-Grid Operations: Incident Management “The main objective of incident management process is to resume regular state of affairs as quickly as possible and minimize the impact of business processes." Service Operation based on ITIL(R) V3  Identification  incidents are triggered by monitoring system, users or technical staff  Registration  issue tracking system (PL-Grid adapted Request Tracker)  incident reported by user or staff is always registered  only long-standing (>24h) problems reported by monitoring system are registered  Classification  regular middleware services / PL-Grid applications  Escalation  experts are responsible for making sure the problem is solved or reassign  incidents can be escalated to EGI for software problems  Solution applied & Tested => Issue Closed  administrator of failed resource applies solution  triggers execution of the monitoring system probes  check if user is satisfied => if all OK, close incident

11 Incident Management – PL-Grid experience  Pro-active procedures for troubleshooting in first 24h monitoring system reported incidents, involving Regional Technical Support  Incident solution process can be useful source of knowledge  PL-Grid introduced Operational Problems Knowledge Base  Regional Technical Support team creates entries  data to be re-used when similar problem occurs again  publicly available - web pages indexed by search engines  entry contains full error message and detailed solution procedure - in case of problems – paste your error message in Google Search  KB population started in Aug 2009, ~50 entries  knowledge base link:  Incident Management Metrics – evaluate performance  quantitative e.g. number of incidents, individual submitters, GGUS share etc.  focused on teams response time  Issues  team reaction time metrics indicate room for improvement, need to promote incident handling procedures among supporters/experts  Knowledge Base requires initial investment, but more entries, more it pays off

12 Grid Infrastructure Monitoring System  Motivation: not acceptable to wait for user to notify service problem  PL-Grid monitoring system is extended version of EGI nagios-based system for grid services availability monitoring  PL-Grid extensions  monitoring PL-Grid scientific software  probes for availability of PL-Grid VO (vo.plgrid.pl)  other middleware services (being integrated)  Alarms sent to EGI message bus (based on ActiveMQ JMS implementation) and then displayed in EGI Operations Dashboard (incl. PL-Grid extensions)  Issues  core services poorly or not monitored  monitoring system triggers incidents, nice to have possibility to monitor trends and predict failures  no control system, services does not have management interface – software maturity issue

13 Operations Communication & Documentation  PL-Grid Operations Center is distributed, resources are located in geographically distant centers – requires other than F2F means of communication  Solving operational problem requires interactive communication (better than )  Coordination of distributed teams require procedures, work descriptions and handovers  PL-Grid use bi-weekly teleconferences where operations issues can be discussed  Jabber service with automatically generated contact list to all registered PL-Grid staff  RTS fills daily handover reports and quarterly summary  Operational Documentation  Incident Handling in PL-Grid Helpdesk  Operational Procedures for ROD, RTS and site admins

14 Questions?