Introduction OMB, T. Ferrari/EGI.eu 12/4/2018

Slides:



Advertisements
Similar presentations
Technology Steering Group January 31, 2007 Academic Affairs Technology Steering Group February 13, 2008.
Advertisements

Technology Steering Group January 31, 2007 Academic Affairs Technology Steering Group February 13, 2008.
UK NGI Operations John Gordon 10 th January 2012.
EGI: A European Distributed Computing Infrastructure Steven Newhouse Interim EGI.eu Director.
What if you suspect a security incident or software vulnerability? What if you suspect a security incident at your site? DON’T PANIC Immediately inform:
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks From ROCs to NGIs The pole1 and pole 2 people.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Future support of EGI services Tiziana Ferrari/EGI.eu Future support of EGI.
EARTO – working group on quality issues – 2 nd session Anneli Karttunen, Quality Manager VTT Technical Research Centre of Finland This presentation.
EGI: SA1 Operations John Gordon EGEE09 Barcelona September 2009.
What if you suspect a security incident or software vulnerability? What if you suspect a security incident at your site? DON’T PANIC Immediately inform:
RI EGI-InSPIRE RI EGI Future activities Peter Solagna – EGI.eu.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-EGI Grid Operations Transition Maite.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI (Present and) Future of the EGI Services for WLCG Peter Solagna – EGI.eu.
EGI-InSPIRE Steven Newhouse Interim EGI.eu Director EGI-InSPIRE Project Director Technical Director EGEE-III 1GDB - December 2009.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Operational Architecture of PL-Grid project M.Radecki,
EGI-InSPIRE RI EGI EGI-InSPIRE RI Service Operations Security Policy the new generalised site operations security policy.
PIC port d’informació científica EGEE – EGI Transition for WLCG in Spain M. Delfino, G. Merino, PIC Spanish Tier-1 WLCG CB 13-Nov-2009.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Ops Portal New Requirements.
AEGIS Academic and Educational Grid Initiative of Serbia Antun Balaz (NGI_AEGIS Technical Manager) Dusan Vudragovic (NGI_AEGIS Deputy.
INFSO-RI SA2 ETICS2 first Review Valerio Venturi INFN Bruxelles, 3 April 2009 Infrastructure Support.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Grid Oversight in Service Level Agreement environment Małgorzata Krakowian,
RI EGI-InSPIRE RI UMD 2 Decommissioning Status Cristina Aiftimiei EGI.eu.
EGI Process Assessment and Improvement Plan – EGI core services – Tiziana Ferrari FedSM project 1EGI Process Assessment and Improvement Plan (Core Services)
Grid Deployment Technical Working Groups: Middleware selection AAA,security Resource scheduling Operations User Support GDB Grid Deployment Resource planning,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Regional tools use cases overview Peter Solagna – EGI.eu On behalf of the.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Operations Portal OTAG September, 21th 2011 Cyril L’Orphelin – CCIN2P3/CNRS.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI MPI VT report OMB Meeting 28 th February 2012.
SLAs with Software Provider. Scope “…declare the rights and responsibilities between EGI.eu and the Software Provider for a particular component.” Which.
Nordic NE ROC Face 2 Face Meeting
Transition to EGI PSC-06 Istanbul Ioannis Liabotis Greece GRNET
Documentation, Best Practices and Procedures: Roadmap
Il Sistema di Supporto INFNGrid & GGUS (Global Grid User Support )
EGI Operations Management Board
PPS All sites Meeting: - CODs and PPS - Monitoring Tools
NGI and Site Nagios Monitoring
SA1.4 Infrastructure for Grid Management Overview
PL-Grid – an example of NGI support structure Marcin Radecki
Ian Bird GDB Meeting CERN 9 September 2003
DWQ Web Transformation
EGI Community Forum 2012 Munich, 29 March 2012
Advancements in Availability and Reliability computation Introduction and current status of the Comp Reports mini project C. Kanellopoulos GRNET.
Cyril L’Orphelin (CC-IN2P3) COD-19, Bologna, March 30th 2009
NA3: User Community Support Team
Agenda Welcome Project Status (inc. Activity Reports)
Maite Barroso, SA1 activity leader CERN 27th January 2009
Nordic ROC Organization
Outline Introduction Objectives Motivation Expected Output
Pole 3 – Dashboard Assessment COD 20 - Helsinki
TS4.10 Comp Reports A new approach to Computing Availability/Reliability reports for EGI Progress Report C. Kanellopoulos GRNET 9/14/2018.
Connecting the European Grid Infrastructure to Research Communities
Solutions for federated services management EGI
Unit4 Partner Portal for Case Creator
Interaction with resource providers: selection, SLA, support
Monitoring of cloud services
Centrally Managed Resource Allocation
GGUS Report Generator Assessment report
Unsupported middleware migration update
Operations Management Board April 30
EGI operations - news T. Ferrari/EGI.eu 12/9/2018.
Operations sustainability
UMD 2 / EMI 2 Decommissioning Status
UMD 2 Decommissioning Status
Operations Management Board January 29
UMD 2 Decommissioning Status
OLA/SLA framework update
EGI-Engage T. Ferrari/EGI.eu.
EGEE Operation Tools and Procedures
Core Activities re-assessment
Operations Management Board March 26
Retirement calendar of gLite 3.2 and EMI 1 middleware
Presentation transcript:

Introduction OMB, 18-12-2012 T. Ferrari/EGI.eu 12/4/2018 Introduction to OMB, 18-12-2012

Wiki Documentation Navigation of documentation being made easier (also through better navigation menus) Administrators: https://wiki.egi.eu/wiki/Administrator_Documentation Users: https://wiki.egi.eu/wiki/User_Documentation (in progress) Comments: operations at egi.eu or GGUS ticket 12/4/2018 Introduction to OMB, 18-12-2012

PROC16: update of decommissioning of unsupported software http://wiki.egi.eu/wiki/PROC16 Approved at the Nov OMB Various comments for improvements In case of site which fail to provide feedback and do not put Service Endpoints (SE) in downtime, ROD is requested to add a SE scheduled downtime (earlier: suspension)  suspension only if tickets reach the COD escalation step steps split into Preparation phase and Escalation phase  timelines for Escalation steps added some steps split to make them more clear 12/4/2018 Introduction to OMB, 18-12-2012

PROC04 Verification of monthly Availability/Reliability statistics no longer handled through monthly tickets opened by COD since Nov Underperformance in the last 30 calendar days detected by operations portal probe Alarm displayed in case of under performance ROD is responsible of opening ticket New procedure reflected in PROC04 Please provide feedback to COD/OMB in case of problem with the current A/R probe implementation  tuning may be needed (see presentation at the Nov OMB) 12/4/2018 Introduction to OMB, 18-12-2012

A/R reporting Top-BDII reports: https://operations-portal.egi.eu/availability/topbdiiList Site monthly reports (pdf monthly report). Pdf reports will be still uploaded on document DB after validation/recomputation http://grid-monitoring.egi.eu/myegi/reports/ Report  EGI Profile  ROC_CRITICAL EGI.eu operations tools and NGI SAM http://grid-monitoring.egi.eu/myegi/sa/ VO  OPS Profile  OPS_MONITOR_CRITICAL Sites  e.g. GRIDOPS-GOCDB (sites are hosted by the EGI.eu operations centre in GOCDB) https://grid-monitoring.egi.eu/myegi/sa/?view=2&graph=1&vo=37&profile=61&filters-value-Regions_or_Tiers=&filters-value-Sites=85&period=pD&dateorperiod=dt&startdate=01-12-2012&enddate=18-12-2012&resolution=D 12/4/2018 Introduction to OMB, 18-12-2012

Changes in A/R computation A/R of a RC that is certified in the course of the month or decommissioned (hence in scheduled down time) https://rt.egi.eu/rt/Ticket/Display.html?id=4790 Monthly A/R statistics include sites suspended during the previous month. A site suspended after 5 days with 0% availability during those days contribute to the average values as it was in production for the whole month, its contribution is weighted on the cores, but not on the number of days in production. Proposal: exclude sites that during the month experienced one of the following transitions uncertified  certified and Certified  suspended 12/4/2018 Introduction to OMB, 18-12-2012

EGI.eu OLA https://documents.egi.eu/document/1093 Few changes from November A few services removed Ticket triage and assignment (no longer a service separated from 1st and 2nd level support) Security Incident Response (communication based on e-mail exchanges  difficult quantitative measurement) targets relaxed for software support to keep them in-line with the old DMSU practice: 1st, 2nd and 3rd Level Support accroding to https://wiki.egi.eu/wiki/EGI_DMSU_Ticket_Priorities Maximum response time Top priority: immediate within support hours Very urgent: within 8 support hours Urgent: within 16 support hours Less Urgent: within 40 support hours Please provide feedback by Jan OMB, approval of EGI.eu OLA in case of no comments received 12/4/2018 Introduction to OMB, 18-12-2012

GGUS ticket handling New policy under discussion: close as unsolved tickets for which after several reminders no reply is provided from the submitter See discussion at https://savannah.cern.ch/support/?133041 When ticket set in "waiting for reply" + 5 WDs  email reminder to the user If no user answer + 5 WDs  2nd email reminder to the user If no user answer + 5 WDs  ticket automatically set to status "unsolved" (total of 15 WDs) unsolved + 10 WDs  ticket automatically set to status "closed" (total of 25 WDs!) Drawback: requires careful usage of “waiting for reply”  GGUS team monitoring Please circulate your comments by the Jan OMB to the mailing list 12/4/2018 Introduction to OMB, 18-12-2012

Support to new RCs and NGIs Problem: support to emerging RC administrators and NGIs with little experience Two new for a subforum created – moderated by COD (sso account required): sites http://go.egi.eu/NewSiteForum NGIs http://go.egi.eu/NewNGIForum Please advertise this in case of new RC administrators with little familiarity with tools, procedures and software installation 12/4/2018 Introduction to OMB, 18-12-2012

Nagios probe AB Kickoff the Nagios Probe Advisory Board under COD coordination Proposal Invite ARC-based NGIs to participate to provide feedback about the current ARC probes Indicate 1 expert per NGI Denmark, Estonia, Finland, Latvia, Lithuania, Slovenia, Sweden, Switzerland, Ukraine Doodle: http://doodle.com/mzcruxd3nvnd32gh 12/4/2018 Introduction to OMB, 18-12-2012

Mini projects Use of EGI-InSPIRE underspent funds to support supplemental activities that accelerate EGI's strategic goals around Community & Coordination, Operational Infrastructure and establishing Virtual Research Environments 12 month duration Deadline for submission: 21-01-13 for SHORT (4 page) project proposals. Questions relating to these mini-projects (administrative and technical): mini-projects@egi.eu.  Organisations within EGI-InSPIRE (or it JRUs) will be eligible for funding Ideas SAM extensions for regionalized A/R computations Operations Portal extensions Activities around resource allocation Accounting 12/4/2018 Introduction to OMB, 18-12-2012

Issues for Jan OMB Operations plan 2013 Resource allocation procedures Installation of software at sites Yaim core will not supported after EMI (April 2013) What is the experience with puppet and other configuration systems? Preferences/recommendations from NGIs? 12/4/2018 Introduction to OMB, 18-12-2012

Audio-conferencing EVO evolving into a pay per service from Jan 2013 Replacement still being evaluated Proposal Usage of AdobeConnect as interim solution OMB testing session on Jan 15 2013 (details will be availble on indico) 12/4/2018 Introduction to OMB, 18-12-2012