EGEE-II SLA Progress Report & Initial Proposal

Slides:



Advertisements
Similar presentations
SERVICE LEVEL AGREEMENTS The Technical Contract Within the Master Agreement.
Advertisements

EGEE SA1 Operations Workshop Stockholm, 13-15/06/2007 Enabling Grids for E-sciencE Service Level Agreement Metrics SLA SA1 Working Group Łukasz Skitał.
Managing the Information Technology Resource Jerry N. Luftman
B O N N E V I L L E P O W E R A D M I N I S T R A T I O N 1 Network Operating Committee (NOC) June 12 th, 2014.
08/11/908 WP2 e-NMR Grid deployment and operations Technical Review in Brussels, 8 th of December 2008 Marco Verlato.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Pilot Test-bed Operations and Support Work.
S L H C – P P Management Tools Kick-off Meeting April 8 th, 2008 Mar CAPEANS CERN This project has received funding from the European.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE II - Network Service Level Agreement (SLA) Establishment EGEE’07 Mary Grammatikou.
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
SEE-GRID-SCI SEE-GRID-SCI Operations Procedures and Tools Antun Balaz Institute of Physics Belgrade, Serbia The SEE-GRID-SCI.
Responsibilities of ROC and CIC in EGEE infrastructure A.Kryukov, SINP MSU, CIC Manager Yu.Lazin, IHEP, ROC Manager
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
EGEE is a project funded by the European Union under contract IST JRA1-SA1 requirement gathering Maite Barroso JRA1 Integration and Testing.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks David Kelsey RAL/STFC,
Grid Operations Centre LCG SLAs and Site Audits Trevor Daniels, John Gordon GDB 8 Mar 2004.
9-Oct-03D.P.Kelsey, LCG-GDB-Security1 LCG/GDB Security (Report from the LCG Security Group) FNAL 9 October 2003 David Kelsey CCLRC/RAL, UK
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1: Grid Operations Maite Barroso (CERN)
Grid Security Vulnerability Group Linda Cornwall, GDB, CERN 7 th September 2005
INFSO-RI Enabling Grids for E-sciencE EGEE SA1 in EGEE-II – Overview Ian Bird IT Department CERN, Switzerland EGEE.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Operational Architecture of PL-Grid project M.Radecki,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Task tracking SA3 All Hands Meeting Prague.
WLCG Laura Perini1 EGI Operation Scenarios Introduction to panel discussion.
EGEE-II INFSO-RI Enabling Grids for E-sciencE End-to-End Service Level Agreement Provisioning and Monitoring for End-to-End QoS.
State of Georgia Release Management Training
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Resource Allocation in EGEEIII Overview &
EGI-InSPIRE RI EGI EGI-InSPIRE RI Service Operations Security Policy the new generalised site operations security policy.
PIC port d’informació científica EGEE – EGI Transition for WLCG in Spain M. Delfino, G. Merino, PIC Spanish Tier-1 WLCG CB 13-Nov-2009.
INFSO-RI Enabling Grids for E-sciencE NRENs & Grids Workshop Relations between EGEE & NRENs Mathieu Goutelle (CNRS UREC) EGEE-SA2.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Deliverable DSA1.4 Jules Wolfrat ARM-9 –
SA2 : Network Resource Provision All Activity Meeting – 17 March SA2 Execution Plan for the first year Jean-Paul Gautier SA2 Manager CNRS/UREC.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
CERN - IT Department CH-1211 Genève 23 Switzerland t IT-GD-OPS attendance to EGEE’09 IT/GD Group Meeting, 09 October 2009.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What all NGIs need to do: Helpdesk / User.
INFSO-RI Enabling Grids for E-sciencE Network Services Development Network Resource Provision 3 rd EGEE Conference, Athens, 20 th.
A Computing Tier 2 Node Eric Fede – LAPP/IN2P3. 2 Eric Fede – 1st Chinese-French Workshop Plan What is a Tier 2 –Context and definition To be a Tier 2.
II EGEE conference Den Haag November, ROC-CIC status in Italy
SEE-GRID-SCI Grid Operations Procedures Antun Balaz Institute of Physics Belgrade Serbia The SEE-GRID-SCI initiative.
NGI_TR Emrah Akkoyun TR-Grid Operational Center EGI-InSPIRE – SA1 Kickoff Meeting1.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid is a Bazaar of Resource Providers and.
EGI Process Assessment and Improvement Plan – EGI core services – Tiziana Ferrari FedSM project 1EGI Process Assessment and Improvement Plan (Core Services)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
Grid Deployment Technical Working Groups: Middleware selection AAA,security Resource scheduling Operations User Support GDB Grid Deployment Resource planning,
SLAs with Software Provider. Scope “…declare the rights and responsibilities between EGI.eu and the Software Provider for a particular component.” Which.
Bob Jones EGEE Technical Director
Il Sistema di Supporto INFNGrid & GGUS (Global Grid User Support )
James Casey, CERN IT-GD WLCG Workshop 1st September, 2007
Job monitoring and accounting data visualization
Regional Operations Centres Core infrastructure Centres
To the ETS – Accounts Setup and Preferences Online Training Course
EGEE is a project funded by the European Union
SA1 Execution Plan Status and Issues
Ian Bird GDB Meeting CERN 9 September 2003
Brief overview on GridICE and Ticketing System
SLR, SLS and SLA issues Afrodite Sevasti SA2 participant
Christos Markou Institute of Nuclear Physics NCSR ‘Demokritos’
Grid Operations Procedures
Service Level Agreement/Description between CE ROC and Sites
Report on SLA progress Ioannis Liabotis <ilaboti at grnet.gr>
The CCIN2P3 and its role in EGEE/LCG
Maite Barroso, SA1 activity leader CERN 27th January 2009
Nordic ROC Organization
Hyper-V Cloud Proof of Concept Kickoff Meeting <Customer Name>
LCG Operations Centres
LCG Operations Workshop, e-IRG Workshop
To the ETS – Accounts Setup and Preferences Online Training Course
EGEE Operation Tools and Procedures
Presentation transcript:

EGEE-II SLA Progress Report & Initial Proposal Ioannis Liabotis <ilaboti at grnet.gr> Ognjen Prnjat <oprnjat at grnet.gr> Kostas Koumantaros <kkoum at grnet.gr> SLA-WG (project-eu-egee-sa1-sla-group@cern.ch) https://twiki.cern.ch/twiki/bin/view/EGEE/SA1_SLA_WG

SLA WG Mandate Collecting relevant examples of SLAs and other documentation and making these available within the working group. Reviewing the example documents and extracting a list of useful items from each one. Identifying the broad areas which a minimal SLA should cover. These are areas for which all ROCs should have some sort of agreement with their resource centres. Deciding whether there should be a single SLA or whether we should follow a WLCG model in which there are several SLAs with varying levels of commitment from the resource centres and corresponding various levels of support from the ROC. Creating one or more draft SLAs which incorporate points 3) and 4). In each area covered by the SLA there should be suggestions on the type of metrics which could be applied. These draft SLAs should not contain details of numbers for limits, thresholds, etc. for specific metrics. After the draft SLA(s) has/have been approved by the ROC Managers, the SLA working group will make a proposal for the metrics to appear in each of the sections of the SLA. Wherever possible, metrics should be used which are already measured. The number of metrics should be kept to a minimum set which will apply to all ROCs. OPS Workshop, June 2007

SLA WG will NOT Identify what will be the consequences for resource centres failing SLA(s). This will be discussed by the ROC managers at a later stage. Propose specific limits, thresholds, targets, etc. for metrics. OPS Workshop, June 2007

Identified SLAs or MoUs SEE-GRID SLA WLCG MoU INFN MoU UK Tier-2 MoU Oxford NGS Service Level Description Service Level Description for NGS Heldesk BalticGrid SLA (Networking) EGEE-II SA2 SLA (Networking) OPS Workshop, June 2007

SLAs/MoUs Summaries SEE-GRID SLA Hardware and connectivity criteria Minimum amount of CPUs Network Connectivity enough to pass SAM tests and support SEEGRID VO Service Nodes must support execution of SAM tests Level of support Site admin, security admin, 9-5 weekday support, response within following working day Level of expertise 1 experienced site admin, relations with network support stuff, 1 security admin, names of responsible people should be stated in HGSM VO support Support and deliver to SEEGIRD-VO and support OPS role Conformance to Operational Metrics OPS Workshop, June 2007

SLAs/MoUs Summaries SEE-GRID SLA (2) Site Availability (Quality Metric) Sites must have 90% availability during uptime in a given quarter (3months) This metric is calculated as follows: If a site has degraded performance during a given day (>50% of the SAM test fail) then site is considered down for that day. Site declared Downtime Sites must not be in downtime for more that 10% of the time in a given quarter (3 months) except for reasons out of sites responsibility negotiated with country GIMs. OPS Workshop, June 2007

SLAs/MoUs Summaries WLCG MoU Different Levels of service are provided for different service providers Host Laboratory Services Tier-1 Services Tier-2 Services Definition of Grid Operation Services List of supported VOs Minimal Computing Resources for participation Network Connectivity criteria Storage availability Minimum delay in responding to operational problems Average availability measured over a period of time. Provision of Grid Operations centers User support facilities provision Table with available and foreseen available computing power made available to the grid OPS Workshop, June 2007

SLAs/MoUs Summaries GridPP SLA Hardware Support Stuff GridPP Supports Hardware Support Stuff FTE Allocation defined for support stuff Support Stuff should produce quarterly reports Hardware Resources Hardware resources should me made available to the Grid. Table with offered hardware resources provided in the MoU Availability of resources Level of service agreed between Deployment Board and Tier 2 board Provide support for VO but not installation and maintenance of experiment software Monitoring of Hardware resources Monitoring software provided by Grid PP Installed at sites Results should be public and available in a web sites Target Shares Overall target shares are defined by boards Individual target shares are defined by Tier-2s Software GridPP? provides middleware releases Timescale for deployment of software is decided by Tier-2 board Network Connectivity GridPP provides network monitoring software Site agree to run this software Security and availability Defined by various boards Management Reporting and information exchange procedures defined OPS Workshop, June 2007

SLAs/MoUs Summaries INFN GRID MoU Provide adequate computing and storage resources (and optional services where available). The farm size (at least 10 CPUs) and the storage capacity will be settled by contractors involved; Guarantee sufficient manpower in order to manage the site: at least 2 people and a minimum of 1 FTE are required; Manage site resources efficiently: carry out m/w installation, perform updates, apply patches, properly modify configurations as requested by CMT and *within maximum time expected and agreed for the several operations.* Take the responsibility and update the tickets assigned to the site within 24 hours (tier 2) or 48 hours (other sites) Monday to Wednesday. Actively monitor the site, checking both resources and services status on a regular basis (using existing tools: GridICE, GSTAT, SAM, etc.) Guarantee continuity to the support and management of the site, also during holidays in one of the following forms: a. Local shift; b. Delegate site management (with full access) to CMT; c. Signal site downtime and close queues (only for the sites with no special INFN commitments); Guarantee proper site-manager participation to fortnightly EGEE SA1 phone conferences and SA1/production grid meetings. Keep site information on GOC-DB up-to-date; Enable test VOs (infngrid, dteam and ops), giving them an higher priority than the one of other VOs OPS Workshop, June 2007

SLAs/MoUs Summaries Oxford NGS SLD Applies to Oxford NGS node at Oxford University Service Inclusions Available Middleware and middleware services User level software available and the support level Accepted certificates Various other service details… Service Exclusions Turnaround time cannot be guaranteed Service Level Quality Availability Reliability Filestore Compliance Operational Framework Definition of Support Categories Problem severity definitions Escalation Mechanisms OPS Workshop, June 2007

SLAs/MoUs Summaries Oxford NGS SLD Support Center Service Provided by the NGS Support Centre HelpDesk Certification and Registration Site Resources User Support Web site Training Application Repository Documentation User Account Management Promotion and education Global Activities and Collaboration Monitoring and Auditing of Services. Development Board Technical Board Operations Board Creation of New Services Termination of Services Performance Reporting Procedures Definition of Monitoring Tools and other services OPS Workshop, June 2007

SLAs/MoUs Summaries BalticGRID SLA Packet loss: < 0.1% One-way delay between the BalticGrid resource centres is in the range of 20-50ms, but does not exceed 150 ms under any conditions. MTU of at least 1500 bytes all along the traffic path. Minimal jitter by avoiding extra routing/buffering hops on the path. Traffic load does not exceed 75% of available bandwidth for more than 10% a month. Available bandwidth should be increased so that traffic load does not exceed 50%. Qos Levels: Amber Rock Timber Time scales for implementation of these levels of service defined. OPS Workshop, June 2007

SLAs/MoUs Summaries EGEE-II SA2 SLA Based On Premium IP offered by GEANT OPS Workshop, June 2007

EGEE-II Proposed SLA EGEE SLA Structure Purpose Summary SLA Between Sites and ROCs With a view towards the NGI-Sites relationship Parties to the Agreement Grid Management Service Providers ROCs Service Providers (Sites) Duration and Extensions Amendment Description of Services Covered Grid Management Service Core Services Site Services Responsibilities GRIDOPS Requirements It is proposed to have 2-3 level of SLA requirements from sites by changing the limits OPS Workshop, June 2007

Requirements - Hardware and connectivity criteria Site Hardware Metric Operator Value Measurement Method Service Nodes Must Support the Execution of SAM tests Worker Nodes Cluster Total Number of CPUs > xxx Information System Total Si2k Storage Capacity Total Storage Nodes Interconnection Interconnection BW OPS Workshop, June 2007

Requirements – Network Connectivity Metric Operator Value Measurement Method Connectivity with GEANT Bandwidth > xxx OPS Workshop, June 2007

Requirements – Level of Expertise 1 experienced site admin 1 experienced network support person, or a direct link to network support / network operations center 1 security administrator to be available for advice any time Names and contact details (e-mail) of the above people should be available via GOCDB OPS Workshop, June 2007

Requirements – Level of Support Metric Operator Value Measurement Method Ticket Response Time** Mean Response Time < xxx GGUS Ticket Solution Time*** Mean Solution Time ***If solution can be provided by site personnel Support Calendar   Mon-Friday 09:00-17:00 Local Time Except Public Holidays and Scheduled Institution Closures OPS Workshop, June 2007

SLA - Conformance to Operational Metrics Availability Metric Operator Value Measurement Method Site Availability(time up/scheduled up-time) > xxx%/quarter SAM Site Downtime* Declared Uptime Xxx%/quarter GOCB *As declared in OPS manual declaration of scheduled intervantions OPS Workshop, June 2007

Requirements - VO support Site needs to define a minimum amount of resource priorities for supporting specific VOs OPS Workshop, June 2007