Download presentation
Presentation is loading. Please wait.
Published byGwen Cox Modified over 8 years ago
1
www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools EGI-InSPIRE PY2 Review 25-26 June 2013 T. Ferrari, Chief Operations Officer/EGI.eu 1 SA1 and JRA1 - EGI-InSPIRE Review 2013
2
www.egi.eu EGI-InSPIRE RI-261323 Contents I.Introduction to SA1 and JRA1 II.Infrastructure III.Results IV.Analysis 2 SA1 and JRA1 - EGI-InSPIRE Review 2013
3
www.egi.eu EGI-InSPIRE RI-261323 PART I I.Introduction to SA1 and JRA1 –resources –partners –objectives II.Resource infrastructure III.Service infrastructure IV.Analysis 3 SA1 and JRA1 - EGI-InSPIRE Review 2013
4
www.egi.eu EGI-InSPIRE RI-261323 SA1 tasks and resource distribution TaskLeader/Partner Task effort distribution TSA1.1Activity ManagementT. Ferrari/EGI.eu1% TSA1.2Secure InfrastructureD. Kelsey/STFC9% TSA1.3Service Deployment ValidationJ. Pina/LIP11% TSA1.4Infrastructure for Grid ManagementE. Imamagic/ SRCE21% TSA1.5AccountingA. Packer/STFC6% TSA1.6Helpdesk InfrastructureG. Grein/KIT9%9% TSA1.7Support TeamsR. Trompert/SARA28% TSA1.8Providing a Reliable Grid Infrastructure and core services P. Korosoglou /AUTH15% I. Introduction 4 SA1 and JRA1 - EGI-InSPIRE Review 2013
5
www.egi.eu EGI-InSPIRE RI-261323 JRA1 tasks and resource distribution Task and Effort Distribution Leader TJRA1.1Activity Management (7%)D. Scardaci/INFN TJRA1.2Maintenance and development of the deployed operational tools (42%) T. Antoni/KIT TJRA1.3Supporting National Deployment Models (6%) = TJRA1.4Accounting for usage of different resource types (28%) Cloud, HPC, Desktop Grid, Storage/Data Usage Application Usage Billing system A. Packer/SFTC TJRA1.5Integrated Operations Portal (17%) Service Oriented model Porting to Symfony New DCI integration Support of mobile devices C. L’Orphelin/CNRS I. Introduction 5 SA1 and JRA1 - EGI-InSPIRE Review 2013
6
www.egi.eu EGI-InSPIRE RI-261323 Objectives Operate a secure, reliable European-wide federated production grid infrastructure that is integrated and interoperates with other grids worldwide TasksTask Objectives O1 TSA1.2 Maintain a secure infrastructure O2 TSA1.3 Validate new technology releases (tools and middleware) O3 TSA1.7 Support end-users and Resource Centre administrators O4 TSA1.8 Service Level Management, grid oversight, documentation and procedures O5 TSA1.4 TSA1.5 TSA1.6 Operate tools, the accounting infrastructure and the EGI Helpdesk O6 JRA1.2 JRA1.3 JRA1.4 JRA1.5 Evolve the operational tools used by the production infrastructure -Maintenance, development and support of national deployment -Accounting for the use of new resources (desktop, virtualisation, storage, data, application and billing) I. Introduction 6 SA1 and JRA1 - EGI-InSPIRE Review 2013
7
www.egi.eu EGI-InSPIRE RI-261323 Contents I.Introduction to SA1 and JRA1 II.Infrastructure –Resource Centres –Operations Centres –Usage III.Results IV.Analysis 7 SA1 and JRA1 - EGI-InSPIRE Review 2013
8
www.egi.eu EGI-InSPIRE RI-261323 Metrics (April 2013)Value (yearly increase) Countries EGI-InSPIRE and Council members44 Including integrated RPs55 NewIran, Vietnam LeavingArgentina, Ireland Operations Centres Total (National, Federated, EIRO)34 (26, 7, 1) NewNGI_UA Decommissioned NGI_IE, Iniciativa de Grid de America Latina (IGALC) Resource Centres (RCs) EGI-InSPIRE and Council Participants306 (-6%) Including integrated infrastructures335 (-5%) Resource infrastructure Providers (RPs) II. Resource infrastructure 8SA1 and JRA1 - EGI-InSPIRE Review 2013
9
www.egi.eu EGI-InSPIRE RI-261323 Metrics (April 2012)Value (yearly increase) Resource Centres (RCs) EGI-InSPIRE and Council Participants326 (+3%) Including integrated infrastructures 352 Supporting MPI90 (+20%) Countries EGI-InSPIRE and Council members42 Including integrated RPs54 Operations Centres Total (National, Federated, EIRO)37 (27, 9, 1) NewNGI_FI, NGI_IE, NGI_UK Resource infrastructure Providers Integrated EGI-InSPIRE Partners and EGI Council Members Internal/External RPs being integrated External RP Peer RP II. Resource infrastructure 9 UPDATE SA1 and JRA1 - EGI-InSPIRE Review 2013
10
www.egi.eu EGI-InSPIRE RI-261323 Installed Capacity StorageValue (yearly increase) Disk (PB)235 PB (+69%) Tape (PB)176 PB (+32%) Logical CPUs (April 2013)Value (yearly increase) EGI-InSPIRE and Council Partic. 333,400 (+23%) Stretch target: 330,000 Including integrated RPs361,300 II. Resource infrastructure 10 SA1 and JRA1 - EGI-InSPIRE Review 2013
11
www.egi.eu EGI-InSPIRE RI-261323 Capacity delivered 11 4.5 Billion hours 116.7 Million hours 110.3 Million hours II. Resource infrastructure SA1 and JRA1 - EGI-InSPIRE Review 2013
12
www.egi.eu EGI-InSPIRE RI-261323 Contents I.Introduction to SA1 and JRA1 II.Resource infrastructure III.Results –Innovation of the EGI core infrastructure platform –Infrastructure integration –Continued secure and reliable access to federated resources –Evolving operations and user support –Enhancement of service level management and reporting IV.Analysis 12 SA1 and JRA1 - EGI-InSPIRE Review 2013
13
www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 Innovation of the EGI core infrastructure platform PO1 The continued operation and expansion of today’s production infrastructure 13 III. Results SA1 and JRA1 - EGI-InSPIRE Review 2013
14
www.egi.eu EGI-InSPIRE RI-261323 Information discovery service Support of OGF standard GLUE 1.3 and 2.0 by all Resource Centres, more than 4300 service end-points –Retirement of gLite software EGI GLUE2 profile –ARC and UNICORE integration with EMI 3.0 –Globus (in progress) EGI GLUE 2 profile –Defines detailed semantics, extends the schema definitions –Classifies attributes by use case and importance Service Discovery, Service Selection, Monitoring, Oversight, Diagnostic Mandatory, Recommended, Desirable, Optional, Undesirable Various implications: need for accuracy, update rate, latency, caching etc. –Validation criteria FATAL, ERROR, WARNING, INFO GLUE 2 support by EGI service registry – GOCDB (CHECK) 14 III. Results Innovation SA1 and JRA1 - EGI-InSPIRE Review 2013
15
www.egi.eu EGI-InSPIRE RI-261323 New features Security support tools Operations Portal GOCDB SAM Messaging Accounting repository and portal Metrics portal 15 III. Results Innovation SA1 and JRA1 - EGI-InSPIRE Review 2013
16
www.egi.eu EGI-InSPIRE RI-261323 Security support tools SSC5 support framework Pakiti Monitoring of deployed software versions Security Nagios –Tracking of services as a result of vulnerabilities –Unsupported software versions –Plan for full worker node cluster monitoring in progress PY3 // S. Gabriel –….. 16 III. Results Innovation SA1 and JRA1 - EGI-InSPIRE Review 2013
17
www.egi.eu EGI-InSPIRE RI-261323 Operations Portal and GOCDB 17 Operations Portal –Dashboards refactoring: new portal look and feel & improvement on efficiency, reactivity and visibility GOCDB –V4.4: harmonized the separate read-only and read/write instances into a single portal –design and development of V5: new data layer able to use different RDBMS platforms and GLUE2 compliant –A GOCDB failover instance at the Fraunhofer Institute –Scoping of services, revision of testing and monitoring attributes III. Results Innovation SA1 and JRA1 - EGI-InSPIRE Review 2013
18
www.egi.eu EGI-InSPIRE RI-261323 18 Monitoring and Messaging SAM –MyEGI: reviewed and improved as part of SAM Update-19. Messaging –A test message broker network was deployed –A test suite in order to test the message brokers network prior to applying software updates on production message broker network was developed –Credential synchronization system –The scalability of the network improved through the deployment of a new ActiveMQ software versions III. Results Innovation SA1 and JRA1 - EGI-InSPIRE Review 2013
19
www.egi.eu EGI-InSPIRE RI-261323 19 Accounting repository –New record publication protocol (Secure Stomp Messenger – SSM) –New SSM 2.0 –CAR, STAR record formats MPI, storage, cloud EMI 3 APEL client EGI Accounting Portal External clients External clients MySQL CPU JobRecords CPU Summaries MySQL CPU JobRecords CPU Summaries EGI Message Brokers Receiving SSM Sending SSM Record loader DBunloader EMI 3 APEL client III. Results Innovation SA1 and JRA1 - EGI-InSPIRE Review 2013
20
www.egi.eu EGI-InSPIRE RI-261323 20 Accounting portal and metrics portal Accounting Portal –Extension and maintenance of the VO Manager views –InterNGI usage reports –Support of RFC2254 DNs –PDA & Mobile support Metrics Portal –Per country metrics for NGIs; –Added XLS output support; –Aggregated metrics (sum of all NGI predicted metrics plus entered metrics) III. Results Innovation SA1 and JRA1 - EGI-InSPIRE Review 2013
21
www.egi.eu EGI-InSPIRE RI-261323 Tools for pay-for-use 21 III. Results Innovation SA1 and JRA1 - EGI-InSPIRE Review 2013
22
www.egi.eu EGI-InSPIRE RI-261323 Contents I.Introduction to SA1 and JRA1 II.Resource infrastructure III.Results –Innovation of the EGI core infrastructure platform –Infrastructure integration Software, Technology, Operations Tools –Continued secure and reliable access to federated resources –Evolving operations and user support –Enhancement of service level management and reporting IV.Analysis 22 SA1 and JRA1 - EGI-InSPIRE Review 2013
23
www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 Infrastructure integration PO4 Interfaces that expand access to new user communities PO5 Mechanisms to integrate existing infrastructure providers in Europe and around the world PO6 Establish processes and procedures to allow the integration of new DCI technologies 23 III. Results SA1 and JRA1 - EGI-InSPIRE Review 2013
24
www.egi.eu EGI-InSPIRE RI-261323 Software Core infrastructure platform interfaces and procedures allow the deployment and operation of heterogeneous software stacks –Integration of ARC (9.11%), UNICORE (1.49%), Globus (1.49%), QosCosGrid (1.12%) and Desktop Grids (6540 cores) completed –accounting in progress –77 MPI Resource Centres –44 HPC clusters 24 III. Results Integration SA1 and JRA1 - EGI-InSPIRE Review 2013
25
www.egi.eu EGI-InSPIRE RI-261323 Technology Seamless authentication, data access, transfer, replication and processing across EGI, EUDAT and PRACE technical and operational Collaboration between user communities, Resource Infrastructures and Technology Providers −Seismology (VERCE) −Biomedical modelling and simulation of the human body (VPH) −European plate observation (EPOS) −Multi-scale simulation for nano-material science (MAPPER) −Hydro-meteorology (DRIHM) OSG and XSEDE 25 III. Results Integration SA1 and JRA1 - EGI-InSPIRE Review 2013
26
www.egi.eu EGI-InSPIRE RI-261323 Operations Provisioning of operations technical services –locally deployable tools and/or centrally provided regional views (GOCDB, SAM, Availability/Reliability computation) Provisioning of software –Interoperation thanks to reuse of code (GOCDB for EUDAT) Seamless exchange of support services, accounting and monitoring Security operations and policies –Security for Collaboration among Infrastructures (SCI) – D. Kelsey/EGI (EGI, OSG, PRACE, WLCG, XSEDE) Managing cross-Grid operational security risks, build trust, develop policy standards for collaboration –One team for security incident response for EGI, EUDAT, PRACE (in progress) MoU with Asia Pacific and with OSG (in progress) 26 III. Results Integration SA1 and JRA1 - EGI-InSPIRE Review 2013
27
www.egi.eu EGI-InSPIRE RI-261323 27 GOCDB Scoping of service end-points: allow them to be part of different arbitrary infrastructures (mini-project) −Non-exclusive scope tags to enable hosting multiple projects/infrastructures within a single GOCDB instance −Infrastructure-specific views for regionalization Design and development of GOCDB v5: new data layer able to use different RDBMS platforms to ease deployment 28 new service types registered (94 in total at PQ12): ex- gLite, UNICORE, Globus, iRODS, ARC, QosCosGrid, BES, Cloud, Torque, Squid, XRootD III. Results Integration Tools SA1 and JRA1 - EGI-InSPIRE Review 2013
28
www.egi.eu EGI-InSPIRE RI-261323 28 Accounting New Secure Stomp Messenger protocol (SSM v. 2.0) for publishing of accounting records (IGE/GridSafe, QoSCoSGrid/MAPPER, EDGI and UNICORE) Accounting of usage of multiple resources types −PY1/PY2: Compute −PY3/PY4: Storage, Cloud, Parallel Jobs −In progress: Application, Virtual machines (in progress) Accounting Portal −XML endpoints generalization to be used to export accounting data to other infrastructures −Cloud accounting views being prototyped Regional APEL Repository and Accounting Portal (prototype) −Can be deployed by external infrastructures III. Results Integration Tools SA1 and JRA1 - EGI-InSPIRE Review 2013
29
www.egi.eu EGI-InSPIRE RI-261323 SAM and Operations Portal Operations Portal –Operations Dashboard regional views SAM –Any probe can be integrated to extend the framework Ex-gLite, Globus, ARC and Desktop Grids PY3: QosCosGrid, UNICORE, Federated cloud –Fully regionalized Set of probes can be customized Operations: all operations centres are running their local SAM regional instance (44 service end-points) Users: VO SAM (11 service end-points) 29 III. Results Integration Tools SA1 and JRA1 - EGI-InSPIRE Review 2013
30
www.egi.eu EGI-InSPIRE RI-261323 Contents I.Introduction to SA1 and JRA1 II.Resource infrastructure III.Results –Innovation of the EGI core infrastructure platform –Infrastructure integration –Continued secure and reliable access to federated resources Coordination, Security, Core Infrastructure Platform operations –Evolving operations and user support –Enhancement of service level management and reporting IV.Analysis 30 SA1 and JRA1 - EGI-InSPIRE Review 2013
31
www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 Secure and reliable access to federated resources PO1 The continued operation and expansion of today’s production infrastructure 31 III. Results SA1 and JRA1 - EGI-InSPIRE Review 2013
32
www.egi.eu EGI-InSPIRE RI-261323 Operations coordination Operations Management Board –EGI-level operations coordination, integration, documentation, sustainability and technical roadmap, requirements gathering and prioritization for software and operational tools Local operations coordination provided by NGIs/EIROs User Community Board –EGI-level operations and support coordination of existing user communities, VRCs for internal coordination Software deployment coordination –EGI bi-weekly meetings Working groups and task forces (7 active, 3 new in PY3) GGUS Advisory Board –Operators (1 st and 2 nd level support), user communities, technology providers (3 rd level support) 32 III. Results Operations Coordination SA1 and JRA1 - EGI-InSPIRE Review 2013
33
www.egi.eu EGI-InSPIRE RI-261323 Security Operations Security Coordination Group coordinate overall EGI security activities Incident Response Task Force (incident handling and coordination) Security monitoring (Pakiti, Security Nagios, Security Dashboard) Security drills Training and dissemination EUGridPMA EGI CSIRT Software Vulnerability Group Handling reported vulnerabilities, vulnerability assessment, secure coding education Security Policy Group Develop and maintain security policies External software providers (EMI/IGE/…) PRACE/XEDE/OSG/… 33 III. Results Operations Coordination SA1 and JRA1 - EGI-InSPIRE Review 2013
34
www.egi.eu EGI-InSPIRE RI-261323 EGI CSIRT (placeh) Incident Prevention (security monitoring, security intelligence group, assessing known vulnerabilities with the support of SVG, preparation of advisories) Incident Response (incident handling including investigation, heads up, coordination with site CSIRTs, forensics, technical support, advisories, reports) Listed team in the European database of CSIRTs –Trusted Introducer accreditation in Oct 2013 Collaboration with other CSIRTs Grid-SEC −coordinated response to cross-grid security incidents (vetted security representatives from WLCG, OSG, XSEDE, EGI) PY3 activities –Policy for decommissioning of unsupported software –… 34 III. Results Operations Security SA1 and JRA1 - EGI-InSPIRE Review 2013
35
www.egi.eu EGI-InSPIRE RI-261323 Security Service Challenges To be provided 35 III. Results Operations Security SA1 and JRA1 - EGI-InSPIRE Review 2013
36
www.egi.eu EGI-InSPIRE RI-261323 Central emergency user suspension Central emergency suspension framework allows for central infrastructure-wide blacklisting of a DN that is suspected of malicious use or to prevent malicious use –Sites are protected quickly, e.g. during an incident which occurs out of hours –Automated suspension or download of a list of suspended DNs and deploy alternative mechanism Resource Centres know and control Approved: You must implement automated procedures to download the security emergency suspension lists defined centrally by Security Operations and should take appropriate actions based on these lists, to be effective within the specified time period. [Service Operations security policy] Procedure for compromised certificates and criteria for central emergency in progress suspension under preparation Enforcement of policy –PY3: Implementation plan –PY4: Start of implementation in PQ14 36 III. Results Operations Security SA1 and JRA1 - EGI-InSPIRE Review 2013
37
www.egi.eu EGI-InSPIRE RI-261323 Security Training To be provided 37 III. Results Operations Security SA1 and JRA1 - EGI-InSPIRE Review 2013
38
www.egi.eu EGI-InSPIRE RI-261323 Security monitoring To be provided // technical explanation 38 III. Results Operations Security SA1 and JRA1 - EGI-InSPIRE Review 2013
39
www.egi.eu EGI-InSPIRE RI-261323 Security Threat Risk Mitigations To be provided // activities and mitigations 39 III. Results Operations Security SA1 and JRA1 - EGI-InSPIRE Review 2013
40
www.egi.eu EGI-InSPIRE RI-261323 Incidents and advisories SA1 and JRA1 - EGI-InSPIRE Review 2013 40 III. Results Operations Security EGI CSIRT PY3 2 security incidents handled (10 in PY2) nodes deployed for denial of service attack, brute force attack via ssh 5 EGI CSRIT alerts and advisories 2 critical, 1 high priority 12 Software Vulnerability Group advisories 1 critical, 3 high priority X security training sessions (forensics, …) 0 site suspended because of a critical vulnerability ACTIVITIES - Unsupported software: Policy for unsupported software decommissioning, definition of oversight responsibilities, development of probes, extension of the Operations Security Dashboard and definition of operational procedures for handling retirement campaigns -Service Operations Security Policy -…
41
www.egi.eu EGI-InSPIRE RI-261323 SHA-2 readiness From 01-10-2013 Certification Authorities should begin to phase out issuance of SHA-1 end-entity certificates. SHA-2 certificates should be issued by default Action plan defined for the preparation of the production infrastructure towards the ubiquitous support of SHA-2 as encryption of end-entity certificates –SHA-2 compliance as UMD quality criteria –SHA-2 compliance assessment of operational tools –SHA-2 compliance monitoring of the infrastructure due to start in July 2013 41 III. Results Operations Security SA1 and JRA1 - EGI-InSPIRE Review 2013
42
www.egi.eu EGI-InSPIRE RI-261323 SAM 42 SAM: 3 releases to production (Update 17, 19, 20) −New SAM probes (from EMI as of Update 22) −New SAM component for monitoring profile management (POEM, as of SAM Update-17) −describe existing monitoring metrics and group them −configure the way the availability and reliability is computed −allow notifications to messaging system −New Availability and Reliability reporting module (MyEGI) −New SAM instance for monitoring of EGI.eu and NGI operational tools and Nagios server for central infrastructure monitoring needs (GLUE2 validation, software versions, publishing of User DN in accounting, etc.) SA1 and JRA1 - EGI-InSPIRE Review 2013 III. Results Operations Core Platform
43
www.egi.eu EGI-InSPIRE RI-261323 Operations Portal Operations Portal: 3 releases to production −Security Operations Dashboard for monitoring of unsupported middleware version −Improvements of VO Dashboard, VO ID card and VO management module −Bug fixing New prototype under testing 43 SA1 and JRA1 - EGI-InSPIRE Review 2013 III. Results Operations Core Platform
44
www.egi.eu EGI-InSPIRE RI-261323 Accounting APEL accounting repository −Local job accounting −SSM support −June 2012: SSM 1.2, data from previous repository is merged to provide summaries −April 2013: SSM 2.0 for new APEL publishers with Compute Accounting Record and StAR (Storage Accounting Record) format support −Migration of CERN, NIKHEF, French sites, Italy, NGI_NDGF, OSG to SSM −New types of accounting records: storage, cloud, parallel jobs −Revision of Cloud Accounting Usage Record −Testing of SSM 2.0 for Cloud and storage Accounting portal (1 new release in production) –Inter-NGI reports –Local/Grid jobs views, bug fixing 44 SA1 and JRA1 - EGI-InSPIRE Review 2013 III. Results Operations Core Platform
45
www.egi.eu EGI-InSPIRE RI-261323 Performance 45 III. Results Core Platform PY3 Availability: 97.5% PY3 Reliability: 98.3% SA1 and JRA1 - EGI-InSPIRE Review 2013 PY3 availability: 99.5% PY3 reliability: 99.5%
46
www.egi.eu EGI-InSPIRE RI-261323 Software decommissioning Grid oversight supervision of unsupported software decommissioning –gLite 3.1 and 3.2 (October 2012 - April 2013) –EMI 1 (March 2013 – PQ13) 46 III. Results Operations SA1 and JRA1 - EGI-InSPIRE Review 2013 gLite 3.1/3.2EMI 1
47
www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 Evolving operations and user support O2 Continued support of researchers 47 III. Results SA1 and JRA1 - EGI-InSPIRE Review 2013
48
www.egi.eu EGI-InSPIRE RI-261323 VO and user Statistics III. Results Support 48SA1 and JRA1 - EGI-InSPIRE Review 2013
49
www.egi.eu EGI-InSPIRE RI-261323 CPU Usage 49SA1 and JRA1 - EGI-InSPIRE Review 2013 III. Results Support
50
www.egi.eu EGI-InSPIRE RI-261323 GGUS New Report Generator to create ticket-related reports on demand GGUS High Availability –HA solution for the Web front-ends and the AR Server (BMC Remedy Action Request System) –HA is now available for the on call duty service and the Intrusion Prevention system –Switching to the backup machines though the manual run of a management script - KIT on-call service at any time 50 SA1 and JRA1 - EGI-InSPIRE Review 2013 III. Results Support
51
www.egi.eu EGI-InSPIRE RI-261323 GGUS interfaces Feasibility study of helpdesk interface integration between EGI and PRACE –DANTE and partners for network support Interfaces to ServiceNow (CERN) and NGI France New authentication methods in addition to X.509 to access GGUS services are under examination (e.g. Identity Federation) Regionalisation: xGUS helpdesk (7 instances) 51 SA1 and JRA1 - EGI-InSPIRE Review 2013 III. Results Support
52
www.egi.eu EGI-InSPIRE RI-261323 Software support SA1 and JRA1 - EGI-InSPIRE Review 2013 52 III. Results Support
53
www.egi.eu EGI-InSPIRE RI-261323 New ticket workflows Automated handling of tickets in case of unresponsive submitters and supporters –Tickets closed as unsolved (after human supervision) 1.In case of no reply from submitter (after 25 working days) 2.In case of no reply from supporter (after 30 working days) –(2) Necessary for managing 3 rd level software support in a scenario with no central support coordination provided by EMI/IGE –Periodic notification to submitters/supporters when answer is awaited Revision of all software support units Evolution of technology support helpdesk SA1 and JRA1 - EGI-InSPIRE Review 2013 53 III. Results Support
54
www.egi.eu EGI-InSPIRE RI-261323 Base/Medium/Advanced software support (A) Base –Severity Level: 1-4. Max response time: 5 w days working days (B) Medium –Severity Level: top priority (1) and very urgent (2) - Maximum response time: 1 working day –Severity level: urgent (3) and less urgent (4) - Maximum response time: 5 working days (C) Differentiated –Severity Level: top- priority (1) - Maximum response time: 4 support hours Severity Level: very urgent (2) and urgent (3) - Maximum response time: 1 working day Severity Level: less urgent (4) - Maximum response time: 5 working days III. Results Support
55
www.egi.eu EGI-InSPIRE RI-261323 2 nd level support statistics SA1 and JRA1 - EGI-InSPIRE Review 2013 55 III. Results Support
56
www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 Enhancement of service level management and reporting 56 III. Results SLM and reporting SA1 and JRA1 - EGI-InSPIRE Review 2013 PO1 The continued operation and expansion of today’s production infrastructure
57
www.egi.eu EGI-InSPIRE RI-261323 Operational Level Agreements (OLAs) Resource Centre (RC) OLA –Local resource access services Resource infrastructure Provider (RP) OLA –NGI/EIRO technical, support and coordination services (NEW) EGI.eu OLA –Centrally provisioned technical, support and coordination services 57 SA1 and JRA1 - EGI-InSPIRE Review 2013 III. Results SLM and reporting
58
www.egi.eu EGI-InSPIRE RI-261323 Service level management Process assessment and improvement plan for the EGI.eu core services Objectives –Services to be delivered shall be agreed with customers. SLAs shall include agreed service targets. –A service catalogue shall be maintained. –Services and SLAs shall be reviewed at planned intervals. –Service performance shall be monitored against service targets. –For supporting services or service components provided by Federation members, OLAs shall be agreed PY4 improvement plan with the support of FedSM 58 SA1 and JRA1 - EGI-InSPIRE Review 2013 III. Results SLM and reporting
59
www.egi.eu EGI-InSPIRE RI-261323 59 Service Level computation and reporting Service level computation RC Availabilitiy/Reliability SAM/ACE (new) VO Availability/Reliability Operations Portal VO-oriented views (new) RP Availability/Reliability Operations Portal with topology information from GOCDB (groupings) Visualization and reporting (new) Resource Centre monthly reports MyEGI (prototype) RP, EGI.eu and VO monthly reports Operations Portal PY4: Mini Project for new modular and integrated approach to service level computations SA1 and JRA1 - EGI-InSPIRE Review 2013 III. Results SLM and reporting
60
www.egi.eu EGI-InSPIRE RI-261323 VO Availability SA1 and JRA1 - EGI-InSPIRE Review 201360 Median of VO Monthly Availability (top active VOs): 99.48% III. Results SLM and reporting
61
www.egi.eu EGI-InSPIRE RI-261323 61 Service Level computation and reporting SA1 and JRA1 - EGI-InSPIRE Review 2013 III. Results SLM and reporting
62
www.egi.eu EGI-InSPIRE RI-261323 Underperformance (new) RC service level monitoring completely automated through Nagios probes –Alarms in case performance does not comply to the service level targets of the RC OLA –Follow-up through GGUS tickets according to general operations procedures PY4: automation of RP and EGI.eu 62 III. Results SA1 and JRA1 - EGI-InSPIRE Review 2013 III. Results SLM and reporting
63
www.egi.eu EGI-InSPIRE RI-261323 Operations portal –v. 2.9.4: availability/reliability reporting –v. 2.9.6: underperforming site probe 63 III. Results SA1 and JRA1 - EGI-InSPIRE Review 2013
64
www.egi.eu EGI-InSPIRE RI-261323 64 SA1 and JRA1 - EGI-InSPIRE Review 2013
65
www.egi.eu EGI-InSPIRE RI-261323 PART IV I.Introduction to SA1 and JRA1 II.Resource infrastructure III.Results IV.Analysis –Issues, use of resources, impact and plans 65 SA1 and JRA1 - EGI-InSPIRE Review 2013
66
www.egi.eu EGI-InSPIRE RI-261323 Third-party software repositories, software maintenance and specialized support challenged by the end of EMI and IGE –EGI software provisioning processes, service level targets, responsiveness to reported incidents –PY3 mitigation: revision of procedures, strengthening of EGI specialized support, sustainability Expanding set of products and platforms to be staged rollout. –PY3 mitigation: revision of SA1.3 effort, reallocation of resources, policies and priorities Several infrastructures in the Eastern Europe region underperforming –PY3 mitigation: support action in collaboration with GRNET, training, better support of testing in the operational tools (PY3) Issues/SA1 RRRR IV. Analysis 66 SA1 and JRA1 - EGI-InSPIRE Review 2013
67
www.egi.eu EGI-InSPIRE RI-261323 Issues/JRA1 RRRR 2 nd level support of regionalized operational tools –accounting, monitoring, operations portal, currently relying on voluntary contributions –PY3 mitigation: proposed revised structure of EGI support tasks and effort allocation Insufficient effort for innovation to address new “high impact requirements” –JRA1.3 regionalization, JRA1.2 SAM, JRA1.5 operations portal (assessment in D7.2) –PY3 mitigation: re-scoping of development activities IV. Analysis 67 SA1 and JRA1 - EGI-InSPIRE Review 2013
68
www.egi.eu EGI-InSPIRE RI-261323 Use of Resources/SA1 RRR 104% PMs achieved (aggregated) EGI.eu Global Services –96% PMs achieved (aggregated) compensating PY1 over reporting due to transition from EGEE –some tasks affected by personnel turnover (coordination of integration TSA1.3, documentation TSA1.8) handover of coordination to EGI.eu (PY3-PY4) –catch all services/availability (TSA1.8) partner affected by hiring freeze in the public sector, but services successfully delivered NGI Local Services –few cases of under/over reporting that will be compensated over the duration of the project IV. Analysis 68 SA1 and JRA1 - EGI-InSPIRE Review 2013
69
www.egi.eu EGI-InSPIRE RI-261323 87% PMs achieved (aggregated) –112% WP7-E tasks (TJRA1.1, TJRA1.2) – 69% general tasks (TJRA1.2, TJRA1.4, TJRA1.5) PY2 compensating PY1 deviations –95% for TJRA1.3 (PY1+PY2) –100% for TJRA1.2 (PY1+PY2) Over reporting –TJRA1.2 146% (CSIC): restructuring of both accounting portal and metrics portal Under reporting –TJRA1.4: 52% achieved, requirements gathering phase, compensation in PY3 Use of Resources/JRA1 RRRR IV. Analysis 69 SA1 and JRA1 - EGI-InSPIRE Review 2013
70
www.egi.eu EGI-InSPIRE RI-261323 Operations Portal –Mobile devices support –Service level reporting module –Monitoring of Virtual Sites GOCDB –GLUE2.0 compatibility and rendering Accounting –add new resource types in production (storage, clouds, parallel jobs) –regional repository Messaging –deployment of the supported authorization and authentication framework GGUS –Production version of the Report Generator –Improvement of high availability configuration (including DBMS) SAM –Integration of middleware probes from EMI –Production version of profile management service (POEM) JRA1 Plans for PY2 IV. Analysis 70 SA1 and JRA1 - EGI-InSPIRE Review 2013
71
www.egi.eu EGI-InSPIRE RI-261323 Operations Portal –.. GOCDB –.. Accounting –.. Messaging –.. GGUS –.. SAM –.. JRA1 Plans for PY3 IV. Analysis 71 SA1 and JRA1 - EGI-InSPIRE Review 2013
72
www.egi.eu EGI-InSPIRE RI-261323 SA1 Plans for PY2 Security –complete security threat risk assessment, consolidation of security tools, NGI SSC5, revise policies and new one on data protection Middleware upgrade campaign –Extended staged rollout –Phasing out of gLite 3.1/3.2 –GLUE 2.0: upgrade plan, EGI profiling and information validation Service level management –EGI.eu OLA –extended monitoring and reporting of EGI.eu and NGI services –consolidation of NGI services (including NGI SAM) DCI integration –EUDAT and PRACE roadmap –Accounting of Globus, Unicore, Desktop Grids, QosCosGrid Migration to SSM of infrastructures publishing summary records IPv6 compliance testing IV. Analysis 72 SA1 and JRA1 - EGI-InSPIRE Review 2013
73
www.egi.eu EGI-InSPIRE RI-261323 SA1 Plans for PY3 … IV. Analysis 73 SA1 and JRA1 - EGI-InSPIRE Review 2013
74
www.egi.eu EGI-InSPIRE RI-261323 O1 The continued operation and expansion of today’s production infrastructure −352 production RCs, (+30.7% compute capacity, +50% storage capacity) −+1.9% yearly increase of availability O2 Continued support of researchers−+3.20% new registered VOs −+46.42% yearly increase of resource usage −Astronomy Astrophysics and Astro-particle Physics ramping up O4 Interfaces that expand access to new user communities −New GOCDB service types and SAM probes −34 operational tool releases −Integration of accounting in progress −55 grid middleware requirements O5 Mechanisms to integrate existing infrastructure providers in Europe and around the world −RP Operational Level Agreement −2 new RP MoUs −Moldova, South Africa, Ukraine being integrated −Collaboration with PRACE O6 Establish processes and procedures to allow the integration of new DCI technologies −Collaboration with EUDAT −ARC, gLite, GLOBUS, UNICORE, Desktop Grid, QosCosGrid Impact and value RRRRR IV. Analysis 74 SA1 and JRA1 - EGI-InSPIRE Review 2013
75
www.egi.eu EGI-InSPIRE RI-261323 Summary RRRR SA1 and JRA1 contribute to meet the project objectives and support the EGI Strategy 2020 Leadership with the expansion of the resource infrastructure and increasing usage Openness with a growing level of integration Reliability with continued operation and increasing performance Innovation with evolving tools, procedures and policies and requirements 75 SA1 and JRA1 - EGI-InSPIRE Review 2013
76
www.egi.eu EGI-InSPIRE RI-261323 References (review) Security Risk Assessment of the EGI Infrastructure, deliverable D4.4, http://go.egi.eu/863 Security procedures and policies: –https://wiki.egi.eu/wiki/EGI_CSIRT:Policies –https://wiki.egi.eu/wiki/SPG:Documents Operations procedures: https://wiki.egi.eu/wiki/Operations_Procedures Operations documentation: https://wiki.egi.eu/wiki/Documentation https://wiki.egi.eu/wiki/Documentation Task forces and working groups: https://wiki.egi.eu/wiki/Task_forces https://wiki.egi.eu/wiki/Task_forces https://wiki.egi.eu/wiki/Software_Retirement_Calendar https://wiki.egi.eu/wiki/Documentation#OLA 76 SA1 and JRA1 - EGI-InSPIRE Review 2013
77
www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 Backup 77 SA1 and JRA1 - EGI-InSPIRE Review 2013
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.