Download presentation
Presentation is loading. Please wait.
1
SA1 Operation of EGI technical platforms
Peter Solagna Senior Operations manager, EGI.eu WP5 Activity Leader, EGI-Engage
2
Outline Overview Status of the EGI federation production services
Objectives, tasks, partners and effort Status of the EGI federation production services Achievements Coordination Security Integration Use of resources, issues Plan for PY2 Summary SA1 Operations
3
Overview
4
Work Package objectives
Tasks Task Objectives TSA1.1 Operations coordination Coordinate the operational activities of the EGI production infrastructure, ensuring a secure and reliable provisioning of HTC, cloud and storage resources. TSA1.2 Development of security operations Evolve the security activities in EGI to support the new technologies and resource provisioning paradigms, maintaining a secure and trustworthy infrastructure. TSA1.3 Integration, deployment of grid and cloud platforms Integrate and deploy platforms on cloud and grid resources to support new use cases for the existing and new EGI users. TSA1.1 clarify that SA1.1 is coordinating activities outside of the project. SA1 Operations
5
SA1 Partners and effort 9 Participants PY1 effort Project Total effort
Task Leader / Partner SA1.1 Operations Coordination Peter Solagna / EGI.eu SA1.2 Development of Security Operations David Kelsey / STFC SA1.3 Integration, Deployment of Grid and Cloud Platforms 9 Participants PY1 effort 42 PMs Project Total effort 106 PMs 3.5 FTEs Provided by PO SA1 Operations
6
Status of the EGI production infrastructure
Main achievements of the work package
7
The EGI distributed infrastructure
Total countries 56 EGI members 24 22 NGIs, 2 EIROs Integrated resource infrastructures 6 Total resource centres (HTC and cloud) 324 Peer infrastructure 1 SA1 Operations
8
MoUs with infrastructure providers
MoUs with integrated infrastructures (non-council members): Integrated in the EGI production infrastructure, subject to the same operational requirements, policies and procedures Asia Pacific (26 resource centres) ROC Latin America (9 resource centres) ROC Africa-Arabia (10 resource centres) China (1 resource centres) Ukraine (19 resource centres) Canada (9 resource centres) MoUs with peer-infrastructures: To enable interoperability to support common communities Open Science Grid (US) Compute Canada SA1 Operations
9
Platform architecture
Community Platforms Brokering, community-specific data, tools and applications EGI endorsed VM images, Helpdesk Collaboration Platform VM Image Catalogue of Data-intensive computing HTC Platform Cloud compute and storage GPGPU Platform Open Data Platform EGI Core Infrastructure Platform AAI, Service Registry, Accounting, Monitoring Federated Service Management, LTOS Physical Infrastructure SA1 Operations
10
Platform architecture
Community Platforms Brokering, community-specific data, tools and applications EGI endorsed VM images, Helpdesk Collaboration Platform VM Image Catalogue of Data-intensive computing HTC Platform Cloud compute and storage GPGPU Platform Open Data Platform EGI Core Infrastructure Platform AAI, Service Registry, Accounting, Monitoring Federated Service Management Physical Infrastructure SA1 Operations
11
EGI Federated operations
Operations coordination EGI.eu Core infrastructure platform Operations Management Board Ops Centre Ops Centre Ops Centre NGI/EIRO’s Resource Infrastructure NGI/EIRO’s Resource Infrastructure Integrated Resource Infrastructure Resource Centre Resource Centre Resource Centre Architecture of EGI Operations, EGI.eu, NGIs/EIROs, Resource Centres, Core activities Resource Centre Resource Centre Resource Centre WP5 Operations
12
High Throughput data analysis platform (HTC)
Logical cores Computing power (HEP_SPEC06) Online storage (PB) Nearline storage (PB) February 2016 652,000 5,842,000 264 239 Increase from January 2015 + 23% + 38% + 11.8% + 42% Explain what hep_spec06 is
13
HTC capacity consumption
Metric In the last year Increase from 2014 Normalized CPU time 20,560 Million hours* HEP_SPEC06 26.4 % Number of jobs 585 Million 9.3% Online storage 160 PB N.A. SA1 Operations
14
The EGI Federated Cloud
OpenStack Nova Manage instances Uniform interfaces and behaviour Share & endorse VM images (OVF) Cloud Providers Cloud Site (OpenStack) (OpenNebula) (Synnefo) Image replication (VMCatcher) EGI e-infrastructure operation tools Operation services AAI (VO management) Service registry Information service Accounting Monitoring Endorsed VM images SA1 Operations
15
EGI Federated Cloud participants
As of Number of resource centres 2016 22 (+3 integrations in progress) Cloud management system Number of resource centres OpenStack 15 Open Nebula 6 Synnefo 1 Total of 6k cores SA1 Operations
16
Federated cloud usage CPU hours # VM instantiated
# Virtual appliances registered Last year 2.31 Millions 343,000 62 Daily average 6,320 920 - Daily increase 9% 86.3% VMs, cpu hours VA registered in AppDB, VOs using AppDB SA1 Operations
17
Federated cloud usage CPU hours # VM instantiated
# Virtual appliances registered Last year 2.31 Millions 343,000 62 Daily average 6,320 920 - Daily increase 9% 86.3% VMs, cpu hours VA registered in AppDB, VOs using AppDB SA1 Operations
18
EGI AAI and trust model TRUST User A TRUST EGI Services Virtual
Information sent to service providers Community attributes “User A” User A TRUST EGI Services Virtual Organization WP5 Operations
19
Changes in EGI AAI and trust model
During the last year users used robot certificates to: Submit 40% of the jobs Consume 31% of the CPU time Robot Science gateway Information sent to service providers Community attributes User A TRUST EGI Services Virtual Organization WP5 Operations
20
Achievements Main achievements of the work package
21
Core activities The EGI Core activities are centrally provided services (both technical and human) that enable the EGI federation daily activities. Service providers are chosen with a bid process open to all EGI members. The monitoring activities have been merged in one bid. Monitoring infrastructures have been centralised, all distributed components – operated by NGIs – will be decommissioned after June 1st The new monitoring infrastructure is more flexible and efficient Core service Operations Portal Security coordination Accounting Repository Acceptance criteria Accounting Portal Collaboration tools/IT support SAM central services Staged rollout Monitoring central services Software provisioning infrastructure Security monitoring and related support tools Incident management helpdesk Service registry (GOCDB) 1st and 2nd level support (core platform, community platform) Catchall services AppDB Operations support e-Grant AppDB and GOCDB have been integrated with the new AAI platform developed in JRA1.1 AppDB has been added to the list of the funded core activities starting in May Explain the funding scheme Operations support have been renamed to ‘Services for the long tail’ in the new bid, and will support resource allocation and long tail of science user authorization. SA1 Operations
22
UMD EGI software provisioning provides the technical tools & the processes to support the UMD. The main goals are: Verify that the software fulfills a given set of Quality Criteria Distributing the software provided by the Technology Providers (i.e. development teams) through a central repository Deploy the software into the infrastructure in a controlled way Remove from production unsupported software Decommissioning campaigns dCache storage element v2.6: Completed SL5 and Debian6 based software: Ongoing Target end of April UMD major releases OS Supported # Updates # Components released UMD-3 SL5, SL6, Deb6 7 42 UMD-4 SL6, Ubuntu 14, CentOS7 1 5 Explain what’s funded. SA1 Operations
23
Federated cloud integration
Full integration of cloud-based tests in the availability and reliability calculation Previously calculated separately Improvement of the monitoring probes for the standard OCCI interface Integration of the OpenStack native interface for IaaS in the EGI production infrastructure Service Registry Accounting Monitoring SA1 Operations
24
VMI Endorsement & Operation policy
Endorser Cloud users, deploying a virtual machines became the administrators of a service running on EGI resources The goal is to define the roles and responsibilities in the registration of virtual machine images in the catalogue and in the operations of the VMs New roles Virtual Machine Image endorser – responsible for the content of the VMI registered in AppDB Virtual Machine Operator – responsible for all security aspects of a running VM Virtual Machine Consumer – end user with no privileges Defined in collaboration with EGI FedCloud stakeholders Aim to strike balance between security requirements and technical feasibility Supported by a list of security requirements related to Virtual Machines. Approved in March by the Operations Management Board AppDB EGI VMI Catalogue Endorses Instantiates Operator SA1 Operations
25
Security activities focus for EGI-Engage
The challenges that drive the EGI Security evolution: Changes in the technology: federated cloud platform is introducing new software and services to the EGI portfolio Changes in the usage patterns: custom configuration of resources, PaaS and SaaS, reduced limitations on the user activities The following actions have been taken to keep secure a distributed infrastructure where every virtual machine is a different service running: Performed a new security threats risk assessment Updated security procedures Updated and developed new security policies SA1 Operations
26
EGI Security Threat Risk Assessment
A complete revision and update of the previous security threat risk assessment Focus on cloud and changing technology Documents produced: Security threat risk assessment (not public) High-level summary with the description of the main changes and methodology The main areas where threats with high risk values are: Security incidents in the EGI Federated Cloud Including detection and effective handling Software and technology choice Rapidly change and proliferation of software making it difficult to ensure secure technology is deployed Staffing levels and training Ensuring sufficient staff to carry out security activities SA1 Operations
27
Software vulnerability handling and security incident handling
Software vulnerability handling procedure Previously geared to grid middleware provided by EGI’s collaborators Now covers all types of software on EGI infrastructure Cloud enabling, virtual appliances, OS, VO, as well as grid middleware The advisory template was updated Security incident handling procedure Cloud-specific requirements, e.g. : RC should not delete compromised VMs Report VMI used by compromised VMs EGI-CSIRT now clearly responsible for: Reporting compromised users to VO & CAs Coordinating response for vulnerable VMIs Send closure report SA1 Operations
28
On-going security assessment on new software
Software checklist Aimed at users developing or integrating software to be used on EGI Helps to avoid common problems Both in terms of support and implementation Aimed at partially mitigating the problem with increasing software diversity Cloud Technology questionnaire Aimed at assessing if an enabling technology is sufficiently secure to fulfil the EGI security policies, and for EGI security team to recommend usage Revised and completed - to examine technologies (cloud) on which EGI has widespread dependence. SA1 Operations
29
Data Protection Policy
To replace the “Policy on the Handling of User-Level Job Accounting Data” To generalise to all forms of Accounting Data and all cases of handling of Personal Data VO Portals, Pilot Job factories, accounting, logging, VO User registration databases (e.g. VOMS) Next step – create specific policies (using the provided template) for Accounting and GOCDB The current draft has been presented to the OMB SA1 Operations
30
Acceptable Use Policy and Conditions of Use
General EGI Acceptable Use Policy (AUP) Released new version: the previous was from 2013 Was approved by EGI OMB in October 2015 Revision addresses Generalise to include all EGI services not only HTC HTC, clouds, long tail of science, etc. Require: acknowledge support in publications Rewording: Liability issues and Data Protection SA1 Operations
31
Services for the long tail of science (LTOS)
Access mode for the individual researchers and small research groups who need access to computational resources and online services to manage and analyse large amount of data. Goal: reduce the barriers for new EGI users Technical implementations: Development of a user registration portal (access.egi.eu), integrated with EGI SSO IdP and social credentials Integration with the Catania science gateway framework Creation of a dedicated VO and a PUSP mechanism to support authorization on resources Collection of an initial pool of resources to support the VO Policies and procedures LTOS AUP LTOS Security policy Procedures for user verification and authorization Access.egi.eu: User registration portal Register User information Science gateway Access EGI HTC and Cloud resources WP5 Operations
32
Post agreement technical support
SLA Framework SLAs define the resources allocated, the quality of service, the support level, and the technical requirements. Request submission Matchmaking Propose & Sign OLAs Creating SLA Propose & Sign SLA Post agreement technical support Negotiator Customer Provider A SLA OLAs Provider B Number of active SLAs 3 HTC allocated resources 55M HEP_SPEC06*hours 60TB Storage Number of supporting OLAs 13 Number of SLA under negotiation 6 (involving 21 RCs) Cloud allocated resources 232 VMs 11TB Storage SLA is an agreement on intentions to collaborate and support research. WP5 Operations
33
ESA exploitation platform integration
EGI is collaborating with ESA through Terradue Terradue is operating the Exploitation platform for the hydrology and geohazard use cases Federated cloud sites are providing computing capacity to ESA’s users Activities performed: Integration of Terradue platform with the OCCI interfaces Integration with the EGI X.509 based authorization Registration of dedicated VOs and VMI Successful performance and scalability tests Implementation of a distributed computing model to distribute load across multiple sites ESA Exploitation platform EGI Federated cloud OCCI Computational tasks Input data Number of resource centres supporting the use case 5 resource centres Allocated cloud resources 190 cores, 416 GB ram SA1 Operations
34
Use of Resources and Issues
Numbers will be provided by PO, work package leaders must provide explanation
35
SA1 – Effort consumed in PY1
Under-reporting for EGI.eu in SA1.1, new Operations team members have been hired towards the end of PY1 Fully staffed now Small deviations in partners with relatively small effort allocated Easy to compensate in PY2 SA1 Operations
36
Plan for PY2 Freedom on way to present
37
Plans PY2 TSA1.1: Fully integrate cloud middleware in the software provisioning process Deploy in production the storage accounting Extend the SLA coverage to more VOs Produce policies to support the data service under definition Improve monitoring of cloud services Include INDIGO DataCloud outputs in the software provision TSA1.2: Further improve software vulnerability issue handling Revise risk levels, including vulnerabilities associated with VMs and VAs. Check and revise if necessary the Vulnerability handling process, after 1 year's use. Threats/Risk: Work with various groups on how to mitigate some of the highest risk threats Work with the GEANT SIG Risk Working group to improve Security risks handling in the EGI project. Improve EGI Cloud security requirements and complete the definition of the cloud security model Work with the GEANT SIG Risk Working group to evolve best practices in security risks handling . TSA1.3 LTOS integration with production level services Support ESA exploitation platform in production Deploy interoperability capabilities with Cloud Canada SA1 Operations
38
Summary Objective 1 (O1): The continued coordination of the EGI Community. Continued coordination of the EGI federation operations, of the core services and the software provisioning Continued improvement of the integration of new services in the EGI federation Completed new security threats risk assessment Improved security processes and policies with particular focus on cloud services Objective 2 (O2): EGI Solutions, related business models and access policies. ESA Thematic Exploitation platform successfully tested on EGI resources, ready for production Agreed and activated 3 VO SLAs, 6 more under preparation Objective 5 (O5): (O5): Promote the adoption of the current EGI services and extend them with new capabilities through user co-development Developed new tools for the Long-tail of science Achievements grouped by objectives SA1 Operations
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.