Download presentation
Presentation is loading. Please wait.
Published byRonald Caldwell Modified over 8 years ago
1
ARMS-CC Workshop – San Sebastián – July 20th, 2015 User-guided provisioning in federated clouds for distributed calculations A. J. Rubio-Montero 1, E. Huedo 2, R. Mayo-García 1 1 CIEMAT – 2 UCM ARMS-CC Workshop (as part of PoDP 2015) San Sehastián – July 20 th, 2015
2
The problem The followed approach Results obtained Conclusions Index 2
3
Typical approach with commercial providers (AWS) Focused on the management customised environments equivalent to the existent ones within the institution Elasticity for HTC – On demand growing data-centre capacity: clusters – Consolidate services: scientific portals But, how many providers will we count on? Current EGI FedCloud: 40 sites EGI grid: 573 CEs (126 CEs allowed for biomed VO) One cloud provider it’s not a matter 3
4
The Provisioning Problem: Inter-cloud approach Interoperation issue – Multiple Interfaces / protocols – Different authentication, authorisation and payment mechanisms and accounts. – Monitoring and control Limited characterisation Limited scheduling Not scalable 4
5
The Provisioning Problem: Federations Interoperability requirements Enable EGI FedCloud Standardised interfacesInteroperationOCCI v1.1, CDMI Information Systems (IS)Discovering and scheduling providers LDAP GLUE schema, site and top-BDIIs Authentication and Authorisation Infrastructure (AAI) Global security and Virtual Organisations X.509, CAs, VOMS Common monitoring, accouting and tracking systems QoS control SLA achievement Nagios/SAM, APEL, GGUS, GocDB… Other tools designed for cloud – AppDB Marketplace. – Science GateWay. 5
6
Scheduling highly distributed HTC calculations Compatible HTC applications use standardised interfaces/APIs for distributed computing – DRMAA, SAGA, OGSA-BES, OGSA-DAI. Other legacy applications use interfaces derived of Grid computing – GRAM, GridFTP, SRM. Efficient scheduling should take account of the performance of the instantiated VMs and the location of the data to distribute jobs 6
7
Scheduling highly distributed HTC calculations Most of the few brokering frameworks working on IaaS cloud federations are oriented to consolidate complex services on demand – SlipStream, QBROKAGE, CompatibleOne Exception: the PMES broker o ff ers the OGSA-BES interface to allow instantiating a single VM per each job 7
8
Characterisation, overheads and reliability As in Grid – GLUE (and OCCI) are incomplete, i.e. quotas for a user, QoS… – ISs usually do not show the reality – Cloud providers can be overbooked – To repeatedly instantiate VMs can be an unacceptable overhead – Cloud providers unexpectedly fail Hinders advanced scheduling 8
9
Use of pilot frameworks in clouds Concurrent use of Clouds and Grids (and local clusters or Desktop Computing if supported by the framework) Compatibility with applications previously adapted Preserving the achieved performance or speedup and the robustness of the system Keeping the expertise acquired from Grid, reducing the user training-gap and the operational costs Complementary tools such as web UI, monitoring tools and portals, etc. 9
10
Design adaptation Name LRMS basedElastic Clusters (Sholuoder, PBS, SGE) Push glideinWMS Application oriented Big-Job Grid Scheduler ServiceSs Gwpilot Pull LHC orientedDirac Volunteer Computing 3G-Bridge Easy deployment Communication Customisable Characterisation Legacy applications Automated Scheduling Guided provisioning Grid compatibility 10 Comparative of pilot systems
11
GWcloud + GWpilot advantages Compatibility and interoperability – Friendly interface and compatibility with legacy applications (DRMAA and OGSA-BES) – Grid security standards and protocols are preserved to enable compatibility with external Grid services – Independent and easy configuration, lower overheads that allow decentralised and local installations, even on the PC of the user – Minimal contextualisation, fully customisable by the user 11
12
GWcloud + GWpilot advantages Scheduling features – Automatic discovering and monitoring of providers that belong to several cloud federations. – Scheduling capabilities based on constraints, ranking, fair- sharing, deadlines, etc., to instantiate VMs at providers with certain characteristics Specific VM image (e.g. by the appdb.egi.eu identifier) Available hardware, with advanced reservations Associated costs and budgets, QoS, etc. 12
13
GWcloud + GWpilot advantages Scheduling features – Personalised user-level scheduling of tasks and VMs that allow Post-configuration of VMs on demand Customised monitoring of new configurations Personalised provisioning Efficient execution of very short jobs – Stackable with other scheduling tools – Parallel accounting of associated costs 13
14
Provider A Provider B OCCI DRMAA GridWay Core CLI GWpilot Factory pilots GWcloud Execution Driver GWpilot Server HTTPS pull task IaaS Federation abstraction Scheduler Federated IaaS Cloud GWcloud Information Driver LDAP Legacy Application task launching applications Top BDII BES Cloud-Init contextualisation a) a)creation of a new user with sudo privileges; b) b)creation of a le with the temporal user proxy; c) c)inclusion of EUGridPMA repositories; d) d)pre-installation of CAs certificates and minimal grid tools (globus-urlcopy); e) e)execute the pilot. GWcloud Information: The URI contact endpoint, the protocol, hypervisor and VIM release, maximum number of cores, etc. every os_tpl# and their appdb.egi.eu identifier in a list of pairs and included as a new tags. every resource_tpl# is shown as a different queue, with its own characterisation: number of cores, memory, etc. task troubleshooting
15
15
16
Experiments: Suitable EGI FedCloud providers Basic Ubuntu 14.04 LTS image was selected from the appdb.egi.eu repository – https://appdb.egi.eu/store/vo/image/de355bfb-5781-5b0c-9ccd-9bd3d0d2be06 https://appdb.egi.eu/store/vo/image/de355bfb-5781-5b0c-9ccd-9bd3d0d2be06 OCCI 1.1, encrypted endpoint Miss configured and unreliable sites omitted (7) ProviderOCCI endpointresource tpl#Max.VIM IDGBHyp.Cores Ahttps://carach5.ics.muni.cz:11443small2Xen960OpenNebula Bhttps://controller.ceta-ciemat.es:8787m1-small2KVM224OpenStack Chttps://egi-cloud.pd.infn.it:8787m1-small2KVM96OpenStack Dhttps://fc-one.i3m.upv.es:11443small1KVM16OpenNebula Ehttps://nova2.ui.savba.sk:8787m1-small2KVM408OpenStack Fhttps://prisma-cloud.ba.infn.it:8787small1KVM600OpenStack Ghttps://stack-server-01.ct.infn.it:8787m1-small2KVM600OpenStack 16
17
Experiments: Scheduling configuration Factory : – Maximum of 200 pilots (running on VMs) – Waits maximum of 600 s for starting a pilot ( creation, booting, contextualisation…) Scheduling : – The number of tasks and VMs managed during a scheduling cycle of 10 s was set to 20 – One VM per suitable provider in every cycle – Resource banning is enabled: the only way to know the quotas established at providers 17
18
Experiments: VMs provisioned ProviderTest 1Test 2Test 3 set upfailedset upfailedset upfailed A186115104 B57154273 C3013245 4 D44--138 E303193 6 F23221008 G123116104 Unreliable (7) 10492109 18 BEAMnrc – – 4·10 8 particles on a rectangular geometry – – The workload was divided into 2,000 tasks (150-280 s)
19
19 Experiments: Test 1
20
20 Experiments: Test 2
21
21 Experiments: Test 3
22
Conclusions 22 A generic framework for performing massive distributed calculations in federated clouds has been presented The system is able to perform a dynamic provisioning based on the current status of the cloud federations while supporting legacy applications The incorporation of diverse complex algorithms devoted to specific workloads will be evaluated in the future – – Stacking self-schedulers. – – Provisioning based on economical questions – – QoS and budgets – – Inclusion of deadlines or the management of checkpoints.
23
Thank you for your attention CIEMAT – Avda. Complutense, 40 – 28040 Madrid 91 346 6000 antonio.rubio@ciemat.es – rafael.mayo@ciemat.es http://rdgroups.ciemat.es/web/sci-track/
24
DRMAA Legacy Application GWpilot Factory pilots GridWay Core CLI task GWcloud Execution Driver GWpilot Server task HTTPS pull task IaaS Federation abstraction Scheduler troubleshooting launching applications Federated IaaS Cloud GWcloud Information Driver LDAP OCCI Top BDII Provider A Provider B BES 24
25
DRMAA Legacy Application GridWay Core CLI GWcloud Execution Driver IaaS Federation abstraction Scheduler troubleshooting launching applications Federated IaaS Cloud GWcloud Information Driver LDAP OCCI Top BDII Provider A Provider B BES remote BoT Direct execution Only suitable for long jobs or BoTs 25
26
GWcloud Information Driver GWcloud Information Driver The URI contact endpoint, the protocol, hypervisor and VIM release, the maximum number of available cores, etc. Every OS template name (the os tpl ) and their appdb.egi.eu image identifier is compiled in a list of pairs and included as a new tags. Every resource template (resource tpl ) is shown as a different queue, with its own characterisation: number of cores, memory, etc 26
27
GWcloud Execution Driver GWcloud Execution Driver 1.It gets and stores the match, i.e. the description of the job and the URI of the OCCI service. 2.It interprets the job description to obtain the inputs, outputs and executable URIs, the os tpl, and the resource tpl. 3.Creates the contextualisation file 4.It builds and performs an OCCI create operation that includes the contextualisation file, the resource tpl, os tpl and the URI of provider. Subsequently, the job is considered in a PENDING state. 5.It waits for the VM starting to change the job state to ACTIVE. To make this periodically, it performs OCCI describe operations. If this circumstance does not happen during the timeout set in the job description, the job is considered as FAILED. 6.When the VM is running, the driver waits for the VM becoming into inactive ; subsequently, the job is considered DONE. However, if other VM condition is reached, it returns FAILED. 7.Finally, it deletes the VM 27
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.