Net-Centric Software and Systems I/UCRC Copyright © 2011 NSF Net-Centric I/UCRC. All Rights Reserved. High-Confidence SLA Assurance for Cloud Computing Systems and Services Project Lead: Farokh B. Bastani, I-Ling Yen, Krishna Kavi, and Jeff Tian Date: April 7, 2011
Emerging cloud computing paradigm enables – On-demand access to storage, computing, software, and physical resources – Integrated capabilities of a large spectrum of networked services and resources for realizing tasks that are far beyond current practices Need SLA to enhance cloud system usability and dependability Existing SLA (service level agreement) research: Siloed – SLA model: Consider agreement for each QoS aspect independently – Client perspective Need to establish SLAs one service at a time, lacking an end-to-end approach for the client task that require composing multiple services/resources Consider individual QoS aspects independently, not potential tradeoffs – Provider perspective Each provider operates independently, lacks a collaborative concept to globally achieve high SLA assurance while maximizing resource utilization – No satisfactory solutions to security issues across all layers Challenges: Develop a comprehensive SLA model and supporting environment Problem Description 2
Proposed Solution 3 Local QoS Monitoring Resource Management Admission Control feedback S R R R R S S Provider 1 Local QoS Monitoring Resource Management Admission Control feedback S R S Provider 2 R R Local QoS Monitoring Resource Management feedback Provider 3 Admission Control Local QoS Monitoring Resource Management Admission Control feedback Provider N Service Composer SLA for first service SLA for second service... Fail to get agreement Integrated SLA Monitoring - Agent based distributed monitoring and behavior integration - Rule based approach, formalize SLAs as rules, events as facts, and use reasoning to derive the violation situations - Consider fuzzy violation decision models - Across providers and resource types - Proactive SLA assurance (recovery) - Perform end-to-end QoS analysis before SLAs May need reservations to avoid new failures - Consider QoS aspects holistically and directly determine the configuration parameters to fully control tradeoffs Improve SLA model to support holistic SLA Improved SLA protocol: First determine with which providers and levels of QoS Then preliminarily check the possibility of getting the SLAs Finally establish the SLAs client At each provider: - Consider strict & flexible SLAs - Develop optimal resource management and admission control schemes - Formulation: optimization problem with the objective of maximizing the gain, given task completion rewards and violation penalties and the available resources - Admit only if positive gain - Local monitoring and online reconfiguration - Ensure SLAs are satisfied if resources are sufficient; if not, adjust resource decisions Probabilistic SLAs to collaboratively get backup resources under failure or extreme load Form cloud community
2011 New Project Summary 2011 New Project Summary High-Confidence SLA Assurance for Cloud Computing Systems and Services 4 Tasks: 1.Comprehensive model of cloud SLAs considering correlations of QoS aspects and end-to-end QoS requirements 2.Integrated SLA monitoring approach across providers and resource types 3.Optimal adaptive strategies for assuring SLAs under normal and failure situations 4.Method of assessing system-level SLAs based on component-level SLAs 5.Layered collaborative approach for optimally achieving global SLA assurance by leveraging resources from multiple cloud domains Research Goals: 1.Improved SLA models and protocols to facilitate highly dependable and practically usable cloud computing 2.Optimal supporting environment for SLA assurance considering end-to-end QoS and QoS tradeoffs and achieving local as well as global monitoring, resource management, and admission control Benefits to Industry Partners: 1.Advanced cloud technologies to meet specified SLAs to a high degree of confidence in spite of multiple failures 2.Enable cloud computing to be used for critical applications, including health-care systems, emergency response systems, defense systems, transportation systems, etc. Project Schedule: A M J J A S O N D J F M A 1112 Task 1: SLA model Task 2: Integrated SLA monitoring Task 3: Optimal adaptive SLA assurance