Md. Arafat Habib ( ) Supervisor: Dr. Muhidul Islam Khan

Slides:

Advertisements

Similar presentations

Autonomic Scaling of Cloud Computing Resources

Advertisements

Markov Decision Process

SLA-Oriented Resource Provisioning for Cloud Computing

Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.

An Approach to Secure Cloud Computing Architectures By Y. Serge Joseph FAU security Group February 24th, 2011.

Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,

1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.

Markov Decision Processes

Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Lezione 6 - Reinforcement Learning Prof. Giancarlo.

SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.

KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.

Persistent Autonomous FlightNicholas Lawrance Reinforcement Learning for Soaring CDMRG – 24 May 2010 Nick Lawrance.

Reinforcement Learning (2) Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Department of Computer Science Undergraduate Events More

VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT.

Instructor: Vincent Conitzer

 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.

Department of Computer Science Engineering SRM University

Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:

+ CS 325: CS Hardware and Software Organization and Architecture Cloud Architectures.

Reinforcement Learning

November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.

Presented by: Sanketh Beerabbi University of Central Florida COP Cloud Computing.

Q-learning Watkins, C. J. C. H., and Dayan, P., Q learning,

Reinforcement Learning 主講人：虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.

CPSC 7373: Artificial Intelligence Lecture 10: Planning with Uncertainty Jiang Bian, Fall 2012 University of Arkansas at Little Rock.

Towards Constraint-based High Performance Cloud System in the Process of Cloud Computing Adoption in an Organization Speaker : 吳靖緯 MA0G0101.

Solving POMDPs through Macro Decomposition

Performance Analysis of Preemption-aware Scheduling in Multi-Cluster Grid Environments Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing.

Virtualization and Databases Ashraf Aboulnaga University of Waterloo.

Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.

Announcements  Upcoming due dates  Wednesday 11/4, 11:59pm Homework 8  Friday 10/30, 5pm Project 3  Watch out for Daylight Savings and UTC.

CSE 473Markov Decision Processes Dan Weld Many slides from Chris Bishop, Mausam, Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer.

Department of Computer Science Undergraduate Events More

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Developing resource consolidation frameworks for moldable virtual machines in clouds Author: Liang He, Deqing Zou, Zhang Zhang, etc Presenter: Weida Zhong.

Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.

1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.

Authors: Md. Arafat Habib Muhidul Islam Khan Jia Uddin

OPERATING SYSTEMS CS 3502 Fall 2017

Markov Decision Process (MDP)

Reinforcement Learning Based Virtual Cluster Management

Reinforcement learning

Dynamic Graph Partitioning Algorithm

Reinforcement Learning

Reinforcement Learning (1)

Markov Decision Processes

Reinforcement Learning

CPS 570: Artificial Intelligence Markov decision processes, POMDPs

Planning to Maximize Reward: Markov Decision Processes

 Real-Time Scheduling via Reinforcement Learning

Reinforcement learning

Basic Grid Projects – Condor (Part I)

Instructors: Fei Fang (This Lecture) and Dave Touretzky

CS 188: Artificial Intelligence Fall 2007

Reinforcement Learning in MDPs by Lease-Square Policy Iteration

Instructor: Vincent Conitzer

Reinforcement Learning

 Real-Time Scheduling via Reinforcement Learning

Chapter 17 – Making Complex Decisions

CS 188: Artificial Intelligence Spring 2006

Introduction to Reinforcement Learning and Q-Learning

Hidden Markov Models (cont.) Markov Decision Processes

Reinforcement Learning Dealing with Partial Observability

Department of Computer Science Ben-Gurion University

Reinforcement Learning (2)

Markov Decision Processes

Markov Decision Processes

Reinforcement Learning (2)

Presentation transcript:

Reinforcement Learning based Autonomic Virtual Machine Management in Clouds Md. Arafat Habib (12101056) Supervisor: Dr. Muhidul Islam Khan Assistant Professor, Department Of Computer Science and Engineering

The Virtual Machine Management Problem

Different Components o Eucalyptus Cloud Cloud Controller (CLC): 1. Monitoring the availability of resources on various components of the cloud infrastructure -Hypervisor nodes -Cluster Controller Resource arbitration – deciding which clusters will be used for provisioning the instances. Monitoring the running instances. Querying the node managers for information about resources, making high level scheduling decisions, and implementing them by making requests to cluster controllers.

Cluster Controller (CC): CCs gather information about a set of VMs and schedules VM execution on specific NCs. Participates in the enforcement of SLAs as directed by the CLC. To collect information about the NCs registered with it and report it to the CLC.

Node Controller 1.learns about the state of VM instances running on the node and propagates this data up to the CC. 2. Collection of data related to the resource availability and utilization on the node and reporting the data to CC. 3. Instance life cycle management. Other Components(Walrus, Storage Controller)

Markov Decision Process A set of possible states(S) A set of Possible Actions(A) T is the probability distribution of going to a state “s′” from “s′ ” by taking any random action “a”. R is the cost function that expresses the reward if action “a” is taken at state s. “β” is the discount factor, 0< β <1.

The MDP that models our VM management problem is, M= {S, A, T, R, β} where: S= {NC, CC, CLC, W , V, P, Psla} NC is the Node controller, CC is the Cluster Controller, CLC is the Cloud Controller, W is the Workload, V is the number of virtual machines, P is the performance and Psla is the performance the cloud service provider committed to provide. A is the action set that includes adding (a), reducing(r), maintaining(m) and migrating(mg) a virtual machine. T is the probability distribution of going to a state “s′” from “s” by taking any random action “a”. R is the cost function that expresses the reward if action “a” is taken at state s. “β” is the discount factor, 0< β <1.

State Transition Diagram for Our MDP CLC NC W CC V P Psla

Q-learning 1. (∀s ∈ S)(∀a ∈ A(s)); 2. initialize Q(s , a) 3 Q-learning 1. (∀s ∈ S)(∀a ∈ A(s)); 2. initialize Q(s , a) 3. s := the initial observed state 4. loop 5. Choose a ∈ A(s) according to a policy derived from Q 6. Take action a and observe next state s ′ and reward r 7. Q[s , a] := Q[s , a] + α(R[s,a] + g * maxa Q[s′ , a′ ] - Q[s, a]) 8. s := s′ 9. end loop 10. return π (s) = argmaxa Q(s , a) SARSA(Lambda) 1. Initialize Q(s, a) arbitrarily 2. Repeat (for each episode): 3. Initialize s 4. Choose a from s using policy derived from Q 5. Repeat (for each episode): 6. Take action a, observe r, s′ 7. Choose a′ from s′ using policy derived from Q 8. δ = r+ g Q[s′ , a′ ] - Q[s, a] 9. e(s, a) = e(s, a)+1 10. For all (s , a): 11. Q[s , a] = Q[s , a] + α δ e(s, a) 12. e(s, a) = g λe(s, a) 13. s=s′ ; a=a′ 14. until s is terminal

R=β (Cost) + (1-β) (Penalty) Where, Cost =Cr × Va × (Cr ×V) (1) Penalty=Pc × (1+ (Pd- Psla) / Psla) (2) In equation (1), Cr=the cost of the resources, the variable depending on the specific configuration and region. Va = the specific virtual machine to be added, reduced, maintained and migrated. V = total number of VMs in the system. In equation (2), Pc = penalty for the violation of SLA Pd = the performance displayed by the system randomly Psla = target performance Lastly, β is the balancing factor.

Beta Cost Penalty 0.10 2.17 6.04 0.25 3.34 7.10 0.50 4.84 6.90 0.75 2.50 7.74 0.90 5.31 5.25 Cost Vs Penalty Graph for beta in Q-learning

Beta Cost Penalty 0.10 33.21 7.21 0.25 71.08 6.00 0.50 75.50 8.0 0.75 81.61 6.39 0.90 90.93 6.18 Cost Vs Penalty Graph for beta in SARSA (Lambda)

Cost Vs Penalty Graph for beta in Q and SARSA(λ)

Different values of alpha producing chunks of reward SARSA (Lambda)

Different values of alpha producing chunks of reward (Q)

Convergence Comparison of Q and SARSA (λ)

Any Questions??