A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris.

Slides:

Advertisements

Similar presentations

Value and Planning in MDPs. Administrivia Reading 3 assigned today Mahdevan, S., “Representation Policy Iteration”. In Proc. of 21st Conference on Uncertainty.

Advertisements

Markov Decision Processes (MDPs) read Ch utility-based agents –goals encoded in utility function U(s), or U:S  effects of actions encoded in.

Markov Decision Process

Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.

Partially Observable Markov Decision Process (POMDP)

1 Dynamic Programming Week #4. 2 Introduction Dynamic Programming (DP) –refers to a collection of algorithms –has a high computational complexity –assumes.

CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)

Decision Theoretic Planning

MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.

An Introduction to Markov Decision Processes Sarah Hickmott

Markov Decision Processes

Infinite Horizon Problems

Planning under Uncertainty

SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.

Policy Evaluation & Policy Iteration S&B: Sec 4.1, 4.3; 6.5.

Markov Decision Processes CSE 473 May 28, 2004 AI textbook : Sections Russel and Norvig Decision-Theoretic Planning: Structural Assumptions.

Markov Decision Processes

Nov 14 th  Homework 4 due  Project 4 due 11/26.

Source-Destination Routing Optimal Strategies Eric Chi EE228a, Fall 2002 Dept. of EECS, U.C. Berkeley.

Planning in MDPs S&B: Sec 3.6; Ch. 4. Administrivia Reminder: Final project proposal due this Friday If you haven’t talked to me yet, you still have the.

4/1 Agenda: Markov Decision Processes (& Decision Theoretic Planning)

Discretization Pieter Abbeel UC Berkeley EECS

Planning to learn. Progress report Last time: Transition functions & stochastic outcomes Markov chains MDPs defined Today: Exercise completed Value functions.

The Value of Plans. Now and Then Last time Value in stochastic worlds Maximum expected utility Value function calculation Today Example: gridworld navigation.

Department of Computer Science Undergraduate Events More

More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.

Instructor: Vincent Conitzer

MAKING COMPLEX DEClSlONS

Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.

Computer Science CPSC 502 Lecture 14 Markov Decision Processes (Ch. 9, up to 9.5.3)

Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 1: Generalized policy iteration.

Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.

Department of Computer Science Undergraduate Events More

Reinforcement Learning Yishay Mansour Tel-Aviv University.

MDPs (cont) & Reinforcement Learning

CS433 Modeling and Simulation Lecture 07 – Part 01 Continuous Markov Chains Dr. Anis Koubâa 14 Dec 2008 Al-Imam.

Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.

Discrete Time Markov Chains

CPS 570: Artificial Intelligence Markov decision processes, POMDPs

Announcements  Upcoming due dates  Wednesday 11/4, 11:59pm Homework 8  Friday 10/30, 5pm Project 3  Watch out for Daylight Savings and UTC.

Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making Fully Observable MDP.

Department of Computer Science Undergraduate Events More

Markov Decision Process (MDP)

CS433 Modeling and Simulation Lecture 11 Continuous Markov Chains Dr. Anis Koubâa 01 May 2009 Al-Imam Mohammad Ibn Saud University.

1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.

Department of Computer Science Undergraduate Events More

Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.

1 Markov Decision Processes Finite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

Making complex decisions

Markov Decision Processes

Markov Decision Processes

Planning to Maximize Reward: Markov Decision Processes

Markov Decision Processes

CS 188: Artificial Intelligence Fall 2007

Reinforcement Learning in MDPs by Lease-Square Policy Iteration

13. Acting under Uncertainty Wolfram Burgard and Bernhard Nebel

Instructor: Vincent Conitzer

Markov Decision Problems

Chapter 17 – Making Complex Decisions

Hidden Markov Models (cont.) Markov Decision Processes

CS 416 Artificial Intelligence

Reinforcement Learning Dealing with Partial Observability

CS 416 Artificial Intelligence

Markov Decision Processes

Markov Decision Processes

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

Presentation transcript:

A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris BroadNet 2004 proceeding Presented by Zhanxiang February 7, 2005

Goal & Contribution Goal: –Fairness control and service differentiation in a WDM grooming network. Also maximizing the overall utilization. Contributions: –An optimal CAC policy providing fairness control by using a Markov Decision Process approach; –A heuristic decomposition algorithm for multi-link and multi-wavelength network.

Quick Review of MDP DTMC DTMDP –We focus on DTMDP because CTMDP usually solved by discretization.

DTMC Originate from Professor Malathi Veeraraghavan’s slides.

DTMC Originate from Professor Malathi Veeraraghavan’s slides.

DTMC Two states i and j communicate if for some n and n’, p ij (n)>0 and p ji (n’)>0. A MC is Irreducible, if all of its states communicate. A state of a MC is periodic if there exists some integer m>0 such that pii(m)>0 and some integer d>1 such that pii(n)>0 only if d|n. Originate from Professor Malathi Veeraraghavan’s slides.

DTMC Originate from Professor Malathi Veeraraghavan’s slides.

Decision Theory Probability Theory + Utility Theory = Decision Theory Describes what an agent should believe based on evidence. Describes what an agent wants. Describes what an agent should do. Originate from David W. Kirsch’s slides

Markov Decision Process MDP is defined by: State Space:S Action Space:A Reward Function: R: S  {real number} Transition Function:T: SXA  S (deterministic) T: SXA  Power(S) (stochastic) The transition function describe the effect of an action in state s. In this second case the transition function has a probability distribution P(s’|s,a) on the range. Originate from David W. Kirsch’s slides and modified by Zhanxiang

MDP differs DTMC MDP is like a DTMC, except the transition matrix depends on the action taken by the decision maker (a.k.a. agent) at each time step. P s,a,s' = P [S(t+1)=s' | S(t)=s, A(t)=a] Next state s’ Action a DTMC MDP Current state s

MDP Actions Stochastic Actions: –T : S X A  PowerSet(S) For each state and action we specify a probability distribution over next states, P( s’ | s, a). Deterministic Actions: –T : S X A  S For each state and action we specify a new state. Hence the transition probabilities will be 1 or 0.

Action Selection & Maximum Expected Utility Assume we assign reward U(s) to each state s Expected Utility for an action a in state s is MEU Principle: An agent should choose an action that maximizes the agent’s EU. EU(a|s) =  s’ P(s’ | s, a) U(s’) Originate from David W. Kirsch’s slides and modified by Zhanxiang

Policy & Following a Policy Policy: a mapping from S to A, π : S  A Following policy procedure: 1. Determine current state s 2. Execute action π(s) 3. Repeat 1-2 Originate from David W. Kirsch’s slides modified by Zhanxiang

Solution to an MDP In deterministic processes, solution is a plan. In observable stochastic processes, solution is a policy A policy’s quality is measured by its EU Notation: π ≡ a policy π(s) ≡ the recommended action in state s π* ≡ the optimal policy (maximum expected utility) Originate from David W. Kirsch’s slides and modified by Zhanxiang

Should we let U(s)=R(s)? In the definition of MDP we introduce R(s), which obviously depends on some specific properties of a state. Shall we let U(s)=R(s)? –Often very good at choosing single action decisions. –Not feasible for choosing action sequences, which implies R(s) is not enough to solve MDP.

Assigning Utility to Sequences How to add rewards? - simple sum - mean reward rate Problem: Infinite Horizon  infinite reward - discounted rewards R(s 0,s 1,s 2 …) = R(s 0 ) + cR(s 1 ) + c 2 R(s 2 )… where 0<c≤1 Originate from David W. Kirsch’s slides modified by Zhanxiang

How to define U(s)? Define U π (s) is specific to each π U π (s) = E(  t R(s t )| π, s 0 =s) Define U(s)= Max π {U π (s) }= U π* (s) We can calculate U(s) on the base of R(s) U(s)=R(s) +  max  P(s’|s,π(s))U(s’) π s’ Bellman equation If we solve the Bellman equation for each state, we will have solved the optimal policy π* for the given MDP on the base of U(s). Originate from David W. Kirsch’s slides and modified by Zhanxiang

Value Iteration Algorithm We have to solve |S| simultaneous Bellman equations Can’t solve directly, so use an iterative approach: 1. Begin with an arbitrary utility function U 0 2. For each s, calculate U(s) from R(s) and U 0 3. Use these new utility values to update U 0 4. Repeat steps 2-3 until U 0 converges This equilibrium is a unique solution! (see R&N for proof) Originate from David W. Kirsch’s slides

State Space and Policy Definition in this paper The author’s idea of using MDP is great, I’m not comfortable with state space definition and the policy definition. If I were the author, I will define system state space and policy as follows: –S’ = S X E where S={(n1, n2, …, nk) |  tknk<=T} and E={ck class call arrivals} U {ck class call departures} U {dummy events} –Policy π : S  A

Network Model :: Definitions OADM: Optical Add/Drop Multiplexer WC: wavelength converter TSI: time-slot interchanger L: # of links a WDM grooming network contains M: # of origin-destination pairs the network includes W: # of wavelengths in a fiber in each link T: # of time slots each wavelength includes K: # of classes of traffic streams ck: traffic stream classes differ by their b/w requirements tk: # of time slots required by class ck traffic to be established nk: # of class ck calls currently in the system

Network model :: assumptions For each o-d pair, class ck arrivals are distributed according to a Poisson process with rate λk. The call holding time of class ck is exponentially distributed with mean 1/μk. Unless otherwise stated, we assume 1/μk = 1. Any arriving call from any class is blocked when no wavelength has tk available time slots. Blocked calls do not interfere with the system. The switching nodes are non-blocking No preemption

Fairness definition There is no significant difference between the blocking probabilities experienced by different classes of users;

CS & CP Complete Sharing (CS) –No resources reserved for any class of calls; –Lower b/w requirement & higher arrival rate calls may starve calls with higher b/w requirement and lower arrival rate; Complete Partitioning –A portion of resources is dedicated to each class of calls; –May not maximize the overall utilization of available resources. Not Fair Fair but

Single-link single-wavelength(0) System stat space S: S={(n1, n2, …, nk) |  tknk <= T} k Operators: –Aks = (n1, n2, …, nk+1, …, nK) –Dks = (n1, n2, …, nk-1, …, nK) –AkPas = (n1, n2, …, nk+a, …, nK)

Single-link single-wavelength(1) Sampling rate v =  ( [ T/tk ]μk+ k) k Only one single transition can occur during each time slot. A transition can correspond to an event of –1) Class ck call arrival –2) Class ck call departure –3) Fictitious or dummy event (caused by high sampling rate)

Single-link single-wavelength(2) Reward function R: Value function

Single-link single-wavelength(3) Optimal value function: Optimal Policy:

Single-link single-wavelength(4) Value iteration to compute Vn(s)

Single-link single-wavelength(5) Action decision: If Vn(AkP1s) >= Vn(AkP0s) then a=1; else a=0; Basing on the equation below.

My understanding The author’s idea of using MDP is great

Example

Matlab toolbox calculation

Heuristic decomposition algorithm Step 1: For each hop i, partition the set of available wavelengths into subsets, dedicated to each of o-d pairs using hop i. Step 2: Assume uniformly distributed among the Wm wavelengths, thus, the arrival rate of class ck for each of the Wm wavelengths is given by: λk/Wm.

Heuristic decomposition algorithm (2) Step 3: Compute the CAC policy with respect to λk/Wm. Step 4: Using the CAC policy computed in Step 3, we determine the optimal action for each of the Wm wavelengths, individually.

Performance comparison

Relation to our work We can utilize MDP to model our bandwidth allocation problem in call admission control to achieve fairness; But in heterogeneous network the bandwidth granularity problem is still there;

Possible Constrains Under some conditions the optimal policy of an MDP exists.

Backup Other MDP representations

Markov Assumption Markov Assumption: The next state’s conditional probability depends only on a finite history of previous states (R&N) kth order Markov Process Andrei Markov (1913) The definitions are equivalent!!! Any algorithm that makes the 1 st order Markov Assumption can be applied to any Markov Process Markov Assumption: The next state’s conditional probability depends only on its immediately previous state (J&B) 1 st order Markov Process Originate from David W. Kirsch’s slides

MDP A Markov Decision Process (MDP) model contains: –A set of possible world states S –A set of possible actions A –A real valued reward function R(s,a) –A description T(s,a) of each action’s effects in each state.

MDP differs DTMC A Markov Decision Process (MDP) is just like a Markov Chain, except the transition matrix depends on the action taken by the decision maker (agent) at each time step. P s,a,s' = P [S(t+1)=s' | S(t)=s, A(t)=a] The agent receives a reward R(s,a), which depends on the action and the state. The goal is to find a function, called a policy, which specifies which action to take in each state, so as to maximize some function of the sequence of rewards (e.g., the mean or expected discounted sum).

MDP Actions Stochastic Actions: –T : S X A  PowerSet(S) For each state and action we specify a probability distribution over next states, P( s’ | s, a). Deterministic Actions: –T : S X A  S For each state and action we specify a new state. Hence the transition probabilities will be 1 or 0.

Transition Matrix Next state s’ Current state s Action a DTMC MDP

MDP Policy A policy π is a mapping from S to A π : S  A Assumes full observability: the new state resulting from executing an action will be known to the system

Evaluating a Policy How good is a policy π in the term of a sequence of actions? –For deterministic actions just total the rewards obtained... but result may be infinite. –For stochastic actions, instead expected total reward obtained… again typically yields infinite value. How do we compare policies of infinite value?

Discounting to prefer earlier rewards A value function, V π : S  Real, represents the expected objective value obtained following policy from each state in S. Bellman equations relate the value function to itself via the problem dynamics.

Bellman Equations

Value Iteration Algorithm Can’t solve directly, so use an iterative approach: 1. Begin with an arbitrary utility vector V; each 2. For each s, calculate V*(s) from R(s,π) and V; 3. Use these new utility values V*(s) to update V; 4. Repeat steps 2-3 until V converges; This equilibrium is a unique solution!

MDP Solution