Some Final Thoughts Abhijit Gosavi. From MDPs to SMDPs The Semi-MDP is a more general model in which the time for transition is also a random variable.

Some Final Thoughts Abhijit Gosavi

From MDPs to SMDPs The Semi-MDP is a more general model in which the time for transition is also a random variable. The MDP Bellman equations can be extended to SMDPs to accommodate time.

SMDPs (contd.) In the average reward case, we would be interested in maximizing the average reward per unit time. For the discounted reward case, we will need to discount proportionate to the time spent in each transition. The Q-Learning algorithm for discounted reward has a direct extension. For average reward, we have a family of algorithms called R-SMART (see book for references).

Policy Iteration Another method to solve the MDP: an alternative to SMDPs Slightly more involved mathematically Sometimes more efficient than value iteration Its Reinforcement Learning counterpart is called Approximate Policy Iteration

Other Applications Supply Chain Problems Disaster Response Management Production Planning in Remanufacturing Systems Continuous event systems (LQG control)

What you’ve learned (hopefully ) Markov chains and how they can be employed to model systems Markov decision processes: the idea of optimizing systems (controls) driven by Markov chains Some concepts from Artificial Intelligence Some (hopefully) cool applications of Reinforcement Learning Some coding (for those who were not averse to doing it) Systems thinking Coding iterative algorithms Some discrete-event simulation HOPE YOU’VE ENJOYED THE CLASS!

HAPPY HOLIDAYS!!!

Some Final Thoughts Abhijit Gosavi. From MDPs to SMDPs The Semi-MDP is a more general model in which the time for transition is also a random variable.

Similar presentations

Presentation on theme: "Some Final Thoughts Abhijit Gosavi. From MDPs to SMDPs The Semi-MDP is a more general model in which the time for transition is also a random variable."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Some Final Thoughts Abhijit Gosavi. From MDPs to SMDPs The Semi-MDP is a more general model in which the time for transition is also a random variable.

Similar presentations

Presentation on theme: "Some Final Thoughts Abhijit Gosavi. From MDPs to SMDPs The Semi-MDP is a more general model in which the time for transition is also a random variable."— Presentation transcript:

Similar presentations

About project

Feedback