Download presentation
Presentation is loading. Please wait.
Published byÁdám Gáspár Modified over 5 years ago
1
Department of Computer Science Ben-Gurion University
Decision Process with Non-Markovian Reward Benjamin Berend & Amihay Elboher Supervisor: Prof. Ronen Brafman Department of Computer Science Ben-Gurion University
2
Graphical infrastructure for running, planning and learning algorithms
Goal: clean the stains and collect the fruits to the basket
3
RL – reinforcement learning
Reinforcement learning- optimizing a behavior by learning from retributions RL – reinforcement learning Known Environment Unknown Environment Policy Iteration Q-Learning R-Max Automata Learning
4
Known Environment MDP – Markov Decision Process
In a known environment, the MDP is fully observable.
5
? ! Q-Learning Experience
The learning is experience based: the agent starts from any policy, and adjusts it’s behavior according the rewards it gets.
6
Automata Learning Sample: 1, 10, 100
0,1 In order to learn a non-Markovian reward we construct an automaton that accepts all “words” that lead to the reward. The algorithm finds the “most logical” automaton and combines it in the state. *Σ Sample: 1, 10, 100 1 *10 1 1, 10, 100 }}
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.