Department of Computer Science Ben-Gurion University

Department of Computer Science Ben-Gurion University
Decision Process with Non-Markovian Reward Benjamin Berend & Amihay Elboher Supervisor: Prof. Ronen Brafman Department of Computer Science Ben-Gurion University

Graphical infrastructure for running, planning and learning algorithms
Goal: clean the stains and collect the fruits to the basket

RL – reinforcement learning
Reinforcement learning- optimizing a behavior by learning from retributions RL – reinforcement learning Known Environment Unknown Environment Policy Iteration Q-Learning R-Max Automata Learning

Known Environment MDP – Markov Decision Process
In a known environment, the MDP is fully observable.

? ! Q-Learning Experience
The learning is experience based: the agent starts from any policy, and adjusts it’s behavior according the rewards it gets.

Automata Learning Sample: 1, 10, 100
0,1 In order to learn a non-Markovian reward we construct an automaton that accepts all “words” that lead to the reward. The algorithm finds the “most logical” automaton and combines it in the state. *Σ Sample: 1, 10, 100 1 *10 1 1, 10, 100 }}

Department of Computer Science Ben-Gurion University

Similar presentations

Presentation on theme: "Department of Computer Science Ben-Gurion University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Department of Computer Science Ben-Gurion University

Similar presentations

Presentation on theme: "Department of Computer Science Ben-Gurion University"— Presentation transcript:

Similar presentations

About project

Feedback