A Comparison of Learning Algorithms on the ALE

A Comparison of Learning Algorithms on the ALE
Deep Learning Behzad Ghazanfari

Useful and helpful Ref. The ALE: an Evaluation Platform for General Agents Playing Atari with Deep Reinforcement Learning Reinforcement learning: an introduction

RL Trial and Error Learn, interact, adapt in complex agents
Generalization Learning a reusable, high level understanding from world General competency : tasks and domains The curse of dimensionality (high dimension) Hand crafted features Online and offline policies

RL methods TD(λ): eligibility traces Dynamic programming (model based)
Value iteration Policy iteration Monte Carlo (learning from experiences) TD Q-learning SARSA Actor-critic R learning TD(λ): eligibility traces accumulating Replacing trees

Q learning and Sarsa

R-learning and Actor-Critic

TD(λ)

ALE ALE is a wrapper around Stella emulator for the 61 Atari 2600.
Learning from raw video data Ale games are varied ALE provides different challenges than classical testbeds ALE has 18 actions, 5 Basic actions(4 movements and No OP) ALE 210*160 pixel each pixel a color value(7 bits) ALE support a reduced color space, SECAM; the paper use Mapping 128 to 8 colors It encode screen with a courser grid, with a resolution 14*16 Background subtracting

Exploration Epsilon greedy: depending the game and situation
Online methods are more sensitive(reduced) Dependent to the problem and generally extracted by testing Softmax policy (simulate annealing) Scalar value need to set(temperature) Too sensitive (fine tuned ) Result is not comparable Optimistic initializations Encourage explorations but non linear function approximations, the value decreased even they have not been seen

Learning algorithms Eligibility traces SARSA(λ)

Learning algorithms Q(λ): death can be created as a result of random actions, better policy, diverge with function approximations ETTR(λ): It has the advantage of potentially being easier to learn, as it gets a non-noisy signal whenever it actually reaches a positive reward. The disadvantage is a lack of long term planning and poorer risk-aversion.

R(λ) Another class of reinforcement learning agents seek to optimize the expected reward per time step instead. R-learning is the primary example of such a method in the off-policy case

Learning algorithms Actor-Critic GQ(λ) : gradient temporal difference

Convergance The percentage of converged trails out of those that finished for each method was: SARSA: 85%; AC: 80%; ETTR: 84%; GQ: 80%; Q: 82%; R: 85%.

DQN in ALE General competency in a variety of tasks and domains without the need for domain-specific tailoring. learning a reusable, high-level understanding of the world from raw sensory data Achieved performance comparable to a human problematic aspects of DQN’s evaluation make it difficult to fully interpret the results. DQN experiments exploit non-standard game-specific prior information and also report only one independent trial per game. What properties were most important to its success?

Question

A Comparison of Learning Algorithms on the ALE

Similar presentations

Presentation on theme: "A Comparison of Learning Algorithms on the ALE"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Comparison of Learning Algorithms on the ALE

Similar presentations

Presentation on theme: "A Comparison of Learning Algorithms on the ALE"— Presentation transcript:

Similar presentations

About project

Feedback