From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International Conference on Artificial Neural Networks 3 d - 6 th September 2008, Prague
Reinforcement Learning valueactor units fixed reactive system that always strives for the same goal Trained Weights
reinforcement learning does not use the exploration phase to learn a general model of the environment that would allow the agent to plan a route to any goal so let’s do this
Learning actor state space randomly move around the state space learn world models: ● associative model ● inverse model ● forward model
Learning: Associative Model weights to associate neighbouring states use these to find any possible routes between agent and goal
Learning: Inverse Model weights to “postdict” action given state pair use these to identify the action that leads to a desired state Sigma-Pi neuron model
Learning: Forward Model weights to predict state given state-action pair use these to predict the next state given the chosen action
Planning
goal actor units agent
Planning
Discussion - reinforcement learning... if no access to full state space - previous work... AI-like planners assume links between states - noise... wide “goal hills” will have flat slopes - shortest path... not taken; how to define? - biological plausibility... Sigma-Pi neurons; winner-take-all - to do: embedding... learn state space from sensor input - to do: embedding... let the goal be assigned naturally - to do: embedding... hand-designed planning phases
Acknowledgments Collaborators: Jochen Triesch FIAS J-W-Goethe University Frankfurt Stefan Wermter University of Sunderland Mark Elshaw University of Sheffield