World models and basis functions

Slides:

Advertisements

Similar presentations

Reinforcement learning

Advertisements

Lirong Xia Reinforcement Learning (2) Tue, March 21, 2014.

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

12 June, STD( ): learning state temporal differences with TD( ) Lex Weaver Department of Computer Science Australian National University Jonathan.

Reinforcement Learning

Fuzzy Inference System Learning By Reinforcement Presented by Alp Sardağ.

לביצוע מיידי ! להתחלק לקבוצות –2 או 3 בקבוצה להעביר את הקבוצות – היום בסוף השיעור ! ספר Reinforcement Learning – הספר קיים online ( גישה מהאתר של הסדנה.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

CS 188: Artificial Intelligence Fall 2009 Lecture 10: MDPs 9/29/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either Stuart Russell.

ADITI BHAUMICK ab3585. To use reinforcement learning algorithm with function approximation. Feature-based state representations using a broad characterization.

1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.

Introduction Many decision making problems in real life

Reinforcement learning This is mostly taken from Dayan and Abbot ch. 9 Reinforcement learning is different than supervised learning in that there is no.

Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein MTAT Seminar in Computational Neuroscience Zurab Bzhalava.

CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 26- Reinforcement Learning for Robots; Brain Evidence.

Chapter 16. Basal Ganglia Models for Autonomous Behavior Learning in Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from Humans.

CHAPTER 10 Reinforcement Learning Utility Theory.

Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.

Verve: A General Purpose Open Source Reinforcement Learning Toolkit Tyler Streeter, James Oliver, & Adrian Sannier ASME IDETC & CIE, September 13, 2006.

Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.

Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.

Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.

Reinforcement learning (Chapter 21)

Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.

Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions by S. Mahadevan & M. Maggioni Discussion led by Qi An ECE, Duke University.

Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.

Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.

1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.

Interventions for Cognitive Dysfunction OT 460A

Learning Fast and Slow John E. Laird

Visual Learning with Navigation as an Example

Yueguo Gu The Institute of Beiwai Online Education

Reinforcement learning (Chapter 21)

István Szita & András Lőrincz

Reinforcement Learning (1)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Reinforcement learning (Chapter 21)

Reinforcement Learning

An Overview of Reinforcement Learning

Reinforcement Learning

Model-based RL (+ action sequences): maybe it can explain everything

Model-Free Episodic Control

Dr. Unnikrishnan P.C. Professor, EEE

CS 188: Artificial Intelligence Fall 2008

Reinforcement Learning in MDPs by Lease-Square Policy Iteration

Decision Making: From Neuroscience to Psychiatry

Module Final Review II.

Reinforcement Learning

CS 188: Artificial Intelligence Fall 2007

October 6, 2011 Dr. Itamar Arel College of Engineering

How Does Cognitive Psychology Explain Learning?

Update on “Channel Models for 60 GHz WLAN Systems” Document

Decision Making: From Neuroscience to Psychiatry

CS 188: Artificial Intelligence Spring 2006

Deep Reinforcement Learning

CS 188: Artificial Intelligence Fall 2008

Information Processing by Neuronal Populations Chapter 5 Measuring distributed properties of neural representations beyond the decoding of local variables:

CS 188: Artificial Intelligence Spring 2006

Models of memory Nisheeth April 4th 2019.

59.1 – Identify the psychologist who first proposed the social-cognitive perspective, and describe how social-cognitive theorists view personality development.

Reinforcement Nisheeth 18th January 2019.

Reinforcement Learning (2)

Markov Decision Processes

Function approximation

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Orbitofrontal Cortex as a Cognitive Map of Task Space

Markov Decision Processes

Reinforcement Learning (2)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

World models and basis functions Nisheeth 25th January 2019

Levels of analysis Level Description Computational What is the problem? Algorithmic How is the problem solved? Implementation How this is done by networks of neurons?

RL in the brain What is the problem? How is it solved? Implementation Reinforcement  learning preferences for actions that lead to desirable outcomes How is it solved? MDPs provide a general mathematical structure for solving decision problems under uncertainty RL was developed as a set of online learning algorithms to solve MDPs A critical component of model-free RL algorithms is the temporal difference signal Hypothesis: brain is implementing model-free RL? Implementation Spiking rates of dopaminergic neurons in the basal ganglia and ventral striatum behave as if they are encoding this TD signal

Implication? Model-free learning Learn the mapping from action sequences to rewarding outcomes Don’t care about the physics of the world that lead to different outcomes Is this a realistic model of how human and non-human animals learn?

Learning maps of the world

Cognitive maps in rats and men

Rats learned a spatial model Rats behave as if they had some sense of p(s’|s,a) This was not explicitly trained Generalized from previous experience Corresponding paper is recommended reading So is Tolman’s biography http://psychclassics.yorku.ca/Tolman/Maps/maps.htm

The model free vs model-based debate Model free learning  learn stimulus-response mappings = habits What about goal-based decision-making? Do animals not learn the physics of the world in making decisions? Model-based learning  learn what to do based on the way the world is currently set up = thoughtful responding? People have argued for two systems Thinking fast and slow (Balleine & O’Doherty, 2010)

A contemporary experiment The Daw task (Daw et al, 2011) is a two-stage Markov decision task Differentiates model-based and model-free RL accounts empirically

Predictions meet data Behavior appears to be a mix of both strategies What does this mean? Active area of research

Some hunches Moderate training Extensive training (Holland, 2004; Kilcross & Coutureau, 2003)

Current consensus In moderately trained tasks, people behave as if they are using model-based RL In highly trained tasks, people behave as if they are using model-free RL Nuance: Repetitive training on a small set of examples favors model-free strategies Limited training on a larger set of examples favors model-based strategies (Fulvio, Green & Schrater, 2014)

Big ticket application How to practically shift behavior from habitual to goal-directed in the digital space Vice versa is understood pretty well by Social media designers

The social media habituation cycle Reward State

Designed based on cognitive psychology principles

Competing claims First World kids are miserable! https://journals.sagepub.com/doi/full/10.1177/2167702617723376 (Twenge, Joiner, Rogers & Martin, 2017) Not true! https://www.nature.com/articles/s41562-018-0506-1 (Orben & Przybylski, 2019)

Big ticket application How to change computer interfaces from promoting habitual to thoughtful engagement Depends on being able to measure habitual vs thoughtful behavior online

The vocabulary of world maps Basis functions

The state space problem in model-free RL Number of states quickly becomes too large Even for trivial applications Learning becomes too dependent on right choice of exploration parameters Explore-exploit tradeoffs become harder to solve State space = 765 unique states

Solution approach Cluster states Design features to stand in for important situation elements Close to win Close to loss Fork opp Block fork Center Corner Empty side

Value function approximation RL methods have traditionally approximated the state value function using linear basis functions θ is a K valued parameter vector, where K is the number of features that are part of the function φ Implicit assumption: all features contribute independently to evaluation

Function approximation in Q-learning Approximate the Q table with linear basis functions Update the weights Where δ is the TD term More details in next class