NIPS Workshops 12/10/05 Does RL Occur Naturally? C. R. Gallistel Rutgers Center for Cognitive Science.

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Tests of Hypotheses Based on a Single Sample

Chapter 2: Marr’s theory of vision. Cognitive Science  José Luis Bermúdez / Cambridge University Press 2010 Overview Introduce Marr’s distinction between.

Usually the next step is to run the Cognitive Tests. Click on “Run Cognitive Tests” button to start testing. All of the tests begin with you giving a brief.

Lectures 14: Instrumental Conditioning (Basic Issues) Learning, Psychology 5310 Spring, 2015 Professor Delamater.

CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.

(Includes references to Brian Clipp

10 Hypothesis Testing. 10 Hypothesis Testing Statistical hypothesis testing The expression level of a gene in a given condition is measured several.

Software Quality Control Methods. Introduction Quality control methods have received a world wide surge of interest within the past couple of decades.

PSY 402 Theories of Learning Chapter 7 – Behavior & Its Consequences Instrumental & Operant Learning.

1 Sociology 601, Class 4: September 10, 2009 Chapter 4: Distributions Probability distributions (4.1) The normal probability distribution (4.2) Sampling.

1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.

The Experimental Approach September 15, 2009Introduction to Cognitive Science Lecture 3: The Experimental Approach.

Baysian Approaches Kun Guo, PhD Reader in Cognitive Neuroscience School of Psychology University of Lincoln Quantitative Methods 2011.

Experimental Evaluation

Fourth International Symposium on Neural Networks (ISNN) June 3-7, 2007, Nanjing, China Online Dynamic Value System for Machine Learning Haibo He, Stevens.

Why You Should Make Smart Flashcards Mark Mitchell & Janina Jolley, 2014 Clarion University of Pennsylvania

Standard Error of the Mean

Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.

Copyright © Allyn & Bacon 2007 Chapter 6 Learning This multimedia product and its contents are protected under copyright law. The following are prohibited.

Classical Conditioning, Operant Conditioning, and Observational Learning Learning Conditioning Watson Thorndike Behavior Reinforcement Skinner Operants.

Rewards and punishment

Neural mechanisms of Spatial Learning. Spatial Learning Materials covered in previous lectures Historical development –Tolman and cognitive maps the classic.

Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.

Topic 5 Statistical inference: point and interval estimate

Statistical inference. Distribution of the sample mean Take a random sample of n independent observations from a population. Calculate the mean of these.

1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.

Reinforcement learning This is mostly taken from Dayan and Abbot ch. 9 Reinforcement learning is different than supervised learning in that there is no.

The student will demonstrate an understanding of how scientific inquiry and technological design, including mathematical analysis, can be used appropriately.

1 Computational Vision CSCI 363, Fall 2012 Lecture 31 Heading Models.

Apr 4, 2007 PHYS 117B.02 1 PHYS 117B.02 Lecture Apr 4 The last few lectures we’ve been switching gears from classical to quantum physics This way: The.

Cognitive Maps Edward C. Tolman (1948)

Visual Information Systems Recognition and Classification.

Lunch & Learn Statistics By Jay. Goals Introduce / reinforce statistical thinking Understand statistical models Appreciate model assumptions Perform simple.

Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.

CSH, May 23, Deconstructing the Law of Effect C. R. Gallistel Rutgers Center for Cognitive Science.

Chapter 8 Delving Into The Use of Inference 8.1 Estimating with Confidence 8.2 Use and Abuse of Tests.

Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?

Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:

Uncertainty Management in Rule-based Expert Systems

An Introduction to Learning 5/e A BRIEF BIOGRAPHICAL INTRODUCTION.

BCS547 Neural Decoding.

Burrhus Frederic Skinner The developer of radical behaviorism By: Matt Miller & Alexander Skinner.

Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.

Swarms MONT 104Q – Mathematical Journeys, November 2015.

The Evolution of “An Experiment on Exit in Duopoly”

Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.

Chapter 3 Language Acquisition: A Linguistic Treatment Jang, HaYoung Biointelligence Laborotary Seoul National University.

1 Computational Vision CSCI 363, Fall 2012 Lecture 32 Biological Heading, Color.

Exploring the Universe. What three things are the criteria for a planet? Orbits the Sun Large enough that gravity pulls it into the shape of a sphere.

PSY 402 Theories of Learning Chapter 3 – Nuts and Bolts of Conditioning (Mechanisms of Classical Conditioning)

Does the brain compute confidence estimates about decisions?

Animal and plant responses Homing and Migration (CB pg 216 – 222)

Artificial Intelligence

Virtual University of Pakistan

Operant Conditioning Remember… it deals with rewards and punishments… is VOLUNTARY…and based on what happens after the behavior.

HIERARCHY THEOREMS Hu Rui Prof. Takahashi laboratory

Volume 14, Issue 6, Pages R221-R224 (March 2004)

Revealing priors on category structures through iterated learning

Generalization in deep learning

PSY 626: Bayesian Statistics for Psychological Science

Kate Straub, Julia Boothroyd, Sam Nordstrom and Anna Winter

CS 188: Artificial Intelligence Fall 2008

Greg Schwartz, Sam Taylor, Clark Fisher, Rob Harris, Michael J. Berry

MGT601 SME MANAGEMENT.

Deep Reinforcement Learning: Learning how to act using a deep neural network Psych 209, Winter 2019 February 12, 2019.

Volume 23, Issue 21, Pages (November 2013)

Reinforcement Learning (2)

Reinforcement Learning (2)

Presentation transcript:

NIPS Workshops 12/10/05 Does RL Occur Naturally? C. R. Gallistel Rutgers Center for Cognitive Science

Turing’s Vision (‘47-’48) “It would be quite possible to have the machine try out behaviors and accept or reject them…” “What we want is a machine that can learn from experience. The possibility of letting the machine alter its own instructions provides the mechanism for this…” “It might possible to carry through the organizing [of a learning machine] with only two interfering inputs, one for reward (R) or pleasure and the other for pain or punishment (P). It is intended that pain stimuli occur when the machine’s behavior is wrong, pleasure stimuli when it is particularly right.”

A Different Vision Policy (what to do given a state of the world) is pre-specified and immutable Learning consists in determining the state of the world; it’s all model estimation Appropriate sampling behavior is itself prespecified

The Deep Reasons Wolpert & Macready’s “No Free Lunch” theorems Chomsky’s “Poverty of the Stimulus” argument Bottom line: reinforcement learning takes too long Because there is not enough information in the R & P signals Because learning in the absence of a highly structured hypothesis space is a practical impossibility (we don’t live long enough)

Learning by Integrating Ant knows where it is This knowledge is acquired (learned) It is acquired by path integration --Harkness & Maroudas, 1985

Building a Map Ant remembers where the food was (records its coordinates) Bees & ants make a map by the GPS principle (record location coordinates--& views) They do not discover by trial and error that this is a good thing to do As in the GPS, the computational machinery to determine a course from an arbitrary location to an arbitrary location is built in No RL learning here

Ranging Behavior When leaving a new food source or a new nest (hive), bees & wasps fly backwards in an ever increasing zigzag Determining visual feature distances by parallax Innately specified sampling (model building) behavior Wehner, 1981

Also in the Locust Locust scanning Sobel, 1990 Moved target, so as to make  independent of D Reproduced function relating take off velocity to D

Learning by Parameter Estimation Animal’s (including insects) use sun as compass reference To do this, must learn solar ephemeris: sun’s compass bearing as a function of the time of day--where it is when Solar ephemeris varies with latitude and season

Learning from the Dance Returning forager does a dance to tell other foragers the location (range & bearing) of source Compass bearing, , specified by specifying current solar bearing,  Range specified by number of waggles Hopeless as an RL problem?  = compass bearing of sun  = compass bearing of source  =solar bearing of source

Ephemeris Framework

Deceived Dancing Dyer, 1987

Poverty of Stimulus Dyer & Dickinson, 1994 Incubator raised bees allowed to forage to station due west of hive but only in late afternoon when sun declining in west On heavy overcast day, moved to new field line with different compass orientation and allowed to forage in morning (with feeder “west” of hive location) Experimenter observes dance of returning foragers to estimate where they believe the sun to be

Bees Believe Earth is Round

Implications Form of solar ephemeris equation is built into the nervous system Only its parameters are estimated from observation Solves poverty of the stimulus problem: the information about universal properties of the ephemeris in the priors Neural net without this prior information could not generalize as bees do

Language Learning Same story? Innate universal grammar specifies structure common to all language Distinctions between languages are due to differences in parameters (e.g., head final versus head first) Learning a language reduces to learning the (binary?) parameter values Mark Baker (2001) The Atom’s of Language

Natural Learning Curves Gallistel et al (PNAS 2004) Analyzed individual(!) learning curves from standard paradigms and in pigeons, rats, rabbits and mice  Pavlovian (autoshaping in pigeon, rat & mouse)  Eyeblink in rabbit  + Maze in rat  Water maze in mouse Regardless of paradigm, the typical curve cannot be distinguished from a step function Latency and size of step varies between subjects Averaging across these steps produces a gradual learning curve: it’s gradualness is an averaging artifact

Matching Subjects foraging back and forth between locations where food becomes available unpredictably (on random rate schedules with unlimited holds) Subjects match the ratio of the time they invest in the locations (expected stay duration, T 1 /T 2 ) to the ratio of the incomes they have derived from them (I 1 /I 2 ) Matching equates returns: R i = I i /T i ; I 1 /T 1 = I 2 /T 2 iff T 1 /T 2 = I 1 /I 2

RL Models Most assume hill-climbing discovery of the policy that equates returns Policy is one dimensional (ratio of expected stay durations) Try-out given policy (stay ratio) Determine direction of inequality Adjust investment ratio accordingly

But (Gallistel et al 2001) Adjustment of investment ratio after a step change in the relative rates of reward is quick and step-like

Bayesian Ideal Detector Analysis

Second Example

 Incomes, Not  Returns Evidence of a change in behavior appears as soon as there is evidence of a change in incomes And (often) before there is evidence of a change in returns

Evidence of Absence of Evidence Upper panel: Odds that subject’s stay durations had changed as a function of session time Lower panel: Odds that subject’s returns had changed. There was no evidence--in the returns!

Implications Matching is an innate policy Depends only on estimates of incomes Anti-aliasing sampling behavior to detect periodic structure in reward provision built into policy Estimates of incomes to be expected based on small samples taken only when a change in income detected Here, too, learning is model updating, not policy value updating Subjects perversely ignore returns (policy values)

Conclusions Most (all?) natural learning looks like model estimation Efficient model estimation is made possible by  Informative priors (a highly structured problem-specific hypothesis space)  Innately specified efficient sampling routines