TRIALS AND TRIBULATIONS Architectural Constraints on Modeling a Visuomotor Task within the Reinforcement Learning Paradigm.

TRIALS AND TRIBULATIONS Architectural Constraints on Modeling a Visuomotor Task within the Reinforcement Learning Paradigm

SUBJECT OF INVESTIGATION  How humans integrate visual object properties into their action policy when learning a novel visuomotor task. BubblePop!  Problem: Too many possible questions…  Solution: Motivate behavioral research by looking at modeling difficulties. Nonobvious crossroads

APPROACH  Since the task has a scalar performance signal, model must utilize reinforcement learning. Temporal Difference Back Propagation  Start with an extremely simplified version of the task and add back the complexity once you have a successful model.  Analyze the representational and architectural constraints necessary for each model.

 5x5 grid-world  4 possible actions Up, down, left, right  1 unmoving target  Starting locations of target and agent randomly assigned  Fixed reward upon reaching target and a new target generated  Epoch ends after fixed number of steps FIRST STEPS: DUMMY WORLD

DUMMY WORLD ARCHITECTURES 25 units for the grid 4 Actions 8 Hidden Layer 1 Expected Reward (ego only) The whole grid (allocentric), or agent centered (egocentric)

 Current architectures learn each action independently.  ‘Up’ is like ‘Down’, but different. It shifts the world  1 action, 4 different inputs “In which rotation of the world would you rather go ‘up’ in?” BUILDING IN SYMMETRY

 Scaled grid size up to 10x10 Not as unrealistic as one might think… (tile coding)  Scaled number of targets Difference from 1 to 2, but not from 2 to many.  Confirmed ‘winning-est’ representation  Added memory WORLD SCALING

 Added a ‘ripeness’ dimension to target, and changed the reward function: If target.ripeness >.60 reward = 1; Else reward = -.66667; NO LOW HANGING FRUIT: THE RIPENESS PROBLEM How the problem occurs: 1.At a high temperature you move randomly. 2.The random pops net zero reward. 3.The temperature lowers and you ignore the target entirely.

ANNEALING AWAY THE CURSE OF PICKINESS

 No feedback for almost ripe  So how could we anneal our ripeness criterion?  Anneal the amount you care about unripe pops.  Differentiate internal and extern reward functions A PSYCHOLOGICALLY PLAUSIBLE SOLUTION

FUTURE DIRECTIONS  Investigate how the type of ripeness difficulty impacts computational demands. Difficulty due to reward schedule vs. perceptual acuity vs. redundancy vs. conjunctive-ness vs. ease of prediction  How to handle the ‘Feature Binding ‘Problem’ in this context Emergent binding through deep learning?  Just keep increasing complexity and see what problems crop up. If the model gets to human level performance without a hitch, then that’d be pretty good to.

SUMMARY& DISCUSSION  Egocentric representations pay off in this domain, even with the added memory cost. In any domain with a single agent?  Symmetries in the action space can be exploited to greatly expedite learning Could there be a general mechanism for detecting such symmetries?  Difficult reward functions might be learnt via annealing internal reward signals. How could we have this annealing emerge from the model?

QUESTIONS?

TRIALS AND TRIBULATIONS Architectural Constraints on Modeling a Visuomotor Task within the Reinforcement Learning Paradigm.

Similar presentations

Presentation on theme: "TRIALS AND TRIBULATIONS Architectural Constraints on Modeling a Visuomotor Task within the Reinforcement Learning Paradigm."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

TRIALS AND TRIBULATIONS Architectural Constraints on Modeling a Visuomotor Task within the Reinforcement Learning Paradigm.

Similar presentations

Presentation on theme: "TRIALS AND TRIBULATIONS Architectural Constraints on Modeling a Visuomotor Task within the Reinforcement Learning Paradigm."— Presentation transcript:

Similar presentations

About project

Feedback