Advancing Motivated Learning with Goal Creation

Slides:

Advertisements

Similar presentations

Reinforcement Learning

Advertisements

Lirong Xia Reinforcement Learning (2) Tue, March 21, 2014.

Markov Decision Process

Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.

Analog Circuits for Self-organizing Neural Networks Based on Mutual Information Janusz Starzyk and Jing Liang School of Electrical Engineering and Computer.

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Emergent Representations and Reasoning in Adaptive Agents Joost Broekens, Doug.

x – independent variable (input)

Self-Organizing Hierarchical Neural Network

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005.

Mental Development and Representation Building through Motivated Learning Janusz A. Starzyk, Ohio University, USA, Pawel Raif, Silesian University of Technology,

1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.

IJCNN, International Joint Conference on Neural Networks, San Jose 2011 Pawel Raif Silesian University of Technology, Poland, Janusz A. Starzyk Ohio University,

D Nagesh Kumar, IIScOptimization Methods: M1L4 1 Introduction and Basic Concepts Classical and Advanced Techniques for Optimization.

Reinforcement Learning and Soar Shelley Nason. Reinforcement Learning Reinforcement learning: Learning how to act so as to maximize the expected cumulative.

Making Decisions CSE 592 Winter 2003 Henry Kautz.

A Hybrid Self-Organizing Neural Gas Network James Graham and Janusz Starzyk School of EECS, Ohio University Stocker Center, Athens, OH USA IEEE World.

Calibration Guidelines 1. Start simple, add complexity carefully 2. Use a broad range of information 3. Be well-posed & be comprehensive 4. Include diverse.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Verve: A General Purpose Open Source Reinforcement Learning Toolkit Tyler Streeter, James Oliver, & Adrian Sannier ASME IDETC & CIE, September 13, 2006.

Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University

Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.

1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.

1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.

Chapter 13 Simple Linear Regression

Figure 5: Change in Blackjack Posterior Distributions over Time.

M. Lopes (ISR) Francisco Melo (INESC-ID) L. Montesano (ISR)

Deep Feedforward Networks

Chapter 11: Artificial Intelligence

12. Principles of Parameter Estimation

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 8

Reinforcement Learning (1)

Consciousness and Cognition

Reinforcement learning (Chapter 21)

Cache Memory Presentation I

Markov Decision Processes

Reinforcement Learning

Classification with Perceptrons Reading:

Understanding Standards Event Higher Statistics Award

Biomedical Data & Markov Decision Process

UAV Route Planning in Delay Tolerant Networks

Hidden Markov Models Part 2: Algorithms

Announcements Homework 3 due today (grace period through Friday)

with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017

CS 188: Artificial Intelligence Fall 2007

Chapter 2: Evaluative Feedback

Integration of sensory modalities

Designing Samples Section 5.1.

Neural Networks Geoff Hulten.

Artificial Intelligence Lecture No. 28

Reinforcement Learning

October 6, 2011 Dr. Itamar Arel College of Engineering

Markov Decision Problems

CS 188: Artificial Intelligence Spring 2006

Introduction to Reinforcement Learning and Q-Learning

Chapter 7: Eligibility Traces

Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.

Mathematical Foundations of BME Reza Shadmehr

12. Principles of Parameter Estimation

Physics-guided machine learning for milling stability:

Chapter 2: Evaluative Feedback

Reinforcement Learning (2)

Markov Decision Processes

Markov Decision Processes

Reinforcement Learning

Reinforcement Learning (2)

Morteza Kheirkhah University College London

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 8

Presentation transcript:

Advancing Motivated Learning with Goal Creation James Graham1, Janusz A. Starzyk1,2, Zhen Ni3 and Haibo He3 1School of Electrical Engineering and Computer Science Ohio University, Athens, OH, USA 2University of Information Technology and Management Rzeszow, Poland 3Electrical, Computer, and Biomedical Engineering University of Rhode Island, Kingston, RI, USA

Overview Introduction Enhancements to Motivated Learning Bias calculation Use of desirability and availability Probabilistic goal selection Desired Resource Levels Resource level as an optimization problem Resource dependencies Desirability calulations Comparison to RL algorithms Conclusions Give some example of biologically inspired designs? JTG - 12/10/14 - SSCI - CIHLI

Motivated Learning Controlled by underlying “primitive” motivations Builds on motivations to create additional “abstract” motivations Motivation Hierarchy Unlike in RL, focus is not on maximizing externally set rewards, but on intrinsic rewards and creating mission related new goals and motivations. Intrinsic Intrinsic General purpose? What machine learning is used for. ML is part of machine learning. Autonomous learning, rovers, etc. Intrinsic Extrinsic JTG - 12/10/14 - SSCI - CIHLI

Improvements to ML Bias/Pain calculations Resource availability Learning to select actions Probabilistic goal selection Determining desired resource levels JTG - 12/10/14 - SSCI - CIHLI

Significance of bias signals Initially we only have primitive needs (no biases) Bias is a foundation for the creation of new needs0 Bias is a preference for or aversion to something (resource or action) Bias results from an existing need being helped or hurt by a resource or action Level of bias is measured related to the availability of a resource or likelihood of an action Measures presented in next slide JTG - 12/10/14 - SSCI - CIHLI

Bias based on availability and desirability Availability based bias A bias signal triggers an abstract pain and is defined depending on the type of perceived situation. Bias reflects the level of need associated with the resource/action Bias based on availability is more general than than calculated strictly for an action or a resource. It can also be adjusted for desirability. Rd is a desired resource value (at a sensory input si) Rc is a current resource value A is the Availability calculation dc is the current distance to another agent dd is a desired (a comfortable) distance to another agent JTG - 12/10/14 - SSCI - CIHLI

Bias based on availability and desirability Bias for a desired resource Bias for a desired action Bias for an undesired action Bias for an undesired resource Shown equation was chosen because it is linear (but won’t hit INF) In the case of desired resource we want non-zero bias for both too little and to much of a resource. We also prefer “exponential” (nonlinear)” growth of bias for desired situations. Why use linear calc. for undesired situations? JTG - 12/10/14 - SSCI - CIHLI

Probabilistic goal selection Uses normalized wPG weights to select actions based on probability. However, previous wPG calculation could lead to weight saturation at αg, so we used the following: This causes the weights to saturate at (3/π)atan(ds/dŝ) (ds/dŝ) measures how useful action is at restoring resource How does the agent learn goals in the first place (before selecting them) Lamda_i is pain reduction should be lambda_p Transition from previous slide. How do we go from bias to action calculation? Bias is used to calculate pain. Changes in pain reflect how action affect the agent. We need to update weights between pain and goals Weights are used to select actions/goals. Previous slide was deterministic, here we include a level of randomness by using wpg weights to select actions based on probability We explore a wider range of possible actions instead of settling on the first valid action found. JTG - 12/10/14 - SSCI - CIHLI

Probabilistic goal selection WPG weights Weights will saturate as determined by (ds/dŝ ) tend toward zero Figure 1 shows wpg weights without probabilistic selection Figure 2 shows wpg weights where there are 3 valid actions for a specific pain. Show that Probabalistic ML tests all actions and learns three different actions that can reduce pain compared to only one action learned in non-probabalistic approach JTG - 12/10/14 - SSCI - CIHLI

Probabilistic goal selection Here we show how wbp weights are affected by the different goal selction approaches. While significantly noisier due to the effect of probabilistically choosing more “incorrect” actions, the wBPweights of Fig. 7 indicate that agent is able to discover the usefulness of resources significantly earlier in the simulation when using probabilistic goal section. Wbp weight levels remain similar (most sig. resources remain high) Object 8 is not discovered at all on left, while on right it is discovered ~2500 When actions are not taken according to policy we have higher randomness in wbp weight adjustment (more “invalid” actions for the discovered needs) Without probabilistic selection With probabilistic selection JTG - 12/10/14 - SSCI - CIHLI

Determining desired resource levels Desired values should be set according to the agent’s needs. To begin, the agent is given the initial “primitive” resource level, Rdp. The agent must learn the rate at which “desired” resources are used (∆p). The agent can use its knowledge of the environment to set the desired resource levels. Resource levels are established only for resources that the agent cares about. The frequency of performing tasks cannot be too great as the agent’s time is limited. The agent also needs to “learn”. Gone over selecting actions. How does the agent know how much to use. Agent must set the desired resource levels Desired resource levels and current levels effect bias and JTG - 12/10/14 - SSCI - CIHLI

Determining desired resource levels To establish the optimum level of desired resources we solve the optimization problem subject to constraints and sum of all frequencies is less than 1 where the restoration frequency is Add notes What is alpha i? Resource I F_s_hat (node frequency) is the resource alpha s_hat times the sum of all lower level frequencies dependent on it JTG - 12/10/14 - SSCI - CIHLI

Determining desired resource levels - example The agent starts with levels for multiple resource set to the initially observed environment state. As it learns to use specific resources it adjusts the levels at which it wants to maintain said resources. Each resource equilibrates to a different level Initial setting was not optimum. We can observe some shuffling in level order JTG - 12/10/14 - SSCI - CIHLI

Reinforcement Learning Reinforcement learning maximizes external reward Learns approximating value functions Usually a single function May include “subgoal” generation and “curiosity” Primarily reactive Objectives are set by the designer JTG - 12/10/14 - SSCI - CIHLI

Motivated Learning Controlled by underlying motivations Uses existing motivations to create additional “abstract” motivations ML focus is not on maximizing externally set objectives (as is RL), but on learning new motivations, and building and supporting its internal reward system Minimax – minimize pain Primarily deliberative JTG - 12/10/14 - SSCI - CIHLI

Comparison to other RL algorithms Algorithms tested: Q-learning SARSA Hierarchical RL – MAXQ Neural Fitted Q Iteration (NFQ) TD-FALCON JTG - 12/10/14 - SSCI - CIHLI

Comparison to other RL algorithms – test environment Testing environment is a simplified version of what we use in NeoAxis. In NeoAxis we have pains, tasks, triggering pains, and (maybe) NACs. Comparison test is a “Black Box” that has no NACs and is run as a simplified environment making RL algorithms more compatible and easier to interface. Images from current NeoAxis implementation should be used for this and the follow NeoAxis slides We have pains, tasks, resources, triggering pains, and (maybe) NACs. Black box scenario that can fit “all” algorithms. JTG - 12/10/14 - SSCI - CIHLI

Comparison to other RL algorithms - results Algorithms tested: Q-learning, SARASA, HRL, ML ML HRL, Q-Learning, SARSA Plot or normalized average reward ML can work in more general environments NFQ TD-Falcon JTG - 12/10/14 - SSCI - CIHLI

NFQ Results Note highlighted lines and see both when the occur and their general profile Some test very well, others poorly. Due to oscillation, average is worse than Sarsa, etc. JTG - 12/10/14 - SSCI - CIHLI

Conclusion Designed and implemented several enhancements to the Motivated Learning architecture Bias calculations Goal Selection Setting desired resource levels Compared ML to several RL algorithms using a basic test environment and simple reward scenario. ML achieved higher average reward faster than other algorithms tested JTG - 12/10/14 - SSCI - CIHLI

Questions? JTG - 12/10/14 - SSCI - CIHLI

Bias signal calculation for resources For resource related pain Rd is a desired resource value (at a sensory input si) Rc is a current resource value ε is a small positive number γ regulates how quickly pain increases δr=1 when the resource is desired, δr=-1 when it is not; δr=0 otherwise Shown equation was chosen because it is linear (but won’t hit INF) A bias signal triggers an abstract pain and is defined depending on the type of perceived situation. Bias reflects the level of need associated with the resource/action JTG - 12/10/14 - SSCI - CIHLI

Learning and selecting actions Goals are selected based on pain-goal weights: δp indicates how the associated pain changed ∆a , outside of μg ensures the weights stay below the ceiling of αg=1 μg determines the rate of chance Transition from previous slide. How do we go from bias to action calculation? Bias is used to calculate pain. Changes in pain reflect how action affect the agent. We need to update weights between pain and goals Weights are used to select actions/goals. JTG - 12/10/14 - SSCI - CIHLI

Comparing Reinforcement Learning to Motivated Learning Compare ML and RL Reinforcement Learning Motivated Learning Single value function Multiple value functions Measurable awards Internal immeasurable rewards Predictable Unpredictable Objectives set by designer Sets its own (“abstract”) objectives Maximizes the reward Solves minimax problem Potentially unstable Always stable Always active Acts only when needed JTG - 12/10/14 - SSCI - CIHLI