IJCNN, International Joint Conference on Neural Networks, San Jose 2011 Pawel Raif Silesian University of Technology, Poland, Janusz A. Starzyk Ohio University, USA, Motivated Learning in Autonomous Systems
Outline Reinforcement Learning (RL) Goal Creation System (GCS) yields self-organizing pain based network Motivated Learning (ML) as a combination of RL + GCS Simulations Results Possible Applications of ML
hierarchical RL Machine Learning Methods intrinsic motivation PROBLEMS IN „REAL WORLD” APPLICATIONS like in AUTONOMOUS SYSTEMS machine learning supervised learning unsupervised learning corrective learning reinforcement learning „curse of dimensionality” lack of motivation for development „top-down approach” „bottom-up approach”
Reinforcement Learning learning through interaction with the environment RL as r ENVIRONMENT
Motivated Learning ML can combine internal goal creation system (GCS) and reinforcement learning (RL). Motivated learning (ML) is need based motivation, goal creation and learning in an embodied agent. An agent creates hierarchy of goals based on the primitive need signals. It receives internal rewards for satisfying its goals (both primitive and abstract). ML applies to EI working in a hostile environment.
action state GC reward GOALS (motivations) RL ML Motivated Learning – the main IDEA… intrinsic motivations created by learning machines.
An intelligent agent learns how to survive in a hostile environment. How to motivate a machine? We suggest that the hostility of the environment, is the most effective motivational factor.
Assumptions 1. ML agent is independent: it can act autonomously in its environment and is able to choose its own way of development. 2. ML agent’s interface to the environment is the same as RL agent’s. 3. Environment is hostile to the agent. 4. Hostility may be active or passive (depleted resources). 5. Environment is fully observable.
Goal Creation System Neural self-organizing pain-based structures Goal creation scheme a primitive pain is directly sensed an abstract pain is introduced by solving a lower level pain thresholded curiosity based pain Motivations and selection of a goal Motivations are as desires in BDI agent WTA competition selects motivation another WTA selects goals P2P2 G w PpG w BP1 1 PpPp G M2M2 w P1G w BP2 1 P1P1 S1S1 S2S2 B1B1 B2B2 M1M1. SkSk P2P2 G M w PG w BP2 B2B2 B1B1 w BP1 1 P1P1 1 UA -10 WTA
The least abstract The most abstract Office Bank Grocery Food SENSORMOTORINCREASEDECREASE FoodEatSugar levelFood supplies GroceryBuyFood suppliesMoney amount BankWithdrawMoney amountBank account OfficeWorkBank accountWorking possibilities Internal goals simple linear hierarchy between different goals Hierarchy of resources (and possible agent’s goals): Resources are distributed all over the „grid world”
Modified „grid world” This environment is: Complex, Dynamically changing, Fully observable. Agent must localize resources and learn how to utilize them
Environment Internal need signals Perception of resources Resources present in the environment can be used to satisfy the agent’s needs Subjective sense of „lack of resources ” By discovering useful resources and their dependencies, learned hierarchy of internal goals expresses the environment complexity. Resources are distributed all over the „grid world”
Relationships between internal goals doesn’t have to be a linear hierarchy. They may constitute a tree structure or a complex network of resource dependencies. Relationships between internal goals By discovering subsequent resources and their dependencies, the complexity of internal goal network grows. BUT each system may have unique experiences (reflecting personal history of development) Designer’s specified needs Top level resources need1need2 need3
Experiment that combines ML & RL Every resource discovered by the agent becomes a potential goal and is assigned a value function „level”. Goal Creation System establishes new goals and switches agent’s activity between them. RL algorithm learns value functions on different levels.
Experiment Results switching between goals at the beginning … … and at the end. Initially the agent uses many iterations to reach a goal (red dots). Sometimes it abandons the goal when another pain dominates. Final runs are shorter and more successful.
Comparing Primitive Pain Levels of RL & ML Experiment Results Moving average of the primitive pain signal. Initially RL agent learns better. Its performance deteriorates as the resources are depleted
Experiment Results Effectiveness in terms of cumulative reward : Reward determined by the designer of the experiment. Cumulative reward
Reinforcement LearningMotivated Learning Reinforcement Learning Motivated Learning Single value function – Various objectives Measurable rewards Predictable Objectives set by designer Maximizes the reward – Potentially unstable Learning effort increases with complexity Always active Multiple value functions – One for each goal Internal rewards Unpredictable Sets its own objectives Solves minimax problem – Always stable Learns better in complex environment than RL Acts when needed
Conclusions Motivated learning method, based on goal creation system, can improve learning of autonomus agents in special class of problems. ML is especially useful in complex, dynamic environments where it works according to learned hierarchy of goals. Individual goals use well known reinforcement learning algorithms to learn their corresponding value functions. ML concerns building internal representations of useful environment percepts, through interaction with the environment. ML switches machine’s attention and sets intended goals becoming an important mechanism for a cognitive system.
„The real danger is not that computers will begin to think like man, but that man will begin to think like computers.” Sydney J. Harris
References: J.A. Starzyk, J.T. Graham, P. Raif, and A-H.Tan, Motivated Learning for the Development of Autonomous Systems, Cognitive Systems Research, Special issue on Computational Modeling and Application of Cognitive Systems, 12 January Starzyk J.A., Raif P., Ah-Hwee Tan, Motivated Learning as an Extension of Reinforcement Learning, Fourth International Conference on Cognitive Systems, CogSys 2010, ETH Zurich, January Starzyk J.A., Raif P., Motivated Learning Based on Goal Creation in Cognitive Systems, Thirteenth International Conference on Cognitive and Neural Systems, Boston University, May J. A. Starzyk, Motivation in Embodied Intelligence, Frontiers in Robotics, Automation and Control, I-Tech Education and Publishing, Oct. 2008, pp