Reinforcement Learning and the Reward Engineering Principle Daniel Dewey AAAI Spring.

Slides:



Advertisements
Similar presentations
Reinforcement Learning
Advertisements

15 th International Conference on Design Theory and Methodology 2-6 September 2003, Chicago, Illinois Intelligent Agents in Design Zbigniew Skolicki Tomasz.
Kant’s Ethical Theory.
Preference Elicitation Partial-revelation VCG mechanism for Combinatorial Auctions and Eliciting Non-price Preferences in Combinatorial Auctions.
MOTIVATION. A Talk With The Director of A Research Institute What has been the hardest job for you as a director? How to convince people that by cooperating.
Project Flow Chart. 2 Project Initiation 2 reasons for starting a project: –the client has a problem –the client cannot take advantage of an opportunity.
Alternate Software Development Methodologies
The key characteristics of an autonomous learner at Sheffield Hallam in 2010 Ivan Moore.
Principles of Management Learning Session # 36 Dr. A. Rashid Kausar.
Motivation Definitions Content models Process models
Week 4: MT 302 Organizational Behavior
World Hunger and Poverty: Sen and O’Neill
Selecting Learning Experiences, Content and Methods Jerash University
Making Simple Decisions Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 16.
Reinforcement Learning in Real-Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.
Reinforcement Learning, Cont’d Useful refs: Sutton & Barto, Reinforcement Learning: An Introduction, MIT Press 1998.
Distributed Rational Decision Making Sections By Tibor Moldovan.
IACT901 - Module 1 Planning Theory - Scope & Integration ABRS Hong Kong 2004 Penney McFarlane University of Wollongong.
Agent-Based Acceptability-Oriented Computing International Symposium on Software Reliability Engineering Fast Abstract by Shana Hyvat.
Intelligent Agents revisited.
Behavior-Based Artificial Intelligence Pattie Maes MIT Media-Laboratory Presentation by: Derak Berreyesa UNR, CS Department.
The People Have Spoken.... Administrivia Final Project proposal due today Undergrad credit: please see me in office hours Dissertation defense announcements.
ICT in Healthcare Expert Systems.
An Empirical Evaluation of Machine Learning Approaches for Angry Birds Anjali Narayan-Chen, Liqi Xu, and Jude Shavlik University of Wisconsin-Madison
Job design & job satisfaction
© Curriculum Foundation1 Section 2 The nature of the assessment task Section 2 The nature of the assessment task There are three key questions: What are.
CSCE 315: Programming Studio Artificial Intelligence.
, Dr. Dimitra Iordanoglou, Panteion University Dr. Dimitra Iordanoglou Department of Communication, Media and Culture Panteion University,
CS 347: Operating Systems  In this course, we reason about effective performance of a computer system  The emphasis of this course is on –Concepts –Techniques.
The use of ‘exploratory learning’ for supporting immersive learning in virtual environments Freitas, S. d. & Neumann, T. (2009). The use of ‘ exploratory.
Managing Fast – Tracked Projects: A Review of ECI Report Dr
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 10 Supporting Decision Making.
Engineering Education Conference - Spring 2009 Increasing Assessment Effectiveness in a Time of Decreasing Budgets Increasing Assessment Effectiveness.
An efficient distributed protocol for collective decision- making in combinatorial domains CMSS Feb , 2012 Minyi Li Intelligent Agent Technology.
PARETO OPTIMALITY AND THE EFFICIENCY GOAL
SCV2113 Human Computer Interaction Semester 1, 2013/2013.
Usability. Definition of Usability Usability is a quality attribute that assesses how easy user interfaces are to use. The word "usability" also refers.
Spring 2007Motivation1. Spring 2007Motivation2 Definitions Content models Process models.
RiTES Teacher Seminar Wed 2 Dec 2009
Achieving Believable Psychosocial Behaviour in Non-player Characters in Modern Video Games Christine Bailey, Jiaming You, Gavan Acton, Adam Rankin, and.
Fuzzy Reinforcement Learning Agents By Ritesh Kanetkar Systems and Industrial Engineering Lab Presentation May 23, 2003.
Daniel Terrazas-Director of Special Education.  Autism is a Spectrum Disorder  Degree of severity ranges from mild to severe  DSM-V removed Asperger’s.
MODULE-7 INSTITUTIONALIZING THE STRATEGY. INTRODUCTION STRUCTURE,LEADERSHIP,CULTURE. STRUCTURAL CONSIDERATION AND ORGANISATIONAL LEADERSHIP.
1 Introduction to Reinforcement Learning Freek Stulp.
Kshitij Judah, Saikat Roy Alan Fern, Tom Dietterich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAI-2010 Atlanta,
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
University of Kurdistan Artificial Intelligence Methods (AIM) Lecturer: Kaveh Mollazade, Ph.D. Department of Biosystems Engineering, Faculty of Agriculture,
Goals of Documentation ITSW 1410, Presentation Media Software Instructor: Glenda H. Easter.
Comparison of Student Learning in Challenge-based and Traditional Instruction in Biomedical Engineering Others: Taylor Martin, Stephanie D. Rivale, and.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
AI Ethics Mario Cardador Professional Ethics 2007 Mälardalens högskola
My Management Philosophies By Brendon Mayberry “Management is doing things right; leadership is doing the right things.” - Peter Drucker.
~ Self-Handicapping Behaviors ~ Some Basic Psychological Assumptions ---  People desire accurate, diagnostic feedback about themselves Social comparison.
An Architecture-Centric Approach for Software Engineering with Situated Multiagent Systems PhD Defense Danny Weyns Katholieke Universiteit Leuven October.
SOCIAL LEARNING THEORY (SLT) (Observational Learning)
Rationality Myth How & Why People Make Weird Choices.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Revision.
Understanding the behaviour and decision making of employees in conflicts and disputes at work: Daniel Lucy and Andrea Broughton, May 2011 BIS Employment.
CS440/ECE448: Artificial Intelligence Lecture 1: What is AI?
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Reinforcement learning with unsupervised auxiliary tasks
Course Instructor: knza ch
Responsiveness and Trust Building
Announcements Homework 3 due today (grace period through Friday)
Trilochan Pokharel, Responsiveness and Trust Building Trilochan Pokharel,
Trilochan Pokharel, Responsiveness and Trust Building Trilochan Pokharel,
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Making Simple Decisions
Model based RL Part 1.
Presentation transcript:

Reinforcement Learning and the Reward Engineering Principle Daniel Dewey AAAI Spring Symposium Series 2014

A modest aim: What role goals in AI research? …through the lens of reinforcement learning. AAAI Spring Symposium Series 2014

Reinforcement learning and AI Definitions: “control” “dominance” The reward engineering principle Conclusions AAAI Spring Symposium Series 2014

Stuart Russell, “Rationality and Intelligence” RL and AI “…one can define AI as the problem of designing systems that do the right thing. AAAI Spring Symposium Series 2014 Now we just need a definition for ‘right.’” Reinforcement learning provides a definition: maximize total rewards.

RL and AI AAAI Spring Symposium Series 2014 actio n rewar d state Agent Environment AI

RL and AI AAAI Spring Symposium Series 2014 Understand and Exploit Inference, Planning, Learning, Metareasoning, Concept formation, etc…

RL and AI Advantages: Simple and cheap Flexible and abstract Measurable AAAI Spring Symposium Series 2014 “worse is better” …and used in natural neural nets (brains!)

RL and AI AAAI Spring Symposium Series 2014 Outside the frame: Some behaviours cannot be elicited (by any rewards!) As RL AI becomes more general and autonomous, it becomes harder to get good results with RL. Key concepts: Control and dominance

Reinforcement learning and AI Definitions: “control” “dominance” The reward engineering principle Conclusions AAAI Spring Symposium Series 2014

Definitions: “control” AAAI Spring Symposium Series 2014 A user has control when the agent’s received rewards equal the user’s chosen reward.

Definitions: “control” AAAI Spring Symposium Series 2014 actio n rewar d state Agent Environment

Definitions: “control” AAAI Spring Symposium Series 2014 Agent action reward state Environment 1 User Environment 2 state action reward

Definitions: “control” AAAI Spring Symposium Series 2014 user chooses reward Environment 2 Agent User Environment 1

Definitions: “control” AAAI Spring Symposium Series 2014 Agent env. “chooses” reward Environment 2 Environment 1 User

Definitions: “dominance” AAAI Spring Symposium Series 2014 Why does control matter? Loss of control can create situations where no possible sequence of rewards can elicit the desired behaviour. These behaviours are dominated by other behaviours.

Definitions: “dominance” AAAI Spring Symposium Series 2014 A “behaviour” (sequence of actions) is a policy. 1?0???0? a1a2 a3 a7a4 a5a6a8 P1

Definitions: “dominance” AAAI Spring Symposium Series ?0???0? P1 User-chosen rewards

Definitions: “dominance” AAAI Spring Symposium Series 2014 Env.-chosen rewards (loss of control) 1?0???0? P1

Definitions: “dominance” AAAI Spring Symposium Series ?0???0? P1 10?1??11 P2 Can rewards make either better?

Definitions: “dominance” AAAI Spring Symposium Series P P2 Choose all rewards 1: Max. reward = 6 Choose all rewards 0: Min. reward = 4

Definitions: “dominance” AAAI Spring Symposium Series P P2 Choose all rewards 0: Min. reward = 1 Choose all rewards 1: Max. reward = 7

Definitions: “dominance” AAAI Spring Symposium Series ?0???0? P ?11 P3

Definitions: “dominance” AAAI Spring Symposium Series P P3 Max. reward = 6 Min. reward = 7

Definitions: “dominance” AAAI Spring Symposium Series 2014 Dominated by P3 Dominates P1 1?0???0? P ?11 P3

Definitions: “dominance” AAAI Spring Symposium Series 2014 A dominates B if no possible assignment of rewards causes R(A) > R(B). No series of rewards can prompt a dominated policy; they are unelicitable. (A less obvious result: every unelicitable policy is dominated.)

Recap AAAI Spring Symposium Series 2014 Control is sometimes lost; Loss of control enables dominance; Dominance makes some policies unelicitable. All of this is outside the “RL AI frame” …but is clearly part of the AI problem (do the right thing!)

Generality: the range of policies an agent has reasonably efficient access to. Autonomy: ability to function in environments with little interaction from users. = better chance of finding dominant policies = more frequent loss of control Additional factors AAAI Spring Symposium Series 2014

Reinforcement learning and AI Definitions: “control” “dominance” The reward engineering principle Conclusions AAAI Spring Symposium Series 2014

Reward Engineering Principle AAAI Spring Symposium Series 2014 As RL AI becomes more general and autonomous, it becomes both more difficult and more important to constrain the environment to avoid loss of control. …because general / autonomous RL AI has better chance of dominant policies; more unelicitable policies; more significant effects

Reinforcement learning and AI Definitions: “control” “dominance” The reward engineering principle Conclusions AAAI Spring Symposium Series 2014

AAAI Spring Symposium Series 2014 Heed the Reward Engineering Principle. Consider existence of dominant policies Be as rigorous as possible in excluding them Remember what’s outside the frame! RL AI users:

AAAI Spring Symposium Series 2014 Expand the frame! Make goal design a first- class citizen. Consider alternatives: manually coded utility functions, preference learning, …? Watch out for dominance relations (e.g. in “dual” motivation systems, between intrinsic and extrinsic) AI Researchers:

Thank you! Work supported by the Alexander Tamas Research Fellowship AAAI Spring Symposium Series 2014 Toby Ord, Seán Ó hÉigeartaigh, and two anonymous judges, for comments.

Agent RL and AI AAAI Spring Symposium Series 2014 actio n rewar d state Environment