Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson.

Slides:

Advertisements

Similar presentations

Object Persistence for Synthetic Characters Damian Isla Bungie Studios Microsoft Corp. Bruce Blumberg Synthetic Characters MIT Media Lab.

Advertisements

Cognitive Systems, ICANN panel, Q1 What is machine intelligence, as beyond pattern matching, classification and prediction. What is machine intelligence,

Finding aesthetic pleasure on the edge of chaos: A proposal for robotic creativity Ron Chrisley COGS Department of Informatics University of Sussex Workshop.

Chapter 11 user support. Issues –different types of support at different times –implementation and presentation both important –all need careful design.

Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08.

Chapter 3: Understanding users. What goes on in the mind?

Learning from how dogs learn Prof. Bruce Blumberg The Media Lab, MIT Prof. Bruce Blumberg The Media Lab, MIT.

Cognitive Neuroscience Treatment Research to Improve Cognition in Schizophrenia.

Progress Report Reihaneh Rabbany Presented for NLP Group Computing Science Department University of Alberta April 2009.

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.

Introduction to Data-driven Animation Jinxiang Chai Computer Science and Engineering Texas A&M University.

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 9: Planning and Learning pUse of environment models pIntegration of planning.

Reinforcement Learning

Administrivia/Announcements Project 0 will be taken until Friday 4:30pm –If you don’t submit in the class, you submit to the dept office and ask them.

Integrated Learning & Training for Interactive Characters Bruce Blumberg & the Synthetic Characters Group

Designing Help… Mark Johnson Providing Support Issues –different types of support at different times –implementation and presentation both important.

Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.

Evolutionary Games The solution concepts that we have discussed in some detail include strategically dominant solutions equilibrium solutions Pareto optimal.

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Interactive Control of Avatars Animated with Human Motion Data Jehee Lee Carnegie Mellon University Seoul National University Jehee Lee Carnegie Mellon.

Reinforcement Learning: Learning to get what you want... Sutton & Barto, Reinforcement Learning: An Introduction, MIT Press 1998.

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 9: Planning and Learning pUse of environment models pIntegration of planning.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

COMPUTATIONAL MODELING OF INTEGRATED COGNITION AND EMOTION Bob MarinierUniversity of Michigan.

Studying Visual Attention with the Visual Search Paradigm Marc Pomplun Department of Computer Science University of Massachusetts at Boston

Copyright ©2011 Pearson Education

 Pearce et al. (2006, Experiment 1) found potentiation of geometry learning by a feature in a rectangular enclosure, and overshadowing of geometry by.

4/12/2007dhartman, CS A Survey of Socially Interactive Robots Terrance Fong, Illah Nourbakhsh, Kerstin Dautenhahn Presentation by Dan Hartmann.

Evaluation Techniques Material from Authors of Human Computer Interaction Alan Dix, et al.

Design for Interaction Rui Filipe Antunes

Visual Discovery Management: Divide and Conquer Abhishek Mukherji, Professor Elke A. Rundensteiner, Professor Matthew O. Ward XMDVTool, Department of Computer.

Interactive Discovery and Semantic Labeling of Patterns in Spatial Data Thomas Funkhouser, Adam Finkelstein, David Blei, and Christiane Fellbaum Princeton.

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH.

Computer Graphics 2 In the name of God. Outline Introduction Animation The most important senior groups Animation techniques Summary Walking, running,…examples.

Lecture 2b Readings: Kandell Schwartz et al Ch 27 Wolfe et al Chs 3 and 4.

F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden.

Reinforcement Learning

Bob Marinier Advisor: John Laird Functional Contributions of Emotion to Artificial Intelligence.

The Cognitive Dog Class 14 ABC’S OF LEARNING THEORY How Dogs Learn – The Basics.

Template matching and object recognition. CS8690 Computer Vision University of Missouri at Columbia Matching by relations Idea: –find bits, then say object.

Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.

Imitation and Social Intelligence for Synthetic Characters Daphna Buchsbaum, MIT Media Lab and Icosystem Corporation Bruce Blumberg, MIT Media Lab.

CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:

Artificial Intelligence Chapter 10 Planning, Acting, and Learning Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.

Chapter 1. Cognitive Systems Introduction in Cognitive Systems, Christensen et al. Course: Robots Learning from Humans Park, Sae-Rom Lee, Woo-Jin Statistical.

Basic cognitive processes - 1 Psych 414 Prof. Jessica Sommerville.

Reinforcement learning (Chapter 21)

Chapter 10 Planning, Acting, and Learning. 2 Contents The Sense/Plan/Act Cycle Approximate Search Learning Heuristic Functions Rewards Instead of Goals.

Course Requirements Class participation Written paper: around 1500 words Mock class Final exam (no midterm)

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

1 FollowMyLink Individual APT Presentation First Talk February 2006.

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.

Reinforcement Learning. Overview Supervised Learning: Immediate feedback (labels provided for every input). Unsupervised Learning: No feedback (no labels.

Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 5 Ann Nowé By Sutton.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

Planning, Acting, and Learning Chapter Contents The Sense/Plan/Act Cycle Approximate Search Learning Heuristic Functions Rewards Instead of Goals.

Learning Fast and Slow John E. Laird

Chapter 11: Learning Introduction

Reinforcement learning (Chapter 21)

Announcements Homework 3 due today (grace period through Friday)

Dr. Unnikrishnan P.C. Professor, EEE

Application to Animating a Digital Actor on Flat Terrain

Chapter 11 user support.

Artificial Intelligence Chapter 10 Planning, Acting, and Learning

Chapter 9: Planning and Learning

PSY402 Theories of Learning

Artificial Intelligence Chapter 10 Planning, Acting, and Learning

World models and basis functions

Presentation transcript:

Integrated Learning for Interactive Characters Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael P. Johnson, Bill Tomlinson.

Practical & compelling real-time learning Easy for interactive characters to learn what they ought to be able to learn Easy for a human trainer to guide learning process A compelling user experience Provide heuristics and practical design principles Easy for interactive characters to learn what they ought to be able to learn Easy for a human trainer to guide learning process A compelling user experience Provide heuristics and practical design principles

Related Work Reinforcement learning Reinforcement learning Barto & Sutton 98, Mitchell 97, Kaelbling 90, Drescher 91 Animal training Animal training Lindsay 00, Lorenz & Leyahusen 73, Ramirez 99, Pryor 99, Coppinger 01 Motor learning Motor learning van de Panne et al 93,94, Grzeszczuk & Terzopoulos 95, Hodgins & Pollard 97, Gleicher 98, Faloutsos et al 01 Behavior Architectures Behavior Architectures Reynolds 87, Tu & Terzopoulos 94, Perlin & Goldberg 96, Funge et al 99, Burke et al 01 Computer games & digital pets Computer games & digital pets Dogz, AIBO, Black & White Reinforcement learning Reinforcement learning Barto & Sutton 98, Mitchell 97, Kaelbling 90, Drescher 91 Animal training Animal training Lindsay 00, Lorenz & Leyahusen 73, Ramirez 99, Pryor 99, Coppinger 01 Motor learning Motor learning van de Panne et al 93,94, Grzeszczuk & Terzopoulos 95, Hodgins & Pollard 97, Gleicher 98, Faloutsos et al 01 Behavior Architectures Behavior Architectures Reynolds 87, Tu & Terzopoulos 94, Perlin & Goldberg 96, Funge et al 99, Burke et al 01 Computer games & digital pets Computer games & digital pets Dogz, AIBO, Black & White

Dobie T. Coyote Goes to School

Reinforcement Learning (R.L.) As Starting Point A1A2A3 S1Q(1,1)Q(1,2)Q(1,3) S2Q(2,1)Q(2,2)Q(2,3) S3Q(3,1)Q(3,2)Q(3,3) Utility of taking action A3 in state S2 Set of all possible actions Set of all possible states of world Dogs solve a simpler problem in a much larger space & one that is more relevant to interactive characters Dogs solve a simpler problem in a much larger space & one that is more relevant to interactive characters

D.L.: Take Advantage of Predictable Regularities Constrain search for causal agents by taking advantage of temporal proximity & natural hierarchy of state spaces Constrain search for causal agents by taking advantage of temporal proximity & natural hierarchy of state spaces Use consequences to bias choice of action But vary performance and attend to differences Explore state and action spaces on “as- needed” basis Explore state and action spaces on “as- needed” basis Build models on demand Constrain search for causal agents by taking advantage of temporal proximity & natural hierarchy of state spaces Constrain search for causal agents by taking advantage of temporal proximity & natural hierarchy of state spaces Use consequences to bias choice of action But vary performance and attend to differences Explore state and action spaces on “as- needed” basis Explore state and action spaces on “as- needed” basis Build models on demand

D.L.: Make Use of All Feedback: Explicit & Implicit Use rewarded action as context for identifying Use rewarded action as context for identifying Promising state space and action space to explore Good examples from which to construct perceptual models, e.g., A good example of a “sit-utterance” is one that occurs within the context of a rewarded Sit. Use rewarded action as context for identifying Use rewarded action as context for identifying Promising state space and action space to explore Good examples from which to construct perceptual models, e.g., A good example of a “sit-utterance” is one that occurs within the context of a rewarded Sit.

D.L.: Make Them Easy to Train Respond quickly to “obvious” contingencies Respond quickly to “obvious” contingencies Support Luring and Shaping Support Luring and Shaping Techniques to prompt infrequently expressed or novel motor actions “Trainer friendly” credit assignment “Trainer friendly” credit assignment Assign credit to candidate that matches trainer’s expectation Respond quickly to “obvious” contingencies Respond quickly to “obvious” contingencies Support Luring and Shaping Support Luring and Shaping Techniques to prompt infrequently expressed or novel motor actions “Trainer friendly” credit assignment “Trainer friendly” credit assignment Assign credit to candidate that matches trainer’s expectation

The System

Representation of State: Percept Percepts are atomic perception units Recognize and extract features from sensory data Model-based Organized in dynamic hierarchy

Representation of State-Action Pairs: Action Tuples Percept Action Value Novelty Reliability Value Percept Activation Action Tuples are organized in dynamic hierarchy and compete probabilistically based on their learned value and reliability

Representation of Action: Labeled Path Through Space of Body Configurations A motor program generates a path through a graph of annotated poses, e.g., A motor program generates a path through a graph of annotated poses, e.g., Sit animation Follow-your-nose procedure Paths can be compared and classified just like perceptual events using Motor Model Percepts Paths can be compared and classified just like perceptual events using Motor Model Percepts A motor program generates a path through a graph of annotated poses, e.g., A motor program generates a path through a graph of annotated poses, e.g., Sit animation Follow-your-nose procedure Paths can be compared and classified just like perceptual events using Motor Model Percepts Paths can be compared and classified just like perceptual events using Motor Model Percepts

Use Time to Constrain Search for Causal Agents Sit Attention Window: Look here for cues that appear correlated with increased likelihood of action being followed by a good thing Good Thing Consequences Window: Assume any good or bad things that happen here are associated with the preceding action and the context in which it was performed Scratch Time

Four Important Tasks Are Performed During Credit Assignment Choose most worthy Action Tuple heuristically based on reliability and novelty statistics Choose most worthy Action Tuple heuristically based on reliability and novelty statistics Update value Update value Create new Action Tuples as appropriate Create new Action Tuples as appropriate Guide State and Action Space Discovery Guide State and Action Space Discovery Choose most worthy Action Tuple heuristically based on reliability and novelty statistics Choose most worthy Action Tuple heuristically based on reliability and novelty statistics Update value Update value Create new Action Tuples as appropriate Create new Action Tuples as appropriate Guide State and Action Space Discovery Guide State and Action Space Discovery

Most Worthy Action Tuple Gets Credit Sit Time “sit-utterance” perceived. Good Thing “click” perceived. begins But credit goes to

Create New Action Tuples As Appropriate

Implicit Feedback Guides State Space Discovery Good Thing appears. Create a new Percept with “beg” example as initial model Time Utterance occurs within window but not classified by any existing percept “beg” This means that Percepts are only created to recognize “promising” utterances BegGood ThingScratch

Implicit Feedback Identifies Good Examples BegGood Thing Good Thing appears. Update model of “beg” utterance using “beg” that occurred in attention window Scratch Time Classify utterance as “beg”. “beg” This means model is built using good examples

Unrewarded Examples Don’t Get Added to Models BegSit Beg ends without food appearing. Do not update model since example may have been bad. Scratch Time Utterance classified as “Beg” by mistake. Beg becomes active. “Leg” Actually, bad examples can be used to build model of “not-Beg.”

Implicit Feedback Guides Action Space Discovery Good Thing appears. Compare accumulated path to known paths Time “Follow-your-nose” action accumulates path through pose-space Down Down gets the credit for Good Thing appearing, rather than “Follow-your-nose.” Follow-your-noseGood Thing

If Path Is Novel, Create a New Motor Program and Action Good Thing appears. Compare accumulated path to known paths Time “Follow-your-nose” action accumulates path through pose-space Figure-8 is created and subsequent examples of Figure-8 are used to improve model of path Figure-8 Follow-your-noseGood Thing

Dobie T. Coyote…

Limitations and Future Work Important extensions Important extensions Other kinds of learning (e.g., social or spatial) Generalization Sequences Expectation-based emotion system How will the system scale? How will the system scale? Important extensions Important extensions Other kinds of learning (e.g., social or spatial) Generalization Sequences Expectation-based emotion system How will the system scale? How will the system scale?

Useful Insights Use Use Temporal proximity to limit search. Hierarchical representations of state, action and state-action space & use implicit feedback to guide exploration “trainer friendly” credit assignment Luring and shaping are essential Luring and shaping are essential Use Use Temporal proximity to limit search. Hierarchical representations of state, action and state-action space & use implicit feedback to guide exploration “trainer friendly” credit assignment Luring and shaping are essential Luring and shaping are essential

Acknowledgements Members of the Synthetic Characters Group, past, present & future Members of the Synthetic Characters Group, past, present & future Gary Wilkes Gary Wilkes Funded by the Digital Life Consortium Funded by the Digital Life Consortium Members of the Synthetic Characters Group, past, present & future Members of the Synthetic Characters Group, past, present & future Gary Wilkes Gary Wilkes Funded by the Digital Life Consortium Funded by the Digital Life Consortium

Reinforcement Learning (R.L.) as starting point Goal Goal Learn optimal set of actions that will take creature from any arbitrary state to a goal state Approach Approach Probabilistically explore states, actions and their outcomes to learn how to act in any given state. Goal Goal Learn optimal set of actions that will take creature from any arbitrary state to a goal state Approach Approach Probabilistically explore states, actions and their outcomes to learn how to act in any given state.