Mind is About Predictions Rich Sutton AT&T Labs with special thanks to Michael Littman, Doina Precup, Satinder Singh, David McAllester.

Slides:



Advertisements
Similar presentations
Approaches, Tools, and Applications Islam A. El-Shaarawy Shoubra Faculty of Eng.
Advertisements

Artificial Intelligence: Knowledge Representation
An Introduction to Game Theory Part V: Extensive Games with Perfect Information Bernhard Nebel.
Heuristic Search techniques
Cognitive Systems, ICANN panel, Q1 What is machine intelligence, as beyond pattern matching, classification and prediction. What is machine intelligence,
Reinforcement Learning
P.M van Hiele Mathematics Learning Theorist Rebecca Bonk Math 610 Fall 2009.
© Jude Shavlik 2006, David Page 2007 CS 760 – Machine Learning (UW-Madison)RL Lecture, Slide 1 Reinforcement Learning (RL) Consider an “agent” embedded.
Dialogue Policy Optimisation
Problems and Their Classes
Planning with Non-Deterministic Uncertainty (Where failure is not an option) R&N: Chap. 12, Sect (+ Chap. 10, Sect 10.7)
Signals and Systems March 25, Summary thus far: software engineering Focused on abstraction and modularity in software engineering. Topics: procedures,
Situation Calculus for Action Descriptions We talked about STRIPS representations for actions. Another common representation is called the Situation Calculus.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Partially Observable Markov Decision Process (POMDP)
Holland on Rubin’s Model Part II. Formalizing These Intuitions. In the 1920 ’ s and 30 ’ s Jerzy Neyman, a Polish statistician, developed a mathematical.
INTRODUCTION TO MODELING
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
1 Dynamic Programming Week #4. 2 Introduction Dynamic Programming (DP) –refers to a collection of algorithms –has a high computational complexity –assumes.
Experience-Oriented Artificial Intelligence Rich Sutton with special thanks to Michael Littman, Doina Precup, Satinder Singh, David McAllester, Peter Stone,
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Decision Theoretic Planning
Planning under Uncertainty
Visual Recognition Tutorial
POMDPs: Partially Observable Markov Decision Processes Advanced AI
Probability theory Much inspired by the presentation of Kren and Samuelsson.
LEARNING DECISION TREES
Toward Grounding Knowledge in Prediction or Toward a Computational Theory of Artificial Intelligence Rich Sutton AT&T Labs with thanks to Satinder Singh.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
MAKING COMPLEX DEClSlONS
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
1 Performance Evaluation of Computer Networks: Part II Objectives r Simulation Modeling r Classification of Simulation Modeling r Discrete-Event Simulation.
Active Structure A + B = C. Logic and Numbers IF A + B = C THEN D + E = F.
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)
CS1Q Computer Systems Lecture 8
TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.
Lecture 05: Theory of Automata:08 Kleene’s Theorem and NFA.
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
How Solvable Is Intelligence? A brief introduction to AI Dr. Richard Fox Department of Computer Science Northern Kentucky University.
University of Windsor School of Computer Science Topics in Artificial Intelligence Fall 2008 Sept 11, 2008.
1 More About Turing Machines “Programming Tricks” Restrictions Extensions Closure Properties.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
University of Kurdistan Artificial Intelligence Methods (AIM) Lecturer: Kaveh Mollazade, Ph.D. Department of Biosystems Engineering, Faculty of Agriculture,
Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto Autonomous Learning Laboratory.
CSCI1600: Embedded and Real Time Software Lecture 28: Verification I Steven Reiss, Fall 2015.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
We Have Not Yet Begun to Learn Rich Sutton AT&T Labs.
1 An infrastructure for context-awareness based on first order logic 송지수 ISI LAB.
Path Planning Based on Ant Colony Algorithm and Distributed Local Navigation for Multi-Robot Systems International Conference on Mechatronics and Automation.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
DISC Behavior Profile Module 00-2 Modified: 9/20/2018.
Markov Decision Processes
Introduction Artificial Intelligent.
Markov Decision Processes
Announcements Homework 3 due today (grace period through Friday)
Artificial Intelligence Lecture 2: Foundation of Artificial Intelligence By: Nur Uddin, Ph.D.
Dr. Unnikrishnan P.C. Professor, EEE
CS 416 Artificial Intelligence
Reinforcement Learning Dealing with Partial Observability
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Morteza Kheirkhah University College London
Presentation transcript:

Mind is About Predictions Rich Sutton AT&T Labs with special thanks to Michael Littman, Doina Precup, Satinder Singh, David McAllester

Mind is About Predictions Hypothesis: Knowledge is predictive About what-leads-to-what, under what ways of behaving What will I see if I go around the corner? Objects: What will I see if I turn this over? Active vision: What will I see if I look at my hand? Value functions: What is the most reward I know how to get? Such knowledge is learnable, chainable Hypothesis: Mental activity is working with predictions Learning them Combining them to produce new predictions (reasoning) Converting them to action (planning, reinforcement learning) Figuring out which are most useful

Philosophical and Psychological Roots Like classical british empiricism (1650–1800) –Knowledge is about experience –Experience is central But not anti-nativist (evolutionary experience) Emphasizing sequential rather than simultaneous events –Replace association/contiguity with prediction/contingency Close to Tolman’s “Expectancy Theory” (1932–1950) –Cognitive maps, vicarious trial and error Psychology struggled to make it a science (1890–1950) –Introspection –Behaviorism, operational definitions –Objectivity

Modern Computional View of Mind OK to talk about insides of minds Ok to talk about the function and purpose of a design We talk about Why –Why a system works –Why it should compute X and in manner Y –Why such a system should achieve purpose Z This is new, and resolves classical struggles –Servo-mechanisms, state-transition probabilities –Utility and decision theory –Information as signal – subjective (private) yet clear Purpose defines and constrains mental constructs

Informational View of Mind Mind does information processing Mind exchanges information with the world Only experience is known for sure –Anything more public or “objective” is suspect World is an I-O entity, a black box Although we often seem to talk about what is inside, All we can sensibly talk about is I-O behavior This “interactionist stance” seems to follow from IVoM MindWorld experience

Is Mind about Predictions? OR Is Mind about Action (or Policies)? Of course it is ultimately about action But action generation methods are relatively clear –Value functions and decision theory Pick action that maximizes expected cumulative reward –OR Policy gradient RL methods Execution-time search Reflexes and behavior-based robotics Learning-extended reflexes and conditioning Flexible cognition requires more than action generation Most mental activity is working with predictions

An old, simple, appealing idea Mind as prediction engine! Predictions are learnable, combinable They represent cause and effect, and can be pieced together to yield plans Perhaps this old idea is essentially correct. Just needs –Development, revitalization in modern forms –Greater precision, formalization, mathematics –The computational perspective to make it respectable –Imagination, determination, patience Not rushing to performance Not building in ungrounded world knowledge

Topics Super-Predictions Combining Predictions (reasoning and planning) Predictions and State

Experience 1-step Prediction stateaction XY a k-step Prediction XY  In general, predictions depend on actions, on policies And there is a huge space of policies…can be closed loop The Simplest Predictions

Simple Mixture Predictions Where will I be in 10–20 steps? Where will I be in roughly 10 steps? now 10 steps 20 steps 10 steps Arbitrary termination profiles are possible Closed-loop termination: Terminate depending on what happens Where will I be when X happens? short term medium term long term

Closed-loop termination loosens the time-specificity of predictions Instead of “what will I see at t +100?” Can say “what will I see when I open the box?” Will we elect a black or a woman president first? Where will the tennis ball be when it reaches me? What time will it be when the talk starts? or “when John arrives?” “when the bus comes?” “when I get to the store?” A substantial increase in expressiveness

Super-Predictions Closed-loop terminations And Closed-loop policies Correspond to arbitrary experiments and the results of those experiments What will I see if I go into the next room? What time will it be when the talk is over? Is there a dollar in the wallet in my pocket? Where is my car parked? Can I throw the ball into the basket? Is this a chair situation? What will I see if I turn this object around?

Anatomy of a Super-Prediction 1 Predictor Recognizes the conditions, makes the prediction 2 Experiment - policy - termination condition - measurement function(s) 3 Goal A function of the anticipated measurement to be maximized by choice of policy and termination

Example: Open-the-door Predictor Use visual input to estimate –Probabilities of succeeding in opening the door, and of other outcomes (door locked, no handle, no real door) –expected cumulative cost (sub-par reward) in trying Experiment –Policy for walking up to the door, shaping grasp of handle, turning, pulling, and opening the door –Terminate on successful opening or various failure conditions –Measure outcome and cumulative cost Goal –Sum of expected cost and expected value of outcome –Can be used to define experiment’s policy and termination

RoboCup-Soccer Example Safe to pass? Predict the outcome of choosing to pass The pass will take several steps to set up – choosing to pass involves a whole action policy You may choose to not to pass half way through Terminations and outcomes: – pass is aborted – opponents touch the ball before teammate – teamate touches first, appears to control ball – ball goes out of bounds

Example: Pass-to-Teammate Predictor uses perceived positions of ball, opponents, etc. to estimate probabilities of –Successful pass, openness of receiver –Interception –Reception failure –Aborted pass, in trouble –Aborted pass, something better to do –Loss of time Experiment –Policy for maneuvering ball, or around ball, to set up and pass –Termination strategy for aborting, recognizing completion –Measurement of outcome, time Goal –Some combination of outcome values, time, openness of rec.

Topics Super-Predictions Combining Predictions (reasoning and planning) Predictions and State

Combining Predictions I: Composition If the mind is about predictions, Then thinking is combining predictions to produce new ones X Y Y Z X Z Here each prediction is assumed to predict A transient measurement (e.g., elapsed time, cumulative reward) A final measurement (e.g., partial distribution of outcome states) The new prediction does not necessarily have a goal

Combining Predictions I: Composition If the mind is about predictions, Then thinking is combining predictions to produce new ones X Y Y Z X Z  1  1 then if Y  2  2 T 1 .8T 2 Here each prediction is assumed to predict A transient measurement (e.g., elapsed time, cumulative reward) A final measurement (e.g., partial distribution of outcome states) The new prediction does not necessarily have a goal Y’.1 Y’’.1.8 Y’.1 Y’’.1.8

Combining Predictions II: Choice A predictor plus a goal compose to form a value function  we can do all the usual planning backups with p  g X Y g = 5 X Y’ g = 6 In X, for g, is a better Choice than . Store it with g.

Room-to-Room Super-Predictions up down rightleft (to each room's 2 hallways) Fail 33% of the time Sutton, Precup, & Singh, multi-step super-predictions 4 stochastic primitive actions “Options” Precup 2000 Sutton, Precup, & Singh 1999 Predict: Probability of reaching each terminal hallway Goal: minimize # steps + values for target and other outcome hallway Policy Termination hallways Target (goal) hallway

Planning with Super-Predictions (super-predictions)

Topics Super-Predictions Combining Predictions (reasoning and planning) Predictions and State

Predictive State Representations Hypothesis: What we normally think of as state is a set of predictions about outcomes of experiments –Wallet’s contents, John’s location, presence of objects… Problem: So far we have assumed states but really world just gives information, “observations” There are several ways to formalize this problem –Learning deterministic Finite State Automata Rivest & Schapire, 1987 –Adding stochasticity: An alternative to Hidden Markov Models Herbert Jaeger, 1999 –Adding action: An alternative to Partially Observable Markov Decision Processes Littman, Sutton, & Singh 2001

PSR Formalism 1 MindWorld actions observations Experience: Random variables A test is a subsequence, a simple case of an experiment if the actions are done, will the observations occur? The world is defined by the probabilities of each test from the beginning of time: and after a finite history sequence h (formally another test):

PSR Formalism 2 A Predictive State Representation (PSR) is a set of tests whose vector of predictions is sufficient information to predict all tests i.e., whose predictions are a sufficient statistic, a state A linear PSR is a PSR where each f t is linear

Walk/Reset Example Actions: Walk : Take a random step left or right, see 0 Reset: Jump to rightmost state, see 1 if already there Need to remember of Walks since last Reset Probabilities of being rightmost are: … PSR tests: Reset1, Walk0Reset1 1

Walk/Reset Example Start on Right... Walk : Take a random step left or right, see 0 Reset: Jump to rightmost state, see 1 if already there Need to remember of Walks since last Reset Probabilities of being rightmost are: … PSR tests: Reset1, Walk0Reset1 1

Walk/Reset Example Start on Right... After one Walk Walk step left or right, see 0 Reset: Jump to rightmost state, see 1 if already there Need to remember of Walks since last Reset Probabilities of being rightmost are: … PSR tests: Reset1, Walk0Reset1.5 1

Walk/Reset Example Start on Right... After one Walk Walk step left or right, see 0 After two Walks state, see 1 if already there Need to remember of Walks since last Reset Probabilities of being rightmost are: … PSR tests: Reset1, Walk0Reset

PSR Results Exist compact, linear PSRs –# tests ≤ # states in minimal POMDP –# tests ≤ Rivest & Schapire’s Diversity –# tests can be exponentially fewer than diversity and POMDP Compact simulation/update process Construction algorithm from POMDP Learning/discovery algorithms of Rivest and Schapire, and of Jaeger, do not immediately extend to PSRs There are natural EM-like algorithms (current work)

Constructing Linear PSRs from POMDPs Outcome vector u(t): the predictions for test t from all POMDP states. A test t is said to be independent of a set of tests T if it’s outcome vector is linearly independent of T’s o.v.s Accumulate tests whose outcome vectors are independent Search: Start with T={} While some extension aot of t  T independent, add to T Else terminate, return T.

PSR Conclusions A path to exorcizing the assumption of state –Toward the goal of totally data- (experience-) oriented AI The predictive view of state is competitive –Even better (more compact) in some ways –States have data interpretations! –And are thus potentially more learnable, refinable Naturally leads to constructive discovery ideas –Searching for the right tests to understand the world “Tests” generalize naturally to super-predictions

Empiricism MindWorld actions observations  Experience is the data; it is all we really know Experience should be the focus of AI But by and large it is not… even in robotics, Alife, etc. Experience is central —Knowledge is about experience

Mind is About Predictions Hypothesis: Knowledge is predictive About what-leads-to-what, under what ways of behaving Such knowledge is learnable, chainable Hypothesis: Mental activity is working with predictions Learning them Combining them to produce new predictions (reasoning) Converting them to action (planning, reinforcement learning) Figuring out which are most useful Hypothesis: These ideas are newly viable Unfamiliar flexibiliy & expressiveness of “super”-predictions New engineering planning methods DP/RL/Values New state-representation ideas Hypothesis: Predictions are the Coin of the Mental Realm

It’s Hard to Build Large AI Systems Brittleness Unforeseen interactions Scaling Requires too much manual complexity management –people must understand, intervene, patch and tune –like programming Need more autonomy –learning, verification –internal coherence of knowledge and experience

AI Implications of Predictive View An alternative theory of knowledge and thought –Alternative to conventional, symbolic “language of thought” –Alternative to “database” view of knowledge Requires experiments to be in the machine, not just the designer — true grounding Automated complexity management –Should help with brittleness and scaling Could permit AI systems of much greater complexity

Both Predictors and Experiments must be in the Machine “Classical” AI systems omit both! –e.g., “Tweety is a bird”, “John loves Mary” –sometimes called the “symbol grounding problem” Modern AI sytems tend to skimp the experiments –supervised learning, Bayes nets, robotics… It is not OK to leave the experimental definitions to external, human observers –the information is just not in the machine –we don’t understand it; we haven’t done our job! Yet this is such an appealing shortcut that we have almost always done it

More Predictive Knowledge John is in the coffee room My car in is the South parking lot What we know about geography, navigation What we know about how an object looks, rotates What we know about how objects can be used Recognition strategies for objects and letters The portrait of Washington on the dollar in the wallet in my other pants in the laundry, has a mustache on it –Composing experiments creates a productive rep’n language

Relational, Propositional, and Deictic  objects X, If I drop X, then X will be on the floor –Holding object X means predicting certain sensations if, for example, one directs one’s eyes toward one’s hand –Thus, on dropping, the predicted sensations are merely transferred from the looking-at-hand prediction to the looking-at-floor prediction –Such transfer of existing predictions should be a common part of visual knowledge - updated every time the eyes move  X,Y, such that Red(X), Blue(Y), and Above(X,Y) –There is some place I can foveate and see Red –There is some place I can foveate and see Blue –If I foveate first the Red place, “mark” it, then the Blue place, the mark will be Above the fovea (may need to search) –These are typical ideas of modern, active, deictic vision X X

Should All Knowledge be Experiential? Allowing only Predictions in terms of Data? loses Expressiveness –can’t talk about objects, space, people; no “is-a” or “part-of” External (human) coherence –verbal labels, interpretability, explainability, calibration –the “shortcut” of entering knowledge directly into the agent gains The knowledge will have meaning to the machine It can be mechanically learned/verified/extended It will be suited for a general reasoning processes –composition and backup of predictions to yield new predictions

There is value in forcing world knowledge into prediction form We will finally have all the knowledge in the machine –all will be mechanically interpretable –we will finally really understand the knowledge’s meaning –anything else is just an empty shell Agent will be able to learn/verify/extend knowledge –provides an internal coherence for the knowledge –enable building it up from a firm foundation The knowledge will flow immediately into a general reasoning engine –the concatenation of predictions yields new predictions

Conclusions World knowledge must be expressed in terms of the data Such posterior grounding is challenging, –lose expressiveness in the short term –lose external (human) coherence, explainability But can be done step by step, And brings palpable benefits –autonomous learning/verification/extension of knowledge –autonomous complexity management due to internal coherence –knowledge suited to general reasoning process We must provide this grounding!