A Value-Driven Architecture for Intelligent Behavior

Slides:



Advertisements
Similar presentations
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Advertisements

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley Arizona State University and Institute for the Study of Learning and Expertise Expertise, Transfer, and Innovation in.
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California Elena Messina.
Pat Langley Seth Rogers Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California USA.
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, CA Cumulative Learning of Relational and Hierarchical Skills.
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona USA Varieties of Problem Solving in a Unified Cognitive Architecture.
Pat Langley Dan Shapiro Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona USA Extending the I CARUS Cognitive Architecture Thanks to D. Choi,
Pat Langley Dongkyu Choi Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California USA.
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona USA Mental Simulation and Learning in the I CARUS Architecture.
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona USA Modeling Social Cognition in a Unified Cognitive Architecture.
Information Processing Technology Office Learning Workshop April 12, 2004 Seedling Overview Learning Hierarchical Reactive Skills from Reasoning and Experience.
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California USA
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California USA
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona A Cognitive Architecture for Integrated.
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona USA A Unified Cognitive Architecture for Embodied Agents Thanks.
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona USA Cognitive Architectures and Virtual Intelligent Agents Thanks.
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California A Cognitive Architecture for Complex Learning.
ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
Intelligent Architectures for Electronic Commerce Part 1.5: Symbolic Reasoning Agents.
The Logic of Intelligence Pei Wang Department of Computer and Information Sciences Temple University.
Expert System Human expert level performance Limited application area Large component of task specific knowledge Knowledge based system Task specific knowledge.
Reinforcement Learning Rafy Michaeli Assaf Naor Supervisor: Yaakov Engel Visit project’s home page at: FOR.
The Importance of Architecture for Achieving Human-level AI John Laird University of Michigan June 17, th Soar Workshop
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 7: Expert Systems and Artificial Intelligence Decision Support.
Models of Human Performance Dr. Chris Baber. 2 Objectives Introduce theory-based models for predicting human performance Introduce competence-based models.
Introduction to Affect and Cognition Copyright © 2005 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 3.
Meaningful Learning in an Information Age
Reinforcement Learning and Soar Shelley Nason. Reinforcement Learning Reinforcement learning: Learning how to act so as to maximize the expected cumulative.
Cognitive level of Analysis
Long-Term Memory Ch. 3 Review A Framework Types of Memory stores Building Blocks of Cognition Evolving Models Levels of Processing.
Bob Marinier Advisor: John Laird Functional Contributions of Emotion to Artificial Intelligence.
Some Thoughts to Consider 8 How difficult is it to get a group of people, or a group of companies, or a group of nations to agree on a particular ontology?
Review of Parnas’ Criteria for Decomposing Systems into Modules Zheng Wang, Yuan Zhang Michigan State University 04/19/2002.
How conscious experience and working memory interact Bernard J. Baars and Stan Franklin Soft Computing Laboratory 김 희 택 TRENDS in Cognitive Sciences vol.
Cognitive Architectures For Physical Agents Sara Bolduc Smith College CSC 290.
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
Cognitive Architectures and General Intelligent Systems Pay Langley 2006 Presentation : Suwang Jang.
CognitiveViews of Learning Chapter 7. Overview n n The Cognitive Perspective n n Information Processing n n Metacognition n n Becoming Knowledgeable.
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, CA
From NARS to a Thinking Machine Pei Wang Temple University.
COGNITIVE LEVEL OF ANALYSIS An Introduction. Cognitive Psychology studies: how the human mind comes to know things about the world AND how the mind uses.
Done by Fazlun Satya Saradhi. INTRODUCTION The main concept is to use different types of agent models which would help create a better dynamic and adaptive.
Cognitive Level of Analysis
Attitudes and Intentions
Knowledge Representation and Reasoning
Design Patterns: MORE Examples
Training for Master Trainers: Learning Engagement & Motivation
Learning Fast and Slow John E. Laird
Learning Teleoreactive Logic Programs by Observation
Chapter 7 Psychology: Memory.
Knowledge Representation and Reasoning
HNDBM – 6. Perception & Individual Decision Making
Today: Classic & AI Control Wednesday: Image Processing/Vision
Intelligent Agents Chapter 2.
IN THE NAME OF “ALLAH” THE MOST BENIFICENT AND THE MOST MERCIFUL
Cognitive level of analysis
Announcements Homework 3 due today (grace period through Friday)
CSc4730/6730 Scientific Visualization
Bryan Stearns University of Michigan Soar Workshop - May 2018
Symbolic cognitive architectures
A Simulator to Study Virtual Memory Manager Behavior
Gestalt Theory.
Presented By: Darlene Banta
The Cognitive Level of Analysis
Why are we all so bad at shopping?
NARS an Artificial General Intelligence Project
LEARNER-CENTERED PSYCHOLOGICAL PRINCIPLES. The American Psychological Association put together the Leaner-Centered Psychological Principles. These psychological.
Presentation transcript:

A Value-Driven Architecture for Intelligent Behavior Dan Shapiro Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California http://cll.stanford.edu/ This research was supported in part by Grant NCC-2-1220 from NASA Ames Research Center. Thanks to Meg Aycinena, Michael Siliski, Stephanie Sage, and David Nicholas.

Assumptions about Cognitive Architectures We should move beyond isolated phenomena and capabilities to develop complete intelligent agents. AI and cognitive psychology are close allies with distinct but related goals. A cognitive architecture specifies the infrastructure that holds constant over domains, as opposed to knowledge, which varies. We should model behavior at the level of functional structures and processes, not the knowledge or implementation levels. A cognitive architecture should commit to representations and organizations of knowledge and processes that operate on them. An architecture should come with a programming language for encoding knowledge and constructing intelligent systems. An architecture should demonstrate generality and flexibility rather than success on a single application domain.

Examples of Cognitive Architectures Some f the cognitive architectures produced over 30 years include: ACTE through ACT-R (Anderson, 1976; Anderson, 1993) Soar (Laird, Rosenbloom, & Newell, 1984; Newell, 1990) Prodigy (Minton & Carbonell., 1986; Veloso et al., 1995) PRS (Georgeff & Lansky, 1987) 3T (Gat, 1991; Bonasso et al., 1997) EPIC (Kieras & Meyer, 1997) APEX (Freed et al., 1998) However, these systems cover only a small region of the space of possible architectures.

Goals of the ICARUS Project We are developing the ICARUS architecture to support effective construction of intelligent autonomous agents that: integrate perception and action with cognition combine symbolic structures with affective values unify reactive behavior with deliberative problem solving learn from experience but benefit from domain knowledge In this talk, we report on our recent progress toward these goals.

Design Principles for ICARUS Our designs for ICARUS have been guided by five principles: 1. Affective values pervade intelligent behavior; 2. Categorization has primacy over execution; 3. Execution has primacy over problem solving; 4. Tasks and intentions have internal origins; and 5. The agent’s reward is determined internally. These ideas distinguish ICARUS from most agent architectures.

A Cognitive Task for Physical Agents (car self) (#back self 225) (#front self 235) (#speed self 60) (car brown-1) (#back brown-1 208) (#front brown-1 218) (#speed brown-1 65) (car green-2) (#back green-2 251) (#front green-2 261) (#speed green-2 72) (car orange-3) (#back orange-3 239) (#front orange-3 249) (#speed orange-3 56) (in-lane self B) (in-lane brown-1 A) (in-lane green-2 A) (in-lane orange-3 C)

Overview of the ICARUS Architecture* Perceptual Buffer Long-Term Conceptual Memory Categorize / Compute Reward Short-Term Conceptual Memory Perceive Environment Select / Execute Skill Instance Environment Nominate Skill Instance Long-Term Skill Memory Short-Term Skill Memory Abandon Skill Instance Repair Skill Instance * without learning

Some Motivational Terminology ICARUS relies on three quantitative measures related to motivation: Reward  the affective value produced on the current cycle. Past reward  the discounted sum of previous agent reward. Expected reward  the predicted discounted future reward. These let an ICARUS agent make decisions that take into account its past, present, and future affective responses.

Long-Term Conceptual Memory ICARUS includes a long-term conceptual memory that contains: Boolean concepts that are either True or False; numeric concepts that have quantitative measures. These concepts may be either: primitive (corresponding to the results of sensory actions); defined as a conjunction of other concepts and predicates. Each Boolean concept includes an associated reward function. * Icarus’ concept memory is distinct from, and more basic than, skill memory, and provides the ultimate source of motivation.

Examples of Long-Term Concepts (ahead-of (?car1 ?car2) (coming-from-behind (?car1 ?car2) :defn (car ?car1) (car ?car2) :defn (car ?car1) (car ?car2) (#back ?car1 ?back1) (in-lane ?car1 ?lane1) (#front ?car2 ?front2) (in-lane ?car2 ?lane2) (> ?back1 ?front2) (adjacent ?lane1 ?lane2) :reward (#dist-ahead ?car1 ?car2 ?d) (faster-than ?car1 ?car2) (#speed ?car2 ?s) (ahead-of ?car2 ?car1) ) :weights (5.6 3.1) ) :reward (#dist-behind ?car1 ?car2 ?d) (#speed ?car2 ?s) :weights (4.8 2.7) ) (clear-for (?lane ?car) :defn (lane ?lane ?left-line ?right-lane) (not (overlaps-and-adjacent ?car ?other)) (not (coming-from-behind ?car ?other)) (not (coming-from-behind ?other ?car)) :constant 10.0 )

A Sample Conceptual Hierarchy lane adjacent faster-than #speed clear-for coming-from-behind ahead-of #yback car overlaps-and-adjacent overlaps #yfront in-lane

Long-Term Skill Memory ICARUS includes a long-term skill memory in which skills contain: an :objective field that encodes the skill’s desired situation; a :start field that must hold for the skill to be initiated; a :requires field that must hold throughout the skill’s execution; an :ordered or :unordered field referring to subskills or actions; a :values field with numeric concepts to predict expected value; a :weights field indicating the weight on each numeric concept. These fields refer to terms stored in conceptual long-term memory. * Icarus’ skill memory encodes knowledge about how and why to act in the world, not about how to solve problems.

Examples of Long-Term Skills (pass (?car1 ?car2 ?lane) (change-lanes (?car ?from ?to) :start (ahead-of ?car2 ?car1) :start (in-lane ?car ?from) (in-same-lane ?car1 ?car2) :objective (in-lane ?car ?to) :objective (ahead-of ?car1 ?car2) :requires (lane ?from ?shared ?right) (in-same-lane ?car1 ?car2) (lane ?to ?left ?shared) :requires (in-lane ?car2 ?lane) (clear-for ?to ?car) (adjacent ?lane ?to) :ordered (*shift-left) :ordered (speed&change ?car1 ?car2 ?lane ?to) :constant 0.0 ) (overtake ?car1 ?car2 ?lane) (change-lanes ?car1 ?to ?lane)) :values (#distance-ahead ?car1 ?car2 ?d) (#speed ?car2 ?s) :weights (0.26 0.17) )

ICARUS’ Short-Term Memories Besides long-term memories, ICARUS stores dynamic structures in: a perceptual buffer with primitive Boolean and numeric concepts (car car-06), (in-lane car-06 lane-a), (#speed car-06 37) a short-term conceptual memory with matched concept instances (ahead-of car-06 self), (faster-than car-06 self), (clear-for lane-a self) a short-term skill memory with instances of skills that the agent intends to execute (speed-up-faster-than self car-06), (change-lanes lane-a lane-b) These encode temporary beliefs, intended actions, and their values. * Icarus’ short-term memories store specific, value-laden instances of long-term concepts and skills.

Categorization and Reward in ICARUS Perceptual Buffer Long-Term Conceptual Memory Categorize / Compute Reward Short-Term Conceptual Memory Perceive Environment Environment Categorization occurs in an automatic, bottom-up manner. A reward is calculated for every matched Boolean concept. This reward is a linear function of associated numeric concepts. Total reward is the sum of rewards for all matched concepts. * Categorization and reward calculation are inextricably linked.

Skill Nomination and Abandonment ICARUS adds skill instances to short-term skill memory that: refer to concept instances in short-term conceptual memory; have expected reward > agent’s discounted past reward . ICARUS removes a skill when its expected reward << past reward. Long-Term Skill Memory Short-Term Nominate Skill Instance Abandon Conceptual Memory * Nomination and abandonment create highly autonomous behavior that is motivated by the agent’s internal reward.

Skill Selection and Execution Perceptual Buffer On each cycle, ICARUS executes the skill with highest expected reward. Selection invokes deep evaluation to find the action with the highest expected reward. Execution causes action, including sensing, which alters memory. Short-Term Conceptual Memory Perceive Environment Select / Execute Skill Instance Environment Short-Term Skill Memory * ICARUS makes value-based choices among skills, and among the alternative subskills and actions in each skill.

ICARUS’ Interpreter for Skill Execution Objectives Subskill Start Requires (speed&change (?car1 ?car2 ?from ?to) :start (ahead-of ?car1 ?car2) (same-lane ?car1 ?car2) :objective (faster-than ?car1 ?car2) (different-lane ?car1 ?car2) :requires (in-lane ?car2 ?from) (adjacent ?from ?to) :unordered (*accelerate) (change-lanes ?car1 ?from ?to) ) Given Start: If not (Objectives) and Requires, then - choose among unordered Subskills - consider ordered Subskills * ICARUS skills have hierarchical structure, and the interpreter uses a reactive control loop to identify the most valuable action.

Cognitive Repair of Skills ICARUS seeks to repair skills whose requirements do not hold by: finding concepts that, if true, would let execution continue; selecting the concept that is most important to repair; and nominating a skill with objectives that include the concept. Repair takes one cycle and adds at most one skill instance to memory. * This backward chaining is similar to means-ends analysis, but it supports execution rather than planning. Long-Term Skill Memory Short-Term Skill Memory Repair Skill Instance

Revising Expected Reward Functions ICARUS uses a hierarchical variant of Q learning to revise estimated reward functions based on internally computed rewards: pass pass R(t) change-lanes speed&change *shift-left *accelerate Update Q(S) =  •  with R(t), Q(s´) * This method learns 100 times faster than nonhierarchical ones.

Intellectual Precursors Our work on ICARUS has been influenced by many previous efforts: earlier research on integrated cognitive architectures especially influenced by ACT, Soar, and Prodigy earlier work on architectures for reactive control especially universal plans and teleoreactive programs research on learning value functions from delayed reward especially hierarchical approaches to Q learning decision theory and decision analysis previous versions of ICARUS (going back to 1988). However, ICARUS combines and extends ideas from its various predecessors in novel ways.

Directions for Future Research Future work on ICARUS should introduce additional methods for: forward chaining and mental simulation of skills; allocation of scarce resources and selective attention; probabilistic encoding and matching of Boolean concepts; flexible recognition of skills executed by other agents; caching of repairs to extend the skill hierarchy; revision of internal reward functions for concepts; and extension of short-term memory to store episodic traces. Taken together, these features should make ICARUS a more general and powerful architecture for constructing intelligent agents.

Concluding Remarks ICARUS is a novel integrated architecture for intelligent agents that: includes separate memories for concepts and skills; organizes concepts and skills in a hierarchical manner; associates affective values with all cognitive structures; calculates these affective values internally; combines reactive execution with cognitive repair; and uses expected values to nominate tasks and abandon them. This constellation of concerns distinguishes ICARUS from other research on integrated architectures.

Estimation of Expected Reward Estimates are linear functions of numeric features in the :values field of a skill. These features are inherited down the calling tree from skills to subskills. pass speed&change *accelerate #speed #dist-ahead #speed, #dist-ahead, #room #room, #own-speed * The expected reward for an ICARUS skill depends on the agent’s intent and responds to its current beliefs.

A Cognitive Task for Physical Agents

Examples of Long-Term Concepts (ahead-of (?car1 ?car2) (overlaps-and-adjacent (?car1 ?car2) :defn (car ?car1)(car ?car2) :defn (car ?car1) (car ?car2) (#yback ?car1 ?back1) (in-lane ?car1 ?lane1) (#yfront ?car2 ?front2) (in-lane ?car2 ?lane2) (> ?back1 ?front2) (adjacent ?lane1 ?lane2) :reward (#distance-ahead ?car1 ?car2 ?d) (overlaps ?car1 ?car2) ) (#speed ?car2 ?s) :weights (5.6 3.1) ) (overlaps (?car1 ?car2) (coming-from-behind (?car1 ?car2) (#yfront ?car1 ?front1) (in-lane ?car1 ?lane1) (#yback ?car1 ?back1) (in-lane ?car2 ?lane2) (#yfront ?car2 ?front2) (adjacent ?lane1 ?lane2) (> ?front1 ?front2) (faster-than ?car1 ?car2) (> ?front2 ?back1) ) (ahead-of ?car2 ?car1) ) (faster-than (?car1 ?car2) (clear-for (?lane ?car) :defn (#speed ?car1 ?s1) :defn (lane ?lane ?left-line ?right-lane) (#speed ?car2 ?s2) (not (overlaps-and-adjacent ?car ?other)) (> ?s1 (+ ?s2 5)) ) (not (coming-from-behind ?car ?other)) (not (coming-from-behind ?other ?car)) )

Examples of Long-Term Skills (pass (?car1 ?car2 ?lane) (speed&change (?car1 ?car2 ?from ?to) :start (ahead-of ?car2 ?car1) :start (ahead-of ?car1 ?car2) (in-same-lane ?car1 ?car2) (same-lane ?car1 ?car2) :objective (ahead-of ?car1 ?car2) :objective (faster-than ?car1 ?car2) (in-same-lane ?car1 ?car2) (different-lane ?car1 ?car2) :requires (in-lane ?car2 ?lane) :requires (in-lane ?car2 ?from) (adjacent ?lane ?to) (adjacent ?from ?to) :ordered (speed&change ?car1 ?car2 ?lane ?to) :unordered (speed-up-faster ?car1 ?car2) (overtake ?car1 ?car2 ?lane) (change-lanes ?car1 ?from ?to) (change-lanes ?car1 ?to ?lane)) :values (#distance-ahead ?car1 ?car2 ?d) (#speed ?car2 ?s) :weights (0.26 0.17) ) (change-lanes (?car ?from ?to) (speed-up-faster (?car1 ?car2) :start (in-lane ?car ?from) :start (faster-than ?car2 ?car1) :objective (in-lane ?car ?to) :objective (faster-than ?car1 ?car2) :requires (lane ?from ?shared-line ?right-line) :requires ( ) (lane ?to ?left-line ?shared-line) :ordered (*accelerate)) (clear-for ?to ?car) :ordered (*shift-left))

A Long-Term Skill Hierarchy speed&change speed-up-faster *accelerate pass overtake *wait *shift-right *shift-left change-lanes

Categorization in ICARUS On each cycle, ICARUS’ conceptual recognition module: Applies preattentive sensors to refresh contents of the perceptual buffer Ships added, removed, and changed primitive concepts to hierarchy Propagates literals to add, remove, and update instances of defined concepts Adds, removes, and updates elements in short-term conceptual memory This bottom-up process is akin to matching in Rete networks and to inference in truth maintenance systems.

Calculation of Internal Reward For each instance I of a matched concept C in long-term memory: Access each numeric concept N mentioned in C’s :reward field Find the corresponding concept instance J in short-term memory Multiply Q by weight W associated with N in the :weights field Extract the quantity Q associated with the concept instance J Add the products for each numeric concept to compute reward for I ICARUS sums the contributions from each matched Boolean concept to obtain the overall reward for the current cycle.

Nomination and Selection of Skills On every cycle, ICARUS nominates a new task that it will pursue by: Accessing all skills that refer to concepts in short-term memory Generating instantiated candidates from variables bound in concepts Estimating the reward that each skill instance would produce if executed Adding best candidate to skill STM only if value more than past discounted reward ICARUS selects the skill instance with the highest value for execution.

Calculation of Expected Reward For an instance I of skill S, ICARUS computes the expected affective value of executing I by either: Using the :values and :weights fields associated with that skill in the same way it computes reward for a concept instance; or accessing the :values and :weights fields associated with each executable subskill and returning the highest result. ICARUS uses the first process when selecting among skills and the second when determining how to execute a selected skill. Thus, skill selection relies on skills’ summary statistics, whereas execution has access to their detailed structures.

Repair of Selected Skills When ICARUS cannot execute a selected skill, the repair process: Finds unmatched concepts that, if matched, would let execution continue Computes expected reward for each and selects concept C with the highest value Accesses every skill that includes concept C in its :objective field Computes expected reward for instances of these relevant skills Selects the skill instance with highest value and adds it to skill STM This backward-chaining process is similar to means-ends analysis, but it supports execution rather than planning.

Abandoning Nominated Skills Besides nominating skills, ICARUS can decide to abandon them by: Updating the discounted average RA of the past overall rewards Computing the expected reward VI for each skill instance I in short-term memory Retaining the skill instance I only if VI is greater than 0.2  RA Thus, the agent gives up on pursuing a task when its expected value drops too low compared to recent experience.

Learning Hierarchical Control Policies reward streams learned value functions (pass (?x) :start (behind ?x)(same-lane ?x) :objective (ahead ?x)(same-lane ?x) :requires (lane ?x ?l) :components ((speed-up-faster-than ?x) (change-lanes ?l ?k) (overtake ?x) (change-lanes ?k ?l))) (speed-up-faster-than (?x) :start (slower-than ?x) :objective (faster-than ?x) :requires ( ) :components ((accelerate))) (change-lanes (?l ?k) :start (lane self ?l) :objective (lane self ?k) :requires (left-of ?k ?l) :components ((shift-left))) (overtake (?x) :start (behind ?x)(different-lane ?x) :objective (ahead ?x) :requires (different-lane ?x)(faster-than ?x) Induction (pass (?x) :start (behind ?x)(same-lane ?x) :objective (ahead ?x)(same-lane ?x) :requires (lane ?x ?l) :components ((speed-up-faster-than ?x) (change-lanes ?l ?k) (overtake ?x) (change-lanes ?k ?l))) (speed-up-faster-than (?x) :start (slower-than ?x) :objective (faster-than ?x) :requires ( ) :components ((accelerate))) (change-lanes (?l ?k) :start (lane self ?l) :objective (lane self ?k) :requires (left-of ?k ?l) :components ((shift-left))) (overtake (?x) :start (behind ?x)(different-lane ?x) :objective (ahead ?x) :requires (different-lane ?x)(faster-than ?x) hierarchical skills