The Metacognitive Loop and the Problem of Brittleness Michael L. Anderson University of Maryland Joint work with Don Perlis, Tim Oates, John Grant, Ken Hennacy, Darsana Josyula, Yuan Chong and Walid Gomaa
Perturbation Tolerance : A Goal for Intelligent Systems A perturbation is any change, whether in the world or in the system itself, that impacts performance. Perturbation tolerance is the ability of a system to quickly recover from perturbations. Perturbation intolerance has long been a major issue for intelligent systems. The roots of the problem: self-ignorance and brittleness.
Self-Ignorance A typical AI system has no notion of what it is, or what it is doing, let alone what it should be doing, or strive to be So why does it surprise us when systems fail to do what they ought, and instead blindly follow their programming over the metaphorical (or literal) cliff? DARPA grand challenge vehicle; satellite
Brittleness Self-awareness is of limited usefulness without a capacity for self-alteration. A perturbation-tolerant system should not only notice when it isn't behaving how it ought or achieving what it should, but be able to use this knowledge to change the way it operates.
The Metacognitive Loop Our approach to this very general problem has been to equip artificial agents with the ability to notice when something is amiss, assess the anomaly, and guide a solution into place. Because this basic strategy involves monitoring, reasoning about, and perhaps even altering one’s own decision-making components, it is a metacognitive strategy, and we call the basic Note-Assess-Guide process the Metacognitive Loop (MCL).
Self-monitoring Self-monitoring for anomalies, assessing, and responding to those anomalies is a better, more efficient, and ultimately more effective approach to perturbation tolerance than is doing nothing, on the one hand, or trying to continually monitor and model the world, on the other. Why?
Self-monitoring (2) The world is huge; the system is small. If the world changes, but this change does not affect performance, who cares? Anomalies can help focus attention on which parts of the world need (re-)modeling, maiking modeling more tractable.
Learning We believe that efforts should be aimed at implementing mechanisms that help systems help themselves. The goal should be to increase their agency and freedom of action in responding to problems, instead of limiting it and hoping that circumstances do not stray from the anticipations of the system designer. Why?
Learning (2) Primarily because we don’t think system designers are smart enough to anticipate every eventuality. But also because we think that self-aware, self- guided learning is the foundation of autonomy. Metacognitive learners would be advanced active learners, able to decide what, when, and how to learn (and when to stop).
Applications In our ongoing work, we have found that including an MCL component can enhance the performance of—and speed learning in—different types of systems, including reinforcement learners, natural language human-computer interfaces, commonsense reasoners, deadline-coupled planning systems, robot navigation, and, more generally, repairing arbitrary direct contradictions in a knowledge base
MCL Application 1: Active Logic Active Logic (AL) is a time-sensitive, contradiction- tolerant logical formalism for use by autonomous cognitive agents. Central to AL are special rules controlling the inheritance of beliefs in general, and beliefs about the current time in particular, very tight controls on what can be derived from direct contradictions (P & ¬P), and mechanisms allowing an agent to represent and reason about its own beliefs and past reasoning.
MCL Application 1: Active Logic t: Now(t) t+1: Now(t+1) t: P, ¬P t+1: Contra(t, P, ¬P)
MCL Application 1: Active Logic Essentially, AL continually watches the KB for anomalies, in the form of contradictions. When a contradiction is noticed, the system can begin reasoning to deal with the contradiction, including disinheriting premises, looking for more information, etc. AL has been used in several applications.
MCL Application 1: Active Logic We have been making progress on a semantics for AL, that tries to do justice to the fact that real agents Exist in time Have a constantly evolving KB, all the consequences of which they do not yet know Inevitably face contradictions The trouble is, when one has a contradictory KB, it cannot be modeled in the classical sense.
MCL Application 1: Active Logic To see what sort of model makes sense here, we ask: “What must the world seem like to the agent”, instead of: “What must the world be like if the KB were true”: If the KB contains only [P, P Q, ¬Q], the agent has not yet noticed that this is contradictory. The agent knows that P, and knows P implies something, but does not know what it implies. Thus, the the Q in “¬Q” and the Q in “P Q”, are not (yet) seen as the “same” formula. We say they are “superscripted”: P 1, P 1 Q 1, ¬Q 2
MCL Application 1: Active Logic We have worked out a definition of model based on these ideas that allows us to define a relevant notion of soundness, such that: When reasoning with consistent premises, all classically sound rules are sound for active logic. However, not everything that is classically sound remains sound in our sense, for by classical definitions, all rules with contradictory premises are vacuously sound, whereas in active logic not everything follows from a contradiction.
MCL Application 2: ALFRED ALFRED is a domain-independent natural-language based HCI system. It is built using active logic. ALFRED represents its beliefs, desires, intentions and expectations, and the status of each. It tracks the history of its own reasoning. If ALFRED is unable to achieve something, something is taking too long, or an expectation is not met, it assesses this problem, and takes one of several corrective actions, such as trying to learn or correcting an error in reasoning.
MCL Application 2: ALFRED Example 1: User : Send the Boston train to Atlanta. Alfred: OK. [ALFRED chooses a train (train1) in Boston and sends it to Atlanta] User : No, send the Boston train to Atlanta. Alfred: OK. [ALFRED recalls train1, but also notices an apparent contradiction: don’t send train1, do send train1. ALFRED considers possible causes of this contradiction, and decides the problem is his faulty interpretation of “the Boston train” as train1. He chooses train2, also at Boston, and sends it to Atlanta]
MCL Application 2: ALFRED Example 2: User: Shomax “Troy” Alfred: I do not know the command “shomax”. What does “shomax” mean? User: Shomax means play the movie and maximize the screen. Alfred: OK. [shows “Troy” in fullscreen mode.] ** Alfred gets user feedback to learn the new command Shomax **
MCL Application 2: ALFRED Example 3: User : (6:00 am) Send B to B Alfred: I do not know the train B. User : B is Bullet Alfred: I do not know the city B. User : B is Baltimore Alfred: OK. [sends the instruction “send Bullet to Baltimore” to the TOS] User : (6:01 am) Send B to Richmond Alfred: OK. [sends the instruction “send Bullet to Richmond” to the TOS]
MCL Application 3: Navigation Robby is a simulated khepera robot with a hybrid reasoner: a neural net with primary navigational control, and a logical reasoner for self-monitoring. When Robby has a navigational failure (e.g. a collision) the reasoner notices, assesses the failure, and any pattern of failures, and can instruct the net to retrain on a specific set of inputs. Robby exhibits more sensible behavior during training, and learns to navigate more quickly.
MCL Application 4: Learning Chippy is a reinforcement learner (Q-learning, SARSA, and Prioritized Sweeping), who learns an action policy in a reward-yielding state space. He maintains expectations for rewards, and monitors his performance (average reward, average time between rewards). If his experience deviates from his expectations (a performance anomaly that we cause by changing the state space) he assesses the anomaly and chooses from a range of responses.
Comparison of the per-turn performance of non-MCL and simple- MCL with a degree 8 perturbation from [10,-10] to [-10,10] in turn 10,001.
Overall Average Post-perturbation Performance non-MCL0.530 simple-MCL0.545 sensitive-MCL0.546 sophisticated-MCL0.567 MCL Application 4: Learning
Future Work: Bolo Future work will focus on building systems with robust MCL in more sophisticated, dynamic environments. Possible applications include: Autonomous search-and-rescue or supply vehicles Decision-support reasoning systems Multiple-domain human-computer interfaces
Future Work: Bolo Bolo is a tank game. It’s really hard. For a first step, we will be implementing a search- and-rescue scenario within Bolo. The tank will have to find all the pillboxes and bring them to a safe location. However, it will encounter unexpected perturbations along the way: moved pillboxes, changed terrain, and shooting pillboxes.
Future Work: Bolo It will use an typical 3-tier architecture: reactive, deliberative, and reflective. However, our middle tier contains (only) flexible, learning components. Trainer Modules Trainable Modules Inference Engine KB Oversight (MCL) Trainer Modules Trainable Modules ??? Traditional and Symbolic
Some Relevant Publications Logic, self-awareness and self-improvement: The metacognitive loop and the problem of brittleness. Michael L. Anderson and Donald R. Perlis. Journal of Logic and Computation, 15(1), The roots of self-awareness. Michael L. Anderson and Don Perlis. Phenomenology and the Cognitive Sciences, 4(3), 2005 (in press). On the reasoning of real-world agents: Toward a semantics for active logic. Michael L. Anderson Walid Gomaa, John Grant and Don Perlis. Proceedings of the 7th Annual Symposium on the Logical Formalization of Commonseense Reasoning, Dresden University Technical Report (ISSN X), 2005.