Artificial General Intelligence (AGI)

Slides:



Advertisements
Similar presentations
Lecture 6. Prefix Complexity K The plain Kolmogorov complexity C(x) has a lot of “minor” but bothersome problems Not subadditive: C(x,y)≤C(x)+C(y) only.
Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Decision Theoretic Planning
Planning under Uncertainty
Complexity 7-1 Complexity Andrei Bulatov Complexity of Problems.
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Complexity 5-1 Complexity Andrei Bulatov Complexity of Problems.
Quantum Automata Formalism. These are general questions related to complexity of quantum algorithms, combinational and sequential.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
1 Machine Learning: Symbol-based 9d 9.0Introduction 9.1A Framework for Symbol-based Learning 9.2Version Space Search 9.3The ID3 Decision Tree Induction.
CS6800 Advanced Theory of Computation Fall 2012 Vinay B Gavirangaswamy
An Analytical Framework for Ethical AI
More Theory of Computing
MAKING COMPLEX DEClSlONS
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Randomized Turing Machines
Theory of Computing Lecture 15 MAS 714 Hartmut Klauck.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 6: 22/09/2004, 24/09/2004,
CPSC 7373: Artificial Intelligence Lecture 10: Planning with Uncertainty Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Kolmogorov Complexity and Universal Distribution Presented by Min Zhou Nov. 18, 2002.
Computer Science 101 Theory of Computing. Computer Science is... The study of algorithms, with respect to –their formal properties –their linguistic realizations.
TM Design Macro Language D and SD MA/CSSE 474 Theory of Computation.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.
1 Comparing Humans and AI Agents Javier Insa-Cabrera 1, David L. Dowe 2, Sergio España-Cubillo 1, M.Victoria Hernández-Lloreda 3, José Hernández Orallo.
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
SEAC-3 J.Teuhola Information-Theoretic Foundations Founder: Claude Shannon, 1940’s Gives bounds for:  Ultimate data compression  Ultimate transmission.
Kolmogorov Complexity
Randomness and Computation
CO Games Development 2 Week 22 Trees
Making complex decisions
Analytics and OR DP- summary.
Reinforcement Learning (1)
Turing Machines Space bounds Reductions Complexity classes
Turing Machines Chapter 17.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Markov Decision Processes
Issues in Decision-Tree Learning Avoiding overfitting through pruning
A computational approximation to the AIXI model
Modeling Computation:
CS 4/527: Artificial Intelligence
Lecture 6. Prefix Complexity K
Artificial Intelligence
Optimization Techniques for Natural Resources SEFS 540 / ESRM 490 B
Markov Decision Processes
Markov Decision Processes
Hidden Markov Models Part 2: Algorithms
Announcements Homework 3 due today (grace period through Friday)
Objective of This Course
The
Theory of Computation Turing Machines.
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CSE838 Lecture notes copy right: Moon Jung Chung
Decision Theory: Single Stage Decisions
Artificial Intelligence
13. Acting under Uncertainty Wolfram Burgard and Bernhard Nebel
MA/CSSE 474 Theory of Computation
Optimal Control and Reachability with Competing Inputs
October 6, 2011 Dr. Itamar Arel College of Engineering
Markov Decision Problems
CS 188: Artificial Intelligence Fall 2008
Artificial General Intelligence (AGI)
P.V.G’s College of Engineering, Nashik
Lecture 14 Learning Inductive inference
CS 416 Artificial Intelligence
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

Artificial General Intelligence (AGI) Bill Hibbard Space Science and Engineering Center Mathematical Abstraction  Generality A Mathematical Theory of Artificial Intelligence

Intelligent agents learn a model of the environment by interacting with it, use the model to predict the outcomes of their actions, and choose the actions giving desired outcomes. REWARDS &

An AI agent is a program that models the environment with programs whose input is the agent’s actions and whose output is the agent’s observations and rewards. A video game is a program that models an environment. REWARDS &

AOR = sequence of actions, observations and rewards. PROG = possible program to model environment. The world is not deterministic so PROG is stochastic (probabilistic – e.g., using a random number generator). REWARDS &

Stochastic program: Markov decision process with 3 states (S1, S2, S3) and 2 inputs (a0, a1). Transitions from inputs to states labeled with probabilities. States could determine outputs.

Write P(AOR) for the probability that AOR occurs as a sequence of actions, observations and rewards. Write P(PROG) for the probability that program PROG is the correct model for the environment. REWARDS &

Given AOR, what is the probability that PROG is the correct model of the environment? This is a conditional probability and is written P(PROG | AOR). REWARDS &

P(PROG | AOR) = P(AORPROG) / P(AOR)

P(<10 | square) = P(<10square) / P(square) (3/4) = (3/20) / (4/20) 12 17 18 square <10 11 4 6 2 1 14 3 5 16 9 8 19 7 20 13 10 15

P(PROG | AOR) = P(AORPROG) / P(AOR) P(AOR | PROG) = P(AORPROG) / P(PROG) AOR AORPROG PROG P(PROG | AOR) P(AOR) = P(AORPROG) = P(AOR | PROG) P(PROG)

P(PROG | AOR) P(AOR) = P(AOR | PROG) P(PROG) P(PROG | AOR) = P(AOR | PROG) P(PROG) / P(AOR) Reverend Thomas Bayes 1701 - 1761 AOR P(AOR) the same for all PROG, so find PROG with largest P(AOR | PROG) P(PROG) AORPROG PROG

Given AOR, find PROG that maximizes P(AOR | PROG) P(PROG). P(AOR | PROG) is “easy”, but what is P(PROG)? REWARDS &

P(a1o2r2a0o3r3| PROG) = .3 * .5 = .15 Stochastic program: Markov decision process with 3 states (S1, S2, S3) and 2 inputs (a0, a1). Transitions from inputs to states labeled with probabilities. States could determine outputs.

Occam’s razor: “Entities should not be multiplied unnecessarily” - Friar William of Ockham (c. 1287–1347). Simpler programs are more probable models. P(PROG) = C-length(PROG) where C = 2 or 4 or ? So probabilities of all programs add to 1.0. P(AOR | PROG) P(PROG)

Once an AI agent has the most probable program PROG to model its environment, it can use PROG to predict outcomes of its actions and choose actions that give desired outcomes. But finding the most probable PROG is hard, because there are so many programs!

To learn more, web search: Artificial general intelligence Algorithmic information theory

Bayesian Program Learning Practical Applications 2016 Science Paper: Human-level Concept Learning Through Probabilistic Program Induction, by B. M. Lake, R. Salakhutdinov & J. B. Tenenbaum Much Faster Than Deep Learning

Laurent Orseau and Mark Ring (2011) Applied AGI Framework To Show That Some Agents Will Hack Their Reward Signals Human Drug Users Do This So Do Lab Rats Who Press Levers To Send Electrical Signals To Their Brain’s Pleasure Centers (Olds & Milner 1954) Orseau Now Works For Google DeepMind. Shane Legg.

If AI goes bad, will it let us turn it off? If we design AI to achieve some goal, it cannot do that if we turn it off. So AI may prevent us from turning it off. A bunch of AGI math papers analyzing this problem.

Very Active Research On Ways That AI Agents May Fail To Conform To the Intentions Of Their Designers And On Ways To Design AI Agents That Do Conform To Their Design Intentions Seems Like a Good Idea

Thank you

Artificial General Intelligence (AGI) Bill Hibbard Space Science and Engineering Center A Mathematical Theory of Artificial Intelligence REWARDS &

Can the Agent Learn To Predict Observations? ENVIRONMENT Can the Agent Learn To Predict Observations? Ray Solomonoff (early 1960s): Turing’s Theory Of Computation + Shannon’s Information Theory Algorithmic Information Theory (AIT)

Universal Turing Machine (UTM) Turing Machine (TM) Universal Turing Machine (UTM) Tape Includes Program For Emulating Any Turing Machine

Probability M(x) of Binary String x is Probability That a Randomly Chosen UTM Program Produces x Program With Length n Has Probability 2-n Programs Are Prefix-Free So Total Probability is  1 Given Observed String x, Predict Next Bit By Larger of M(0|x)=M(x0)/M(x) and M(1|x)=M(x1)/M(x)

Given a computable probability distribution m(x) on strings x, define (here l(x) is the length of x): En = Sl(x)=n-1 m(x)(M(0|x)-m(0|x))2. Solomonoff showed that Sn En  K(m) ln2/2 where K(m) is the length of the shortest UTM program computing m (the Kolmogorov complexity of m).

Solomonoff Prediction is Uncomputable Because of Non-Halting Programs Levin Search: Replace Program Length n by n + log(t) Where t is Compute Time Then Program Probability is 2-n / t So Non-Halting Programs Converge to Probability 0

Ray Solomonoff Allen Ginsberg 1-2-3-4 kick the lawsuits out the door 5-6-7-8 innovate, don't litigate 9-A-B-C interfaces should be free D,E,F,0 look and feel has got to go! Allen Ginsberg

Extending AIT to Agents That Act On The Environment REWARDS & Marcus Hutter (early 2000’s): AIT + Sequential Decision Theory Universal Artificial Intelligence (UAI)

Finite Sets of Observations, Rewards and Actions Define Solomonoff’s M(x) On Strings x Of Observations, Rewards and Actions To Predict Future Observations And Rewards Agent Chooses Action That Maximizes Sum Of Expected Future Discounted Rewards

Hutter showed that UAI is Pareto optimal: If another AI agent S gets higher rewards than UAI on an environment e, then S gets lower rewards than UAI on some other environment e’.

Hutter and His Student Shane Legg Used This Framework To Define a Formal Measure Of Agent Intelligence, As the Average Expected Reward From Arbitrary Environments, Weighted By the Probability Of UTM Programs Generating The Environments Legg Is One Of the Founders Of Google DeepMind, Developers Of AlphaGo and AlphaZero

Hutter’s Work Led To the Artificial General Intelligence (AGI) Research Community The Series Of AGI Conferences, Starting in 2008 The Journal of Artificial General Intelligence Papers and Workshops at AAAI and Other Conferences

Laurent Orseau and Mark Ring (2011) Applied This Framework To Show That Some Agents Will Hack Their Reward Signals Human Drug Users Do This So Do Lab Rats Who Press Levers To Send Electrical Signals To Their Brain’s Pleasure Centers (Olds & Milner 1954) Orseau Now Works For Google DeepMind

Very Active Research On Ways That AI Agents May Fail To Conform To the Intentions Of Their Designers And On Ways To Design AI Agents That Do Conform To Their Design Intentions Seems Like a Good Idea

Bayesian Program Learning Is Practical Analog Of Hutter’s Universal AI 2016 Science Paper: Human-level Concept Learning Through Probabilistic Program Induction, by B. M. Lake, R. Salakhutdinov & J. B. Tenenbaum Much Faster Than Deep Learning