An Analytical Framework for Ethical AI

Slides:

Advertisements

Similar presentations

Lecture 8: Three-Level Architectures CS 344R: Robotics Benjamin Kuipers.

Advertisements

Intelligent Agents Russell and Norvig: 2

CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)

1 Reinforcement Learning Problem Week #3. Figure reproduced from the figure on page 52 in reference [1] 2 Reinforcement Learning Loop state Agent Environment.

MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

CPSC 502, Lecture 11Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 11 Oct, 18, 2011.

Partially Observable Markov Decision Process By Nezih Ergin Özkucur.

COSC 878 Seminar on Large Scale Statistical Machine Learning 1.

Planning under Uncertainty

Decision Theory: Single Stage Decisions Computer Science cpsc322, Lecture 33 (Textbook Chpt 9.2) March, 30, 2009.

CSE 471/598, CBS 598 Intelligent Agents TIP We’re intelligent agents, aren’t we? Fall 2004.

Violence and Social Orders

Reinforcement Learning

Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.

1 Learning from Behavior Performances vs Abstract Behavior Descriptions Tolga Konik University of Michigan.

Reinforcement Learning

Reinforcement Learning, Cont’d Useful refs: Sutton & Barto, Reinforcement Learning: An Introduction, MIT Press 1998.

Planning in MDPs S&B: Sec 3.6; Ch. 4. Administrivia Reminder: Final project proposal due this Friday If you haven’t talked to me yet, you still have the.

CSE 471/598 Intelligent Agents TIP We’re intelligent agents, aren’t we? Spring 2004.

Reinforcement Learning Introduction Presented by Alp Sardağ.

Reinforcement Learning: Learning to get what you want... Sutton & Barto, Reinforcement Learning: An Introduction, MIT Press 1998.

Planning to learn. Progress report Last time: Transition functions & stochastic outcomes Markov chains MDPs defined Today: Exercise completed Value functions.

CS 268: Future Internet Architectures Ion Stoica May 6, 2003.

More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.

The People Have Spoken.... Administrivia Final Project proposal due today Undergrad credit: please see me in office hours Dissertation defense announcements.

Intelligent Agents: an Overview. 2 Definitions Rational behavior: to achieve a goal minimizing the cost and maximizing the satisfaction. Rational agent:

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

MAKING COMPLEX DEClSlONS

1 AI and Agents CS 171/271 (Chapters 1 and 2) Some text and images in these slides were drawn from Russel & Norvig’s published material.

1 Intelligent Systems Q: Where to start? A: At the beginning (1940) by Denis Riordan Reference Modern Artificial Intelligence began in the middle of the.

CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)

Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field.

1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 6: Optimality Criterion in MDPs Dr. Itamar Arel College of Engineering Department.

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.

For Friday Read chapter 27 Program 5 due.

Agents CPSC 386 Artificial Intelligence Ellen Walker Hiram College.

Artificial Intelligence Lecture 1. Objectives Definition Foundation of AI History of AI Agent Application of AI.

Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University

Artificial Intelligence Miroslav Mařík Peter Ševčík Monika Pinďarová.

MDPs (cont) & Reinforcement Learning

Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.

Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

CSE 471/598 Intelligent Agents TIP We’re intelligent agents, aren’t we?

CS Machine Learning Instance Based Learning (Adapted from various sources)

CS382 Introduction to Artificial Intelligence Lecture 1: The Foundations of AI and Intelligent Agents 24 January 2012 Instructor: Kostas Bekris Computer.

Lecture 2: Intelligent Agents Heshaam Faili University of Tehran What is an intelligent agent? Structure of intelligent agents Environments.

Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.

Making complex decisions

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Artificial Intelligence Lecture No. 5

Markov Decision Processes

Markov Decision Processes

Hidden Markov Models Part 2: Algorithms

Announcements Homework 3 due today (grace period through Friday)

Instructors: Fei Fang (This Lecture) and Dave Touretzky

CASE − Cognitive Agents for Social Environments

13. Acting under Uncertainty Wolfram Burgard and Bernhard Nebel

AI and Agents CS 171/271 (Chapters 1 and 2)

Chapter 17 – Making Complex Decisions

Artificial General Intelligence (AGI)

EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF INDUSTRIAL ENGINEERING IENG314 OPERATIONS RESEARCH II SAMIR SAMEER ABUYOUSSEF

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Artificial General Intelligence (AGI)

Presentation transcript:

An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research Institute, Berkeley, CA Ethical Artificial Intelligence http://arxiv.org/abs/1411.1373

Current vs Future AI Self-driving car Current AI Future AI Self-driving car Environment model designed by humans Explicit safety constraints on behavior designed into model Server for electronic companions Environment model too complex for humans to understand and must be learned Explicit safety constraints impossible with learned model Safety rules, such as Asimov’s Laws of Robotics, ambiguous

Utilitarian Ethics for AI Utility function on outcomes resolve ambiguities of ethical rules Utility functions can express any complete and transitive preferences among outcomes Incomplete  outcomes A and B such that AI agent cannot decide between them Not transitive  outcomes A, B and C such that A > B, B > C and C > A so again AI agent cannot decide among them So can assume utility-maximizing agents

Agent observations of environment oi  O finite set Agent actions ai  A finite set Interaction History h = (a1, o1, ..., at, ot)  H, |h| = t Utility function u(h), temporal discount 0 <  < 1

Q is set of environment models stochastic programs with finite memory limit λ(h) := argmax qQ P(h | q) 2-|q| (h') = P(h' | λ(h)) where h’ extends h ρ(o | ha) = ρ(hao) / ρ(ha) = ρ(hao) / ∑o'O ρ(hao') v(h) = u(h) +  max aA v(ha) v(ha) = ∑oO ρ(o | ha) v(hao) (h) := a|h|+1 = argmax aA v(ha) Agent policy  : H  A

Future AI Risks Self-delusion Corrupting the reward generator Inconsistency of the agent’s utility function with other parts of its definition Unintended Instrumental Actions

Self-delusion i.e., wireheading

Ring, M., and Orseau, L. 2011b. Delusion, survival, and intelligent agents. In: Schmidhuber, J., Thórisson, K.R., and Looks, M. (eds) AGI 2011. LNCS (LNAI), vol. 6830, pp. 11-20. Springer, Heidelberg.

Ring and Orseau showed that reinforcement learning (RL) agents would choose to self-delude (think drug-addicted AI agents). An RL agent has a utility function is a reward from the environment. That is u(h) = rt, where h = (a1, o1, ..., at, ot) and ot, = (o’t, rt). We can avoid self-delusion by defining an agent’s utility function in terms of its environment model λ(h). This is natural for agents with pre-defined environment models. It is more complex for future AI agents that must learn complex environment models.

Environment model qm = λ(hm) Z = set of internal state histories of qm Let h extend hm Zh  Z internal state histories consistent with h uqm(h, z) = utility function of combined histories h  H and z  Zh u(h) := ∑zZh P(z | h, qm) uqm(h, z) model-based utility function Because qm is learned by the agent, uqm(h, z) must bind to learned features in Z. For example, the agent may learn to recognize humans and bind its utility function to properties of those recognized humans.

Humans avoid self-delusion (drug addiction) with a mental model of life as a drug addict. Similarly for an AI agent whose utility function is defined in terms of its environment model.

Corrupting the Reward Generator

Hutter, M. 2005. Universal artificial intelligence: sequential decisions based on algorithmic probability. Springer, Heidelberg. On pages 238-239, Hutter described how an AI agent that gets its reward from humans may corrupt those humans to increase its reward. Bostrom refers to this as perverse instantiation. To avoid this corruption: uhuman_values(hm, hx, h) utility of history h extending hm, based on values of humans at history hx as modeled by λ(hm). Using x = m = current time agent cannot increase utility by corrupting humans. Values from current rather than future humans.

Inconsistency of the Agent’s Utility Function with Other Parts of its Definition For example, the agent definition may include a utility function and constraints to prevent behavior harmful to humans. To maximize expected utility the agent may choose actions to remove the parts of its definition inconsistent with the utility function, such as safety constraints.

Self-Modeling Agents (value learners): ovt(i) = discrete((∑i≤j≤t j-i u(hj)) / (1 - t-i+1)) for i ≤ t Can include constraints, evolving u(hj), etc in ovt(i) o'i = (oi, ovt(i)) and h't = (a1, o'1, ..., ai, o't) q = λ(h't) := argmax qQ P(h't | q) (q) v(hta) = ∑rR ρ(ovt(t+1) = r | h'ta) r (ht) := at+1 = argmaxaAt v(hta)

pvt(i, l, k) = discrete((∑i≤j≤t j-i uhuman_values(hl, hk, hj)) / (1 - t-i+1)) t(i-1, n) = pvt(i, i‑1, n) ‑ pvt(i, i‑1, i‑1). Condition: ∑i≤n≤t  t(i-1, n)  0 ovt(i) = pvt(i, i‑1, i‑1) if Condition is satisfied and i > m 0 if Condition is not satisfied or i  m This definition of ovt(i) models evolution of utility function with increasing environment model accuracy, and avoids corrupting the reward generator.

Unintended Instrumental Actions Agent will calculate that it will be better able to maximize expected utility by increasing its resources, disabling threats, gaining control over other agents, etc. Omohundro, S. 2008. The basic AI drives. In Wang, P., Goertzel, B., and Franklin, S. (eds) AGI 2008. Proc. First Conf. on AGI, pp. 483-492. IOS Press, Amsterdam. These unintended instrumental actions may threaten humans.

Humans may be perceived as threats or possessing resources the agent can use. The defense is a utility function that expresses human values. E.G., the agent can better satisfy human values by increasing its resources as long as other uses for those resources are not more valuable to humans.

Biggest Risks Will be Social and Political AI will be a tool of economic and military competition Elite humans who control AI servers for widely used electronic companions will be able to manipulate society Narrow, normal distribution of natural human intelligence will be replaced by power law distribution of artificial intelligence Average humans will not be able to learn the languages of the most intelligent