Technical Problems in Long-Term AI SafetyAndrew Critch Technical (and Non-Technical) Problems in Long-Term AI Safety Andrew Critch.

Slides:



Advertisements
Similar presentations
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona USA Modeling Social Cognition in a Unified Cognitive Architecture.
Advertisements

Fostering Wireless Spectrum Sharing via Subsidization Allerton 2013 Murat Yuksel Computer Science and Engineering Department, University of Nevada, Reno.
Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.
Game Theory “Доверяй, Но Проверяй” (“Trust, but Verify”) - Russian Proverb (Ronald Reagan) Topic 5 Repeated Games.
Game Theory Eduardo Costa. Contents What is game theory? Representation of games Types of games Applications of game theory Interesting Examples.
An Introduction to Artificial Intelligence. Introduction Getting machines to “think”. Imitation game and the Turing test. Chinese room test. Key processes.
Bart Selman CS CS 475: Uncertainty and Multi-Agent Systems Prof. Bart Selman Introduction.
Game-Theoretic Approaches to Multi-Agent Systems Bernhard Nebel.
CRESCCO Project IST Work Package 2 Algorithms for Selfish Agents V. Auletta, P. Penna and G. Persiano Università di Salerno
Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.
CPS Topics in Computational Economics Instructor: Vincent Conitzer Assistant Professor of Computer Science Assistant Professor of Economics
Creating Friendly Superintelligence Eliezer S. Yudkowsky Anders Sandberg Michael Korns John Smart.
Abstract Though previous explorations of equilibria in game theory have incorporated the concept of error-making, most do not consider the possibility.
1 Chapter 18 Fuzzy Reasoning. 2 Chapter 18 Contents (1) l Bivalent and Multivalent Logics l Linguistic Variables l Fuzzy Sets l Membership Functions l.
CPSC 322 Introduction to Artificial Intelligence September 10, 2004.
6.896: Topics in Algorithmic Game Theory Spring 2010 Constantinos Daskalakis vol. 1:
Kim Solez, MD. Singularity Course  Some already know what the technological Singularity is, others don’t, and are finding out now.  However, if the.
Presented by Yeshwanth Boppana Sowmya Nagubadi. overview Introduction SL and WOW Virtual Laboratory Experiments Observational Social and Economic Science.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
An Analytical Framework for Ethical AI
Ch1 AI: History and Applications Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011.
1 AI and Agents CS 171/271 (Chapters 1 and 2) Some text and images in these slides were drawn from Russel & Norvig’s published material.
Chapter 14: Artificial Intelligence Invitation to Computer Science, C++ Version, Third Edition.
Introduction (Chapter 1) CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
1 A unified approach to comparative statics puzzles in experiments Armin Schmutzler University of Zurich, CEPR, ENCORE.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Knowledge representation
5. Alternative Approaches. Strategic Bahavior in Business and Econ 1. Introduction 2. Individual Decision Making 3. Basic Topics in Game Theory 4. The.
Sampletalk Technology Presentation Andrew Gleibman
Dynamic Games of complete information: Backward Induction and Subgame perfection - Repeated Games -
Academic excellence for business and the professions C-agents: a notion that allows extension of Multi-Agent Systems (MAS) to SoS Eduardo Alonso Systems.
Some Comments on “The Reports of My Death are Greatly Exaggerated – Expert Systems Research in Accounting” Daniel E. O’Leary University of Southern California.
For Friday Read chapter 27 Program 5 due.
CPS 270: Artificial Intelligence Machine learning Instructor: Vincent Conitzer.
What is AI…? Dr. Simon Colton Computational Bioinformatics Laboratory Department of Computing Imperial College, London.
How Solvable Is Intelligence? A brief introduction to AI Dr. Richard Fox Department of Computer Science Northern Kentucky University.
Data Analysis Econ 176, Fall Populations When we run an experiment, we are always measuring an outcome, x. We say that an outcome belongs to some.
Game Theory, Social Interactions and Artificial Intelligence Supervisor: Philip Sterne Supervisee: John Richter.
Artificial Intelligence
CTM 2. EXAM 2 Exam 1 Exam 2 Letter Grades Statistics Mean: 60 Median: 56 Modes: 51, 76.
Algorithmic, Game-theoretic and Logical Foundations
1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.
University of Kurdistan Artificial Intelligence Methods (AIM) Lecturer: Kaveh Mollazade, Ph.D. Department of Biosystems Engineering, Faculty of Agriculture,
Definitions Think like humansThink rationally Act like humansAct rationally The science of making machines that: This slide deck courtesy of Dan Klein.
What is Artificial Intelligence?
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
© Andrew F. Siegel, 1997 and 2000 Irwin/McGraw-Hill 1-1 BQT 173 BUSINESS STATISTICS.
Being Human INST 4200 David J Stucki Spring 2015.
Chapter 1. Introduction in Creating Brain-like intelligence, Sendhoff et al. Course: Robots Learning from Humans Bae, Eun-bit Otology Laboratory Seoul.
Theme Guidance - Network Traffic Proposed NMLRG IETF 95, April 2016 Sheng Jiang (Speaker, Co-chair) Page 1/6.
Sub-fields of computer science. Sub-fields of computer science.
Dana Moshkovitz The Institute For Advanced Study
Christos H. Papadimitriou UC Berkeley
Summary of Chapter 10 The Computer Scientist: Artificial Intelligence
Artificial Intelligence and Searching
Introduction Artificial Intelligent.
Modelling Dr Andy Evans In this lecture we'll look at modelling.
Unit 4 SOCIAL INTERACTIONS.
Asymmetric auctions with resale: an experimental study
AI and Agents CS 171/271 (Chapters 1 and 2)
Artificial General Intelligence (AGI)
Institute of Computing Technology
Auctions with Toeholds: an Experimental Study
Game Theory: The Nash Equilibrium
Artificial Intelligence and Searching
Algorithms CSCI 235, Spring 2019 Lecture 37 The Halting Problem
Instructor: Vincent Conitzer
Artificial General Intelligence (AGI)
Presentation transcript:

Technical Problems in Long-Term AI SafetyAndrew Critch Technical (and Non-Technical) Problems in Long-Term AI Safety Andrew Critch Machine Intelligence Research Institute

Technical Problems in Long-Term AI SafetyAndrew Critch Motivation, Part 1: Is human-level AI plausible? There are powerful short-term economic incentives to create human-level AI if possible. Natural selection was able to produce human- level intelligence. Thus, HLAI seems plausible in the long-term. Recent surveys of experts give arrival medians between 2040 and 2050.

Technical Problems in Long-Term AI SafetyAndrew Critch From :

Technical Problems in Long-Term AI SafetyAndrew Critch Cumulative probability of AI being predicted over time, by group From :

Technical Problems in Long-Term AI SafetyAndrew Critch Motivation, Part 2: Is superintelligence plausible? In many domains, once computers have matched human performance, soon afterward they far surpassed it. Thus, not long after HLAI, it’s not implausible that AI will far exceed human performance in most domains, resulting in what Bostrom calls “superintelligence”.

Technical Problems in Long-Term AI SafetyAndrew Critch (optional pause for discussion / comparisons)

Technical Problems in Long-Term AI SafetyAndrew Critch Thought experiment: Imagine it’s 2060, and the leading tech giant announces it will roll out the world’s first superintelligent AI sometime in the next year. Is there anything you’re worried about? Are there any questions you wish there had been decades of research on dating back to 2015? Motivation, Part 3: Is superintelligence safe?

Technical Problems in Long-Term AI SafetyAndrew Critch Some Big questions Is it feasible to build a useful superintelligence that, e.g., Shares our values, and will not take them to extremes? Will not compete with us for resources? Will not resist us modifying its goals or shutting it down? Can understand itself without deriving contradictions as in Gödel’s Theorems?

Technical Problems in Long-Term AI SafetyAndrew Critch Goal: Develop these big questions past the stages of philosophical conversation and into the domain of mathematics and computer science PhilosophyMathematics/CS Big Questions Technical Understanding

Technical Problems in Long-Term AI SafetyAndrew Critch Motivation, Part 4: Examples of technical understanding Vickrey second-price auctions (1961) : – Well-understood optimality results (truthful bidding is optimal) – Real-world applications, (network routing) – Decades of peer-review

Technical Problems in Long-Term AI SafetyAndrew Critch Nash equilibria (1951) :

Technical Problems in Long-Term AI SafetyAndrew Critch Classical Game Theory (1953) : An extensive form game.

Technical Problems in Long-Term AI SafetyAndrew Critch Key Problem: Counterfactuals for Self-Reflective Agents What does it mean for a program A to improve some feature of a larger program E in which A is running, and which A can understand? def Environment (): … def Agent(senseData) : def Utility(globalVariables) : … … do Agent(senseData1) … do Agent(senseData2) … end

Technical Problems in Long-Term AI SafetyAndrew Critch (optional pause for discussion of IndignationBot)

Technical Problems in Long-Term AI SafetyAndrew Critch Example: π maximizing What would happen if I changed the first digit of π to 9? This seems absurd because π is logically determined. However, the result of running a computer program (e.g. the evolution of the Schrodinger equation) is logically determined by its source code and inputs…

Technical Problems in Long-Term AI SafetyAndrew Critch … when an agent reasons to do X “because X is better than Y”, considering what would happen if it did Y instead means considering a mathematical impossibility. (If the agent has access to its own source code, it can derive a contradiction from the hypothesis “I do Y”, from which anything follows. This is clearly not how we want our AI to reason. How do we?

Technical Problems in Long-Term AI SafetyAndrew Critch Current formalisms are “Cartesian” in that they separate an agent’s source code and cognitive machinery form its environment. This is a type error, and in combination with other subtleties, it has some serious consequences.

Technical Problems in Long-Term AI SafetyAndrew Critch Examples (page 1) Robust Cooperation in the Prisoners’ Dilemma (LaVictoire et al, 2014) demonstrates non- classical cooperative behavior in agents with open source codes; Robust Cooperation in the Prisoners’ Dilemma Memory Issues of Intelligent Agents (Orseau and Ring, AGI 2012) notes that Cartesian agents are oblivious to damage to their cognitive machinery; Memory Issues of Intelligent Agents

Technical Problems in Long-Term AI SafetyAndrew Critch Examples (page 2) Space-Time Embedded Intelligence (Orseau and Ring, AGI 2012) provides a more naturalized framework for agents inside environments; Space-Time Embedded Intelligence Problems of self-reference in self-improving space-time embedded intelligence (Fallenstein and Soares, AGI 2014) identifies problems persisting in the Orseau-Ring framework, including procrastination and issues with self- trust arising from Löb’s theorem; Problems of self-reference in self-improving space-time embedded intelligence

Technical Problems in Long-Term AI SafetyAndrew Critch Examples (page 3) Vingean Reflection: Reliable Reasoning for Self-Improving Agents (Fallenstein and Soares, 2015) provides some approaches to resolving some of these issues; Vingean Reflection: Reliable Reasoning for Self-Improving Agents … lots more; see intelligence.org/research for additional reading.intelligence.org/research

Technical Problems in Long-Term AI SafetyAndrew Critch Summary There are serious problems with superintelligence that need formalizing in the way that fields like probability theory, statistics, and game theory have been formalized. Superintelligence poses a plausible existential risk to human civilization. Some of these problems can be explored now via examples in theoretical computer science and logic. So, let’s do it!

Technical Problems in Long-Term AI SafetyAndrew Critch Thanks! To…. Owen Cotton-Baratt for the invitation to speak. Patrick LaVictoire for reviewing my slides. Laurent Orseau, Mark Ring, Mihaly Barasz, Paul Christiano, Benja Fallenstein, Marcello Herreshoff, Patrick LaVictoire, and Eliezer Yudkowsky for doing all the research I cited