Evolution of Cooperative problem-solving in an artificial economy by E. Baum and I. Durdanovic presented by Quang Duong.

Slides:



Advertisements
Similar presentations
Artificial Intelligence Presentation
Advertisements

Markov Decision Process
The mind is a computer program Effective procedure = sequence of syntactic steps Church-Turing Thesis: any physical.
Partially Observable Markov Decision Process (POMDP)
Algorithms + L. Grewe.
Preference Elicitation Partial-revelation VCG mechanism for Combinatorial Auctions and Eliciting Non-price Preferences in Combinatorial Auctions.
Markov Decision Processes
Planning under Uncertainty
1 Module 2: Fundamental Concepts Problems Programs –Programming languages.
Artificial Intelligence Chapter 11: Planning
Lecture 2: Fundamental Concepts
Algorithms and Problem Solving. Learn about problem solving skills Explore the algorithmic approach for problem solving Learn about algorithm development.
Structure and Phase Transition Phenomena in the VTC Problem C. P. Gomes, H. Kautz, B. Selman R. Bejar, and I. Vetsikas IISI Cornell University University.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
Computer Science Prof. Bill Pugh Dept. of Computer Science.
The Design and Analysis of Algorithms
9/23. Announcements Homework 1 returned today (Avg 27.8; highest 37) –Homework 2 due Thursday Homework 3 socket to open today Project 1 due Tuesday –A.
Evolutionary Reinforcement Learning Systems Presented by Alp Sardağ.
Differential Evolution Hossein Talebi Hassan Nikoo 1.
Khaled Rasheed Computer Science Dept. University of Georgia
Genetic Algorithms Overview Genetic Algorithms: a gentle introduction –What are GAs –How do they work/ Why? –Critical issues Use in Data Mining –GAs.
COMP s1 Computing 2 Complexity
Genetic Programming.
Artificial Intelligence Dr. Paul Wagner Department of Computer Science University of Wisconsin – Eau Claire.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Structure and Phase Transition Phenomena in the VTC Problem C. P. Gomes, H. Kautz, B. Selman R. Bejar, and I. Vetsikas IISI Cornell University University.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
© Negnevitsky, Pearson Education, Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Evolution strategies Evolution.
CS 484 – Artificial Intelligence1 Announcements Lab 3 due Tuesday, November 6 Homework 6 due Tuesday, November 6 Lab 4 due Thursday, November 8 Current.
Applying reinforcement learning to Tetris A reduction in state space Underling : Donald Carr Supervisor : Philip Sterne.
Analysis of Algorithms
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
What is Genetic Programming? Genetic programming is a model of programming which uses the ideas (and some of the terminology) of biological evolution to.
Lecture 6: Problem Solving Dr John Levine Algorithms and Complexity February 10th 2006.
ART – Artificial Reasoning Toolkit Evolving a complex system Marco Lamieri
GENETIC ALGORITHM A biologically inspired model of intelligence and the principles of biological evolution are applied to find solutions to difficult problems.
Overview of Machine Learning RPI Robotics Lab Spring 2011 Kane Hadley.
Project funded by the Future and Emerging Technologies arm of the IST Programme FET-Open scheme Project funded by the Future and Emerging Technologies.
A New Evolutionary Approach for the Optimal Communication Spanning Tree Problem Sang-Moon Soak Speaker: 洪嘉涓、陳麗徽、李振宇、黃怡靜.
Algorithms and their Applications CS2004 ( ) 13.1 Further Evolutionary Computation.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Discovering Interesting Patterns for Investment Decision Making with GLOWER-A Genetic Learner Overlaid With Entropy Reduction Advisor : Dr. Hsu Graduate.
Solving problems by searching A I C h a p t e r 3.
ALGORITHMS AND FLOWCHARTS. Why Algorithm is needed? 2 Computer Program ? Set of instructions to perform some specific task Is Program itself a Software.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
1 Comparative Study of two Genetic Algorithms Based Task Allocation Models in Distributed Computing System Oğuzhan TAŞ 2005.
1 Structured Programming Arab Academy for Science and Technology CC112 Dr. Sherif Mohamed Tawfik The Course.
제 9 주. 응용 -4: Robotics Artificial Life and Real Robots R.A. Brooks, Proc. European Conference on Artificial Life, pp. 3~10, 1992 학습목표 시뮬레이션 로봇과 실제 로봇을.
Creativity of Algorithms & Simple JavaScript Commands
Chapter 14 Genetic Algorithms.
Evolutionary Algorithms Jim Whitehead
The Design and Analysis of Algorithms
Done Done Course Overview What is AI? What are the Major Challenges?
Markov Decision Processes
Introduction to Computer Programming
What to do when you don’t know anything know nothing
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Spring 2006
CHAPTER I. of EVOLUTIONARY ROBOTICS Stefano Nolfi and Dario Floreano
CS 416 Artificial Intelligence
Introduction to Artificial Intelligence Instructor: Dr. Eduardo Urbina
Search.
Search.
Markov Decision Processes
Markov Decision Processes
Continuous Curriculum Learning for RL
Presentation transcript:

Evolution of Cooperative problem-solving in an artificial economy by E. Baum and I. Durdanovic presented by Quang Duong

Outline Reinforcement learning and other learning approaches’ limitations Artificial Economy Representation Language: S-expression Block Worlds Rubik’s Cube Rush Hour Conclusion and further exploration

Reinforcement learning Value iteration, policy iteration TD learning: neural network, inductive logic programming Limitation: –state/policy spaces are enormous –evaluation function (reward) is hard to learn/present –(based on empirical results)

Related work Holland Classifier System: -A set of agents –classifiers: communicate with the world and other agents -Bucket brigade and genetic algorithms Is it a fair comparison? search vs. plan E.g: GraphPlan, Blackbox

Evolution approach Artificial economy of modules. Learn a program capable of solving a class of problems, instead of an individual problem. Higher reward for harder problems Revolution: Main problem (external world) Artificial economy (internal world) most effective agent/module External reward action Internal reward (simulation)

Artificial Economy Hayek: collection of modules/agents W init set to the largest reward earned Auctions: among modules/agents; the winner executes its set of instructions. Each agent: –Wealth –Bid: min of wealth and the world simulation’s returned value –Create agents if Wealth > 10 * W init An example… Assumptions: –only one agent owns everything at a time ! Competing for the right to interact with the real world –conservation of money –voluntary inter-agent transaction and property rights

Representation Language S-expression: a symbol or a list structure of S- expressions; typed expression !!! Hayek3; capable of solving large but finite problems

Block World Max moves: 10nlog 3 (k) while k is number of colors

Evolution process Start: a hand-coded agent New agent: mutation of the parent agent. W init : highest earned reward (initially 0) Demonstration… Additional elements: Numcorrect and R node

Block World Difficulty: combinatorial numbers of states A class of problems: uniform distribution of size Result: 100-block problems solved 100%, 200-block solved 90% with 3 colors

Resulting system 1000 agents Bid calculation formula: A. Numcorrect + B: –A, B (agent type) –Numcorrect: environment 3 recognizable types: a few cleaners, many stackers, closers (say Done) Macro-action. Example: Approximate Numcorrect. R(a,b) node.

Meta learning 2 types: Creator (don’t bid): modify or create – otherwise, Solver. Example… Result: show 30%-50% better performance after a week of execution. Limitations: –expensive matching computation –shows no better performance than the simple Hayek system

Example

Rubik’s cube model variations 2 schemes: –Scrambled in one rotation, reward only for completely unscrambling. –Completely scrambled, partial reward: (1) number of correct cubes (2) number of correct cubes in a fixed sequence (3) number of correct faces. 2 cube models 2 languages. Limitations: after a few days cease to make progress due to: –More difficult to find new beneficial operators without destroying the existing correct part. –Increasingly complex operators Less structure to be exploited

Rush Hour Challenges: possible exponential move solution, representation language Expressions are limited to nodes Note: the use of Creation Agents. Learn with subsampled problems (40 different instances)

Rush Hour (cont.) The ability of agent to recognize distance from the solution (low bid for unsolvable problem) Reward: Break down to subgoals Genetic algorithms never learned to solve any of the original problems

Conclusion Property rights and money conservation Learning in complex environment will involve automated programming Problem: space of programs is enormous Hayek: creates very complex program from small chunk of codes Suggestions: trade info, single and universal language

!? Comments? Applications? How related to the course?