Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field.

Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field Graduate School of Information, Production and Systems Waseda University

I Research Background Intelligent systems (evolutionary and learning algorithms) can solve problems automatically Systems are becoming large and complex robot control elevator Group Control System Stock trading system It is very difficult to make efficient control rules considering many kinds of real world phenomena

II Objective of the research propose an algorithm which combines evolution and learning –In the natural world ・・・ evolution ― Many individuals (living things) adapt to the world (environment) through long time of generations learning ― the knowledge the living things acquire in their life time through trial-and-error give inherent functions and characteristics to the living things the knowledge acquired in the course of their life

III evolution

selection crossover mutation Evolution Characteristics of living things are determined by genes Evolution is realized the following components Evolution gives inherent characteristics and Functions

Selection Those who fit into an environment survive, otherwise die out. Crossover Genes are exchanged between two individuals Mutation Some of the genes of the selected individuals are changed to other ones New individuals are produced

IV learning

Important factors in reinforcement learning State transition (definition of states and actions) Trial and error learning Future prediction

Framework of Reinforcement Learning Learn action rules through the interaction between an agent and an environment. agent environment Action State signal （ sensor input ） Reward （ evaluation on the action ） The aim of RL is to maximize the total rewards obtained from the environment

r t+n State transition stst s t+1 s t+2 s t+n …… atat An action taken at time t State at time t G Example: maze problem start GG Goal!! stst s t+1 s t+2 Reward 100 s t+n a t : move righta t+1 : move upward a t+2 : move left a t+n : do nothing (end) …… Reward r t a t+1 a t+2 r t+1 r t+2

Trial-and-error learning concept of reinforcement learning trial and error learning method Decide an action take the action Success (get reward) Failure (get negative reward) Take this action again! Don ’ t take this action again Reward (scalar value): indicate whether good action or not Acquired knowledge agent

Future prediction Reinforcement learning estimates the future rewards and take actions s t+1 s t+2 s t+3 atat Reward r t a t+1 a t+2 r t+1 r t+2 stst current time future

Future prediction Reinforcement learning considers the rewards not only current but also the future rewards s t+1 s t+2 s t+3 atat Reward r t =1 a t+1 a t+2 r t+1 =1r t+2 =1 stst s t+1 s t+2 s t+3 atat Reward r t =0 a t+1 a t+2 r t+1 =0r t+2 =100 Case 1 Case 2

V GNP with evolution and learning

Genetic Network Programming (GNP) GNP is an Evolutionary Computation. What ’ s Evolutionary Computation ？ solution gene ＝ Solutions (programs) are represented by genes The programs are evolved (changed) by selection, crossover and mutation

Structure of GNP Graph structure 0034 0116 0257 1080 1004 1512 … … … … gene structure GNP represents its programs using directed graph structures. The graph structures can be represented as gene structures. The graph structure is composed of processing nodes and judgment nodes.

Khepera robot Khepera robot is used for the performance evaluation of GNP obstacle sensor Far from obstacles Close to obstacles Close to zero Close to 1023 Sensor value wheel Speed of the right wheel V R Speed of the left wheel V L -10 (back) ～ 10 (forward)

Node functions Processing node Judgment node Each node determines an agent action Each node selects a branch based on the judgment result Set the speed of the right wheel at 10 Ex) khepera robot behavior Judge the value of sensor 1 500 or more Less than 500

An example of node transition Judge sensor 1 Judge sensor 5 Set the speed of the right wheel at 5 The value is 700 or more The value is less than 700 80 or more Less than 80

Generate an initial population (initial programs) Task execution Reinforcement Learning Evolution Selection / Crossover / Mutation Last generation one generation Flowchart of GNP stop start

Evolution of GNP selection Select good individuals (programs) from the population based on their fitness Fitness indicates how much each individual achieves a given task used for crossover and mutation ・・・ GNP population

Evolution of GNP crossover Some nodes and their connections are exchanged. Individual 1 Individual 2

mutation Change connections Change node function Speed of Right wheel: 5 Speed of Left wheel: 10

The role of Learning Example) Set the speed of the right wheel at 10 Collision! Judge sensor 0 1000 or more Less than 1000 1000 is changed to 500 in order to judge obstacle sensitively Judgment node 10 is changed to 5 not to collide with the obstacle Processing node Node parameters are changed by reinforcement learning

The aim of combining evolution and learning create efficient programs search for solutions faster Evolution uses many individuals and better ones are selected after task execution Learning uses one individuals and better action rules can be determined during task execution

VI Simulation Wall-following behavior 1.All the sensor values must not be more than 1000 2.At least one sensor value is more than 100 3.Move straight 4.Move fast Simulation environment : If the condition 1 and 2 is satisfied : otherwise

Node functions Processing node (2 kinds)Judgment node (8 kinds) Determine the speed of right wheel Determine the speed of left wheel Judge the value of sensor 0 Judge the value of sensor 7.....

Simulation result conditions –The number of individuals: 600 –The number of nodes: 34 Judgement nodes: 24 Processing nodes: 10 fitness generation GNP with learning and evolution Standard GNP (GNP with evolution) fitness curves of the best individuals averaged over 30 independent simulations start Track of the robot

start Simulations in the inexperienced environments Simulation on the generalization ability The robot can show the wall-following behavior. The best program obtained in the previous environment Execute in the inexperienced environment

VII Conclusion The algorithm of GNP using evolution and reinforcement learning is proposed. –From the simulation results, the proposed method can learn wall-following behavior well. Future work –Apply GNP with evolution and reinforcement learning to real world applications Elevator control system Stock trading model –Compare with other evolutionary algorithms

VIII other simulations Example of tileworld wall floor tile agent Agent can push a tile and drop it into a hole. The aim of agent is to drop tiles into holes as many as possible. tileworld hole Fitness = the number of dropped tiles Reward r t = 1 (when dropping a tile into a hole)

Node functions Processing nodeJudgement node go forward turn right turn left stay What is in the forward cell ? (floor, tile, hole, wall or agent) backward cell left cell right cell the direction of the nearest tile (forward, backward, left, right or nothing) the direction of the nearest hole the direction of the nearest hole from the nearest tile the direction of the second nearest tile

Example of node transition What is in the forward? Direction of the nearest hole backward right leftnothing floor wall agent tile hole Go forward forward

Simulation 1 –There are 30 tiles and 30 holes –same environment every generation –Time limit: 150 steps Environment I

Fitness curve （ simulation 1 ） fitness generation GNP with evolution GNP with learning and Evolution EP （ evolution of Finite State Machine ） GP-ADFs (main tree ： max depth 3 GP (max depth 5) ADF: depth 2)

Simulation 2 Put 20 tiles and 20 holes at random positions One tile and one hole appear just after an agent push a tile into a hole Time limit: 300 steps Environment II （ example of an initial state ）

Fitness curve (simulation 2) fitness generation GNP with evolution GNP with learning and evolution EP GP (max depth 5) GP-ADFs (main tree ： max depth3 ADF: depth 2)

Ratio of used nodes Node function Go forwardTurn leftTurn rightDo nothing Judge forwardJudge backward Judge left side Judge right side Direction of tiledirection of hole Direction of hole from tile Second nearest tile Node function Ratio of used nodes Initial generation Last generation Go forwardTurn leftTurn right Do nothing Judge forwardJudge backward Judge left side Judge right side Direction of tiledirection of hole Direction of hole from tile Second nearest tile

Summary of the simulations GNP-LEGNP-EGPGP-ADFsEP Mean fitness21.2318.0014.0015.4316.30 Standard deviation2.731.884.001.941.99 T-test (p value) GNP-LE GNP-E 1.04×10 -6 3.13×10 -17 3.17×10 -11 3.03×10 -13 1.32×10 -6 5.31×10 -11 5.95×10 -4 Simulation I GNP-LEGNP-EGPGP-ADFsEP Mean fitness19.9315.306.106.6714.40 Standard deviation2.433.881.753.192.54 T-test (p value) GNP-LE GNP-E 5.90×10 -8 1.53×10 -31 5.91×10 -15 7.46×10 -26 1.36×10 -13 2.90×10 -12 1.46×10 -1 Simulation II Data on the best individuals obtained at the last generation (30 samples)

Summary of the simulations GNP with LE GNP with EGPGP-ADFsEP Calculation time for 5000 generations [s] 1,7171,0193,2813,2522,802 Ratio of GNP with E (1) to each 1.6813.223.192.75 Simulation I GNP with LE GNP with EGPGP-ADFsEP Calculation time for 5000 generations [s] 2,7341,17712,0595,9211,584 Ratio of GNP with E (1) to each 2.32110.255.031.35 Simulation II Calculation time comparison

The program obtained by GNP 0 step 12345678910111213141516

KG wall floor door agent Maze problem K G key goal fitness= reward r t = 1 (when reaching the goal) Remaining time （ when reaching the goal ）０（ when the agent cannot reach the goal ） objective ： reach goal as early as possible The key is necessary to open the door in front of the goal Time limit: 300 step

Processing nodeJudgment node go forward turn right turn left random (take one of three actions randomly) Judge forward cell Judge backward cell Judge left cell Judge right cell Node functions

Fitness curve （ maze problem ） Fitness generation GP GNP with evolution (GNP-E) GNP with learning and Evolution (GNP-LE) Data on the best individuals obtained at the last generation (30 samples) GNP-LEGNP-EGP mean253.0246.2227.0 Standard deviation0.002.3037.4 Ratio of reaching the goal100% Ratio of obtaining the optimal policy100%3.3%63%

Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field.

Similar presentations

Presentation on theme: "Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field.

Similar presentations

Presentation on theme: "Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field."— Presentation transcript:

Similar presentations

About project

Feedback