CAP6938 Neuroevolution and Artificial Embryogeny Neural Network Weight Optimization Dr. Kenneth Stanley January 18, 2006
Review Remember, the values of the weights and the topology determine the functionality Given a topology, how are weights optimized? Weights are just parameters on a structure ? ? ? ?? ? ? ??
Two Cases Output targets are known Output targets are not known X1X1 X2X2 H1H1 H2H2 out 1 out 2 w 11 w 21 w 12
Decision Boundaries OR function: InputOutput OR is linearly separable Linearly separable problems do not require hidden nodes (nonlinearities) Bias
Decision Boundaries XOR is not linearly separable Requires at least one hidden node XOR function: InputOutput Bias
Hebbian Learning Change weights based on correlation of connected neurons Learning rules are local Simple Hebb Rule: Works best when relevance of inputs to outputs is independent Simple Hebb Rule grows weights unbounded Can be made incremental:
More Complex Local Learning Rules Hebbian Learning with a maximum magnitude: –Excitatory: –Inhibitory: Second terms are decay terms: forgetting –Happens when presynaptic node does not affect postsynaptic node Other rules are possible Videos: watch the connections change
Perceptron Learning Will converge on correct weights Single layer learning rule: Rule is applied until boundary is learned Bias
Backpropagation Designed for at least one hidden layer First, activation propagates to outputs Then, errors are computed and assigned Finally, weights are updated Sigmoid is a common activation function X1X1 X2X2 z1z1 z2z2 y1y1 y2y2 v 11 v 21 v 12 v 22 w 11 w 21 w 12 w 22 t1t1 t2t2 x’s are inputs z’s are hidden units y’s are outputs t’s are targets v’s are layer 1 weights w’s are layer 2 weights
Backpropagation Algorithm 1)Initialize weights 2)While stopping condition is false, for each training pair 1)Compute outputs by forward activation 2)Backpropagate error: 1)For each output unit, error 2) Weight correction 3)Send error back to hidden units 4)Calculate error contribution for each hidden unit: 5)Weight correction 3)Adjust weights by adding weight corrections (target minus output times slope) (Learning rate times error times hidden output)
Example Applications Anything with a set of examples and known targets XOR Character recognition NETtalk: reading English aloud Failure predicition Disadvantages: trapped in local optima
Output Targets Often Not Available (Stone, Sutton, and Kuhlmann 2005)
One Approach: Value Function Reinforcement Learning Divide the world into states and actions Assign values to states Gradually learn the most promising states and actions Start Goal
Learning to Navigate Start Goal Start Goal Start Goal Start Goal T=1 T=56 T=350 T=703
How to Update State/Action Values Q learning rule: Exploration increases Q-values’ accuracy The best actions to take in different states become known Works only in Markovian domains
Backprop In RL The state/action table can be estimated by a neural network The target learned by the network is the Q-value: NN Action State_description Value
Next Week: Evolutionary Computation For 1/23: Mitchell ch.1 (pp. 1-31) and ch.2 (pp ) Note Section 2.3 is "Evolving Neural Networks" For 1/25: Mitchell pp , paper: No Free Lunch Theorems for Optimization (1996)No Free Lunch Theorems for Optimization by David H. Wolpert, William G. Macready EC does not require targets EC can be a kind of RL EC is policy search EC is more than RL