Using Adalines to Approximate Q-functions in Reinforcement Learning Steven Wyckoff December 6, 2006
The Problem Timing traffic lights for optimal traffic flow is hard It would be really nice if there was a good way to have the traffic lights learn the best timing
Green Light District “Intelligent Traffic Light Control” Wiering, van Veenen, Vreeken, Koopman www.cs.uu.nl Built a test-bed for traffic light controller algorithms Based on Reinforcement Learning
Green Light District TLController fills out a table with the ‘gains’ for each lane SimModel picks the best legal light configuration Cars are allowed to move (or not) and the TLController gets to listen in on their movement Repeat
Existing Algorithms Random Most Cars TC-1 GenNeural (And more) Totally random gains Most Cars Based on presence of at least one car TC-1 Real-Time Dynamic Programming Based on probabilities of progress / reward GenNeural Genetically evolve a 3-layer network Uses only traffic densities (And more)
My Algorithm Use a neural network instead of dynamic programming Good: Network can deal with continuous input Might be able to recognize traffic patterns that are not available using a table lookup Bad: Hard to tell what the network will learn Hard to figure out useful input Hard to tell what the ‘right’ output is for training
Pitfalls / Solutions Don’t know if we will be red or green Input Two adalines to predict reward if the light is red or green—gain is the difference Input (for each lane): number of cars, traffic density, is a given lane full Rewards Reward for cars moving, passing through intersections Shared reward for other lanes in the intersection
Results: “Split” “Adaline” did slightly better than “Most Cars” “TC-1” did the best
Results: “Complex” “Adaline” did the worst “TC-1” did the best
What I Wish Was Different Infrastructure Inputs and rewards are all discrete Seems like the network would do better with access to the light configurations Rewards It would be nice to give rewards for no waiting Network Arguably a multi-layer network could perform better
Demo Time