Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Adalines to Approximate Q-functions in Reinforcement Learning

Similar presentations


Presentation on theme: "Using Adalines to Approximate Q-functions in Reinforcement Learning"— Presentation transcript:

1 Using Adalines to Approximate Q-functions in Reinforcement Learning
Steven Wyckoff December 6, 2006

2 The Problem Timing traffic lights for optimal traffic flow is hard
It would be really nice if there was a good way to have the traffic lights learn the best timing

3 Green Light District “Intelligent Traffic Light Control”
Wiering, van Veenen, Vreeken, Koopman Built a test-bed for traffic light controller algorithms Based on Reinforcement Learning

4 Green Light District TLController fills out a table with the ‘gains’ for each lane SimModel picks the best legal light configuration Cars are allowed to move (or not) and the TLController gets to listen in on their movement Repeat

5 Existing Algorithms Random Most Cars TC-1 GenNeural (And more)
Totally random gains Most Cars Based on presence of at least one car TC-1 Real-Time Dynamic Programming Based on probabilities of progress / reward GenNeural Genetically evolve a 3-layer network Uses only traffic densities (And more)

6 My Algorithm Use a neural network instead of dynamic programming Good:
Network can deal with continuous input Might be able to recognize traffic patterns that are not available using a table lookup Bad: Hard to tell what the network will learn Hard to figure out useful input Hard to tell what the ‘right’ output is for training

7 Pitfalls / Solutions Don’t know if we will be red or green Input
Two adalines to predict reward if the light is red or green—gain is the difference Input (for each lane): number of cars, traffic density, is a given lane full Rewards Reward for cars moving, passing through intersections Shared reward for other lanes in the intersection

8 Results: “Split” “Adaline” did slightly better than “Most Cars”
“TC-1” did the best

9 Results: “Complex” “Adaline” did the worst “TC-1” did the best

10 What I Wish Was Different
Infrastructure Inputs and rewards are all discrete Seems like the network would do better with access to the light configurations Rewards It would be nice to give rewards for no waiting Network Arguably a multi-layer network could perform better

11 Demo Time


Download ppt "Using Adalines to Approximate Q-functions in Reinforcement Learning"

Similar presentations


Ads by Google