Using Adalines to Approximate Q-functions in Reinforcement Learning

Slides:



Advertisements
Similar presentations
Decision Support and Artificial Intelligence Jack G. Zheng July 11 th 2005 MIS Chapter 4.
Advertisements

DARPA Mobile Autonomous Robot SoftwareMay Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial.
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
Descriptive Modelling: Simulation “Simulation is the process of designing a model of a real system and conducting experiments with this model for the purpose.
Human Computation Steven Emory CS 575 Human Issues in Computing.
Supervised Learning: Perceptrons and Backpropagation.
Adaptive Traffic Light Control with Wireless Sensor Networks Presented by Khaled Mohammed Ali Hassan.
Traffic Lights Specification Niek. Overview Traffic lights are used everywhere Various algorithms: ◦Simple time-based traffic lights ◦Pressure sensors-based.
Traffic Sign Recognition Using Artificial Neural Network Radi Bekker
11/10/ :53:59 AMweek12-3.ppt1 Intelligent Traffic Controller We want to use a finite state machine to control the traffic lights at an intersection.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
Appendix B: An Example of Back-propagation algorithm
Learning BlackJack with ANN (Aritificial Neural Network) Ip Kei Sam ID:
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Abstract: This paper describes a real life application of fuzzy logic: A Fuzzy Traffic Light Controller. The controller changes the cycle time of the light.
Genetic Algorithms and Neural Networks MIT Splash 2006 Jack Carrozzo.
Prioritizing and Goal Setting for Academic Success.
Simulation A simulation imitates a real situation. It should act as a predictor of what would actually happen in practice. It is a model in which experiments.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Dropout as a Bayesian Approximation
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Soft Computing methods for High frequency tradin.
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
Traffic Light Simulation Lynn Jepsen. Introduction and Background Try and find the most efficient way to move cars through an intersection at different.
Chapter 6 Neural Network.
Improving the left-turn flow at McKellips and Scottsdale Rds. IEE 545 Discrete Event Simulation December 6, 2011 Yousef Dashti Kevin O'Connor Serhan S.
A stop sign is a traffic sign that stands for coming to a complete stop at an intersection or end of the road.
Computer Systems Lab TJHSST Current Projects In-House, pt 2.
TRAFFIC LIGHT CONTROL PROGRESS REPORT YITIAN GU ADITI BHAUMICK VIPUL SINGH LIYAN SUN Professor Nicholas F. Maxemchuk.
Artificial Neural Networks By: Steve Kidos. Outline Artificial Neural Networks: An Introduction Frank Rosenblatt’s Perceptron Multi-layer Perceptron Dot.
Chapter 12 Case Studies Part B. Control System Design.
Bjtxzh bjtxzh.
Inductive model evolved from data
Driving in City Traffic
Reinforcement Learning
Fall 2004 Backpropagation CS478 - Machine Learning.
Deep Feedforward Networks
Debugging Intermittent Issues
Done Done Course Overview What is AI? What are the Major Challenges?
Towards Traffic Light Control through a Multiagent Cooperative System:
Analytics and OR DP- summary.
Debugging Intermittent Issues
Reinforcement learning (Chapter 21)
Policy Compression for MDPs
AV Autonomous Vehicles.
Harm van Seijen Bram Bakker Leon Kester TNO / UvA UvA
Training a Neural Network
Blinkers ++ Team 5.
network of simple neuron-like computing elements
Red lights, yellow lights, and green lights
Transportation Engineering Fancy intersections March 7, 2011
Neural Networks ICS 273A UC Irvine Instructor: Max Welling
Traffic Light Simulation
Traffic Light Simulation
Word2Vec.
Traffic Light Simulation
Erasmus Intensive Program
The number in each lane indicates
Adaptive Traffic Control
Design of Experiments CHM 585 Chapter 15.
Week 9 Farzain Majeed.
Definition: Characteristics Examples Additional Notes: Draw Examples:
EE 193/Comp 150 Computing with Biological Parts
Machine Learning.
A Deep Reinforcement Learning Approach to Traffic Management
Presentation transcript:

Using Adalines to Approximate Q-functions in Reinforcement Learning Steven Wyckoff December 6, 2006

The Problem Timing traffic lights for optimal traffic flow is hard It would be really nice if there was a good way to have the traffic lights learn the best timing

Green Light District “Intelligent Traffic Light Control” Wiering, van Veenen, Vreeken, Koopman www.cs.uu.nl Built a test-bed for traffic light controller algorithms Based on Reinforcement Learning

Green Light District TLController fills out a table with the ‘gains’ for each lane SimModel picks the best legal light configuration Cars are allowed to move (or not) and the TLController gets to listen in on their movement Repeat

Existing Algorithms Random Most Cars TC-1 GenNeural (And more) Totally random gains Most Cars Based on presence of at least one car TC-1 Real-Time Dynamic Programming Based on probabilities of progress / reward GenNeural Genetically evolve a 3-layer network Uses only traffic densities (And more)

My Algorithm Use a neural network instead of dynamic programming Good: Network can deal with continuous input Might be able to recognize traffic patterns that are not available using a table lookup Bad: Hard to tell what the network will learn Hard to figure out useful input Hard to tell what the ‘right’ output is for training

Pitfalls / Solutions Don’t know if we will be red or green Input Two adalines to predict reward if the light is red or green—gain is the difference Input (for each lane): number of cars, traffic density, is a given lane full Rewards Reward for cars moving, passing through intersections Shared reward for other lanes in the intersection

Results: “Split” “Adaline” did slightly better than “Most Cars” “TC-1” did the best

Results: “Complex” “Adaline” did the worst “TC-1” did the best

What I Wish Was Different Infrastructure Inputs and rewards are all discrete Seems like the network would do better with access to the light configurations Rewards It would be nice to give rewards for no waiting Network Arguably a multi-layer network could perform better

Demo Time