Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Reinforcement Learning for a Traffic Engineering Application Mark D. Pendrith DaimlerChrysler Research & Technology Center Presented by: Christina.

Similar presentations

Presentation on theme: "Distributed Reinforcement Learning for a Traffic Engineering Application Mark D. Pendrith DaimlerChrysler Research & Technology Center Presented by: Christina."— Presentation transcript:

1 Distributed Reinforcement Learning for a Traffic Engineering Application Mark D. Pendrith DaimlerChrysler Research & Technology Center Presented by: Christina Schweikert

2 Distributed Reinforcement Learning for Traffic Engineering Problem  Intelligent Cruise Control System  Lane change advisory system based on traffic patterns  Optimize a group policy by maximizing freeway utilization as shared resource  Introduce 2 new algorithms (Monte Carlo-based Piecewise Policy Iteration, Multi-Agent Distributed Q-learning) and compare their performance in this domain

3 Distronic Adaptive Cruise Control

4  Signals from radar sensor, which scans the full width of a three-lane motorway over a distance of approximately 100m and recognizes any moving vehicles ahead  Reflection of the radar impulses and the change in their frequency enables the system to calculate the correct distance and the relative speed between the vehicles

5 Distronic Adaptive Cruise Control  Distance to vehicle in front reduces - cruise control system immediately reduces acceleration or, if necessary, applies the brake  Distance increases – acts as conventional cruise control system and, at speeds of between 30 and 180 km/h, will maintain the desired speed as programmed  Driver is alerted of emergencies

6 Distronic Adaptive Cruise Control  Automatically maintains a constant distance to the vehicle in front of it, prevent rear-end collisions  Reaction time of drivers using Distronic is up to 40 per cent faster than that of those without this assistance system

7 Distributed Reinforcement Learning  State – agents within sensing range  Agents share a partially observable environment  Goal - Integrate agents’ experiences to learn an observation-based policy that maximizes group performance  Agents share a common policy, giving a homogeneous population of agents

8 Traffic Engineering Problem  Population of cars, each with a desired traveling speed, sharing a freeway network  Subpopulation with radar capability to detect relative speeds and distances of cars immediately ahead, behind, and around them

9 Problem Formulation  Optimize average per time-step reward, by minimizing the per-car average loss at each time step v d (i) desired speed of car i v a (i) actual speed of car i n number of cars in simulation at time-step

10 State Representation  View of the world for each car represented by 8-d feature vector – relative distances and speeds of surrounding cars ALACAR CLCarCR BLBCBR

11 Pattern of Cars in Front of Agent ALACAR  0 – lane is clear (no car in radar range or nearest car is faster than agent’s desired speed)  1 – fastest car less than desired speed  2 – slower  3 - still slower

12 Pattern of Cars Behind Agent ALACAR  0 – lane is clear (no car in radar range or nearest car is slower than agent’s current speed)  1 – slowest car faster than desired speed  2 – faster  3 - still faster

13 Lane Change CLCARCR  0 – lane change not valid  1 – lane change valid If there is not a safe gap in front and behind, land change is illegal.

14 Monte Carlo-based Piecewise Policy Iteration  Performs approximate piecewise policy iteration where possible policy changes for each state are evaluated by Monte Carlo estimation  Piecewise - Policy for each state is changed one at a time, rather than in parallel  Searches the space of deterministic policies directly without representing the value function

15 Policy Iteration  Start with arbitrary deterministic policy for given MDP  Generate better policy by calculating best single improvement in policy possible for each state (MC)  Combine all changes to generate successor policy  Continue until no improvement is possible – optimal policy

16 Multi-Agent Distributed Q-Learning Q-Learning  Q-value estimates updated after each time step based on state transition after action is selected  For each time step, only one state transition and one action used to update Q-value estimates  In DQL, there can be as many state transitions per time step as there are agents

17 Multi-Agent Distributed Q-Learning  Takes the average backup value for a state/action pair over all agents that selected action a from state s at the last time step  Q max component of backup value is calculated over actions valid for a particular agent to select at the next time-step

18 Simulation for Offline Learning Advantages: o Since true state of the environment is known, can directly measure loss metric o Can be run faster, many long learning trials o Safety Learn policies offline then integrate into intelligent cruise control system with lane advisory, route planning, etc.

19 Traffic Simulation Specifications  Circular 3 lane freeway 13.3 miles long with 200 cars  Half follow “selfish drone” policy  Rest follow current learnt policy and active exploration decisions  Gaussian distribution of desired speeds, mean of 60 mph  Cars have low level collision avoidance, differ in lane change strategy

20 Experimental Results  Selfish drone policy – consistent per- step reward of -11.9 (each agent traveling 11.9 below desired speed)  APPIA and DQL found policies 3-5% better  Best policies with “look ahead” only  “look behind” model provided more stable learning  “look behind” outperforms “look ahead” at times when good policy is lost

Download ppt "Distributed Reinforcement Learning for a Traffic Engineering Application Mark D. Pendrith DaimlerChrysler Research & Technology Center Presented by: Christina."

Similar presentations

Ads by Google