MIT Artificial Intelligence Laboratory — Research Directions Intelligent Agents that Learn Leslie Pack Kaelbling
MIT Artificial Intelligence Laboratory — Research Directions Making Reinforcement Learning Really Work Typical RL methods require far too much data to be practical in an online setting. Address the problem by –strong generalization techniques –using human input to bootstrap Let humans do what they’re good at Let learning algorithms do what they’re good at
MIT Artificial Intelligence Laboratory — Research Directions Incorporating Human Input Humans can help, even if they are bad at the task –Human provides initial trajectories –No attempt is made to learn to reproduce the trajectories –Reinforcement learning takes place in parallel –Once learned policy is good, use it
MIT Artificial Intelligence Laboratory — Research Directions Learning Phase One Learning System Supplied Control Policy Environment ARO
MIT Artificial Intelligence Laboratory — Research Directions Learning Phase Two Learning System Supplied Control Policy Environment ARO
MIT Artificial Intelligence Laboratory — Research Directions Early Results: Corridor Following
MIT Artificial Intelligence Laboratory — Research Directions Corridor-Following 3 continuous state dimensions –corridor angle –offset from middle –distance to end of corridor 1 continuous action dimension –rotation velocity Supplied example policy – Average 110 steps to goal
MIT Artificial Intelligence Laboratory — Research Directions Experimental Set-Up –Initial training runs start from roughly the middle of the corridor –Translation speed has a fixed policy –Evaluation on a number of set starting points –Reward »10 at end of corridor »0 everywhere else
MIT Artificial Intelligence Laboratory — Research Directions Corridor-Following “Best” possible Average training Phase 1Phase 2
MIT Artificial Intelligence Laboratory — Research Directions Corridor Following: Initial Policy
MIT Artificial Intelligence Laboratory — Research Directions Corridor Following: After Phase 1