Policy Compression for MDPs

Slides:



Advertisements
Similar presentations
EE-M /7: IS L7&8 1/24, v3.0 Lectures 7&8: Non-linear Classification and Regression using Layered Perceptrons Dr Martin Brown Room: E1k
Advertisements

Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
Neural networks Introduction Fitting neural networks
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
DARPA Mobile Autonomous Robot SoftwareMay Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial.
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University
Artificial Neural Networks
Biointelligence Laboratory, Seoul National University
Integrating Neural Network and Genetic Algorithm to Solve Function Approximation Combined with Optimization Problem Term presentation for CSC7333 Machine.
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
Classification / Regression Neural Networks 2
Non-Bayes classifiers. Linear discriminants, neural networks.
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
Chapter 18 Connectionist Models
Chapter 6 Neural Network.
Dimensions of Neural Networks Ali Akbar Darabi Ghassem Mirroshandel Hootan Nokhost.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
Energy System Control with Deep Neural Networks
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Today’s Lecture Neural networks Training
Machine Learning Supervised Learning Classification and Regression
Neural networks and support vector machines
Welcome deep loria !.
Reinforcement Learning
Analysis of Sparse Convolutional Neural Networks
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
Deep Feedforward Networks
Data Mining, Neural Network and Genetic Programming
Environment Generation with GANs
Summary of “Efficient Deep Learning for Stereo Matching”
The Gradient Descent Algorithm
Learning with Perceptrons and Neural Networks
Classification: Logistic Regression
COMP24111: Machine Learning and Optimisation
Matt Gormley Lecture 16 October 24, 2016
Deep reinforcement learning
Neural Networks CS 446 Machine Learning.
Classification with Perceptrons Reading:
Basic machine learning background with Python scikit-learn
Student: Hao Xu, ECE Department
Neural Networks and Backpropagation
Classification / Regression Neural Networks 2
"Playing Atari with deep reinforcement learning."
Machine Learning Today: Reading: Maria Florina Balcan
Neural Networks Advantages Criticism
CS 4501: Introduction to Computer Vision Training Neural Networks II
Introduction to Deep Learning with Keras
Neural Networks Geoff Hulten.
Deep Learning for Non-Linear Control
Lecture Notes for Chapter 4 Artificial Neural Networks
Neural networks (1) Traditional multi-layer perceptrons
Backpropagation David Kauchak CS159 – Fall 2019.
Machine Learning Perceptron: Linearly Separable Supervised Learning
David Kauchak CS158 – Spring 2019
Introduction to Neural Networks
Deep Learning Libraries
Image recognition.
LHC beam mode classification
Outline Announcement Neural networks Perceptrons - continued
Overall Introduction for the Lecture
Presentation transcript:

Policy Compression for MDPs AA 228 December 5th, 2016 Kyle Julian

Motivation MDPs can solve a variety of problems Many MDP solutions result in large lookup tables Implementing an MDP solution on limited hardware might be intractable Need to compress solution without loss of performance

Outline Policy Compression for Aircraft Collision Avoidance Systems Problem formulation Neural network compression Results Neural Network Guidance for UAVs Neural Network compression Conclusions

Background – ACAS Xu ACAS X: Aircraft collision avoidance system optimized through Markov decision processes ACAS Xu: UAV version of ACAS X Horizontal advisories Seven discretized state dimensions 𝜌, 𝜃, 𝜓, 𝑣 𝑜𝑤𝑛 , 𝑣 𝑖𝑛𝑡 , 𝜏, 𝑎 𝑝𝑟𝑒𝑣 120 million possible states Five possible actions (heading rates in deg/s) 𝑎: ±3, ±1.5, Clear-of-Conflict (COC) MDP solution: table of score values (Q) for each state-action pair Scores represent costs of taking that action for a given state System takes actions with lowest cost M. J. Kochenderfer and J. P. Chryssanthacopoulos, “Robust airborne collision avoidance through dynamic programming,” Massachusetts Institute of Technology, Lincoln Laboratory, Project Report ATC-371, 2011.

Problem Formulation Seven dimensional table has 600 million Q values 2.4 GB of floats Too large for many certified avionics systems Must compress table Neural Network Compression

Neural Network Compression - Overview Neural Network Representation 𝑄 =𝑓(𝑠𝑡𝑎𝑡𝑒) Table Representation Input: State Variables Output: Score Estimates Only parameters of network need to be stored, reducing required storage

Neural Network Compression – Neural Networks 2) Forward Pass Feed inputs and compute outputs 1) Initialize Weights Random, Gaussian 5) Update Weights Repeat process 3) Loss Function Error between network output and truth 4) Back-propagate Error Gradient descent methods

Neural Network Compression - Key Decisions State Variables Size of network: Total parameters ~600,000 More than 6 hidden layers gives no extra benefit Optimizer Tried five different optimizers AdaMax performed best Fully Connected Layer: 128x7 Activation: ReLU Fully Connected Layer: 512x128 Activation: ReLU Fully Connected Layer: 512x512 Activation: ReLU Fully Connected Layer: 128x512 Activation: ReLU Fully Connected Layer: 128x128 Activation: ReLU Fully Connected Layer: 128x128 Activation: ReLU Output Layer: 5x128 Q Values

Neural Network Compression - Loss Function Need accurate Q estimations while maintaining optimal actions MSE: Fails to maintain optimal actions Categorical cross-entropy: Fails to maintain Q values Solution: Asymmetric MSE Encourages separation between optimal action and suboptimal actions

Neural Network Compression – Implementation Implemented in Python using Keras* with Theano† Trained on TITAN X GPUs Training data is normalized and shuffled Batch size: 216 Trained for 1200 training epochs Requires 4 days * F. Chollet. (2016). Keras: Deep learning library for Theano and TensorFlow, [Online]. Available: keras.io † Theano Development Team. (2016). Theano: A Python framework for fast computation of mathematical expressions

Results – Policy Plots Top down view of 90 ∘ encounter 𝜏=0 𝑠𝑒𝑐 𝑎 𝑝𝑟𝑒𝑣 = -1.5 Nearest neighbor interpolation Nearest neighbor interpolation of MDP table Neural network is a smooth representation

Results – Policy Plots Top down view of head-on encounter 𝜏=60 𝑠𝑒𝑐 𝑎 𝑝𝑟𝑒𝑣 =𝐶𝑂𝐶 Neural network represents original table well

Results - Simulation Simulated on set of 1.5 million encounters p(NMAC): Probability of a Near Midair Collision p(Alert): Probability the system will give an alert p(Reversal): Probability of reversing advisory direction

Results - Example Encounter Network does not need to interpolate Q values Neural network alerts earlier than table Small difference grows larger over time Able to avoid intruder aircraft

Neural Network Guidance for UAVs - Background Want to navigate to a waypoint and have some heading and bank angle when you arrive Five discretized state dimensions 𝜌, 𝜃, 𝜓, 𝑣, 𝜙 Reduce redundancy in states Two possible actions Δ𝜙=± 5 ∘ Reward: -1 if not at waypoint with desired heading and bank angle, 0 otherwise Transitions: Assume steady-level flight Take action every 10Hz Propagate position assuming Gaussian noise in velocity and bank angle. If UAV is at the waypoint with desired heading and bank angle, don’t move Solution: 26 million state-action values M. J. Kochenderfer and J. P. Chryssanthacopoulos, “Robust airborne collision avoidance through dynamic programming,” Massachusetts Institute of Technology, Lincoln Laboratory, Project Report ATC-371, 2011.

MDP Solution Can start from any position Smooth, flyable trajectories to the waypoint Complex trajectories are parameterized by waypoints

Problem Formulation Table requires 112 MB in memory 3DR Pixhawk has 256 KB of memory Really only 10 KB of memory Need compression of over 10,000 Train neural network to predict best action Classification, not regression

Neural Network Compression State Variables Size of network: Total parameters ~1400 ~5.6 KB in memory Tried different combinations of numbers of layers and layer sizes 4 hidden layers of 20 perceptrons each Cross-entropy loss Softmax converts outputs to probabilities Training labels are one-hot vectors [0,1] or [1,0] Optimizer: AdaMax Fully Connected Layer: 20x5 Activation: ReLU Fully Connected Layer: 20x20 Activation: ReLU Fully Connected Layer: 20x20 Activation: ReLU Fully Connected Layer: 20x20 Activation: ReLU Fully Connected Layer: 20x2 Softmax Output: 2 Probability that each action is optimal

Neural network policy matches original MDP policy very well Results – Policy Plots Neural network policy matches original MDP policy very well

Neural Network Trajectories Simulated trajectories of neural network are almost identical to MDP trajectories

Performance Implemented neural network guidance in custom UAV Flies well in calm or windy conditions Experimental flight in calm conditions is 1.3% slower than simulated flight

Conclusions Implementing MDP solutions in real systems may require compressed policies Neural networks can be trained to represent state-action values or policies Compression by factors of 1000+ without performance loss Neural networks can be incorporated within limited memory systems

Questions? Kyle Julian KJulian3@stanford.edu