Download presentation
Presentation is loading. Please wait.
1
Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A AAAAAAAAAAAA [Some rights reserved unless otherwise noted; see http://tinyurl.com/2qn665]
2
c Nov 03, 2010ActionWebs Talk (J. Gillula)2 Talk Outline Current “State of the Art” Reinforcement learning and apprenticeship learning Reachability for guaranteed safe mode switching Motivation Goals – Combining Machine Learning and Control Theory Existing Approaches Current Research Extensions Conclusions Questions
3
c Nov 03, 2010ActionWebs Talk (J. Gillula)3 “An Application of Reinforcement Learning to Aerobatic Helicopter Flight” (Abbeel et al., 2007) Use linear regression to learn parameters of given model Use differential dynamic programming to solve the MDP Generate trajectory using current policy and nonlinear dynamics Compute new policy using LQR and linearized dynamics around that trajectory Reward function generated using apprenticeship learning [Video from Abbeel et. al. 2007] Analysis: Great performance No formal safety analysis Required some hand-tweaking for stability (e.g. hand-chosen reward weights) Easily generalizable
4
c Nov 03, 2010ActionWebs Talk (J. Gillula)4 “Design of Guaranteed Safe Maneuvers Using Reachable Sets…” (Gillula et al., 2010) Safe given accuracy of model and worst-case disturbances Used reachability analysis via level-set methods to design and perform a safe backflip
5
c Nov 03, 2010ActionWebs Talk (J. Gillula)5 “A…Hamilton–Jacobi Formulation of Reachable Sets for Continuous Dynamic Games” (Mitchell et al., 2005) Create a level set function such that: Boundary of keep-out set K is defined implicitly by is negative inside region and positive outside Reachability as game: Disturbance attempts to force system into unsafe region, control attempts to stay safe Solution can be found via Hamilton-Jacobi-Bellman PDE: [Figure from Tomlin 2009]
6
c Nov 03, 2010ActionWebs Talk (J. Gillula)6 “Design of Guaranteed Safe Maneuvers Using Reachable Sets…” (Gillula et al., 2010) Recovery Drift Impulse Analysis: Decent performance Formal safety analysis Required human input for choosing design parameters Difficult to generalize
7
c Nov 03, 2010ActionWebs Talk (J. Gillula)7 Motivation: “Machine Learning” Techniques vs. “Control Theory” Techniques Abbeel et al., 2007Gillula et al., 2010 Modeling & System ID Based on data (could be nonparametric) Based on physics Planning & Control Based on data (could be sampling-based) Based on heuristic reward functions (or physics) Types of Guarantees Proofs of convergence for learning algorithms Safety and robustness guarantees for system performance Summary Data-Driven Convergence Focused Physics-Driven Safety Focused
8
c Nov 03, 2010ActionWebs Talk (J. Gillula)8 Goals/Research Statement How can we get high-performance on complicated systems while still guaranteeing safety Take advantage of “Machine Learning” techniques for performance Data-driven models (potentially nonparametric) Data-driven, sampling-based techniques for estimation and control While getting “Control Theory”-style safety guarantees Formal, principled analyses of safety Several Possible Approaches Adapt data-driven methods to existing safety-analysis techniques Closely couple data-driven methods with techniques for generating safety guarantees Use data-driven techniques in the context of existing safety- analysis techniques Other alternatives
9
c Nov 03, 2010ActionWebs Talk (J. Gillula)9 Talk Outline Current “State of the Art” Reinforcement learning and apprenticeship learning Reachability for guaranteed safe mode switching Motivation Goals – Combining Machine Learning and Control Theory Existing Approaches Current Research Extensions Conclusions Questions
10
c Nov 03, 2010ActionWebs Talk (J. Gillula)10 “System Identification of Post Stall Aerodynamics for UAV Perching” (Hoburg and Tedrake, 2009) Nonlinear and transient aerodynamics in perching Need to learn model from data Use physically-inspired basis functions Nonlinear functions of state x, z, µ, etc. Compute least-squares fit for every combination of n basis functions: [Figures from Hoburg and Tedrake 2009] Adapt data-driven methods to existing safety-analysis techniques
11
c Nov 03, 2010ActionWebs Talk (J. Gillula)11 “System Identification of Post Stall Aerodynamics for UAV Perching” (Hoburg and Tedrake, 2009) Nonlinear and transient aerodynamics in perching Need to learn model from data Use physically-inspired basis functions Nonlinear functions of state x, z, µ, etc. Compute least-squares fit for every combination of n basis functions: [Figures from Hoburg and Tedrake 2009] Analysis/Extensions: Use standard control theory techniques to generate safety guarantees Use lasso or other regularization to choose basis functions Adapt data-driven methods to existing safety-analysis techniques
12
c Nov 03, 2010ActionWebs Talk (J. Gillula)12 “Predictive Guidance Intercept Using The Neural EKF Tracker” (Stubberud and Kramer, 2007) Augmented process model is: Use an adaptive EKF to learn the error: Let augmented state be: Then: NN weights Closely couple data-driven methods with techniques for generating safety guarantees
13
c Nov 03, 2010ActionWebs Talk (J. Gillula)13 “Predictive Guidance Intercept Using The Neural EKF Tracker” (Stubberud and Kramer, 2007) Then associated Jacobian is: so state estimation and NN training are coupled Normal EKF analysis follows Analysis: Learns model error Learning done online But combining ML and control theory tools can be tricky E.g. augmented system is not observable Closely couple data-driven methods with techniques for generating safety guarantees
14
c Nov 03, 2010ActionWebs Talk (J. Gillula)14 Talk Outline Current “State of the Art” Reinforcement learning and apprenticeship learning Reachability for guaranteed safe mode switching Motivation Goals – Combining Machine Learning and Control Theory Existing Approaches Current Research Extensions Conclusions Questions
15
c Nov 03, 2010ActionWebs Talk (J. Gillula)15 Safely Learning A Bounded System Learning unknown dynamics of a target vehicle via observation Limited field of view Safety = always keeping target in view, i.e. Bounded system Assume target dynamics are autonomous and bounded, i.e. Measurement model given by: [Pioneer image courtesy University of Queensland, http://tinyurl.com/38dje6f] Use data-driven techniques in the context of existing safety-analysis techniques
16
c Nov 03, 2010ActionWebs Talk (J. Gillula)16 Safely Learning A Bounded System Problem statement 1) Learn target dynamics 2) Minimize error: 3) Maintain target in view: For (1) use machine learning: Fixed model w/linear regression Physically inspired basis functions Neural network (1) leads to (2) via EKF, UKF, or PF (3) requires controlling our vehicle’s position and height [Pioneer image courtesy University of Queensland, http://tinyurl.com/38dje6f] Use data-driven techniques in the context of existing safety-analysis techniques
17
c Nov 03, 2010ActionWebs Talk (J. Gillula)17 Safely Learning A Bounded System For (3) use reachability: Unsafe set Treat target motion as adversarial disturbance Augmented system dynamics: Result: Can use any learning/tracking algorithm Reachability only kicks in on border of unsafe sets [Pioneer image courtesy University of Queensland, http://tinyurl.com/38dje6f] Use data-driven techniques in the context of existing safety-analysis techniques
18
c Nov 03, 2010ActionWebs Talk (J. Gillula)18 Caveat What follows is pure brainstorming Feedback and suggestions are welcome
19
c Nov 03, 2010ActionWebs Talk (J. Gillula)19 Safely Learning A Bounded System Possible extension: safe autonomous data collection/learning Attempt to learn/modify building model (or control policy) online Start w/basic physics model (or control policy) Assume bounded errors as disturbance Reachability enables following any exploration policies when safe Use data-driven techniques in the context of existing safety-analysis techniques [Image courtesy Jorge Ortiz, http://tinyurl.com/2dnz5jl]
20
c Nov 03, 2010ActionWebs Talk (J. Gillula)20 Safely Learning A Bounded System Limited acceptable range Safety = always keeping target states within acceptable tolerances, i.e. Bounded system Assume target dynamics are bounded, i.e. Problem statement 1) Learn system dynamics 2) Minimize error: 3) Maintain target states in safe region: Use data-driven techniques in the context of existing safety-analysis techniques [Image courtesy Jorge Ortiz, http://tinyurl.com/2dnz5jl] Proposed Approach 1) Use machine learning 2) Use the results of (1) with optimal control 3) Use reachability
21
c Nov 03, 2010ActionWebs Talk (J. Gillula)21 Safely Learning A Bounded System ActionWeb Difficulties: Reachable set calculations for high dimensions And they need to be online Use data-driven techniques in the context of existing safety-analysis techniques [Image courtesy David Culler, http://tinyurl.com/2bcaqnh]
22
c Nov 03, 2010ActionWebs Talk (J. Gillula)22 Safely Learning A Bounded System ActionWeb Solution: Building decomposition Decompose building into separate rooms Model each room in parallel Treat interactions between rooms as bounded adversarial inputs Still fits in machine learning framework (can still model interactions) Still fits in reachability framework (can still calculate safe sets) Use data-driven techniques in the context of existing safety-analysis techniques [Image courtesy Claire Tomlin, http://tinyurl.com/26bpcl8]
23
c Nov 03, 2010ActionWebs Talk (J. Gillula)23 Conclusions Possible Approaches: Adapt data-driven methods to existing safety-analysis techniques Closely couple data-driven methods with techniques for generating safety guarantees Use data-driven techniques in the context of existing safety- analysis techniques Extension to smart buildings and ActionWebs Combining Machine Learning and Control Theory Achieving high-performance on complicated systems while still guaranteeing safety NN weights
24
c Nov 03, 2010ActionWebs Talk (J. Gillula)24 Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.