© 2009 Warren B. Powell 1. Optimal Learning for Homeland Security CCICADA Workshop Morgan State, Baltimore, Md. March 7, 2010 Warren Powell With research.

Slides:

Advertisements

Similar presentations

Reinforcement learning

Advertisements

Partially Observable Markov Decision Process (POMDP)

CROWN “Thales” project Optimal ContRol of self-Organized Wireless Networks WP1 Understanding and influencing uncoordinated interactions of autonomic wireless.

Dynamic Bayesian Networks (DBNs)

© 2008 Warren B. Powell 1. Optimal Learning Informs TutORials October, 2008 Warren Powell Peter Frazier Princeton University © 2008 Warren B. Powell, Princeton.

Computational Stochastic Optimization:

Dynamic Routing Scalable Infrastructure Workshop, AfNOG2008.

Planning under Uncertainty

Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

© 2003 Warren B. Powell Slide 1 Approximate Dynamic Programming for High Dimensional Resource Allocation NSF Electric Power workshop November 3, 2003 Warren.

Approximate Dynamic Programming for High-Dimensional Asset Allocation Ohio State April 16, 2004 Warren Powell CASTLE Laboratory Princeton University

Nov 14 th  Homework 4 due  Project 4 due 11/26.

May 14, Organization Design and Dynamic Resources Huzaifa Zafar Computer Science Department University of Massachusetts, Amherst.

An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell

1 © 1998 HRL Laboratories, LLC. All Rights Reserved Development of Bayesian Diagnostic Models Using Troubleshooting Flow Diagrams K. Wojtek Przytula: HRL.

9/23. Announcements Homework 1 returned today (Avg 27.8; highest 37) –Homework 2 due Thursday Homework 3 socket to open today Project 1 due Tuesday –A.

A Decentralised Coordination Algorithm for Mobile Sensors School of Electronics and Computer Science University of Southampton {rs06r2, fmdf08r, acr,

Slide 1 Tutorial: Optimal Learning in the Laboratory Sciences Working with nonlinear belief models December 10, 2014 Warren B. Powell Kris Reyes Si Chen.

Slide 1 Tutorial: Optimal Learning in the Laboratory Sciences The power of interactivity December 10, 2014 Warren B. Powell Kris Reyes Si Chen Princeton.

Group 6: Paul Antonios, Tamara Dabbas, Justin Fung, Adib Ghawi, Nazli Guran, Donald McKinnon, Alara Tascioglu Quantitative Capacity Building for Emergency.

Slide 1 Tutorial: Optimal Learning in the Laboratory Sciences Building a belief model December 10, 2014 Warren B. Powell Kris Reyes Si Chen Princeton University.

Decentralised Coordination of Mobile Sensors School of Electronics and Computer Science University of Southampton Ruben Stranders,

S305 – Network Infrastructure Chapter 5 Network and Transport Layers.

Instructor: Vincent Conitzer

Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University

1 Chalermek Intanagonwiwat (USC/ISI) Ramesh Govindan (USC/ISI) Deborah Estrin (USC/ISI and UCLA) DARPA Sponsored SCADDS project Directed Diffusion

1 Pertemuan 20 Teknik Routing Matakuliah: H0174/Jaringan Komputer Tahun: 2006 Versi: 1/0.

REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.

Slide 1 Tutorial: Optimal Learning in the Laboratory Sciences The knowledge gradient December 10, 2014 Warren B. Powell Kris Reyes Si Chen Princeton University.

TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.

INFORMS Annual Meeting San Diego 1 HIERARCHICAL KNOWLEDGE GRADIENT FOR SEQUENTIAL SAMPLING Martijn Mes Department of Operational Methods for.

Slide 1 Tutorial: Optimal Learning in the Laboratory Sciences Overview December 10, 2014 Warren B. Powell Kris Reyes Si Chen Princeton University

Models in I.E. Lectures Introduction to Optimization Models: Shortest Paths.

Figure Routers in an Internet.

Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University

VEhicle COntrol and Navigation) An After Seismic Disaster Application VECON (VEhicle COntrol and Navigation) An After Seismic Disaster Application Laboratory.

Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.

Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking Presented by Shihao Ji Duke University Machine Learning.

Computer Vision Lecture 6. Probabilistic Methods in Segmentation.

© 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.

Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.

INFORMS Annual Meeting Austin1 Learning in Approximate Dynamic Programming for Managing a Multi- Attribute Driver Martijn Mes Department of Operational.

CPS 570: Artificial Intelligence Markov decision processes, POMDPs

CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

Slide 1 Tutorial: Optimal Learning in the Laboratory Sciences Searching a two-dimensional surface December 10, 2014 Warren B. Powell Kris Reyes Si Chen.

COMP 2208 Dr. Long Tran-Thanh University of Southampton Reinforcement Learning.

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.

Smart Sleeping Policies for Wireless Sensor Networks Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign.

Planning Under Uncertainty. Sensing error Partial observability Unpredictable dynamics Other agents.

Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia

Application of Dynamic Programming to Optimal Learning Problems Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial.

Sequential Off-line Learning with Knowledge Gradients Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial Engineering.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.

Modelling & Simulation of Semiconductor Devices Lecture 1 & 2 Introduction to Modelling & Simulation.

© 2008 Warren B. Powell 1. Optimal Learning Informs TutORials October, 2008 Warren Powell Peter Frazier With research by Ilya Ryzhov Princeton University.

Slide 1 Tutorial: Optimal Learning in the Laboratory Sciences The value of information December 10, 2014 Warren B. Powell Kris Reyes Si Chen Princeton.

16장. Networking and Internetworking Devices

Introduction to Wireless Sensor Networks

CS b659: Intelligent Robotics

POMDPs Logistics Outline No class Wed

Unit 4: Dynamic Programming

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Instructor: Vincent Conitzer

CS 416 Artificial Intelligence

Reinforcement Learning Dealing with Partial Observability

Presentation transcript:

© 2009 Warren B. Powell 1. Optimal Learning for Homeland Security CCICADA Workshop Morgan State, Baltimore, Md. March 7, 2010 Warren Powell With research by Peter Frazier Ilya Ryzhov Warren Scott Princeton University © 2010 Warren B. Powell, Princeton University

Applications Challenges: »What is the best policy for balancing cost and reliability for testing people and cargo for dangerous materials? »Where should we sample water to detect for possible tampering? »How should we collect information about disease in the population to plan a response to possible bioterrorism? »What is the most effective way of testing new materials to detect explosives, or design new batteries for portable devices?

3 Applications Optimizing response to an emergency through Manhattan »Need to collect information quickly about the state of the network. »May collect information about current delays by accessing GPS devices

Applications Where is the center of a radiation source?

Optimal learning Challenge »How do we collect information to improve our ability to make choices in the future? »We need to balance the cost of the measurement against the value of the knowledge earned No improvement New solution

Information collection on a graph The knowledge gradient »Maximize the marginal value of a measurement: »“x” can be: A policy for evaluating people and cargo A decision to sample part of the population for a disease A test of our waterways for toxins A molecular compound to create a new material for sensing explosives What we know about the value of each choice Current optimization problem based on what we know Updated costs after measurement Decision problem after the update Expected value of updated problem

7 Information collection on a graph The knowledge gradient for discrete alternatives and independent, normally-distributed beliefs: »where »And »…very easy to compute. We have recently derived KG formulas for non-Gaussian situations.

The knowledge gradient Can be applied to a variety of major problem classes »Problems with correlated beliefs Testing one technology/policy/material/location tells us about other alternatives which have not been tested. »On-line and off-line problems On-line – learn as you go (e.g. comparing policies for testing cargo) Off-line – Evaluating technologies or materials in a laboratory »Learning on graphs Which links should we learn more about to have the greatest impact on finding the best shortest path? »Finding the best setting of a vector of continuous parameters Finding the best design for a device.

© 2009 Warren B. Powell 9© 2008 Warren B. Powell Slide 9 Outline Correlated beliefs

measure here these beliefs change too. The correlated knowledge gradient (CKG) The power of the knowledge gradient concept is that it is very general. In particular, we can handle problems where we learn about other choices from a single measurement:

With correlations Without correlations Optimal measuring with correlations Measurement one point tells us about neighboring points »Measuring radiation in the air or water at one location provides information about other locations. »Evaluating the performance of one nuclear detector provides information about others using same technology. Correlated knowledge gradient procedure Chooses measurements based in part on what we learn about other potential measurements. A few measurements allows us to update knowledge about everything. Requires dramatically fewer measurements.

Optimal learning in physical sciences Materials research »How do we find the best material for converting sunlight to electricity? »What is the best battery design for storing energy? »We need a method to sort through potentially thousands of experiments.

Drug discovery Designing molecules »X and Y are sites where we can hang substituents to change the behavior of the molecule

Drug discovery We express our belief using a linear, additive QSAR model »

Drug discovery Compact representation on 10,000 combination compound »Results from 15 sample paths Performance under best possible Number of molecules tested

Drug discovery Single sample path on molecule with 87,120 combinations Performance under best possible Number of molecules tested

© 2009 Warren B. Powell 17© 2008 Warren B. Powell Slide 17 Outline The knowledge gradient for on-line applications

KG for on-line learning problems Knowledge gradient policy »For off-line problems: »For finite-horizon on-line problems: »For infinite-horizon discounted problems: Compare to Gittins indices for on-line (bandit) problems Gittins indices are optimal for infinite horizon problems, but they are hard to compute, and cannot handle correlated beliefs.

KG for on-line learning problems On-line KG vs. Gittins On-line KG slightly outperforms Gittins. On-line KG slightly underperforms Gittins Number of measurements

KG for on-line learning problems KG versus Gittins indices for multiarmed bandit problems »Gittins indices are provably optimal…. »… but computing them is hard. »Chick and Gans (2009) has developed a simple and accurate approximation. Informative prior Improvement of KG over Gittins Uninformative prior

© 2009 Warren B. Powell 21© 2008 Warren B. Powell Slide 21 Outline Learning on a graph

© 2009 Warren B. Powell 22 Applications Figure out Manhattan: »Walking »Subway/walking »Taxi »Street bus »Driving

© 2009 Warren B. Powell 23 Information collection on a graph Optimal routing over a graph:

© 2009 Warren B. Powell 24 Information collection on a graph Optimal routing over a graph »The shortest path

© 2009 Warren B. Powell 25 Information collection on a graph Optimal routing over a graph »The shortest path »Evaluating a link

© 2009 Warren B. Powell 26 Information collection on a graph Optimal routing over a graph »The shortest path »Evaluating a link »Now we have a new shortest path »How do we decide which links to measure?

27 Information collection on a graph The knowledge gradient on a graph »When we had finite alternatives, we had to compute »For problems on graphs, we have to compute Value of best path that includes link (i,j) Value of best path that does not include link (i,j)

Experimental results Ten layered graphs (22 nodes, 50 edges) Ten larger layered graphs (38 nodes, 102 edges)

© 2009 Warren B. Powell 29© 2008 Warren B. Powell Slide 29 Outline Learning continuous surfaces

Finding the hot spot Imagine that we detect nuclear radiation in Manhattan, but we need to find the epicenter. How do we collect information to find this as quickly as possible?

Initially we think the concentration is the same everywhere: »We want to measure the value where the knowledge gradient is the highest. This is the measurement that teaches us the most. Measuring two-dimensional surfaces Estimated concentrationKnowledge gradient

After four measurements: »Whenever we measure at a point, the value of another measurement at the same point goes down. The knowledge gradient guides us to measuring areas of high uncertainty. Measuring two-dimensional surfaces Measurement Value of another measurement at same location. Estimated concentrationKnowledge gradient New optimum

Measuring two-dimensional surfaces After five measurements: Estimated concentrationKnowledge gradient

Measuring two-dimensional surfaces After six samples Estimated concentrationKnowledge gradient

Measuring two-dimensional surfaces After seven samples Estimated concentrationKnowledge gradient

Measuring two-dimensional surfaces After eight samples Estimated concentrationKnowledge gradient

Measuring two-dimensional surfaces After nine samples Estimated concentrationKnowledge gradient

Measuring two-dimensional surfaces After ten samples Estimated concentrationKnowledge gradient

After 10 measurements, our estimate of the surface: Measuring two-dimensional surfaces Estimated concentrationTrue concentration

Measuring multidimensional surfaces Extending to multidimensional surfaces »Challenge: the knowledge gradient surface is nonconcave: »Animation of continuous KG using KG approximationAnimation of continuous KG using KG approximation

Measuring multidimensional surfaces Extending to multidimensional surfaces »Challenge: the knowledge gradient surface is nonconcave: »Animation of continuous KG using KG approximationAnimation of continuous KG using KG approximation

© 2009 Warren B. Powell 42© 2008 Warren B. Powell Slide 42 Outline Learning with a physical state

Managing a physical sensor What if we have to move a physical entity (person, vehicle) around the city to make observations? »This produces a problem with both a physical state (the location of the sensor) and a belief state (what we know about the surface). »Produces partially observable Markov decision process (POMDP’s). Algorithms for this problem class are limited to very small problems.

Managing a physical sensor We recently adapted the knowledge gradient to this problem class. »Classical dynamic programming without learning: »Dynamic programming with learning: Central insight – learning something about the value of being in state tells us something about the value of a random future state as a result of correlated beliefs. Value of physical movement Value of information

The Knowledge Gradient Calculator Spreadsheet interface to Java-based library

© 2009 Warren B. Powell 47