© 2009 Warren B. Powell 1. Optimal Learning for Homeland Security CCICADA Workshop Morgan State, Baltimore, Md. March 7, 2010 Warren Powell With research by Peter Frazier Ilya Ryzhov Warren Scott Princeton University © 2010 Warren B. Powell, Princeton University
Applications Challenges: »What is the best policy for balancing cost and reliability for testing people and cargo for dangerous materials? »Where should we sample water to detect for possible tampering? »How should we collect information about disease in the population to plan a response to possible bioterrorism? »What is the most effective way of testing new materials to detect explosives, or design new batteries for portable devices?
3 Applications Optimizing response to an emergency through Manhattan »Need to collect information quickly about the state of the network. »May collect information about current delays by accessing GPS devices
Applications Where is the center of a radiation source?
Optimal learning Challenge »How do we collect information to improve our ability to make choices in the future? »We need to balance the cost of the measurement against the value of the knowledge earned No improvement New solution
Information collection on a graph The knowledge gradient »Maximize the marginal value of a measurement: »“x” can be: A policy for evaluating people and cargo A decision to sample part of the population for a disease A test of our waterways for toxins A molecular compound to create a new material for sensing explosives What we know about the value of each choice Current optimization problem based on what we know Updated costs after measurement Decision problem after the update Expected value of updated problem
7 Information collection on a graph The knowledge gradient for discrete alternatives and independent, normally-distributed beliefs: »where »And »…very easy to compute. We have recently derived KG formulas for non-Gaussian situations.
The knowledge gradient Can be applied to a variety of major problem classes »Problems with correlated beliefs Testing one technology/policy/material/location tells us about other alternatives which have not been tested. »On-line and off-line problems On-line – learn as you go (e.g. comparing policies for testing cargo) Off-line – Evaluating technologies or materials in a laboratory »Learning on graphs Which links should we learn more about to have the greatest impact on finding the best shortest path? »Finding the best setting of a vector of continuous parameters Finding the best design for a device.
© 2009 Warren B. Powell 9© 2008 Warren B. Powell Slide 9 Outline Correlated beliefs
measure here these beliefs change too. The correlated knowledge gradient (CKG) The power of the knowledge gradient concept is that it is very general. In particular, we can handle problems where we learn about other choices from a single measurement:
With correlations Without correlations Optimal measuring with correlations Measurement one point tells us about neighboring points »Measuring radiation in the air or water at one location provides information about other locations. »Evaluating the performance of one nuclear detector provides information about others using same technology. Correlated knowledge gradient procedure Chooses measurements based in part on what we learn about other potential measurements. A few measurements allows us to update knowledge about everything. Requires dramatically fewer measurements.
Optimal learning in physical sciences Materials research »How do we find the best material for converting sunlight to electricity? »What is the best battery design for storing energy? »We need a method to sort through potentially thousands of experiments.
Drug discovery Designing molecules »X and Y are sites where we can hang substituents to change the behavior of the molecule
Drug discovery We express our belief using a linear, additive QSAR model »
Drug discovery Compact representation on 10,000 combination compound »Results from 15 sample paths Performance under best possible Number of molecules tested
Drug discovery Single sample path on molecule with 87,120 combinations Performance under best possible Number of molecules tested
© 2009 Warren B. Powell 17© 2008 Warren B. Powell Slide 17 Outline The knowledge gradient for on-line applications
KG for on-line learning problems Knowledge gradient policy »For off-line problems: »For finite-horizon on-line problems: »For infinite-horizon discounted problems: Compare to Gittins indices for on-line (bandit) problems Gittins indices are optimal for infinite horizon problems, but they are hard to compute, and cannot handle correlated beliefs.
KG for on-line learning problems On-line KG vs. Gittins On-line KG slightly outperforms Gittins. On-line KG slightly underperforms Gittins Number of measurements
KG for on-line learning problems KG versus Gittins indices for multiarmed bandit problems »Gittins indices are provably optimal…. »… but computing them is hard. »Chick and Gans (2009) has developed a simple and accurate approximation. Informative prior Improvement of KG over Gittins Uninformative prior
© 2009 Warren B. Powell 21© 2008 Warren B. Powell Slide 21 Outline Learning on a graph
© 2009 Warren B. Powell 22 Applications Figure out Manhattan: »Walking »Subway/walking »Taxi »Street bus »Driving
© 2009 Warren B. Powell 23 Information collection on a graph Optimal routing over a graph:
© 2009 Warren B. Powell 24 Information collection on a graph Optimal routing over a graph »The shortest path
© 2009 Warren B. Powell 25 Information collection on a graph Optimal routing over a graph »The shortest path »Evaluating a link
© 2009 Warren B. Powell 26 Information collection on a graph Optimal routing over a graph »The shortest path »Evaluating a link »Now we have a new shortest path »How do we decide which links to measure?
27 Information collection on a graph The knowledge gradient on a graph »When we had finite alternatives, we had to compute »For problems on graphs, we have to compute Value of best path that includes link (i,j) Value of best path that does not include link (i,j)
Experimental results Ten layered graphs (22 nodes, 50 edges) Ten larger layered graphs (38 nodes, 102 edges)
© 2009 Warren B. Powell 29© 2008 Warren B. Powell Slide 29 Outline Learning continuous surfaces
Finding the hot spot Imagine that we detect nuclear radiation in Manhattan, but we need to find the epicenter. How do we collect information to find this as quickly as possible?
Initially we think the concentration is the same everywhere: »We want to measure the value where the knowledge gradient is the highest. This is the measurement that teaches us the most. Measuring two-dimensional surfaces Estimated concentrationKnowledge gradient
After four measurements: »Whenever we measure at a point, the value of another measurement at the same point goes down. The knowledge gradient guides us to measuring areas of high uncertainty. Measuring two-dimensional surfaces Measurement Value of another measurement at same location. Estimated concentrationKnowledge gradient New optimum
Measuring two-dimensional surfaces After five measurements: Estimated concentrationKnowledge gradient
Measuring two-dimensional surfaces After six samples Estimated concentrationKnowledge gradient
Measuring two-dimensional surfaces After seven samples Estimated concentrationKnowledge gradient
Measuring two-dimensional surfaces After eight samples Estimated concentrationKnowledge gradient
Measuring two-dimensional surfaces After nine samples Estimated concentrationKnowledge gradient
Measuring two-dimensional surfaces After ten samples Estimated concentrationKnowledge gradient
After 10 measurements, our estimate of the surface: Measuring two-dimensional surfaces Estimated concentrationTrue concentration
Measuring multidimensional surfaces Extending to multidimensional surfaces »Challenge: the knowledge gradient surface is nonconcave: »Animation of continuous KG using KG approximationAnimation of continuous KG using KG approximation
Measuring multidimensional surfaces Extending to multidimensional surfaces »Challenge: the knowledge gradient surface is nonconcave: »Animation of continuous KG using KG approximationAnimation of continuous KG using KG approximation
© 2009 Warren B. Powell 42© 2008 Warren B. Powell Slide 42 Outline Learning with a physical state
Managing a physical sensor What if we have to move a physical entity (person, vehicle) around the city to make observations? »This produces a problem with both a physical state (the location of the sensor) and a belief state (what we know about the surface). »Produces partially observable Markov decision process (POMDP’s). Algorithms for this problem class are limited to very small problems.
Managing a physical sensor We recently adapted the knowledge gradient to this problem class. »Classical dynamic programming without learning: »Dynamic programming with learning: Central insight – learning something about the value of being in state tells us something about the value of a random future state as a result of correlated beliefs. Value of physical movement Value of information
The Knowledge Gradient Calculator Spreadsheet interface to Java-based library
© 2009 Warren B. Powell 47