Near-Optimal Sensor Placements in Gaussian Processes

Slides:

Advertisements

Similar presentations

Nonmyopic Active Learning of Gaussian Processes An Exploration – Exploitation Approach Andreas Krause, Carlos Guestrin Carnegie Mellon University TexPoint.

Advertisements

Submodularity for Distributed Sensing Problems Zeyn Saigol IR Lab, School of Computer Science University of Birmingham 6 th July 2010.

1 ECE 776 Project Information-theoretic Approaches for Sensor Selection and Placement in Sensor Networks for Target Localization and Tracking Renita Machado.

Partially Observable Markov Decision Process (POMDP)

Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.

Randomized Sensing in Adversarial Environments Andreas Krause Joint work with Daniel Golovin and Alex Roper International Joint Conference on Artificial.

Parallel Double Greedy Submodular Maxmization Xinghao Pan, Stefanie Jegelka, Joseph Gonzalez, Joseph Bradley, Michael I. Jordan.

Niranjan Srinivas Andreas Krause Caltech Caltech

Online Distributed Sensor Selection Daniel Golovin, Matthew Faulkner, Andreas Krause theory and practice collide 1.

Multi-Task Compressive Sensing with Dirichlet Process Priors Yuting Qi 1, Dehong Liu 1, David Dunson 2, and Lawrence Carin 1 1 Department of Electrical.

CMPUT 466/551 Principal Source: CMU

Carnegie Mellon Selecting Observations against Adversarial Objectives Andreas Krause Brendan McMahan Carlos Guestrin Anupam Gupta TexPoint fonts used in.

Near-Optimal Sensor Placements in Gaussian Processes Carlos Guestrin Andreas KrauseAjit Singh Carnegie Mellon University.

Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.

Efficient Informative Sensing using Multiple Robots

1 Distributed localization of networked cameras Stanislav Funiak Carlos Guestrin Carnegie Mellon University Mark Paskin Stanford University Rahul Sukthankar.

Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Near-optimal Nonmyopic Value of Information in Graphical Models Andreas Krause, Carlos Guestrin Computer Science Department Carnegie Mellon University.

Sensor placement applications Monitoring of spatial phenomena Temperature Precipitation... Active learning, Experiment design Precipitation data from Pacific.

Non-myopic Informative Path Planning in Spatio-Temporal Models Alexandra Meliou Andreas Krause Carlos Guestrin Joe Hellerstein.

Optimal Nonmyopic Value of Information in Graphical Models Efficient Algorithms and Theoretical Limits Andreas Krause, Carlos Guestrin Computer Science.

Near-optimal Observation Selection using Submodular Functions Andreas Krause joint work with Carlos Guestrin (CMU)

Independent Component Analysis (ICA) and Factor Analysis (FA)

1 Efficient planning of informative paths for multiple robots Amarjeet Singh *, Andreas Krause +, Carlos Guestrin +, William J. Kaiser *, Maxim Batalin.

Nonmyopic Active Learning of Gaussian Processes An Exploration – Exploitation Approach Andreas Krause, Carlos Guestrin Carnegie Mellon University TexPoint.

Near-optimal Sensor Placements: Maximizing Information while Minimizing Communication Cost Andreas Krause, Carlos Guestrin, Anupam Gupta, Jon Kleinberg.

Study of Sparse Online Gaussian Process for Regression EE645 Final Project May 2005 Eric Saint Georges.

Gaussian process modelling

Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.

Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.

5 Maximizing submodular functions Minimizing convex functions: Polynomial time solvable! Minimizing submodular functions: Polynomial time solvable!

- 1 - Calibration with discrepancy Major references –Calibration lecture is not in the book. –Kennedy, Marc C., and Anthony O'Hagan. "Bayesian calibration.

Regularization of energy-based representations Minimize total energy E p (u) + (1- )E d (u,d) E p (u) : Stabilizing function - a smoothness constraint.

Kriging - Introduction Method invented in the 1950s by South African geologist Daniel Krige (1919-) for predicting distribution of minerals. Became very.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:

The NP class. NP-completeness

Abstract In this paper, the k-coverage problem is formulated as a decision problem, whose goal is to determine whether every point in the service area.

Chapter 7. Classification and Prediction

Monitoring rivers and lakes [IJCAI ‘07]

Near-optimal Observation Selection using Submodular Functions

Probability Theory and Parameter Estimation I

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

Moran Feldman The Open University of Israel

Computer vision: models, learning and inference

Analyzing Redistribution Matrix with Wavelet

Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab

Probabilistic Robotics

ECE 5424: Introduction to Machine Learning

Distributed Submodular Maximization in Massive Datasets

Statistical Learning Dong Liu Dept. EEIS, USTC.

A New Boosting Algorithm Using Input-Dependent Regularizer

Coverage and Distinguishability in Traffic Flow Monitoring

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Coverage Approximation Algorithms

MURI Kickoff Meeting Randolph L. Moses November, 2008

10701 / Machine Learning Today: - Cross validation,

The Bias Variance Tradeoff and Regularization

Cost-effective Outbreak Detection in Networks

Submodular Maximization Through the Lens of the Multilinear Relaxation

Ensemble learning Reminder - Bagging of Trees Random Forest

Kalman Filters Switching Kalman Filter

Parametric Methods Berlin Chen, 2005 References:

the k-cut problem better approximate and exact algorithms

Kalman Filters Gaussian MNs

Submodular Maximization with Cardinality Constraints

Guess Free Maximization of Submodular and Linear Sums

Presentation transcript:

Near-Optimal Sensor Placements in Gaussian Processes Carlos Guestrin Andreas Krause Ajit Singh Carnegie Mellon University

Sensor placement applications Monitoring of spatial phenomena Temperature Precipitation Drilling oil wells ... Active learning, experimental design, ... Results today not limited to 2-dimensions Temperature data from sensor network Precipitation data from Pacific NW

Deploying sensors Computer science Spatial statistics Considered in: This deployment: Evenly distributed sensors Considered in: Computer science (c.f., [Hochbaum & Maass ’85]) Spatial statistics (c.f., [Cressie ’91]) Chicken-and-Egg problem:  assumptions No data or assumptions about distribution But, what are the optimal placements??? i.e., solving combinatorial (non-myopic) optimization Don’t know where to place sensors

Strong assumption – Sensing radius Becomes a covering problem Problem is NP-complete But there are good algorithms with (PTAS) -approximation guarantees [Hochbaum & Maass ’85] Node predicts values of positions with some radius Unfortunately, approach is usually not useful… Assumption is wrong on real data!  For example…

Spatial correlation Precipitation data from Pacific NW Non-local, Non-circular correlations Complex positive and negative correlations

Complex, noisy correlations Complex, uneven sensing “region” Actually, noisy correlations, rather than sensing region

Combining multiple sources of information Temp here? Individually, sensors are bad predictors Combined information is more reliable How do we combine information? Focus of spatial statistics

Gaussian process (GP) - Intuition GP – Non-parametric; represents uncertainty; complex correlation functions (kernels) Uncertainty after observations are made less sure here more sure here y - temperature x - position

Prediction after observing Gaussian processes Posterior mean temperature Posterior variance Kernel function: Prediction after observing set of sensors A:

Gaussian processes for sensor placement Posterior mean temperature Posterior variance Goal: Find sensor placement with least uncertainty after observations Problem is still NP-complete  Need approximation

Non-myopic placements Consider myopically selecting This can be seen as an attempt to non-myopically maximize H(A1) + H(A2 | {A1}) + ... + H(Ak | {A1 ... Ak-1}) most uncertain This is exactly the joint entropy H(A) = H({A1 ... Ak}) most uncertain given A1 most uncertain given A1 ... Ak-1 swap order with previous

Entropy criterion (c.f., [Cressie ’91]) A Ã ; For i = 1 to k Add location Xi to A, s.t.: Entropy places sensors along borders Temperature data placements: Entropy “Wasted” information observed by [O’Hagan ’78] Entropy High uncertainty given current set A – X is different Uncertainty (entropy) plot Entropy criterion wastes information [O’Hagan ’78], Indirect, doesn’t consider sensing region – No formal non-myopic guarantees 

Proposed objective function: Mutual information Locations of interest V Find locations AµV maximizing mutual information: Intuitive greedy rule: Temperature data placements: Entropy Mutual information Uncertainty of uninstrumented locations before sensing Uncertainty of uninstrumented locations after sensing Intuitive criterion – Locations that are both different and informative We give formal non-myopic guarantees  High uncertainty given A – X is different Low uncertainty given rest – X is informative

An important observation Selecting T1 tells sth. about T2 and T5 Selecting T3 tells sth. about T2 and T4 In many cases, new information is worth less if we know more (diminishing returns)! T2 T1 T3 T5 T4 Now adding T2 would not help much

Submodular set functions Submodular set functions are a natural formalism for this idea: f(A [ {X}) – f(A) Maximization of SFs is NP-hard  But… ¸ f(B [ {X}) – f(B) for A µ B B A {X}

How can we leverage submodularity? Theorem [Nemhauser et al. ’78]: The greedy algorithm guarantees (1-1/e) OPT approximation for monotone SFs, i.e. ~ 63%

How can we leverage submodularity? Theorem [Nemhauser et al. ’78]: The greedy algorithm guarantees (1-1/e) OPT approximation for monotone SFs, i.e. ~ 63%

Mutual information and submodularity Even though MI is submodular, can’t apply Nemhauser et al. Or can we…  Mutual information is submodular  F(A) = I(A;V\A) So, we should be able to use Nemhauser et al. Mutual information is not monotone!!!  Initially, adding sensor increases MI; later adding sensors decreases MI F(;) = I(;;V) = 0 F(V) = I(V;;) = 0 F(A) ¸ 0 mutual information A=; A=V num. sensors

Approximate monotonicity of mutual information If H(X|A) – H(X|V\A) ¸ 0, then MI monotonic Solution: Add grid Z of unobservable locations If H(X|A) – H(X|ZV\A) ¸ 0, then MI monotonic H(X|A) << H(X|V\A) MI not monotonic For sufficiently fine Z: H(X|A) > H(X|ZV\A) -  MI approximately monotonic V\A A Z – unobservable X

Theorem: Mutual information sensor placement Greedy MI algorithm provides constant factor approximation: placing k sensors, 8 >0: Approximate monotonicity for sufficiently discretization – poly(1/,k,,L,M) – sensor noise, L – Lipschitz const. of kernels, M – maxX K(X,X) Result of our algorithm Constant factor Optimal non-myopic solution

Different costs for different placements Theorem 1: Constant-factor approximation of optimal locations – select k sensors Theorem 2: (Cost-sensitive placements) In practice, different locations may have different costs Corridor versus inside wall Have a budget B to spend on placing sensors Constant-factor approximation – same constant (1-1/e) Slightly more complicated than greedy algorithm [Sviridenko / Krause, Guestrin]

Mutual information has 3 times less variance than entropy criterion Model learned from 54 sensors Deployment results “True” temp. prediction “True” temp. variance Mutual information has 3 times less variance than entropy criterion Used initial deployment to select 22 new sensors Learned new GP on test data using just these sensors Posterior mean variance Entropy criterion Mutual information criterion

Comparing to other heuristics Greedy Algorithm we analyze Random placements Pairwise exchange (PE) Start with a some placement Swap locations while improving solution Our bound enables a posteriori analysis for any heuristic Assume, algorithm TUAFSPGP gives results which are 10% better than the results obtained from the greedy algorithm Then we immediately know, TUAFSPGP is within 70% of optimum! Better mutual information

Precipitation data Entropy criterion Mutual information Entropy Better Entropy criterion Mutual information Entropy Mutual information

Computing the greedy rule At each iteration For each candidate position i 2{1,…,N}, must compute: Requires inversion of NxN matrix – about O(N3) Total running time for k sensors: O(kN4) Polynomial! But very slow in practice  Exploit sparsity in kernel matrix

Usually, matrix is only almost sparse Local kernels  = Covariance matrix may have many zeros! Each sensor location correlated with a small number of other locations Exploiting locality: If each location correlated with at most d others A sparse representation, and a priority queue trick Reduce complexity from O(kN4) to: Only about O(N log N) Usually, matrix is only almost sparse

Approximately local kernels Covariance matrix may have many elements close to zero E.g., Gaussian kernel Matrix not sparse What if we set them to zero? Sparse matrix Approximate solution Theorem: Truncate small entries ! small effect on solution quality If |K(x,y)| · , set to 0 Then, quality of placements only O() worse

Effect of truncated kernels on solution – Rain data Improvement in running time Effect on solution quality Better Better About 3 times faster, minimal effect on solution quality

Summary Mutual information criterion for sensor placement in general GPs Efficient algorithms with strong approximation guarantees: (1-1/e) OPT-ε Exploiting local structure improves efficiency Superior prediction accuracy for several real-world problems Related ideas in discrete settings presented at UAI and IJCAI this year Effective algorithm for sensor placement and experimental design; basis for active learning

A note on maximizing entropy Entropy is submodular [Ko et al. `95], but… Function F is monotonic iff: Adding X cannot hurt F(A[X) ¸ F(A) Remark: Entropy in GPs not monotonic (not even approximately) H(A[X) – H(A) = H(X|A) As discretization becomes finer H(X|A) ! -1 Nemhauser et al. analysis for submodular functions not applicable directly to entropy

How do we predict temperatures at unsensed locations? Far away points? Interpolation? Overfits temperature position

How do we predict temperatures at unsensed locations? Regression y = a + bx + cx2 + dx3 Few parameters, less overfitting  less sure here more sure here How sure are we about the prediction? y - temperature x - position But, regression function has no notion of uncertainty!!! 