Near-Optimal Sensor Placements in Gaussian Processes Carlos Guestrin Andreas KrauseAjit Singh Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Nonmyopic Active Learning of Gaussian Processes An Exploration – Exploitation Approach Andreas Krause, Carlos Guestrin Carnegie Mellon University TexPoint.
Submodularity for Distributed Sensing Problems Zeyn Saigol IR Lab, School of Computer Science University of Birmingham 6 th July 2010.
1 ECE 776 Project Information-theoretic Approaches for Sensor Selection and Placement in Sensor Networks for Target Localization and Tracking Renita Machado.
Kriging.
Cost of surrogates In linear regression, the process of fitting involves solving a set of linear equations once. For moving least squares, we need to form.
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
Randomized Sensing in Adversarial Environments Andreas Krause Joint work with Daniel Golovin and Alex Roper International Joint Conference on Artificial.
Parallel Double Greedy Submodular Maxmization Xinghao Pan, Stefanie Jegelka, Joseph Gonzalez, Joseph Bradley, Michael I. Jordan.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Niranjan Srinivas Andreas Krause Caltech Caltech
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Maximizing the Spread of Influence through a Social Network
Online Distributed Sensor Selection Daniel Golovin, Matthew Faulkner, Andreas Krause theory and practice collide 1.
Submodular Dictionary Selection for Sparse Representation Volkan Cevher Laboratory for Information and Inference Systems - LIONS.
Multi-Task Compressive Sensing with Dirichlet Process Priors Yuting Qi 1, Dehong Liu 1, David Dunson 2, and Lawrence Carin 1 1 Department of Electrical.
CMPUT 466/551 Principal Source: CMU
Carnegie Mellon Selecting Observations against Adversarial Objectives Andreas Krause Brendan McMahan Carlos Guestrin Anupam Gupta TexPoint fonts used in.
Beyond Keyword Search: Discovering Relevant Scientific Literature Khalid El-Arini and Carlos Guestrin August 22, 2011 TexPoint fonts used in EMF. Read.
Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.
Efficient Informative Sensing using Multiple Robots
Department of Computer Science, University of Maryland, College Park, USA TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
A Decentralised Coordination Algorithm for Maximising Sensor Coverage in Large Sensor Networks Ruben Stranders, Alex Rogers and Nicholas R. Jennings School.
x – independent variable (input)
1 Distributed localization of networked cameras Stanislav Funiak Carlos Guestrin Carnegie Mellon University Mark Paskin Stanford University Rahul Sukthankar.
Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Near-optimal Nonmyopic Value of Information in Graphical Models Andreas Krause, Carlos Guestrin Computer Science Department Carnegie Mellon University.
Sensor placement applications Monitoring of spatial phenomena Temperature Precipitation... Active learning, Experiment design Precipitation data from Pacific.
Non-myopic Informative Path Planning in Spatio-Temporal Models Alexandra Meliou Andreas Krause Carlos Guestrin Joe Hellerstein.
Optimal Nonmyopic Value of Information in Graphical Models Efficient Algorithms and Theoretical Limits Andreas Krause, Carlos Guestrin Computer Science.
Near-optimal Observation Selection using Submodular Functions Andreas Krause joint work with Carlos Guestrin (CMU)
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology.
Zoë Abrams, Ashish Goel, Serge Plotkin Stanford University Set K-Cover Algorithms for Energy Efficient Monitoring in Wireless Sensor Networks.
1 Efficient planning of informative paths for multiple robots Amarjeet Singh *, Andreas Krause +, Carlos Guestrin +, William J. Kaiser *, Maxim Batalin.
Nonmyopic Active Learning of Gaussian Processes An Exploration – Exploitation Approach Andreas Krause, Carlos Guestrin Carnegie Mellon University TexPoint.
Near-optimal Sensor Placements: Maximizing Information while Minimizing Communication Cost Andreas Krause, Carlos Guestrin, Anupam Gupta, Jon Kleinberg.
Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.
Study of Sparse Online Gaussian Process for Regression EE645 Final Project May 2005 Eric Saint Georges.
Gaussian process modelling
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Toward Community Sensing Andreas Krause Carnegie Mellon University Joint work with Eric Horvitz, Aman Kansal, Feng Zhao Microsoft Research Information.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.
Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.
Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, É va Tardos KDD 2003.
5 Maximizing submodular functions Minimizing convex functions: Polynomial time solvable! Minimizing submodular functions: Polynomial time solvable!
Vasilis Syrgkanis Cornell University
- 1 - Calibration with discrepancy Major references –Calibration lecture is not in the book. –Kennedy, Marc C., and Anthony O'Hagan. "Bayesian calibration.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Regularization of energy-based representations Minimize total energy E p (u) + (1- )E d (u,d) E p (u) : Stabilizing function - a smoothness constraint.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:
Chapter 7. Classification and Prediction
Monitoring rivers and lakes [IJCAI ‘07]
Near-optimal Observation Selection using Submodular Functions
Probability Theory and Parameter Estimation I
Moran Feldman The Open University of Israel
Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab
Distributed Submodular Maximization in Massive Datasets
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Coverage Approximation Algorithms
Cost-effective Outbreak Detection in Networks
Near-Optimal Sensor Placements in Gaussian Processes
Submodular Maximization with Cardinality Constraints
Guess Free Maximization of Submodular and Linear Sums
Presentation transcript:

Near-Optimal Sensor Placements in Gaussian Processes Carlos Guestrin Andreas KrauseAjit Singh Carnegie Mellon University

Sensor placement applications Monitoring of spatial phenomena Temperature Precipitation Drilling oil wells... Active learning, experimental design,... Results today not limited to 2-dimensions Precipitation data from Pacific NW Temperature data from sensor network

Deploying sensors This deployment: Evenly distributed sensors But, what are the optimal placements??? i.e., solving combinatorial (non-myopic) optimization Chicken-and-Egg problem:  No data or assumptions about distribution Don’t know where to place sensors assumptions Considered in: Computer science (c.f., [Hochbaum & Maass ’85]) Spatial statistics (c.f., [Cressie ’91])

Strong assumption – Sensing radius Node predicts values of positions with some radius Becomes a covering problem Problem is NP-complete But there are good algorithms with (PTAS)  -approximation guarantees [Hochbaum & Maass ’85] Unfortunately, approach is usually not useful… Assumption is wrong on real data!  For example…

Spatial correlation Precipitation data from Pacific NW Non-local, Non-circular correlations Complex positive and negative correlations

Complex, noisy correlations Complex, uneven sensing “region” Actually, noisy correlations, rather than sensing region

Combining multiple sources of information Individually, sensors are bad predictors Combined information is more reliable How do we combine information? Focus of spatial statistics Temp here?

Gaussian process (GP) - Intuition x - position y - temperature GP – Non-parametric; represents uncertainty; complex correlation functions (kernels) less sure here more sure here Uncertainty after observations are made

Gaussian processes Posterior mean temperature Posterior variance Kernel function:Prediction after observing set of sensors A:

Gaussian processes for sensor placement Posterior mean temperature Posterior variance Goal: Find sensor placement with least uncertainty after observations Problem is still NP-complete  Need approximation

Consider myopically selecting This can be seen as an attempt to non-myopically maximize Non-myopic placements H(A 1 )+ H(A 2 | {A 1 }) H(A k | {A 1... A k-1 }) most uncertain most uncertain given A 1 most uncertain given A 1... A k-1 This is exactly the joint entropy H(A) = H({A 1... A k })

Entropy criterion (c.f., [Cressie ’91]) A Ã ; For i = 1 to k Add location X i to A, s.t.: “Wasted” information observed by [O’Hagan ’78] Entropy High uncertainty given current set A – X is different Temperature data placements: Entropy Uncertainty (entropy) plot Entropy places sensors along borders Entropy criterion wastes information [O’Hagan ’78], Indirect, doesn’t consider sensing region – No formal non-myopic guarantees 

Proposed objective function: Mutual information Locations of interest V Find locations A µ V maximizing mutual information: Intuitive greedy rule: High uncertainty given A – X is different Low uncertainty given rest – X is informative Uncertainty of uninstrumented locations after sensing Uncertainty of uninstrumented locations before sensing Intuitive criterion – Locations that are both different and informative We give formal non-myopic guarantees Temperature data placements: Entropy Mutual information

T1T1 T2T2 An important observation T5T5 T4T4 T3T3 Selecting T 1 tells sth. about T 2 and T 5 Selecting T 3 tells sth. about T 2 and T 4 Now adding T 2 would not help much In many cases, new information is worth less if we know more (diminishing returns)!

Submodular set functions Submodular set functions are a natural formalism for this idea: f(A [ {X}) – f(A) Maximization of SFs is NP-hard  But… B A {X} ¸ f(B [ {X}) – f(B) for A µ B

Theorem [Nemhauser et al. ’78]: The greedy algorithm guarantees (1-1/e) OPT approximation for monotone SFs, i.e. ~ 63% How can we leverage submodularity?

Theorem [Nemhauser et al. ’78]: The greedy algorithm guarantees (1-1/e) OPT approximation for monotone SFs, i.e. ~ 63% How can we leverage submodularity?

Mutual information and submodularity Mutual information is submodular F(A) = I(A;V\A) So, we should be able to use Nemhauser et al. Mutual information is not monotone!!!  Initially, adding sensor increases MI; later adding sensors decreases MI F( ; ) = I( ; ;V) = 0 F(V) = I(V; ; ) = 0 F(A) ¸ 0 mutual information A= ; A=V num. sensors Even though MI is submodular, can’t apply Nemhauser et al. Or can we…

Approximate monotonicity of mutual information If H(X|A) – H(X|V\A) ¸ 0, then MI monotonic Solution: Add grid Z of unobservable locations If H(X|A) – H(X|Z  V\A) ¸ 0, then MI monotonic X AV\A H(X|A) << H(X|V\A) MI not monotonic For sufficiently fine Z: H(X|A) > H(X|Z  V\A) -   MI approximately monotonic Z – unobservable

Theorem: Mutual information sensor placement Greedy MI algorithm provides constant factor approximation: placing k sensors, 8  >0: Optimal non-myopic solution Result of our algorithm Constant factor Approximate monotonicity for sufficiently discretization – poly(1/ ,k, ,L,M)  – sensor noise, L – Lipschitz const. of kernels, M – max X K(X,X)

Different costs for different placements Theorem 1: Constant-factor approximation of optimal locations – select k sensors Theorem 2: (Cost-sensitive placements) In practice, different locations may have different costs Corridor versus inside wall Have a budget B to spend on placing sensors Constant-factor approximation – same constant (1-1/e) Slightly more complicated than greedy algorithm [Sviridenko / Krause, Guestrin]

Deployment results “True” temp. prediction “True” temp. variance Used initial deployment to select 22 new sensors Learned new GP on test data using just these sensors Posterior mean Posterior variance Entropy criterion Mutual information criterion Mutual information has 3 times less variance than entropy criterion Model learned from 54 sensors

Comparing to other heuristics mutual information Better Greedy Algorithm we analyze Random placements Pairwise exchange (PE) Start with a some placement Swap locations while improving solution Our bound enables a posteriori analysis for any heuristic Assume, algorithm TUAFSPGP gives results which are 10% better than the results obtained from the greedy algorithm Then we immediately know, TUAFSPGP is within 70% of optimum!

Precipitation data Better Entropy criterion Mutual information Entropy Mutual information

Computing the greedy rule Exploit sparsity in kernel matrix At each iteration For each candidate position i 2 {1,…,N}, must compute: Requires inversion of NxN matrix – about O(N 3 ) Total running time for k sensors: O(kN 4 ) Polynomial! But very slow in practice 

Local kernels Covariance matrix may have many zeros! Each sensor location correlated with a small number of other locations Exploiting locality: If each location correlated with at most d others A sparse representation, and a priority queue trick Reduce complexity from O(kN 4 ) to: Only about O(N log N) == Usually, matrix is only almost sparse

Approximately local kernels Covariance matrix may have many elements close to zero E.g., Gaussian kernel Matrix not sparse What if we set them to zero? Sparse matrix Approximate solution Theorem: Truncate small entries ! small effect on solution quality If |K(x,y)| · , set to 0 Then, quality of placements only O(  ) worse

Effect of truncated kernels on solution – Rain data Improvement in running time Better Effect on solution quality Better About 3 times faster, minimal effect on solution quality

Summary Mutual information criterion for sensor placement in general GPs Efficient algorithms with strong approximation guarantees: (1-1/e) OPT-ε Exploiting local structure improves efficiency Superior prediction accuracy for several real- world problems Related ideas in discrete settings presented at UAI and IJCAI this year Effective algorithm for sensor placement and experimental design; basis for active learning

A note on maximizing entropy Entropy is submodular [Ko et al. `95], but… Function F is monotonic iff: Adding X cannot hurt F(A [ X) ¸ F(A) Remark: Entropy in GPs not monotonic (not even approximately) H(A [ X) – H(A) = H(X|A) As discretization becomes finer H(X|A) ! - 1 Nemhauser et al. analysis for submodular functions not applicable directly to entropy

How do we predict temperatures at unsensed locations? position temperature Interpolation? Overfits Far away points?

How do we predict temperatures at unsensed locations? x - position y - temperature Regression y = a + bx + cx 2 + dx 3 Few parameters, less overfitting But, regression function has no notion of uncertainty!!!  How sure are we about the prediction? less sure here more sure here