Toward Community Sensing Andreas Krause Carnegie Mellon University Joint work with Eric Horvitz, Aman Kansal, Feng Zhao Microsoft Research Information.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Beyond Convexity – Submodularity in Machine Learning
Research Challenges in the CarTel Mobile Sensor System Samuel Madden Associate Professor, MIT.
1 Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization Joint work with Andreas Krause 1 Daniel Golovin.
Nonmyopic Active Learning of Gaussian Processes An Exploration – Exploitation Approach Andreas Krause, Carlos Guestrin Carnegie Mellon University TexPoint.
Submodularity for Distributed Sensing Problems Zeyn Saigol IR Lab, School of Computer Science University of Birmingham 6 th July 2010.
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
Randomized Sensing in Adversarial Environments Andreas Krause Joint work with Daniel Golovin and Alex Roper International Joint Conference on Artificial.
VTrack: Accurate, Energy-Aware Road Traffic Delay Estimation Using Mobile Phones Arvind Thiagarajan, Lenin Ravindranath, Katrina LaCurts, Sivan Toledo,
PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
Online Distributed Sensor Selection Daniel Golovin, Matthew Faulkner, Andreas Krause theory and practice collide 1.
Submodular Dictionary Selection for Sparse Representation Volkan Cevher Laboratory for Information and Inference Systems - LIONS.
Carnegie Mellon Selecting Observations against Adversarial Objectives Andreas Krause Brendan McMahan Carlos Guestrin Anupam Gupta TexPoint fonts used in.
Near-Optimal Sensor Placements in Gaussian Processes Carlos Guestrin Andreas KrauseAjit Singh Carnegie Mellon University.
Beyond Keyword Search: Discovering Relevant Scientific Literature Khalid El-Arini and Carlos Guestrin August 22, 2011 TexPoint fonts used in EMF. Read.
ACCURACY CHARACTERIZATION FOR METROPOLITAN-SCALE WI-FI LOCALIZATION Presented by Jack Li March 5, 2009.
Efficient Informative Sensing using Multiple Robots
Approximating Sensor Network Queries Using In-Network Summaries Alexandra Meliou Carlos Guestrin Joseph Hellerstein.
A Utility-Theoretic Approach to Privacy and Personalization Andreas Krause Carnegie Mellon University work performed during an internship at Microsoft.
gMapping TexPoint fonts used in EMF.
Near-optimal Nonmyopic Value of Information in Graphical Models Andreas Krause, Carlos Guestrin Computer Science Department Carnegie Mellon University.
Sensor placement applications Monitoring of spatial phenomena Temperature Precipitation... Active learning, Experiment design Precipitation data from Pacific.
Sampling Design: Determine Where to Take Measurements Sampling Design: Determine Where to Take Measurements Empirical Approaches to Sensor Placement: Mobile.
1 An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem Matthew Streeter & Stephen Smith Carnegie Mellon University NESCAI, April
Non-myopic Informative Path Planning in Spatio-Temporal Models Alexandra Meliou Andreas Krause Carlos Guestrin Joe Hellerstein.
[1][1][1][1] Lecture 5-7: Cell Planning of Cellular Networks June 22 + July 6, Introduction to Algorithmic Wireless Communications David Amzallag.
Near-optimal Observation Selection using Submodular Functions Andreas Krause joint work with Carlos Guestrin (CMU)
Dieter Pfoser, LBS Workshop1 Issues in the Management of Moving Point Objects Dieter Pfoser Nykredit Center for Database Research Aalborg University, Denmark.
1 Distributed Online Simultaneous Fault Detection for Multiple Sensors Ram Rajagopal, Xuanlong Nguyen, Sinem Ergen, Pravin Varaiya EECS, University of.
1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
1 Efficient planning of informative paths for multiple robots Amarjeet Singh *, Andreas Krause +, Carlos Guestrin +, William J. Kaiser *, Maxim Batalin.
Nonmyopic Active Learning of Gaussian Processes An Exploration – Exploitation Approach Andreas Krause, Carlos Guestrin Carnegie Mellon University TexPoint.
Freeway Segment Traffic State Estimation
Near-optimal Sensor Placements: Maximizing Information while Minimizing Communication Cost Andreas Krause, Carlos Guestrin, Anupam Gupta, Jon Kleinberg.
Approximation Algorithms: Bristol Summer School 2008 Seffi Naor Computer Science Dept. Technion Haifa, Israel TexPoint fonts used in EMF. Read the TexPoint.
Coordinated Sampling sans Origin-Destination Identifiers: Algorithms and Analysis Vyas Sekar, Anupam Gupta, Michael K. Reiter, Hui Zhang Carnegie Mellon.
Calling all cars: cell phone networks and the future of traffic Presentation by Scott Corey Article written by Haomiao Huang.
Rutgers: Gayathri Chandrasekaran, Tam Vu, Marco Gruteser, Rich Martin,
This slide brought to you by What to Do With Thousands of GPS Tracks John Krumm, PhD Microsoft Research Redmond, WA.
Baik Hoh Marco Gruteser Hui Xiong Ansaf Alrabady All images are credited to “ACM” Hoh et al (2007), pp
Sensys 2009 Speaker:Lawrence.  Introduction  Overview & Challenges  Algorithm  Travel Time Estimation  Evaluation  Conclusion.
Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and.
1 Patch Complexity, Finite Pixel Correlations and Optimal Denoising Anat Levin, Boaz Nadler, Fredo Durand and Bill Freeman Weizmann Institute, MIT CSAIL.
UNIVERSITY of NOTRE DAME COLLEGE of ENGINEERING Preserving Location Privacy on the Release of Large-scale Mobility Data Xueheng Hu, Aaron D. Striegel Department.
Hidden Markov Map Matching Through Noise and Sparseness Paul Newson and John Krumm Microsoft Research ACM SIGSPATIAL ’09 November 6 th, 2009.
Microsoft Research Faculty Summit Aman Kansal Researcher Networked Embedded Computing, MSR.
Eric Horvitz Tadayoshi Kohno Frank McSherry Wendy Seltzer Daniel Weitzner.
Energy-Aware Scheduling with Quality of Surveillance Guarantee in Wireless Sensor Networks Jaehoon Jeong, Sarah Sharafkandi and David H.C. Du Dept. of.
DISCERN: Cooperative Whitespace Scanning in Practical Environments Tarun Bansal, Bo Chen and Prasun Sinha Ohio State Univeristy.
Optimal Sampling Strategies for Multiscale Stochastic Processes Vinay Ribeiro Rolf Riedi, Rich Baraniuk (Rice University)
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Virtual Trip Lines for Distributed Privacy- Preserving Traffic Monitoring Baik Hoh et al. MobiSys08 Slides based on Dr. Hoh’s MobiSys presentation.
Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint.
5 Maximizing submodular functions Minimizing convex functions: Polynomial time solvable! Minimizing submodular functions: Polynomial time solvable!
Deterministic Algorithms for Submodular Maximization Problems Moran Feldman The Open University of Israel Joint work with Niv Buchbinder.
NetQuest: A Flexible Framework for Large-Scale Network Measurement Lili Qiu University of Texas at Austin Joint work with Han Hee Song.
1 Travel Times from Mobile Sensors Ram Rajagopal, Raffi Sevlian and Pravin Varaiya University of California, Berkeley Singapore Road Traffic Control TexPoint.
ParkNet: Drive-by Sensing of Road-Side Parking Statistics Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin,
Privacy Vulnerability of Published Anonymous Mobility Traces Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip (Purdue University) Nageswara S. V. Rao (Oak.
Monitoring rivers and lakes [IJCAI ‘07]
Near-optimal Observation Selection using Submodular Functions
Probability Theory and Parameter Estimation I
Moran Feldman The Open University of Israel
DASH Background Server provides multiple qualities of the same video
Distributed Submodular Maximization in Massive Datasets
UAV Route Planning in Delay Tolerant Networks
Near-Optimal Sensor Placements in Gaussian Processes
Submodular Maximization with Cardinality Constraints
Overview: Chapter 2 Localization and Tracking
Presentation transcript:

Toward Community Sensing Andreas Krause Carnegie Mellon University Joint work with Eric Horvitz, Aman Kansal, Feng Zhao Microsoft Research Information Processing in Sensor Networks | April 24, 2008 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A

2 Motivation: Traffic monitoring Deployed sensors, high accuracy speed data What about 148 th Ave? How can we get accurate road speed estimates everywhere? Detector loops Traffic cameras

3 Cars as traffic sensors Many cars have Personal Navigation Devices (PNDs) Know exact location and speed! Fuse GPS, map information, engine speed, … Modern PNDs have network connection  Can use cars as speed sensors! Example: Dash Express (GPS + GPRS/WiFi)

4 Community Sensing Vision Realize full potential of population owned sensors Must respect privacy and preference about sharing! Privately-held sensors Common goal Estimate spatial phenomenon (traffic, weather, …) Construct 3D cities News coverage Contribute sensor data Request data SenseWeb

5 Privacy concern of GPS traces Dense GPS traces allow to identify people’s locations, activities, intents, etc. Even anonymization or strong obfuscation doesn’t help. Key idea: Avoid dense sampling! Need to predict from sparse samples Images courtesy of John Krumm

6 s1s1 s2s2 s3s3 s4s4 s5s5 s7s7 s6s6 s 11 s 12 s9s9 s 10 s8s8 Phenomenon modeling (Normalized) speeds as random variables Joint distribution allows modeling correlations Can predict unmonitored speeds from monitored speeds using P(S 5 | S 1, S 9 ) s1s1 s3s3 s 12 s9s9 Which segments should we monitor?

7 Minimizing uncertainty s 1 =.9 s 2 =1 S 3 =1 s5s5 s6s6 s 4 =1 s7s7 P(S 5 |s A ) 01 Var(S 5 |s A )=.01 Var(S 5 |s A )=.1 Var(S 5 |S A )= A={S 1,S 2,S 3,S 4 } s 1 =.5 s 2 =.6 s 3 =.8 s 4 =.6.08 Var(S 6 |S A )=.1 Var(S 7 |S A )=.3 s1s1 s2s2 s3s3 s4s4 s1s1 s2s2 s3s3 s4s4 Can estimate prediction error at segment S i Var(S i | S A = s A ) Expected error at segment S i Expected mean squared error EMSE(A) =  i Var(S i | S A ) = + + A* = argmin |A| · k EMSE(A) Does not take “importance” of S i into account  Frequently travelled Less travelled

8 Taking demand into account Model demand D i as random variables (e.g., Poisson) E.g., D i = #cars on segment S i Demand weighted MSE DMSE(A) =  i E[D i ] Var(S i | S A ) Error reduction: R(A) = DMSE( ; )-DMSE(A) Want: A* = argmax |A| · k R(A) NP-hard optimization problem  s1s1 s3s3 s4s4 Var(S 5 |S A )=.08 Var(S 6 |S A )=.1 Var(S 7 |S A )=.3 50 D 5 = s2s2 s5s5 10 D 6 = 200 D 7 = = ¢¢¢ + + s6s6 s7s7

9 Selecting informative locations Greedy algorithm: A  ; For i = 1:k do s*= argmax s R(A [ {s}) A  A [ {s*} How well does this heuristic do? s1s1 s2s2 s3s3 s4s4 s5s5 s7s7 s6s6 s 11 s 12 s9s9 s 10 s8s8 s2s2 s 11 s7s7 s 10

10 s1s1 s2s2 s3s3 s4s4 s5s5 s7s7 s6s6 s 11 s9s9 s 10 s8s8 Selection B Diminishing returns s1s1 s2s2 s3s3 s4s4 s5s5 s7s7 s6s6 s 11 s9s9 s 10 s8s8 s’ Observe new location S’ B A + + Large improvement Small improvement Submodularity: For A µ B, F(A [ {S’}) – F(A) ¸ F(B [ {S’}) – F(B) Utility R(A) is submodular*! *See store for details Selection A Adding s’ helps a lot!Adding s’ doesn’t help much

11 Why is submodularity is useful? Theorem [Nemhauser et al ‘78] Greedy algorithm gives constant factor approximation F(A greedy ) ¸ (1-1/e) F(A opt ) Greedy algorithm gives near-optimal set of locations to observe Have no control over where the sensors (cars, cell phones) are going to be!  ~63%

12 Querying a roving sensor How can we cope with uncertain sensor availability? s1s1 s3s3 s6s6 s4s4 s7s7 s2s2 s5s5 Query! Response: “I’m at S 2, going 55 mph” Query! No response (no data) s 2 =.9

13 Road segments V = {S 1,…,S n } Random A µ V from P(A | B) Modeling sensor availability Set W of observations (cars) we can select from If select car C j, observe S i with probability P(i | C j ) s1s1 s3s3 s6s6 s4s4 s7s7 s2s2 s5s5 C1C1 C2C2 C3C3 Observations W = {C 1,…,C m } Pick B µ W Utility R(A) s1s1 s7s7 Goal: Maximize expected utility: B* = argmax |B| · k  A P(A j B) R(A)

14 Optimizing community sensing Lemma: Whenever R(A) is submodular, the function F(B) =  |A| · k P(A j B) R(A) is submodular Can use the greedy algorithm to optimize selection F(B) is sum over exponentially many terms  Theorem: For any ,  can find set B’ such that F(B’) ¸ (1-1/e) max |B| · k F(B) -  with probability 1- , using independent samples of R(A)

15 Handling user preferences Need to respect user preferences “Sample my speed at most once per day” “Don’t measure my speed for the next hour” “Never sample close to my home” “Wait at least 10 minutes between samples” Can accommodate preferences using constraint optimization: B* = argmax B F(B) subject to C(B) · L Can still get near-optimal solutions (details in paper) Complex cost function Sensing Budget

16 Community Sensing Summary Optimize value of probing roving sensors Utility (expected error reduction) Demand (usage: “utilitarian” impact) Sensor availability Predict location based on history Preferences Abide by preferences E.g., frequency / number of probes, min. inter-probe interval Other constraints: e.g., “Not near my home!” Phenomenon Demand Availability & Preferences

17 Phenomenon modeling 3 months of data from 534 segments across 7 highways and interstates near Seattle, WA Samples at 15 minute intervals Use Gaussian Process to model road speeds (covariance function based on road network topology) Can compute utility R(A) in closed form!

18 Demand modeling Demand = #cars on road segment Estimate demand based on 3166 ClearFlow route requests Expected demand (rush hour)

19 Evaluating model accuracy Accurate estimation of prediction error! Number of locations Demand-weighted RMS Lower is better

20 Demand driven querying 65% error reduction using only 10 (of 534) observations! Optimized sensing requires 10x fewer samples! Lower is better

21 Availability modeling Microsoft Multiperson Location Survey (MSMLS) [Krumm ‘06] GPS traces from 85 drivers, 6+ days each Associate GPS readings with road segments “Map matching” Two models of sensor availability Spatial obfuscation Sparse querying GPS used in MSMLS

22 Spatial obfuscation Motivation: Privacy through enforcing uncertainty about sensor location Community Sensing Service Population of sensors Request road speed at some location in area X Anonymized response from random car in cell X (if available) X

23 Spatial obfuscation Discretization ≈ Utility / Privacy knob High accuracy even with coarse discretization 23 Lower is better

24 Obfuscation by sparse querying Associate roving sensors with anonymous ID Learn availability model for each sensor from data Community Sensing Service Population of sensors Request road speed and location from car C i Response from car C i (if connected to network available)

25 Obfuscation by sparse monitoring Biggest difference in “important” part of the curve 50% error reduction over mean if querying 10 “cars” 25 Lower is better

26 Mobile vs. fixed sensors When does it “pay off” to use mobile vs. fixed sensors? Experiment: cost C(B) = #fixed(B) + #mobile(B) Mobile sensors pay off if fixed sensors 4x as expensive Fixed budget max F(B) s.t. C(B) · L

27 Extensions / Future work Spatio-temporal models (see paper) How to quickly learn good models (see paper) Other applications: Population fitness? News coverage? Reconstruction of 3D cities? Formal privacy guarantees?

28 Related work Travel time estimation using cell phones [Wunnava et al ’07] Privacy-aware querying of cars with GPS & cell phones [Bayen et al ’08, forthcoming] Spatial monitoring, experimental design etc. (see paper)

29 Conclusions Presented integrated approach to community sensing Theoretical analysis  near-optimal sensing policies Extensive empirical evaluation on traffic monitoring case study Phenomenon Demand Availability & Preferences