Carnegie Mellon AI, Sensing, and Optimized Information Gathering: Trends and Directions Carlos Guestrin joint work with: and: Anupam Gupta, Jon Kleinberg,

Slides:

Advertisements

Similar presentations

Beyond Convexity – Submodularity in Machine Learning

Advertisements

Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.

Nonmyopic Active Learning of Gaussian Processes An Exploration – Exploitation Approach Andreas Krause, Carlos Guestrin Carnegie Mellon University TexPoint.

Lindsey Bleimes Charlie Garrod Adam Meyerson

Submodularity for Distributed Sensing Problems Zeyn Saigol IR Lab, School of Computer Science University of Birmingham 6 th July 2010.

LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.

Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.

Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.

Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.

Randomized Sensing in Adversarial Environments Andreas Krause Joint work with Daniel Golovin and Alex Roper International Joint Conference on Artificial.

Maximizing the Spread of Influence through a Social Network

Online Distributed Sensor Selection Daniel Golovin, Matthew Faulkner, Andreas Krause theory and practice collide 1.

S. J. Shyu Chap. 1 Introduction 1 The Design and Analysis of Algorithms Chapter 1 Introduction S. J. Shyu.

Carnegie Mellon Selecting Observations against Adversarial Objectives Andreas Krause Brendan McMahan Carlos Guestrin Anupam Gupta TexPoint fonts used in.

David Chu--UC Berkeley Amol Deshpande--University of Maryland Joseph M. Hellerstein--UC Berkeley Intel Research Berkeley Wei Hong--Arched Rock Corp. Approximate.

Near-Optimal Sensor Placements in Gaussian Processes Carlos Guestrin Andreas KrauseAjit Singh Carnegie Mellon University.

1 Stochastic Event Capture Using Mobile Sensors Subject to a Quality Metric Nabhendra Bisnik, Alhussein A. Abouzeid, and Volkan Isler Rensselaer Polytechnic.

Efficient Informative Sensing using Multiple Robots

Approximating Sensor Network Queries Using In-Network Summaries Alexandra Meliou Carlos Guestrin Joseph Hellerstein.

Probabilistic Aggregation in Distributed Networks Ling Huang, Ben Zhao, Anthony Joseph and John Kubiatowicz {hling, ravenben, adj,

A Utility-Theoretic Approach to Privacy and Personalization Andreas Krause Carnegie Mellon University work performed during an internship at Microsoft.

Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Near-optimal Nonmyopic Value of Information in Graphical Models Andreas Krause, Carlos Guestrin Computer Science Department Carnegie Mellon University.

Sensor placement applications Monitoring of spatial phenomena Temperature Precipitation... Active learning, Experiment design Precipitation data from Pacific.

Computational Methods for Management and Economics Carla Gomes

Non-myopic Informative Path Planning in Spatio-Temporal Models Alexandra Meliou Andreas Krause Carlos Guestrin Joe Hellerstein.

Optimal Nonmyopic Value of Information in Graphical Models Efficient Algorithms and Theoretical Limits Andreas Krause, Carlos Guestrin Computer Science.

INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.

Near-optimal Observation Selection using Submodular Functions Andreas Krause joint work with Carlos Guestrin (CMU)

Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam

1 Efficient planning of informative paths for multiple robots Amarjeet Singh *, Andreas Krause +, Carlos Guestrin +, William J. Kaiser *, Maxim Batalin.

Nonmyopic Active Learning of Gaussian Processes An Exploration – Exploitation Approach Andreas Krause, Carlos Guestrin Carnegie Mellon University TexPoint.

Near-optimal Sensor Placements: Maximizing Information while Minimizing Communication Cost Andreas Krause, Carlos Guestrin, Anupam Gupta, Jon Kleinberg.

Model-driven Data Acquisition in Sensor Networks Amol Deshpande 1,4 Carlos Guestrin 4,2 Sam Madden 4,3 Joe Hellerstein 1,4 Wei Hong 4 1 UC Berkeley 2 Carnegie.

Active Learning for Probabilistic Models Lee Wee Sun Department of Computer Science National University of Singapore LARC-IMS Workshop.

Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.

1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.

1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.

Toward Community Sensing Andreas Krause Carnegie Mellon University Joint work with Eric Horvitz, Aman Kansal, Feng Zhao Microsoft Research Information.

Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.

Evaluation of Non-Uniqueness in Contaminant Source Characterization based on Sensors with Event Detection Methods Jitendra Kumar 1, E. M. Zechman 1, E.

Scaling up Decision Trees. Decision tree learning.

Carnegie Mellon Maximizing Submodular Functions and Applications in Machine Learning Andreas Krause, Carlos Guestrin Carnegie Mellon University.

Approximate Dynamic Programming Methods for Resource Constrained Sensor Management John W. Fisher III, Jason L. Williams and Alan S. Willsky MIT CSAIL.

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.

A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.

5 Maximizing submodular functions Minimizing convex functions: Polynomial time solvable! Minimizing submodular functions: Polynomial time solvable!

Optimization Problems

DISTIN: Distributed Inference and Optimization in WSNs A Message-Passing Perspective SCOM Team

ApproxHadoop Bringing Approximations to MapReduce Frameworks

Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.

Cost-effective Outbreak Detection in Networks Presented by Amlan Pradhan, Yining Zhou, Yingfei Xiang, Abhinav Rungta -Group 1.

Distributed Optimization Yen-Ling Kuo Der-Yeuan Yu May 27, 2010.

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

Optimal Relay Placement for Indoor Sensor Networks Cuiyao Xue †, Yanmin Zhu †, Lei Ni †, Minglu Li †, Bo Li ‡ † Shanghai Jiao Tong University ‡ HK University.

1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,

The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.

Inferring Networks of Diffusion and Influence

Stochastic tree search and stochastic games

Data Driven Resource Allocation for Distributed Learning

Monitoring rivers and lakes [IJCAI ‘07]

Near-optimal Observation Selection using Submodular Functions

Who are the most influential bloggers?

Distributed Submodular Maximization in Massive Datasets

Announcements Homework 3 due today (grace period through Friday)

Objective of This Course

Cost-effective Outbreak Detection in Networks

Near-Optimal Sensor Placements in Gaussian Processes

Presentation transcript:

Carnegie Mellon AI, Sensing, and Optimized Information Gathering: Trends and Directions Carlos Guestrin joint work with: and: Anupam Gupta, Jon Kleinberg, Brendan McMahan, Ajit Singh, and others…

Monitoring algal blooms Algal blooms threaten freshwater 4 million people without water 1300 factories shut down $14.5 billion to clean up Other occurrences in Australia, Japan, Canada, Brazil, Mexico, Great Britain, Portugal, Germany … Growth processes still unclear [Carmichael] Need to characterize growth in the lakes, not in the lab! Tai Lake China 10/07 MSNBC

Can only make a limited number of measurements! Depth Location across lake Monitoring rivers and lakes Need to monitor large spatial phenomena Temperature, nutrient distribution, fluorescence, … Predict at unobserved locations NIMS Kaiser et.al. (UCLA) Color indicates actual temperature Predicted temperature Use robotic sensors to cover large areas Where should we sense to get most accurate predictions? [Singh, Krause, G., Kaiser ‘07]

Water distribution networks Simulator from EPA Water distribution in a city  very complex system Pathogens in water can affect thousands (or millions) of people Currently: Add chlorine to the source and hope for the best Chlorine ATTACK! could deliberately introduce pathogen

Monitoring water networks [Krause, Leskovec, G., Faloutsos, VanBriesen ‘08] Contamination of drinking water could affect millions of people Place sensors to detect contaminations “Battle of the Water Sensor Networks” competition Where should we place sensors to detect contaminations quickly ? Sensors Simulator from EPA Hach Sensor ~$14K

Sensing problems Want to learn something about the state of the world Detect outbreaks, predict algal blooms … We can choose (partial) observations… Place sensors, make measurements, … … but they are expensive / limited hardware cost, power consumption, measurement time … Want cost-effectively get most useful information! Fundamental problem: What information should I use to learn ?

Related work Sensing problems considered in Experimental design (Lindley ’56, Robbins ’52…), Spatial statistics (Cressie ’91, …), Machine Learning (MacKay ’92, …), Robotics (Sim&Roy ’05, …), Sensor Networks (Zhao et al ’04, …), Operations Research (Nemhauser ’78, …) Existing algorithms typically Heuristics: No guarantees! Can do arbitrarily badly. Find optimal solutions (Mixed integer programming, POMDPs):Very difficult to scale to bigger problems.

This talk Theoretical: Approximation algorithms that have theoretical guarantees and scale to large problems Applied: Empirical studies with real deployments and large datasets

Model-based sensing Model predicts impact of contaminations For water networks: Water flow simulator from EPA For lake monitoring: Learn probabilistic models from data (later) For each subset A  V compute “sensing quality” F(A) S2S2 S3S3 S4S4 S1S1 S2S2 S3S3 S4S4 S1S1 High sensing quality F(A) = 0.9 Low sensing quality F(A)=0.01 Model predicts High impact Medium impact location Lowimpact location Sensor reduces impact through early detection! S1S1 Contamination Set V of all network junctions

Robust sensingComplex constraints Sequential sensing Optimizing sensing / Outline Sensing locations Sensing quality Sensing budget Sensing cost Sensor placement

Given: finite set V of locations, sensing quality F Want: A *  V such that Typically NP-hard! How well can this simple heuristic do? S1S1 S2S2 S3S3 S4S4 S5S5 S6S6 Greedy algorithm: Start with A =Ø ; For i = 1 to k s* := argmax s F(A  {s}) A := A  {s*}

Performance of greedy algorithm Greedy score empirically close to optimal. Why? Small subset of Water networks data Greedy Optimal Population protected (higher is better) Number of sensors placed

S2S2 S3S3 S4S4 S1S1 Key property: Diminishing returns S2S2 S1S1 S’ Placement A = {S 1, S 2 } Placement B = {S 1, S 2, S 3, S 4 } Adding S’ will help a lot! Adding S’ doesn’t help much New sensor S’ B..... A S’ + + Large improvement Small improvement For A  B, F(A  {S’}) – F(A) ≥ F(B  {S’}) – F(B) Submodularity: Theorem [Krause, Leskovec, G., Faloutsos, VanBriesen ’08] : Sensing quality F(A) in water networks is submodular!

One reason submodularity is useful Theorem [Nemhauser et al ‘78] Greedy algorithm gives constant factor approximation F(A greedy ) ≥ (1-1/e) F(A opt ) Greedy algorithm gives near-optimal solution! Guarantees best possible unless P = NP! Many more reasons, sit back and relax… ~63%

People sit a lot Activity recognition in assistive technologies Seating pressure as user interface Equipped with 1 sensor per cm 2 ! Costs $16,000!  Can we get similar accuracy with fewer, cheaper sensors? Lean forward SlouchLean left 82% accuracy on 10 postures! [Zhu et al] Building a Sensing Chair [Mutlu, Krause, Forlizzi, G., Hodgins ‘07]

How to place sensors on a chair? Sensor readings at locations V as random variables Predict posture Y using probabilistic model P(Y,V) Pick sensor locations A*  V to minimize entropy: Possible locations V Theorem: Information gain is submodular! * [UAI’05] *See store for details AccuracyCost Before82%$16,000  After79%$100 Placed sensors, did a user study: Similar accuracy at <1% of cost!

Battle of the Water Sensor Networks Competition Real metropolitan area network (12,527 nodes) Water flow simulator provided by EPA 3.6 million contamination events Multiple objectives: Detection time, affected population, … Place sensors that detect well “on average”

BWSN Competition results 13 participants Performance measured in 30 different criteria Total Score Higher is better Our approach Berry et. al. Dorini et. al. Wu & Walski Ostfeld & Salomons Propato & Piller Eliades & Polycarpou Huang et. al. Guan et. al. Ghimire & Barkdoll Trachtman Gueli Preis & Ostfeld E E D D G G G G G H H H G: Genetic algorithm H: Other heuristic D: Domain knowledge E: “Exact” method (MIP) 24% better performance than runner-up!

Simulated all on 2 weeks / 40 processors 152 GB data on disk  Very accurate sensing quality, 16 GB in main memory (compressed) Using “lazy evaluations”: 1 hour/20 sensors Done after 2 days! Advantage through theory and engineering! Lower is better 30 hours/20 sensors 6 weeks for all 30 settings  3.6M contaminations Very slow evaluation of F(A)  Number of sensors selected Running time (minutes) Exhaustive search (All subsets) Naive Greedy ubmodularity to the rescue: What was the trick? Fast Greedy

Robustness against adversaries Unified view Robustness to change in parameters Robust experimental design Robustness to adversaries SATURATE: A simple, but very effective algorithm for robust sensor placement If sensor locations are known, attack vulnerable locations [Krause, McMahan, G., Gupta ‘07]

What about worst-case? S2S2 S3S3 S4S4 S1S1 Knowing the sensor locations, an adversary contaminates here! Where should we place sensors to quickly detect in the worst case? Very different average-case score, Same worst-case score S2S2 S3S3 S4S4 S1S1 Placement detects well on “average-case” (accidental) contamination

Optimizing for the worst case Contamination at node s Sensors A F s (A) is high Contamination at node r F r (A) is low F s (B) is low F r (B) is high Sensors B F r (C) is high F s (C) is high Sensors C Separate utility function F i with each contamination i F i (A) = impact reduction by sensors A for contamination i Want to solve Each of the F i is submodular Unfortunately, min i F i not submodular! How can we solve this robust sensing problem?

How does the greedy algorithm do? Theorem [NIPS ’07]: The problem max |A| ≤ k min i F i (A) does not admit any approximation unless P=NP Optimal solution Greedy picks first Then, can choose only or  Greedy does arbitrarily badly. Is there something better? V={,, } Can only buy k=2 Greedy score:  Optimal score: 1 Set AF1F1 F2F2 min i F i  1   2  121 Hence we can’t find any approximation algorithm. Or can we?

Alternative formulation If somebody told us the optimal value, can we recover the optimal solution A * ? Need to find Is this any easier? Yes, if we relax the constraint |A| ≤ k

Solving the alternative problem Trick: For each F i and c, define truncation c |A| F i (A) F’ i,c (A) Same optimal solutions! Solving one solves the other Non-submodular  Don’t know how to solve Submodular! Can use greedy! Problem 1 (last slide) Problem 2

Back to our example Guess c=1 First pick Then pick  Optimal solution! How do we find c? Do binary search! Set AF1F1 F2F2 min i F i F’ avg,1 100½ 020½  1  (1+  )/2  2  1211

Saturate Algorithm [NIPS ‘07] Given: set V, integer k and submodular functions F 1,…,F m Initialize c min =0, c max = min i F i (V) Do binary search: c = (c min +c max )/2 Greedily find A G such that F’ avg,c (A G ) = c If |A G | ≤  k: increase c min If |A G | >  k: decrease c max until convergence Truncation threshold (color)

Theoretical guarantees Theorem: If there were polytime algorithm with better factor  < , then NP  DTIME(n log log n ) Theorem: Saturate finds a solution A S such that min i F i (A S ) ≥ OPT k and |A S | ≤  k where OPT k = max |A| ≤ k min i F i (A)  = 1 + log max s  i F i ({s}) Theorem: The problem max |A| ≤ k min i F i (A) does not admit any approximation unless P=NP 

Example: Lake monitoring Monitor pH values using robotic sensor Position s along transect pH value True (hidden) pH values Prediction at unobserved locations transect Where should we sense to minimize our maximum error? Use probabilistic model (Gaussian processes) to estimate prediction error (often) submodular [Das & Kempe ’08] Var(s | A)  Robust sensing problem! Observations A

Comparison with state of the art Algorithm used in geostatistics: Simulated Annealing [Sacks & Schiller ’88, van Groeningen & Stein ’98, Wiens ’05,…] 7 parameters that need to be fine-tuned Environmental monitoring better Precipitation data Number of sensors Maximum marginal variance Greedy Saturate Simulated Annealing Saturate is competitive & 10x faster No parameters to tune!

Saturate Results on water networks 60% lower worst-case detection time! Water networks Number of sensors Maximum detection time (minutes) Lower is better No decrease until all contaminations detected! Greedy Simulated Annealing

Is optimizing for the worst case too conservative? SATURATE: significantly better worst-case performance good average performance worst attack max avg max min average attack max avg max min (on water competition data)

Trading off average-case performance and robustness Worst-case may be too conservative Maximize average Subject to worst-case ≥ c adv Hard constrained optimization problem submodular optimization with minimax constraint Extension of SATURATE approximation algorithm with similar guarantees Expected score Adversarial score k=5 k=10 k=15 k=20

Summary so far Submodularity in sensing optimization Greedy is near-optimal Robust sensing Greedy fails badly Saturate is near-optimal Path planning Communication constraints Constrained submodular optimization pSPIEL gives strong guarantees Sequential sensing Exploration Exploitation Analysis All these applications involve physical sensing Now for something completely different Let’s jump from water…

… to the Web! You have 10 minutes each day for reading blogs / news. Which of the million blogs should you read?

Time Information cascade Which blogs should we read to learn about big cascades early? Learn about story after us! Information Cascades [Leskovec, Krause, G., Faloutsos, VanBriesen ‘07]

Water vs. Web In both problems we are given Graph with nodes (junctions / blogs) and edges (pipes / links) Cascades spreading dynamically over the graph (contamination / citations) Want to pick nodes to detect big cascades early Placing sensors in water networks Selecting informative blogs vs. In both applications, utility functions submodular

Performance on Blog selection Outperforms state-of-the-art heuristics 700x speedup using submodularity! Blog selection Lower is better Number of blogs selected Running time (seconds) Exhaustive search (All subsets) Naive greedy Fast greedy Blog selection ~45k blogs Higher is better Number of blogs Cascades captured Greedy In-links All outlinks # Posts Random

Naïve approach: Just pick 10 best blogs Selects big, well known blogs (Instapundit, etc.) These contain many posts, take long to read! Taking “attention” into account Cascades captured Number of posts (time) allowed x 10 4 Cost/benefit analysis Ignoring cost Cost-benefit optimization picks summarizer blogs!

Predicting the “hot” blogs Detects on training set Greedy on historic Test on future Poor generalization! Why’s that? Greedy on future Test on future “Cheating” Cascades captured Number of posts (time) allowed Detect well here! Detect poorly here! Want blogs that will be informative in the future Split data set; train on historic, test on future Blog selection “overfits” to training data! Let’s see what goes wrong here. Want blogs that continue to do well!

Robust optimization Detections using Saturate F 1 (A)=.5F 2 (A)=.8F 3 (A)=.6F 4 (A)=.01F 5 (A)=.02 Optimize worst-case F i (A) = detections in interval i “Overfit” blog selection A “Robust” blog selection A* Robust optimization  Regularization!

Predicting the “hot” blogs Greedy on historic Test on future Robust solution Test on future Greedy on future Test on future “Cheating” Sensing quality Number of posts (time) allowed 50% better generalization!

Summary Submodularity in sensing optimization Greedy is near-optimal Robust sensing Greedy fails badly Saturate is near-optimal Path planning Communication constraints Constrained submodular optimization pSPIEL gives strong guarantees Sequential sensing Exploration Exploitation Analysis Constrained optimization  better use of “attention” Robust optimization  better generalization

AI-complete dream Robot that saves the world Robot that cleans your room But… It’s definitely useful, but… Really narrow Hardware is a real issue Will take a while What’s an “AI-complete” problem that will be useful to a huge number of people in the next years? What’s a problem accessible to a large part of AI community?

What makes a good AI-complete problem? A complete AI-system: Sensing: gathering information from the world Reasoning: making high-level conclusions from information Acting: making decisions that affect the dynamics of the world and/or the interaction with the user But also Hugely complex Can get access to real data Can scale up and layer up Can make progress Very cool and exciting Data gathering can lead to good, accessible and cool AI-complete problems

Factcheck.org Take a statement Collect information from multiple sources Evaluate quality of sources Connect them Make a conclusion AND provide an analysis

Automated fact checking Query Fact or Fiction ? Conclusion and Justification Active user feedback on sources and proof Web Models Inferenc e Can lead to very cool “AI-complete” problem, useful, and can make progress in short term!

Conclusions Sensing and information acquisition problems are important and ubiquitous Can exploit structure to find provably good solutions Obtain algorithms with strong guarantees Perform well on real world problems Could help focus on a cool “AI-complete” problem