Using Value of Information to Learn and Classify under Hard Budgets Russell Greiner, Daniel Lizotte, Aloak Kapoor, Omid Madani Dept of Computing Science,

Slides:

Advertisements

Similar presentations

Robust Feature Selection by Mutual Information Distributions Marco Zaffalon & Marcus Hutter IDSIA IDSIA Galleria 2, 6928 Manno (Lugano), Switzerland

Advertisements

Lirong Xia Reinforcement Learning (1) Tue, March 18, 2014.

Reinforcement Learning I: The setting and classical stochastic dynamic programming algorithms Tuomas Sandholm Carnegie Mellon University Computer Science.

Incentivize Crowd Labeling under Budget Constraint

Max Cut Problem Daniel Natapov.

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct

A Simple Distribution- Free Approach to the Max k-Armed Bandit Problem Matthew Streeter and Stephen Smith Carnegie Mellon University.

Taming the monster: A fast and simple algorithm for contextual bandits

ANDREW MAO, STACY WONG Regrets and Kidneys. Intro to Online Stochastic Optimization Data revealed over time Distribution of future events is known Under.

Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.

Pinpointing the security boundary in high-dimensional spaces using importance sampling Simon Tindemans, Ioannis Konstantelos, Goran Strbac Risk and Reliability.

Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

Planning under Uncertainty

Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

A Perspective on the Data Ajit Paul Singh M.Sc. Candidate Dept. of Computing Science University of Alberta.

Probably Approximately Correct Model (PAC)

Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.

Evaluating Hypotheses

Exploration and Exploitation Strategies for the K-armed Bandit Problem by Alexander L. Strehl.

Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy Psychiatric Biostatistics Symposium May 2009.

Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.

Thanks to Nir Friedman, HU

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.

Crash Course on Machine Learning

Active Learning for Probabilistic Models Lee Wee Sun Department of Computer Science National University of Singapore LARC-IMS Workshop.

Myopic Policies for Budgeted Optimization with Constrained Experiments Javad Azimi, Xiaoli Fern, Alan Fern Oregon State University AAAI, July

Bayesian Networks. Male brain wiring Female brain wiring.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.

27 February 2001What is Confidence?Slide 1 What is Confidence? How to Handle Overfitting When Given Few Examples Top Changwatchai AIML seminar 27 February.

CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

CLASSIFICATION: Ensemble Methods

Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.

1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

Hypothesis Testing.  Select 50% users to see headline A ◦ Titanic Sinks  Select 50% users to see headline B ◦ Ship Sinks Killing Thousands  Do people.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

Slides for “Data Mining” by I. H. Witten and E. Frank.

CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.

Quality of LP-based Approximations for Highly Combinatorial Problems Lucian Leahu and Carla Gomes Computer Science Department Cornell University.

COMP 2208 Dr. Long Tran-Thanh University of Southampton Bandits.

Adaptive Dependent Context BGL: Budgeted Generative (-) Learning Given nothing about training instances, pay for any feature [no “labels”, no “attributes”

CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.

Javad Azimi, Ali Jalali, Xiaoli Fern Oregon State University University of Texas at Austin In NIPS 2011, Workshop in Bayesian optimization, experimental.

Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham.

Application of Dynamic Programming to Optimal Learning Problems Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial.

Non-LP-Based Approximation Algorithms Fabrizio Grandoni IDSIA

Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.

BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.

Distributed Learning for Multi-Channel Selection in Wireless Network Monitoring — Yuan Xue, Pan Zhou, Tao Jiang, Shiwen Mao and Xiaolei Huang.

Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.

Bayesian Estimation and Confidence Intervals Lecture XXII.

1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.

Keep the Adversary Guessing: Agent Security by Policy Randomization

Tradeoffs Between Fairness and Accuracy in Machine Learning

Reinforcement Learning (1)

Bayesian Classification

Reinforcement Learning

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

CAP 5636 – Advanced Artificial Intelligence

LECTURE 07: BAYESIAN ESTIMATION

CS 188: Artificial Intelligence Spring 2006

Reinforcement Learning (2)

Reinforcement Learning (2)

Presentation transcript:

Using Value of Information to Learn and Classify under Hard Budgets Russell Greiner, Daniel Lizotte, Aloak Kapoor, Omid Madani Dept of Computing Science, University of Alberta Yahoo! Research Task: – Need classifier for diagnosing cancer Given: – pool of patients whose subtype is known feature values are NOT known – cost c(X i ) of purchasing feature X i – budget for purchasing feature values Produce: – classifier to predict subtype of novel instance, based on values of its features – … learned using only features purchased Process : Initially, learner R knows NO feature values At each time, – R can purchase value of a feature of an instance at cost – based on results of prior purchases – … until exhausting fixed budget Then R produces classifier Challenge : At each time, what should R purchase? – which feature of which instance? Purchasing a feature value… – Alters accuracy of classifier – Decreases remaining budget Quality: – accuracy of classifier obtained – REGRET: difference of classifier vs optimal Simpler Task Task: Determine which coin has highest P(head) … based on results of only 20 flips C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 C7C Which coin?? C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 C7C7 -- 1H C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 C7C H - 1T C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 C7C H 1T C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 C7C7 3H 2T - 1H 1T 5H 3T 1T 2H 1T 0 ⋮ X1X1 X2X2 X3X3 X4X4 Y ???? 1 ???? 0 ???? 1 ???? 1 $95 $0 ⋮ Costs: Classifier Learner Which feature of which instance?? Selector “C 7 ” Original Task Bayesian Framework: – Coin C i drawn from Beta(a i, b i )  MDP – State =  a 1, b 1, …, a k, b k, r  – Action = “Flip coin i” – Reward = 0 if r  0; else max i { a i /(a i +b i ) } –solve for optimal purchasing policy NP-hard  Develop tractable heuristic policies that perform well Round Robin – Flip C 1, then C 2, then... Biased Robin – Flip C i – If heads, flip C i ; else C i+1 Greedy Loss Reduction – Loss1(C i ) = loss of flipping C i once – Flip C * = argmin i { Loss1(C i ) } once Single Feature Lookahead (k) – SFL(C i, k) = loss of spending k flips on C i – Flip C * = argmin i { SFL(C i, k) } once Heuristic Policies X1X1 X2X2 X3X3 X4X4 Y ?? + ? 1 ???? 0 ???? 1 ???? 1 $100 $85 X1X1 X2X2 X3X3 X4X4 Y ?? + ? 1 ? -- ? ? 0 ???? 1 ???? 1 X1X1 X2X2 X3X3 X4X4 Y ?? + ? 1 ? ? ? 0 ?? + 1 ? + 1 A is APPROXIMATION Algorithm iff A ’s regret is bounded by a constant worse than optimal (for any budget, #coins, …) NOT approximation alg’s: Round Robin Random Greedy Interval Estimation Beta(1,1); n=10, b=10Beta(1,1); n=10, b=40 Beta(10,1); n=10, b=40 Results: Obvious approach Round robin is NOT good ! Contingent policies work best Important to know/use remaining budget (UAI’03; UAI’04; COLT’04; ECML’05) Related Work Not standard Bandit: Pure explore for “b” steps, then single exploit Not on-line learning No “feedback” until end Not PAC-learning Fixed #instances; NOT “polynomial” Not std experimental design This is simple active learning General Budgeted Learning is different … Use NaïveBayes classifier as… it handles missing data  no feature interaction Each +class instance is “the same”, … only O(N) parameters to estimate regret budget optimal alg A rArA Extended Task So far, LEARNER (researcher) has to pay for features… but CLASSIFIER (“MD”) gets ALL feature values … for free ! Typically… MD also has constraints [… capitation …]  Extended model: Hard budget for BOTH learner & classifier Eg: spend b L = $10,000 to learn a classifier, that can spend only b C = $50 /patient… Classifier = “Active Classifier” [GGR, 2002]  policy (decision tree) sequentially gathers info about instance, until rendering decision (Must make decision by  b C depth) Learner … spends b L gathering information,  posterior distribution P(·) [using naïve bayes assumption] uses Dynamic Program to find best cost-b C policy for P(·) Double dynamic program! Too slow  use heuristic policies Issues: Use “analogous” heuristic policies Round-robin (std approach) still bad SingleFeatureLookahead: how far ahead? k in SFL(k) ? Glass – Identical Feature Costs (b C =3) Heart Disease – Different Feature Costs (b C =7) Randomized SFL – Flip C i with probability  exp( SFL(C i, k) ), once Issues: Round-robin still bad… very bad… Randomized SFL is best (Deterministic) SFL is “too focused”