Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

The Matching Hypothesis Jeff Schank PSC 120. Mating Mating is an evolutionary imperative Much of life is structured around securing and maintaining long-term.
Trustworthy Service Selection and Composition CHUNG-WEI HANG MUNINDAR P. Singh A. Moini.
Hypothesis Testing making decisions using sample data.
Outline input analysis input analyzer of ARENA parameter estimation
MPS Research UnitCHEBS Workshop - April Anne Whitehead Medical and Pharmaceutical Statistics Research Unit The University of Reading Sample size.
Structural Reliability Analysis – Basics
Sample size optimization in BA and BE trials using a Bayesian decision theoretic framework Paul Meyvisch – An Vandebosch BAYES London 13 June 2014.
What role should probabilistic sensitivity analysis play in SMC decision making? Andrew Briggs, DPhil University of Oxford.
Visual Recognition Tutorial
Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Capturing User Interests by Both Exploitation and Exploration Richard Sia (Joint work with NEC) Feb
The Cost-Effectiveness and Value of Information Associated with Biologic Drugs for the Treatment of Psoriatic Arthritis Y Bravo Vergel, N Hawkins, C Asseburg,
Latent Dirichlet Allocation a generative model for text
Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.
Exploration and Exploitation Strategies for the K-armed Bandit Problem by Alexander L. Strehl.
Discussion of Profs. Robins’ and M  ller’s Papers S.A. Murphy ENAR 2003.
Visual Recognition Tutorial
1 gR2002 Peter Spirtes Carnegie Mellon University.
Variable Selection for Optimal Decision Making Lacey Gunter University of Michigan Statistics Department Michigan Student Symposium for Interdisciplinary.
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
Chong Ho Yu Department of Psychology, APU 362: Research Method.
Asaf Cohen Department of Mathematics University of Michigan Financial Mathematics Seminar University of Michigan September 10,
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Chapter 8 Introduction to Hypothesis Testing
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
1 Demand for Repeated Insurance Contracts with Unknown Loss Probability Emilio Venezian Venezian Associates Chwen-Chi Liu Feng Chia University Chu-Shiu.
1 of 36 The EPA 7-Step DQO Process Step 6 - Specify Error Tolerances (60 minutes) (15 minute Morning Break) Presenter: Sebastian Tindall DQO Training Course.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Chapter 12 Introduction to Analysis of Variance PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Eighth Edition by Frederick.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
Matching Analyses to Decisions: Can we Ever Make Economic Evaluations Generalisable Across Jurisdictions? Mark Sculpher Mike Drummond Centre for Health.
MPS/MSc in StatisticsAdaptive & Bayesian - Lect 71 Lecture 7 Bayesian methods: a refresher 7.1 Principles of the Bayesian approach 7.2 The beta distribution.
1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.
1 of 31 The EPA 7-Step DQO Process Step 6 - Specify Error Tolerances 60 minutes (15 minute Morning Break) Presenter: Sebastian Tindall DQO Training Course.
Introduction to Multilevel Analysis Presented by Vijay Pillai.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
1 of 48 The EPA 7-Step DQO Process Step 6 - Specify Error Tolerances 3:00 PM - 3:30 PM (30 minutes) Presenter: Sebastian Tindall Day 2 DQO Training Course.
Small Decision-Making under Uncertainty and Risk Takemi Fujikawa University of Western Sydney, Australia Agenda: Introduction Experimental Design Experiment.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Decision Tree Analysis. Definition A Decision Tree is a graphical presentation of a decision-making process within a business which aims to highlight.
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
Chapter 9 Introduction to the t Statistic
Advanced Data Analytics
Figure 5: Change in Blackjack Posterior Distributions over Time.
Falk Lieder1, Zi L. Sim1, Jane C. Hu, Thomas L
Determining How Costs Behave
Chapter Six Normal Curves and Sampling Probability Distributions
When Security Games Go Green
Introduction to Inferential Statistics
Inference Concerning a Proportion
Simultaneous Inferences and Other Regression Topics
Comparing two Rates Farrokh Alemi Ph.D.
Hypothesis Testing.
Michal Rosen-Zvi University of California, Irvine
Shunan Zhang, Michael D. Lee, Miles Munro
Topic Models in Text Processing
PSY 250 Hunter College Spring 2018
Mathematical Foundations of BME Reza Shadmehr
Testing Hypotheses about a Population Proportion
Presentation transcript:

Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human decision-making on bandit problems: Journal of Mathematical Psychology 53 (2009) Presenter: Juan Wang Date: 18/5/2015

Bandit problem A decision –maker must choose one out of multiple alternatives after a short sequence of trials. (Such as different treatments) Each of the alternatives has a fixed reward rate, but are not told what the rates are. (such as success rate after accepting one treatments) However, the problem of dilemma between exploration and exploitation is evident in many real-world decision-making situations. ---e.g. shown in the below figure.

Which alternative should be chosen on the 11 th trails? The first choice represents more failures and less successes, but at a moderate rate and also well-known than the second. However, the second alternative explores the possibility that this alternative may be the more rewarding one. ---Dilemma between exploration and exploitation. Acquiring knowledge of each alternative is exploration, and making use of it to making the option is exploitation.

Therefore, it is necessary for decision-makers to find good ways to learn about alternatives, which is requires exploration and which requires exploitation, simultaneously attaining more rewards.

Background Human performance on bandit problems has been a topic of interest in variety of fields, such as economics and cognitive neuroscience. Most studies focused on a large number of trials (larger horizon bandit problems), however which is less likely to allow for people switch flexibly between exploration and exploitation when a small number of trials ( short-horizon bandit problems).

Objective To know if people switch flexibly between exploration and exploitation under the short horizon bandit problems, and to well understand how switch on a specially interest situation: a well-understood but only moderately-rewarding alternative compared to a less well-understood but possibly better-rewarding alternative.

In this paper, authors developed and evaluated a probabilistic model that assumes different latent states guide decision making for short-horizon bandit problems. (searching/exploration state and stand/exploition state )

Assumption of three different situations

The Probabilistic Model

Experiment Conditions: six different types of bandit problems conditions: combination of two trial size (8 trials and 16 trials) and three different environmental distributions (Beta distribution where two parameters consisted of prior successes and prior failures ). Assumed 50 problems for each condition: (total 300 problems) Date: collected date from 10 naïve participants (6 males, 4 females) all problems within the conditions was randomized for each participant at each trail

Optimal Performance and Model analysis 1) Calculate Optimal decision-making behavior for all of the problems completed by 10 participants using a recursive approach in reinforcement learning literature (e.g.,Kaebling et al.,1996). I did not understand this recursive approach, and this issue mentioned in Kaebling’s paper, anyway, this approach is helpful to find the optimal decision-making process for a bandit problem after giving distribution conditions and trail size. 2) Applied the graphical model in Figure 2 to all of the optimal and human decision data (training data), for all six bandit problem conditions. For each data set, estimated parameter from 1000 posterior samples.

Test the latent state model how to fit the observed data reasonable well Compared its predicted decisions at its-best-fitting parameterization (estimator) to all of the human and optimal decision-making data. Proportion of agreement calculated between both. Generally fit well, just a little less well for participant AH.

Check the descriptive adequacy of the latent state model Zi parameter inferred in the model is a variable to determine either search or state for i-th trail. Descriptive adequacy is shown in the figure of next slide. Posterior probability that the i-th trial uses the stand state approximates the posterior of the Zi indicator variables.