Revisiting James March’s Exploration- Exploitation Trade-off With a Neurobiological Basis Chiara Chelini University of Turin ESA World Meeting, Rome, 28°

Slides:



Advertisements
Similar presentations
Chapter 2 The Process of Experimentation
Advertisements

How Students learn Mike Cook.
Ulams Game and Universal Communications Using Feedback Ofer Shayevitz June 2006.
New ways of thinking about management and organization are a key for Croatian participation in the European Union and in an integrated European Power.
DECISION THEORIES 1 Problem solving –Collaboration, GAME THEORY –Asymmetric information, AGENCY THEORY –Optimization, OPERATIONAL RESEARCH 2 Problem finding.
Validity (cont.)/Control RMS – October 7. Validity Experimental validity – the soundness of the experimental design – Not the same as measurement validity.
A cognitive theory for affective user modelling in a virtual reality educational game George Katsionis, Maria Virvou Department of Informatics University.
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
Experimental evidence of the emergence of aesthetic rules in pure coordination games Federica Alberti University of East Anglia ESA World Meeting 2007.
Q uest for the elusive transfer Danièle Bracke Michel Aubé PhD, Sciences cognitives.
Cognition and Crime Kristopher Proctor Kirk R. Williams Nancy G. Guerra University of California, Riverside.
ES INC: Economic and discounted cash flow techniques: a comparison with respect to the Requirements of the Management Control System.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 2: Evaluative Feedback pEvaluating actions vs. instructing by giving correct.
Green’s Tri-Level Hypothesis Behavioral: a person’s performance on specific experimental tasks Cognitive: the postulated cognitive or affective systems.
Exploration and Exploitation Strategies for the K-armed Bandit Problem by Alexander L. Strehl.
Models of Human Performance Dr. Chris Baber. 2 Objectives Introduce theory-based models for predicting human performance Introduce competence-based models.
7/14/20151 Effective Teaching and Evaluation The Pathwise System By David M. Agnew Associate Professor Agricultural Education.
Decision Making Decision-making is based on information Information is used to: Identify the fact that there is a problem in the first place Define and.
Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007.
Decision Making Upul Abeyrathne, Dept. of Economics, University of Ruhuna, Matara.
INTRODUCTION.- PROGRAM EVALUATION
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Norm Theory and Descriptive Translation Studies
The Need for Scientific Methodology The Characteristics of Modern Science The Objectives of Psychological Science The Tools of Psychological Science Scientific.
 A set of objectives or student learning outcomes for a course or a set of courses.  Specifies the set of concepts and skills that the student must.
POSC 202A: Lecture 1 Introductions Syllabus R Homework #1: Get R installed on your laptop; read chapters 1-2 in Daalgard, 1 in Zuur, See syllabus for Moore.
What research is Noun: The systematic investigation into and study of materials and sources in order to establish facts and reach new conclusions. Verb:
Reinforcement Learning Evaluative Feedback and Bandit Problems Subramanian Ramamoorthy School of Informatics 20 January 2012.
Copyright ©2011 Pearson Education
1 Dr. Itamar Arel College of Engineering Electrical Engineering & Computer Science Department The University of Tennessee Fall 2009 August 24, 2009 ECE-517:
Big Idea 1: The Practice of Science Description A: Scientific inquiry is a multifaceted activity; the processes of science include the formulation of scientifically.
Scientific Inquiry & Skills
Experimental Economics and Neuroeconomics. An Illustration: Rules.
Factors that contribute to effective research in an engineering department Gavin van Winsen, Jan-Harm C Pretorius, Leon Pretorius.
VIII Lecture Time Preferences. Wrap up of the previous lecture Problem of distinguishing between biasing and shaping effect. Evidence for shaping effect.
© 2005 Prentice-Hall 6-1 Individual Decision Making Chapter 6 Essentials of Organizational Behavior, 8/e Stephen P. Robbins.
Rethinking Research Deane Neubauer Professor Emeritus, University of Hawaii, Manoa Presented to the School of Social Science and Languages King Mongkut’s.
Evolving cooperation in one-time interactions with strangers Tags produce cooperation in the single round prisoner’s dilemma and it’s.
Why are there so few key mutant clones? Why are there so few key mutant clones? The influence of stochastic selection and blocking on affinity maturation.
Copyright © Allyn & Bacon 2008 Intelligent Consumer Chapter 14 This multimedia product and its contents are protected under copyright law. The following.
Experimental Algorithmics Reading Group, UBC, CS Presented paper: Fine-tuning of Algorithms Using Fractional Experimental Designs and Local Search by Belarmino.
Sequential decision behavior with reference-point preferences: Theory and experimental evidence - Daniel Schunk - Center for Doctoral Studies in Economics.
C82MST Statistical Methods 2 - Lecture 1 1 Overview of Course Lecturers Dr Peter Bibby Prof Eamonn Ferguson Course Part I - Anova and related methods (Semester.
On rare events and the economics of small decisions Ido Erev, Technion Examples: Using safety devices, cheating in exams, selecting among websites, stopping.
Institute of Physics Wroclaw University of Technology 28/09/2005 How can statistical mechanics contribute to social sciences? Piotr Magnuszewski, Andrzej.
Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
PSY 432: Personality Chapter 1: What is Personality?
Lecture №4 METHODS OF RESEARCH. Method (Greek. methodos) - way of knowledge, the study of natural phenomena and social life. It is also a set of methods.
WHAT IS RESEARCH? According to Redman and Morry,
A. Strategies The general approach taken into an enquiry.
Double Coordination in Small Groups Luigi Mittone, Matteo Ploner, Ivan Soraperra Computable and Experimental Economics Laboratory – University of Trento,
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
How Psychologists Do Research Chapter 2. How Psychologists Do Research What makes psychological research scientific? Research Methods Descriptive studies.
Does the brain compute confidence estimates about decisions?
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Decision Making and Creativity.
Distinguish between an experiment and other types of scientific investigations where variables are not controlled,
Classification of Research
CONCEPTS OF HYPOTHESIS TESTING
Revealing priors on category structures through iterated learning
CASE − Cognitive Agents for Social Environments
Chapter 2: Evaluative Feedback
Volume 85, Issue 2, Pages (January 2015)
Shunan Zhang, Michael D. Lee, Miles Munro
RESEARCH BASICS What is research?.
Biological Science Applications in Agriculture
Chapter 2: Evaluative Feedback
Presentation transcript:

Revisiting James March’s Exploration- Exploitation Trade-off With a Neurobiological Basis Chiara Chelini University of Turin ESA World Meeting, Rome, 28° June-1° July 2007

2 Content Reinterpretation of James March’s exploration-exploitation trade off [March, Organization Science, 1991] from the point of view of recent literature [Daw et al., 2006, Nature] on neuroscience Learning Mechanisms in organizational and individual decision-making

3 Content Theoretical and Methodological Background Successes and Cognitive Traps in Learning Empirical Evidence from a Neuroeconomics Experiment Towards a Conclusion: Would March Agree with this new interpretation?

4 Theoretical and Methodological Background Neuroeconomics represents a useful field of research to better understand the science of decision making. Biological basis on decision making. New evidence on economic games results: e.g. recent replications of Ultimatum Game have shown that a traditional economic and full rational explanation may not hold anymore [Camerer et al., 1999].

5 Successes and Cognitive Traps in Learning Herbert Simon and James March, “Organizations” [1958]: routine as a procedure for decision making Macro level: “repetitive organizational procedures” Micro level: “individual activities automatically triggered on the basis of stable mental models” Intuition and tacit component: Much of the behaviour we observe in organizations is “intuitive” in the sense that it occurs immediately upon recognition of a situation, hence much of the operations we observe in organizational actions come not from explicit analysis but from rules: recognition is in fact the capacity to distinguish “familiar significant cues, and to retrieve stored knowledge about how to use them”.[March and Simon, 1958]

6 Successes and Cognitive Traps in Learning Consistent Optimistic Bias: adjusting downward aspirations more slowly than they adjust upward. For instance when a new idea or a new technology is already introduced and they fail, a learning process will start in order to find a better solution. If we register a second and third failure and so on, a potentially “endless cycle of failure and unrewarding change“[Levinthal and March, 1993] may be triggered. Success or Competence trap: a “potentially self destructive product of learning” because “exploitation drives out exploration“ because when an organisation starts accumulating greater and greater competence in a particular field and it finds that it is rewarding, this positive feedback makes the organization engage more and more in the same activity but also stop innovating.” [Levinthal and March, 1993].

7 Successes and Cognitive Traps in Learning Then “in such a world, we must give an account not only of substantive rationality- the extent to which appropriate courses of action are chosen- but also procedural rationality- the effectiveness, in light of human cognitive powers and limitations, of the procedures used to chose actions.“ [Simon, 1978].

8 James March’s Exploration- Exploitation Trade-off Model of mutual learning between organization and individuals: External reality with m dimensions, each of which can take a value of 1 or -1, both with the same (indipendent) probability 0.5. At each time of period individuals n in an organization hold specific beliefs about external reality. The belief may match or not some aspects of the reality and take different values in accordance with this. Each organization has a “code“ that can be described as a set of “procedures, norms, rules and forms“ in which knowledge is stored and that can perform two kinds of learning: from and by the code.

9 James March’s Exploration- Exploitation Trade-off – Learning from the code: p1, the probability by which individual belief change to that of the code: a measure of the effectiveness of socialization. – Learning by the code: p2, the code may adapt to the beliefs that correspond to the reality on more dimensions that the code does. It is the effectiviness of learning by the code

10 James March's Simulation March run a simulation characterized by. 30 dimensions of reality (m), 50 individuals (n). 80 iterations. He explains that the equilibrium level of knowledge in organizations is given by the interaction of the two learning parameters p1 (learning from the code) and p2 (learning by the code).

11 James March's Result In particular “when socialization is slow, more rapid learning by the code leads to greater knowledge at equilibrium; but when socialization is rapid, greater equilibrium knowledge is achieved through slower learning by the code. By far the highest equilibrium knowledge occurs when the code learns rapidly from individuals whose socialization to the code is slow” [March,1991, page 6].

12 James March's Result A learning process could not be always exploitative because it would be myopic: “tendencies to increase exploitation and reduce exploration make adaptive processes potentially self-destructive“. [March, 1991] An adequate balance between exploitation and exploration is needed. The dilemma concerns then the possibility to calculate a correct measure of the two and to establish the instant in which a routine must be substituted because of its lack of performance.

13 Four Armed-Bandit Experiment [Daw et al., 2006] In their research Daw et al. [2006] investigate which specific brain's areas are activated during explorative and exploitative tasks in a “four-armed bandit problem“ In their experiment Dew et al. involved 14 healthy subjects who perform repeated choices between four different coloured slot machines that appear on a screen The pay-off that the subjects can get are between 1 and 100, drawn from a Gaussian distribution that the subject does not know. These features of the experimental design allow studying explorative and exploitative decisions under uniform conditions, in the context of a single task“. [Daw et al.,2006]

14 Experimental Findings In following interviews the subjects explain their strategies: most of them (11 of 14) reports that “occasionally try the different slots to work out which currently had the highest pay-offs (exploring) while the other times they choose the slot they thought had the highest pay-off (exploiting). The aim of the research is to give a quantitative basis to reinforcement learning strategies for exploration, that differs in “how exploratory action are directed“ [Daw et al., 2006]

15 Reinforcement Learning Rules ε greedy rule: select the action with the maximum value function most of the time, but choose randomly among the remaining options with a small probability (ε)“ [Lee]. The maximization option exploits, while the random option is the exploratory move. When the agent chooses randomly, she chooses equally among all options, then “uniformly and independently of the action-value estimates” softmax rule: the agent evaluates a difference in the value function of the options and chooses always the one she thinks is the best. While in ε- greedy the random selection is equally distributed among all options, the softmax rule uses a Gibbs, or Bolzman, distribution. Therefore with ε-greedy rule “it is likely to choose the worst- appearing action as it is to choose the next-to-best action. In task were the worst actions are very bad, this may be unsatisfactory.

16 Empirical Evidence from the Experiment Daw et al. [2006] classify each trial as explorative or exploitative and show the activity associated with each choice. Striatum and ventromedial prefrontal cortex: exploitative decision making. Frontopolar cortex, “a region considered important for the control of cognitive functions“ [Lee, 2006] and intraparetial sulcus: explorative decisions. Moreover regions of medial orbifrontal cortex (mOFC) are correlated significantly with the numbers of points the subject receives and then they produce an immediate reinforcement. This issue is also consistent with a research that indicates in the OFC the main brain area that encodes economic value [Padoa-Schioppa, 2006]

17 Towards a Conclusion: Would James March agree with this interpretation? In my opinion, he does: experimental economics as a more realistic approach. In the exploration-exploitation dilemma the movement to explore can be seen as a random shock that we can call “ε“ and it is not possible to give a correct prediction of the specific and contingent moment in which it could happen. The model of bounded and procedural rationality finds here a field of application in a particular kind of rationality that we would like to call “residual rationality“: the subject makes a cognitive control over the routine elaborated yet, learns from experience, makes inferences but is not able to forecast exactly when the random shock that make the routine change will take place.