Modeling the Motivation-Learning Interface in Learning and Decision Making (FA9550-06-1-0204) PI ’ s: W. Todd Maddox (University of Texas, Austin) & Arthur.

Slides:



Advertisements
Similar presentations
BEHAVIORAL RESEARCH IN MANAGERIAL ACCOUNTING RANJANI KRISHNAN HARVARD BUSINESS SCHOOL & MICHIGAN STATE UNIVERSITY 2008.
Advertisements

Promotion and Prevention. Approach and avoidance systems We have talked about approach and avoidance These systems may be engaged broadly –Higgins –Promotion.
Design of Experiments Lecture I
This research examines the effects of pressure and task difficulty across the lifespan in a history dependent decision-making task, the Mars Farming task.
Managerial Decision Modeling with Spreadsheets
Institute for Theoretical Physics and Mathematics Tehran January, 2006 Value based decision making: behavior and theory.
Learning new uses of technology: Situational goal orientation matters Presenter: Che-Yu Lin Advisor: Min-Puu Chen Date: 03/09/2009 Loraas, T., & Diaz,
CMPUT 466/551 Principal Source: CMU
Audiovisual Emotional Speech of Game Playing Children: Effects of Age and Culture By Shahid, Krahmer, & Swerts Presented by Alex Park
Decision-making II choosing between gambles neural basis of decision-making.
Do we always make the best possible decisions?
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 5 Making Systematic Observations.
Dr. Bruce J. West Chief Scientist Mathematical & Information Science Directorate Army Research Office UNCLASSIFIED.
DDMLab – September 27, ACT-R models of training Cleotilde Gonzalez and Brad Best Dynamic Decision Making Laboratory (DDMLab)
From T. McMillen & P. Holmes, J. Math. Psych. 50: 30-57, MURI Center for Human and Robot Decision Dynamics, Sept 13, Phil Holmes, Jonathan.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Inference in Dynamic Environments Mark Steyvers Scott Brown UC Irvine This work is supported by a grant from the US Air Force Office of Scientific Research.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Designing the Marketing Channel
Attribution Theory.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Theory of Decision Time Dynamics, with Applications to Memory.
1 Comments on: “New Research on Training, Growing and Evaluating Teachers” 6 th Annual CALDER Conference February 21, 2013.
Establishing MME and MEAP Cut Scores Consistent with College and Career Readiness A study conducted by the Michigan Department of Education (MDE) and ACT,
Issues with Data Mining
Bayesian Hierarchical Models of Individual Differences in Skill Acquisition Dr Jeromy Anglim Deakin University 22 nd May 2015.
Understanding the Variability of Your Data: Dependent Variable Two "Sources" of Variability in DV (Response Variable) –Independent (Predictor/Explanatory)
Overview of the Final Project and Searching the Literature.
Situational Leadership: Perception and the Impact of Power
McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc. All rights reserved. 7-1 Defining Competitiveness Chapter 7.
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
Reinforcers and Punishers versus Incentives Reinforcers and punishers refer to good and bad behavior consequences.
Motivation and Cognition Art Markman Special thanks to: C. Miguel Brendl Kyungil Kim Claude Messner Todd Maddox Grant Baldwin.
Chapter 9 – Classification and Regression Trees
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
CHAPTER 12 Descriptive, Program Evaluation, and Advanced Methods.
Schedules of Reinforcement and Choice. Simple Schedules Ratio Interval Fixed Variable.
From Bad to Worse: Variations in Judgments of Associative Memory Erin Buchanan, Ph.D., Missouri State University Abstract Four groups were tested in variations.
Cognitive Modeling / University of Groningen / / Artificial Intelligence |RENSSELAER| Cognitive Science CogWorks Laboratories › Christian P. Janssen ›
Advanced Decision Architectures Collaborative Technology Alliance An Interactive Decision Support Architecture for Visualizing Robust Solutions in High-Risk.
Ermer, Cosmides, Tooby By: Breana & Bryan Relative status regulates risky decision making about resources in men: evidence for the co-evolution of motivation.
Issues concerning the interpretation of statistical significance tests.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
NCAF Manchester July 2000 Graham Hesketh Information Engineering Group Rolls-Royce Strategic Research Centre.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Choking and Excelling Under Pressure in Classification W. Todd Maddox & Arthur B. Markman University of Texas, Austin Major Collaborators: Grant Baldwin.
Neural Correlates of Learning Models in a Dynamic Decision-Making Task Darrell A. Worthy 1, Marissa A. Gorlick 2, Jeanette Mumford 2, Akram Bakkour 2,
Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.
1 DECISION MAKING Suppose your patient (from the Brazilian rainforest) has tested positive for a rare but serious disease. Treatment exists but is risky.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Dynamics of Reward and Stimulus Information in Human Decision Making Juan Gao, Rebecca Tortell & James L. McClelland With inspiration from Bill Newsome.
With Age Comes Wisdom: Age-Based Differences on Choice Dependent and Choice Independent Decision-Making Tasks Darrell A. Worthy 1, Marissa A. Gorlick 2,
Motivation and Cognition: From Regulatory Fit to Reinforcement Learning Darrell A. Worthy University of Texas, Austin.
Drug abuse liability is associated with higher reward-sensitivity: An fMRI study using the Monetary Incentive Delay task C. Corbly, T. Kelly, Y. Jiang,
The Influence of Motivation on Human Categorization and Decision Making W. Todd Maddox and Arthur B. Markman University of Texas Presented at the Gazzaniga.
Psychology and Neurobiology of Decision-Making under Uncertainty Angela Yu March 11, 2010.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Does the brain compute confidence estimates about decisions?
Dynamics of Reward Bias Effects in Perceptual Decision Making Jay McClelland & Juan Gao Building on: Newsome and Rorie Holmes and Feng Usher and McClelland.
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
Chapter 16 What Should Central Banks Do? Monetary Policy Goals, Strategy, and Targets.
Designing the Marketing Channel
Understanding Results
CJT 765: Structural Equation Modeling
Wallis, JD Helen Wills Neuroscience Institute UC, Berkeley
Presentation transcript:

Modeling the Motivation-Learning Interface in Learning and Decision Making (FA ) PI ’ s: W. Todd Maddox (University of Texas, Austin) & Arthur B. Markman (University of Texas, Austin) AFOSR Joint Program Review - Cognition and Decision Program and Human-System Interface Program (Jan 28-30, 2009, Arlington, VA)

Motivation-Learning Interface (Maddox/Markman) Objective: DoD Benefit: Technical Approach: Budget: Actual/ Planned $K FY06FY07FY08 $152 Annual Progress Report Submitted? YYN Project End Date: 2/28/09 To understand the influence of motivational incentives on learning and performance through empirical and computational model-based analyses. To improve mathematical models of learning and performance based on data. Manipulate people’s motivational state through global and local incentive manipulations. Conduct experiments on choice and signal detection to understand how motivation affects the optimality of performance, and exploration/exploitation tradeoff. Motivational states guide actions but differ across military and non-military settings. Goal is to identify motivational states that optimize performance in each setting. Behavioral and model-based analyses illuminate these effects and characterize them along an exploration-exploitation continuum.

List of Project Goals 1.Develop and test a choice/gambling task 2.Examine and model motivational influences on this choice task 3.Examine and model variants of this task 4.Explore social influences on motivation, learning and performance 5.Extend to and model related tasks (signal detection, decision criterion learning, dynamic decision making)

Progress Towards Goals (or New Goals) 1.Develop and test a choice/gambling task Done 2.Examine and model motivational influences on this choice task Initial studies completed and published 3.Examine and model variants of this task Some competed and published; others in progress 4.Explore social influences on motivation, learning, and performance Initial studies completed and published; others in progress 5.Extend to and model related tasks (signal detection, decision criterion learning, dynamic decision making) Work in progress

Research Questions What does it mean to “motivate” someone to do “well”? How do we achieve this aim?

Layman’s Answer What does it mean to “motivate” someone to do “well”? –Get them to “try harder” (maximize number correct, targets destroyed, etc) How do we achieve this aim? –Give them an incentive for maximizing (raise, promotion, etc) Our research suggests that offering a global incentive (raise) for maximizing local incentive (number correct) is too simple a story and is misleading, even if we define “trying harder” as “attempting to respond optimally”.

Three-Factor Framework Influence of motivating incentives on performance involves a complex three-way interaction between three factors Global incentives (Factor 1) –Approach some global reward (raise), or –Avoid losing some reward (avoid a pay cut) Local incentives (Factor 2) –Maximize gains (maximize points earned) –Minimize losses (minimize points lost) Task demand (What strategy is optimal?) (Factor 3) - Exploration or exploitation optimal

Overview of this talk Three factor (regulatory fit) framework Studies of choice Extensions of choice task and model Social influences on motivation Extensions to signal detection, decision criterion learning, dynamic decision making

Global Incentives (Regulatory Focus) Approach (Promotion Focus) Achieve Global Task Performance Criterion  Raffle ticket for $50 Avoidance (Prevention Focus) Achieve Global Task Performance Criterion  Keep $50 raffle ticket given initially

Task Reward Structure (Local Trial-by-trial Task Goal) Gains Earn points for all responses (Earn more points for correct choice than for incorrect choice) Losses Lose points for all responses (Lose fewer points for correct choice than for incorrect choice)

Consider the bigger picture Hypothesis: Fit increases exploration Exploration can be defined within tasks –Willingness to shift strategies –Willingness to explore a set of options

Consider the bigger picture Almost all cognitive research involves a promotion focus and a gains reward structure –Promotion focus: small monetary reward or social contract with experimenter. –Gains: reward for correct response, no reward for error Fit

Three-way interaction Exploration optimal GainsLosses PromotionFit: GoodMismatch : Poor PreventionMismatch: PoorFit: Good Exploitation optimal GainsLosses PromotionFit: PoorMismatch : Good PreventionMismatch: GoodFit: Poor

Choice/Gambling task Does Regulatory Fit affect choice? Two-Deck variant of Iowa Gambling task –Task 1: Exploration Optimal –Task 2: Exploitation Optimal Regulatory Focus (Global incentive) –Earn ticket or avoid losing ticket Reward Structure (Local incentive) –Gains vs. Losses Worthy, Maddox, & Markman (2007, PB&R)

Gains Condition Example

0 PICK A CARD! Yes Bonus No

0 Correct 181 Yes Bonus No 450 7

0 PICK A CARD! Yes Bonus No

0 Correct 184 Yes Bonus No 450 3

0 PICK A CARD! Yes Bonus No

Losses Condition Example

0 PICK A CARD! Yes Bonus No

0 PICK A CARD! Yes Bonus No

0 PICK A CARD! Yes Bonus No

Regulatory Fit and Choice At any moment, you have an estimate of the relative goodness of the decks –If you choose deterministically from the better deck, you are exploiting –If you choose more probabilistically, you are exploring –Does regulatory fit lead to more exploration than regulatory mismatch?

Modeling Choice Behavior EVs of each option are updated via a recency-weighted algorithm Current EVNew EVRewardRecency Parameter Current EV If reward is greater than the current EV the EV increases If reward is less than the current EV the EV decreases

 is a free parameter constrained to be between 0 and 1 Higher  values give greater weight to recent rewards When  = 1, Updating Equation reduces to: Alternatively, when  = 0, Updating Equation reduces to:

Action Selection Action selection is probabilistically determined via choice rules (e.g. Luce, 1959) Softmax Rule Probability of choosing option “A” EV for option “A” Exploitation parameter Sum of EVs for all options Higher  values indicate greater exploitation Lower  values indicate greater exploration

Exploration Optimal Predictions –Regulatory Fit should perform better –Regulatory Fit should yield small exploitation parameter (defined shortly)

Exploration optimal - Points Analysis

Exploitation Parameter (larger value = greater exploitation)

Exploitation Optimal Results (Gains only) Predictions supported

Summary Regulatory Fit  Exploratory Behavior Fit  Good performance when Exploration Optimal Fit  Poor performance when Exploitation Optimal Replicates pattern seen in classification (Maddox et al, 2006; Grimm et al, 2008)

Affect and Choice Four-Deck variant of Iowa Gambling task –Exploitation Optimal Alternative method for inducing regulatory focus –Smile vs. Frown faces on all cards Reward Structure –Gains vs. Losses Worthy, Maddox, & Markman (in preparation)

0 PICK A CARD! Yes Bonus No

0 PICK A CARD! Yes Bonus No

Predictions Since exploitation optimal, and assuming –smile = promotion –Frown = prevention Predictions –Regulatory Fit should perform worse –Regulatory Fit should yield small exploitation parameter Worthy, Maddox, & Markman (in preparation)

Points Analysis

Exploitation Parameter

Summary Predictions supported Same behavioral and model pattern for regulatory focus and affect manipulation Follow-up studies (running) –Exploration optimal task in progress for affect task. –Model comparison project –Feedback/ITI delays

Social Motivation and Cognition Choice studies so far –Explicit incentives to induce regulatory focus –Affect to induce regulatory focus Other social factors can affect regulatory focus –Stereotype threat: Negative self-relevant stereotype -> poor performance Negative stereotypes may induce a prevention focus If so, losses environment should attenuate effect. –DoD relevant due to hierarchical structure Grimm, Markman, & Maddox (2009; JPSP)

Stereotype Threat – Math problems

Exploration Optimal Classification o = category A = long, steep lines + = category B = all others Task requires exploration of space of possible rules.

Possible Rule-based Strategies 83% accuracy 100% accuracy

Experiment Screen Sample Gains Losses

Method Three-dimensional classification task –Exploration is optimal Arbitrary stereotypes given to participants –Women are better –Men are better Manipulated gains and losses of points Predictions –Traditional stereotype threat result for gains –Reversed stereotype threat result for losses

Task Accuracy Experiment 1: Women are Better Experiment 2: Men are Better

Model-Analyses - CJ Use Experiment 1: Women are Better Experiment 2: Men are Better

Work In Progress Exploitation optimal task in progress –Involves information-integration classification –Prediction: Pattern should completely reverse End of semester effect –Prevention focused so better with losses –supported

State and Trait Factors Affect Global Incentive Focus Manipulate global incentive focus (state variables) –Explicit monetary –Affect/Social stereotype Trait variables –Procrastinators (end-of-semester) –Personality characteristics Impulsivity, sensation seeking, anxiety, depression IMPASS -> bias toward simple rules (Tharp, Pickering & Maddox, under review)

Task and Model Extensions

Signal Detection Two-stimulus identification (line length) Promotion/Prevention x Gains/Losses Biased payoffs so accuracy-maximization must be abandoned (exploration optimal)

Preliminary Results Early learning effect on sensitivity. –Fit leads to increased sensitivity. No systematic effects on bias.

Extended training Effect emerges on bias with extended training –Fit leads to bias shift toward optimal. No systematic effects on sensitivity

Confidence paradigm Classification and Confidence judgment obtained

Nested Modeling Approach (derived from Mueller & Weidmann and Maddox & Bohil)

Preliminary Model Results Fit -> increased classification and confidence noise –Likely due to increased exploration

Followup Incorporate into Maddox and Bohil’s Hybrid Model External decision criterion ….

Summary LOCUS Plot Zhang, et al (1997; Journal of Neuroscience) Exploration-optimal tasks Exploitation-optimal tasks Strong interactions No consistent main effects

Summary What does it mean to “motivate” someone to do “well”? How do we achieve this aim? It is complex, but systematic and understandable. It involves a three-way interaction of –Global incentives –Local incentive –Task demand (i.e., optimal classifier strategy)

Summary (cont.) Regulatory Fit (interaction between global and local incentives) leads to increased exploration. Exploration can be advantageous or disadvantageous, depending upon the task demands.

Summary (cont.) We successfully applied a reinforcement learning model to choice and identified an “exploitation” parameter that tracks regulatory fit effects. We applied classification learning models to stereotype threat data and found that regulatory fit affects the flexibility of hypothesis-testing. We are extending the approach to more basic tasks such as signal detection and criterion learning and are generalizing relevant models to account for regulatory fit effects Finally, we are extending the approach to more dynamic decision making tasks and model development is ongoing.

Future Directions Continue model development Applications to resource acquisition (foraging) Exploration of other social effects on motivation –Social influences on choking under pressure.

Interaction with Other Groups and Organizations Interactions with AFOSR recipient (Brad Love) Interactions with the Institute for Advanced Technology (IAT) at UT-Austin, an Army UARC Interactions with the Institute for Innovation Creativity and Capital (IC 2 ) at UT-Austin Interactions with the Imaging Research Center (IRC) at UT-Austin Interactions with the Institute for Neuroscience (INS) at UT-Austin Interactions with the Center for Perceptual Systems (CPS) at UT-Austin Interactions with Veterans Affairs Medical Center (VAMC) at UC-San Diego

List of Publications Attributed to the Grant (2008-9) Peer-Reviewed Manuscripts Grimm, L.R., Markman, A.B., Maddox, W.T., & Baldwin, G.C. (2008). Differential effects of regulatory fit on classification learning. Journal of Experimental Social Psychology, 44, Worthy, D.A., Maddox, W.T., & Markman, A.B. (2008) Ratio and Difference Comparisons of Expected Reward in Decision Making Tasks. Memory & Cognition, 36, Grimm, L.R., Markman, A.B., Maddox, W.T., & Baldwin, G.C. (in press) Stereotype threat reinterpreted as regulatory fit. Journal of Personality and Social Psychology. Worthy, D.A., Markman, A.B. & Maddox, W.T. (in press) What is pressure? Evidence for social pressure as a type of regulatory focus. Psychonomic Bulletin and Review. Maddox, W.T., Glass, B.D., & Markman, A.B. (under revision) Regulatory fit effects on stimulus identification. Grimm, L.R., Markman, A.B., & Maddox, W.T. (under revision) Regulatory fit created by time of semester and task reward structure influences test performance. Glass, B.C., Markman, A.B., & Maddox, W.T. (under review) The generalized exploration model (GEM): A model of human foraging for empirical analysis Markman, A.B., Beer, J.S., Grimm, L.R., Rein, J.R., & Maddox, W.T. (under review) The optimal level of fuzz: Case studies in a methodology for psychological research. Conference Presentations Worthy, D., Markman, A.B., & Maddox, W.T. Are reward expectancies in choice tasks processes as ratios or differences?: Implications for theories of reward processing in the orbitofrontal cortex. Poster presented at the Annual Meeting of the Cognitive Neuroscience Society, San Franscisco, CA, April, Worthy, D.A., Maddox, W.T., & Markman, A.B. (2007). The length of feedback interval and inter-trial interval effects decision-making in choice tasks. Poster to be presented at the Annual Meeting of the Society for Neuroeconomics, September 27-30, 2008, Hull, Massachusetts. Worthy, D.A, Maddox, W.T., & Markman, A.B. What is pressure? Relating social pressure to regulatory focus. Poster presented at the 49th Annual Meeting of the Psychonomics Society, Chicago, Il, November, Grimm, L.R., Markman, A.B., & Maddox, W.T., Minimizing Losses Improves End of Semester GRE Performance. Presentation at the Society for Personality and Social Psychology, Tampa, Florida, February Glass, B.D., Filoteo, J.V., Markman, A.B. & Maddox, W.T. Regulatory focus and executive functions. Poster presented at the Annual Meeting of the Cognitive Neuroscience Society, San Franscisco, CA, March,

End-of-Semester End of semester participants are “bad”, “unmotivated” Maybe in a prevention focus? So mismatch with most task reward structures (gains). GRE math problems Grimm, Markman, & Maddox (under review)

Regulatory Fit = Exploration: Why? Empirical support in several domains Connection to Neuroscience –Positive affect-frontal exploration hypothesis (Isen, Ashby, etc) –Regulatory focus-frontal activation findings (Amodio, Cunningham, etc) –LC-NE-exploration/exploitation relation (Ashton-Jones, Cohen, Daw)

Feedback Delay, ITI and Choice Worthy, Markman & Maddox (2008; SFN) Increased ITI shown to increase exploitation in an exploration optimal task (Bogacz et al, 2007)

Design and Results Increased feedback duration-> less switching, less exploration. Four-deck exploitation optimal task (gains only)

Risky Decisions/Feedback Interval Each deck has a partner Same EV, but one low and one high variance Short ITI only (gains only) Replicate effect: Increased feedback duration-> less exploration. Increased feedback duration-> fewer risky choices.

Followups in progress Losses variants Exploration optimal variants

Extending Models Choice models use one of two decision rules –Matching rules –Difference rules These rules predict that choices are affected by scalar additions to reward values, but not by scalar multiplications These rules predict that choices are affected by scalar multiplications of reward values, but not by scalar additions. Worthy, Maddox, & Markman (2008; M&C)

Testing influence of reward value Exp 1 (Exploitation optimal) Exp 2 (exploration optimal) –Control: Deck values 1-10 Deck A: EV=6; Deck B: EV=4 –Distance-Preserving: Deck values Deck A: EV=86; Deck B: 84 –Ratio-Preserving: Deck values Deck A: EV=60; Deck B: EV=40 Worthy, Maddox, & Markman (2008; M&C)

Results Altering ratios has largest effect on performance

Summary Support for 3-way interaction in choice –Fit -> exploration Affect manipulation similar to focus Feedback delay increases exploitation, reduces risky choices –Implications for training…. Ratio preserving models supported

Optimal response allocation requires escaping a local minimum of reward [taken from Bogacz et al. (2007) and Montague and Berns (2002)] Exploration optimal Prediction: Regulatory fit should perform better Rising Optimum Task -Are people in regulatory fit less sensitive to local changes in payoffs? If they are, they will be able to overcome the local minimum Otto, Gureckis, Markman, & Love (in preparation)

Preliminary Results Model development in progress