Science of JDM as an Efficient Game of Mastermind Michael H. Birnbaum California State University, Fullerton Bonn, July 26, 2013.

Slides:



Advertisements
Similar presentations
Chapter 7 Hypothesis Testing
Advertisements

New Paradoxes of Risky Decision Making that Refute Prospect Theories Michael H. Birnbaum Fullerton, California, USA.
Among those who cycle most have no regrets Michael H. Birnbaum Decision Research Center, Fullerton.
This Pump Sucks: Testing Transitivity with Individual Data Michael H. Birnbaum and Jeffrey P. Bahra California State University, Fullerton.
Chapter 10.  Real life problems are usually different than just estimation of population statistics.  We try on the basis of experimental evidence Whether.
1 Upper Cumulative Independence Michael H. Birnbaum California State University, Fullerton.
1 Lower Distribution Independence Michael H. Birnbaum California State University, Fullerton.
True and Error Models of Response Variation in Judgment and Decision Tasks Michael H. Birnbaum.
Evaluating Non-EU Models Michael H. Birnbaum Fullerton, California, USA.
Who are these People Who Violate Stochastic Dominance, Anyway? What, if anything, are they thinking? Michael H. Birnbaum California State University, Fullerton.
Testing Lexicographic Semi- Order Models: Generalizing the Priority Heuristic Michael H. Birnbaum California State University, Fullerton.
Testing Heuristic Models of Risky Decision Making Michael H. Birnbaum California State University, Fullerton.
1 A Brief History of Descriptive Theories of Decision Making Kiel, June 9, 2005 Michael H. Birnbaum California State University, Fullerton.
Some New Approaches to Old Problems: Behavioral Models of Preference Michael H. Birnbaum California State University, Fullerton.
1 Distribution Independence Michael H. Birnbaum California State University, Fullerton.
1 Upper Tail Independence Michael H. Birnbaum California State University, Fullerton.
Testing Models of Stochastic Dominance Violations Michael H. Birnbaum Decision Research Center California State University, Fullerton.
1 Upper Distribution Independence Michael H. Birnbaum California State University, Fullerton.
Ten “New Paradoxes” Refute Cumulative Prospect Theory of Risky Decision Making Michael H. Birnbaum Decision Research Center California State University,
Violations of Stochastic Dominance Michael H. Birnbaum California State University, Fullerton.
Testing Critical Properties of Models of Risky Decision Making Michael H. Birnbaum Fullerton, California, USA Sept. 13, 2007 Luxembourg.
Ten “New Paradoxes” Refute Cumulative Prospect Theory of Risky Decision Making Michael H. Birnbaum Decision Research Center California State University,
New Paradoxes of Risky Decision Making that Refute Prospect Theories Michael H. Birnbaum Fullerton, California, USA.
The Simple Regression Model
1 The Case Against Prospect Theories of Risky Decision Making Michael H. Birnbaum California State University, Fullerton.
Evaluating Hypotheses
Testing Transitivity (and other Properties) Using a True and Error Model Michael H. Birnbaum.
Web-Based Program of Research on Risky Decision Making Michael H. Birnbaum California State University, Fullerton.
Web-Based Program of Research on Risky Decision Making Michael H. Birnbaum California State University, Fullerton.
1 A Brief History of Descriptive Theories of Decision Making: Lecture 2: SWU and PT Kiel, June 10, 2005 Michael H. Birnbaum California State University,
1 Gain-Loss Separability and Reflection In memory of Ward Edwards Michael H. Birnbaum California State University, Fullerton.
I’m not overweight It just needs redistribution Michael H. Birnbaum California State University, Fullerton.
1 Ten “New Paradoxes” of Risky Decision Making Michael H. Birnbaum Decision Research Center California State University, Fullerton.
1 Gain-Loss Separability Michael H. Birnbaum California State University, Fullerton.
Is there Some Format in Which CPT Violations are Attenuated? Michael H. Birnbaum Decision Research Center California State University, Fullerton.
1 Lower Cumulative Independence Michael H. Birnbaum California State University, Fullerton.
Experimental Evaluation
Stochastic Dominance Michael H. Birnbaum Decision Research Center California State University, Fullerton.
Web-Based Program of Research on Risky Decision Making Michael H. Birnbaum California State University, Fullerton.
Testing Transitivity with Individual Data Michael H. Birnbaum and Jeffrey P. Bahra California State University, Fullerton.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
1 Restricted Branch Independence Michael H. Birnbaum California State University, Fullerton.
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Probability Population:
Presidential Address: A Program of Web-Based Research on Decision Making Michael H. Birnbaum SCiP, St. Louis, MO November 18, 2010.
Hypothesis Testing in Linear Regression Analysis
Behavior in the loss domain : an experiment using the probability trade-off consistency condition Olivier L’Haridon GRID, ESTP-ENSAM.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Decision making Making decisions Optimal decisions Violations of rationality.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Chapter 21: More About Tests “The wise man proportions his belief to the evidence.” -David Hume 1748.
Stochastic choice under risk Pavlo Blavatskyy June 24, 2006.
Properties of OLS How Reliable is OLS?. Learning Objectives 1.Review of the idea that the OLS estimator is a random variable 2.How do we judge the quality.
A Stochastic Expected Utility Theory Pavlo R. Blavatskyy June 2007.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Ellsberg’s paradoxes: Problems for rank- dependent utility explanations Cherng-Horng Lan & Nigel Harvey Department of Psychology University College London.
Testing Transitivity with a True and Error Model Michael H. Birnbaum California State University, Fullerton.
Axiomatic Theory of Probabilistic Decision Making under Risk Pavlo R. Blavatskyy University of Zurich April 21st, 2007.
Data Analysis.
Inference: Probabilities and Distributions Feb , 2012.
Allais Paradox, Ellsberg Paradox, and the Common Consequence Principle Then: Introduction to Prospect Theory Psychology 466: Judgment & Decision Making.
Can a Dominatrix Make My Pump Work? Michael H. Birnbaum CSUF Decision Research Center.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Logic of Hypothesis Testing
Chapter 5 STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESES TESTING
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Inferential Statistics
New Paradoxes of Risky Decision Making that Refute Prospect Theories
Presentation transcript:

Science of JDM as an Efficient Game of Mastermind Michael H. Birnbaum California State University, Fullerton Bonn, July 26, 2013

Mastermind Game- Basic Game

Auf Deutsch => “SuperHirn”

Mastermind Game Goal to find secret code of colors in positions. In “basic” game, there are 4 positions and 6 colors, making 6 4 = 1296 hypotheses. Each “play” of the game is an experiment that yields feedback as to the accuracy of an hypothesis. For each “play”, feedback = 1 black peg for each color in correct position and 1 white peg for each correct color in wrong position.

Play Mastermind Online online.com/mastermind/index.php online.com/mastermind/index.php (Mastermind is a variant of “Bulls and Cows”, an earlier code-finding game.)

A Game of Mastermind- 8,096 = 8 4

Analogies EXPERIMENTS yield results, from which we revise our theories. RECORD of experiments and results is preserved. Experiments REDUCE THE SPACE of compatible with evidence. Hypotheses can be PARTITIONED with respect to components.

Science vs. Mastermind In Mastermind, feedback is 100% accurate; in science, feedback contains “error” and “bias.” Repeat/revise the “same” experiment, different results. In Mastermind, we can specify the space of hypotheses exactly, but in science, the set of theories under contention expands as people construct new theories. In Mastermind, we know when we are done; science is never done.

Analogies EFFICIENT Mastermind is the goal: Find the secret code with fewest experiments. If FEEDBACK IS NOT PERFECT, results are fallible, and it would be a mistake to build theory on such fallible results. REPLICATION is needed, despite the seeming loss of efficiency.

Hypothesis Testing vs. Mastermind Suppose we simply tested hypotheses, one at a time and a significance tests says “reject” or “retain”? With 1296 hypotheses, we get closer to truth with each rejection--BARELY. Now suppose that 50% of the time we fail to reject false theories and 5% of the time we reject a true theory. Clearly, significance testing this way is not efficient. More INFORMATIVE FEEDBACK needed.

Experiments that Divide the Space of Hypotheses in Half Basic game = 1296 Hypotheses Suppose each experiment cuts space in half: 1296, 648, 324, 162, 81, 40.5, 20.25, 10.1, 5.1, 2.5, 1.3, done. 11 moves. But typical game with 1296 ends after 4 or 5 moves, infrequently 6. So, Mastermind is more efficient than “halving” of the space.

Index of Fit Informative? Suppose we assign numbers to each color, R = 1, G = 2, B = 3, etc. and calculate a correlation coefficient between the code and the experimental results? This index could be highly misleading, it depends on the coding and experiment. Fit could be higher for “worse” theories. (Devil rides again 1970s).

Psychology vs. Mastermind Mastermind: only ONE secret code. In Psychology, we allow that different people might have different individual difference parameters. Even more complicated: Perhaps different people have different models. As if, different experiments in the game have DIFFERENT secret codes.

Partitions of Hypotheses Red GreenBlue

Testing Critical Properties Test properties that do not depend on parameters. Such properties partition the space of hypotheses, like the test of all REDs. For example: CPT (including EU) implies STOCHASTIC DOMINANCE. This follows for any set of personal parameters (any utility/value function and any prob. weighting function).

Critical Tests are Theorems of One Model that are Violated by Another Model This approach has advantages over tests or comparisons of fit. It is not the same as “axiom testing.” Use model-fitting to rival model to predict where to find violations of theorems deduced from model tested.

Outline I will discuss critical properties that test between nonnested theories: CPT and TAX. Lexicographic Semiorders vs. family of transitive, integrative models (including CPT and TAX). Integrative Contrast Models (e.g., Regret, Majority Rule) vs. transitive, integrative models.

Cumulative Prospect Theory/ Rank-Dependent Utility (RDU)

TAX Model

“Prior” TAX Model Assumptions:

TAX Parameters For 0 < x < $150 u(x) = x Gives a decent approximation. Risk aversion produced by  

TAX and CPT nearly identical for binary (two-branch) gambles CE (x, p; y) is an inverse-S function of p according to both TAX and CPT, given their typical parameters. Therefore, there is little point trying to distinguish these models with binary gambles.

Non-nested Models

CPT and TAX nearly identical inside the M&M prob. simplex

Testing CPT Coalescing Stochastic Dominance Lower Cum. Independence Upper Cumulative Independence Upper Tail Independence Gain-Loss Separability TAX:Violations of:

Testing TAX Model 4-Distribution Independence (RS’) 3-Lower Distribution Independence 3-2 Lower Distribution Independence 3-Upper Distribution Independence (RS’) Res. Branch Indep (RS’) CPT: Violations of:

Stochastic Dominance A test between CPT and TAX: G = (x, p; y, q; z) vs. F = (x, p – s; y’, s; z) Note that this recipe uses 4 distinct consequences: x > y’ > y > z > 0; outside the probability simplex defined on three consequences. CPT  choose G, TAX  choose F Test if violations due to “error.”

Error Model Assumptions Each choice pattern in an experiment has a true probability, p, and each choice has an error rate, e. The error rate is estimated from inconsistency of response to the same choice by same person in a block of trials. The “true” p is then estimated from consistent (repeated) responses to same question.

Violations of Stochastic Dominance 122 Undergrads: 59% TWO violations (BB) 28% Pref Reversals (AB or BA) Estimates: e = 0.19; p = Experts: 35% repeated violations 31% Reversals Estimates: e = 0.20; p = 0.50 A: 5 tickets to win $12 5 tickets to win $14 90 tickets to win $96 B: 10 tickets to win $12 5 tickets to win $90 85 tickets to win $96

42 Studies of Stochastic Dominance, n = 12,152* Large effects of splitting vs. coalescing of branches Small effects of education, gender, study of decision science Very small effects of 15 probability formats and request to justify choices. Miniscule effects of event framing (framed vs unframed) * (as of 2010)

Summary: Prospect Theories not Descriptive Violations of Coalescing Violations of Stochastic Dominance Violations of Gain-Loss Separability Dissection of Allais Paradoxes: viols of coalescing and restricted branch independence; RBI violations opposite of Allais paradox; opposite CPT.

Results: CPT makes wrong predictions for all 12 tests Can CPT be saved by using different formats for presentation? Violations of coalescing, stochastic dominance, lower and upper cumulative independence replicated with 14 different formats and ten-thousands of participants. Psych Review 2008 & JDM 2008 “new tests” of CPT and PH.

Lexicographic Semiorders Intransitive Preference. Priority heuristic of Brandstaetter, Gigerenzer & Hertwig is a variant of LS, plus some additional features. In this class of models, people do not integrate information or have interactions such as the probability X prize interaction in family of integrative, transitive models (CPT, TAX, GDU, EU and others)

LPH LS: G = (x, p; y) F = (x’, q; y’) If (y –y’ >  ) choose G Else if (y ’- y >  ) choose F Else if (p – q >  ) choose G Else if (q – p >  ) choose F Else if (x – x’ > 0) choose G Else if (x’ – x > 0) choose F Else choose randomly

Family of LS In two-branch gambles, G = (x, p; y), there are three dimensions: L = lowest outcome (y), P = probability (p), and H = highest outcome (x). There are 6 orders in which one might consider the dimensions: LPH, LHP, PLH, PHL, HPL, HLP. In addition, there are two threshold parameters (for the first two dimensions).

Testing Lexicographic Semiorder Models Allais Paradoxes Violations of Transitivity Violations of Priority Dominance Integrative Independence Interactive Independence EU CPT TAX LS

New Tests of Independence Dimension Interaction: Decision should be independent of any dimension that has the same value in both alternatives. Dimension Integration: indecisive differences cannot add up to be decisive. Priority Dominance: if a difference is decisive, no effect of other dimensions.

Taxonomy of choice models TransitiveIntransitive Interactive & Integrative EU, CPT, TAX Regret, Majority Rule Non-interactive & Integrative Additive, CWA Additive Diffs, SDM Not interactive or integrative 1-dim.LS, PH*

Dimension Interaction RiskySafeTAXLPHHPL ($95,.1;$5)($55,.1;$20)SSR ($95,.99;$5)($55,.99;$20)RSR

Family of LS 6 Orders: LPH, LHP, PLH, PHL, HPL, HLP. There are 3 ranges for each of two parameters, making 9 combinations of parameter ranges. There are 6 X 9 = 54 LS models. But all models predict SS, RR, or ??.

Results: Interaction n = 153 RiskySafe% Safe Est. p ($95,.1;$5)($55,.1;$20)71%.76 ($95,.99;$5)($55,.99;$20)17%.04

Analysis of Interaction Estimated probabilities: P(SS) = 0 (prior PH) P(SR) = 0.75 (prior TAX) P(RS) = 0 P(RR) = 0.25 Priority Heuristic: Predicts SS

Probability Mixture Model Suppose each person uses a LS on any trial, but randomly switches from one order to another and one set of parameters to another. But any mixture of LS is a mix of SS, RR, and ??. So no LS mixture model explains SR or RS.

Results: Dimension Integration Data strongly violate independence property of LS family Data are consistent instead with dimension integration. Two small, indecisive effects can combine to reverse preferences. Observed with all pairs of 2 dims. Birnbaum, in J. math Psych, 2010.

New Studies of Transitivity LS models violate transitivity: A > B and B > C implies A > C. Birnbaum & Gutierrez (2007) tested transitivity using Tversky’s gambles, using typical methods for display of choices. Text displays and pie charts with and without numerical probabilities. Similar results with all 3 procedures.

Replication of Tversky (‘69) with Roman Gutierrez 3 Studies used Tversky’s 5 gambles, formatted with tickets and pie charts. Exp 1, n = 251, tested via computers.

Three of Tversky’s (1969) Gambles A = ($5.00, 0.29; $0) C = ($4.50, 0.38; $0) E = ($4.00, 0.46; $0) Priority Heurisitc Predicts: A preferred to C; C preferred to E, But E preferred to A. Intransitive. TAX (prior): E > C > A

Response Combinations Notation(A, C)(C, E)(E, A) 000ACE* PH 001ACA 010AEE 011AEA 100CCE 101CCA 110CEETAX 111CEA*

Results-ACE patternRep 1Rep 2Both 000 (PH) (TAX) sum

Comments Results were surprisingly transitive. Differences: no pre-test, selection; Probability represented by # of tickets (100 per urn); similar results with pies. Birnbaum & Gutierrez, 2007, OBHDP Regenwetter, Dana, & Davis-Stober also conclude that evidence against transitivity is weak., Psych Review, Birnbaum & Bahra: most Ss transitive. JDM, 2012.

Summary Priority Heuristic model’s predicted violations of transitivity are rare. Dimension Interaction violates any member of LS models including PH. Dimension Integration violates any LS model including PH. Evidence of Interaction and Integration compatible with models like EU, CPT, TAX. Birnbaum, J. Mathematical Psych

Integrative Contrast Models Family of Integrative Contrast Models Special Cases: Regret Theory, Majority Rule (aka Most Probable Winner) Predicted Intransitivity: Forward and Reverse Cycles Research with Enrico Diecidue

Integrative, Interactive Contrast Models

Assumptions

Special Cases Majority Rule (aka Most Probable Winner) Regret Theory Other models arise with different functions, f.

Regret Aversion

Regret Model

Majority Rule Model

Predicted Intransitivity These models violate transitivity of preference Regret and MR cycle in opposite directions However, both REVERSE cycle under permutation over events; i.e., “juxtaposition.” aka, “Recycling”

Example Urn: 33 Red, 33White, 33 Blue One marble drawn randomly Prize depends on color drawn. A = ($4, $5, $6) means win $400 if Red, win $500 if White, $600 if Blue. (Study 1 used values x 100).

Majority Rule Prediction A = ($4, $5, $6) B = ($5, $7, $3) C = ($9, $1, $5) AB: choose B BC: choose C CA: choose A Notation: 222 A’ = ($6, $4, $5) B’ = ($5, $7, $3) C’ = ($1, $5, $9) A’B’: choose A’ B’C’: choose B’ C’A’: choose C’ Notation: 111

Regret Prediction A = ($4, $5, $6) B = ($5, $7, $3) C = ($9, $1, $5) AB: choose A BC: choose B CA: choose C Notation: 111 A’ = ($6, $4, $5) B’ = ($5, $7, $3) C’ = ($1, $5, $9) A’B’: choose B’ B’C’: choose C’ C’A’: choose A’ Notation: 222

Non-Nested Models TAX, CPT, GDU, etc. Integrative Contrast Models Intransitivity Allais Paradoxes Violations Of RBI Transitive Recycling Restricted Branch Independence

Study with E. Diecidue 240 Undergraduates Tested via computers (browser) Clicked button to choose 30 choices (includes counterbalanced choices) 10 min. task, 30 choices repeated again.

Recycling Predictions of Regret and Majority Rule

ABC Design Results

A’B’C’ Results

ABC X A’B’C’ Analysis

ABC-A’B’C’ Analysis

Results Most people are transitive. Most common pattern is 112, pattern predicted by TAX with prior parameters. However, 2 people were perfectly consistent with MR on 24 choices (incl. Recycling pattern). No one fit Regret theory perfectly.

Results: Continued Among those few (est. ~ 9%) who cycle and recycle (intransitive), most have no regrets (i.e., they appear to satisfy MR). Systematic Violations of RBI. Suppose 9% of participants are intransitive. Can we increase the rate of intransitivity? A second study attempted to increase the rate: changed display, but estimated rate MR was lower (~6%).

Conclusions Violations of transitivity predicted by regret, MR, LS appear to be infrequent. Violations of Integrative independence, priority dominance, interactive independence are frequent, contrary to family of LS, including the PH. “New paradoxes” rule out CPT and EU but are consistent with TAX. Violations of critical properties mean that a model must be revised or rejected.

30 Years Later- Old Bull Story