Download presentation
Presentation is loading. Please wait.
Published byRoss Todd Modified over 9 years ago
1
1 Learning and the Economics of Small Decisions Ido Erev and Ernan Haruvy Mainstream analyses of economic behavior assume that incentives shape behavior even when individual agents have limited understanding of the environment. The shaping process in these cases is indirect: The economic incentives determine the agents’ experience, and this experience in turn drives future behavior. Consider, for example, an agent that has to decide whether to cross the road at a particular location and time. The agent is not likely to understand the exact incentive structure and compute the implied equilibria. Rather, she (he or it) is likely to response to past experiences. The current chapter reviews experimental studies that explore this shaping process.
2
2 The clicking paradigm The current experiment includes many trials. Your task, in each trial, is to click on one of the two keys presented on the screen. Each click will be followed by the presentation of the keys’ payoffs. Your payoff for the trial is the payoff of the selected key. You selected Right. Your payoff in this trial is 1 Had you selected Left, your payoff would be 0 1 0 Not a test of rational economic theory The rationality assumption is not even wrong 2
3
3 SRP(R) 5 0 (10,.1; -1)27 60 (-10,.1; +1) 60 400 trials, ¼ cent per point 1. Underweighting of rare events (Barron & Erev, 2003) Risk Seeking Experience-Description gap (Hertwig et al, 2004) Occurs in one-shot decisions from sampling Implies a reversed Allais paradox: (4,.8) > 3, but (4,.2) ~ (3,.25) Robust to prior information (Lajarraga & Gonzalez, 2011) Similar pattern in Honey Bee (Shafir et al., 2008) Taleb’s Black Swan effect Sensitivity to magnitude: -20 vs. -10 (Ert & Erev, 2010) Risk Aversion
4
4 HLP(H) 11096 2(11,.5; -9)058 30(9,.5; -11)53 2. The payoff variability effect (Myers & Sadler, 1960; Busemeyer &Townsend, 1993 ). Neither!! Risk aversion Or Loss aversion?
5
3. The Big Eye effect ( Ben Zion et al., 2010, Grosskopf et al., 2006 ) x ~ N(0,300), y ~ N(0, 300) R1: x R2: y M: Mean(R1,R2) + 5 Deviation from: maximization, risk aversion, loss aversion. Implies under-diversification Robust to prior information 4. The hot stove effect (Hogarth & Einhorn, 1992; March and Denrell, 2002). 5
6
6 6. The very recent effect (Nevo & Erev, 2010) 5. Surprise-triggers-change Evaluation of the sequential dependency in 2-alternative studies reveals a 4-fold recency pattern: ProblemProportion. Of repeated R choices Proportion. Of Switches to R 0 or (+1,.9; -10) After +1 After -10 84 69 After +1 After -10 21 31 0 or (+10,.1; -1) After +10 After -1 60 79 After +10 After -1 23 6
7
7 7. Consistent individual differences ( see Bechara, Damasio, Damasio and Anderson, 1994; Yechiam et al., 2007) Correlation between behavior in Problem 2 “0 or (11,.5; -9)” and in Problem 3 “0 or (9,.5; -11)” StatisticCorrelation Loss/risk attitude0.18 Recency (Best reply 1)0.69 Distance from 0.50.75
8
8 I-SAW (Inertia, Sampling and Weighting, Nevo & Erev, 2012) Three response modes: Exploration, exploitation and inertia. At each exploitation trial player i computes the estimated value of alternative j as: ESV(j) = (1-w i )(Mean of sample of m i from j) + w i (Grand Mean j) And the very last outcome is more likely to be in the sample. The alternative with the highest ESV is selected. Exploration implies random choice. Inertia implies repetition of the last choice. The probability of inertia decreases when the outcomes are surprising. Surprise is computed by the gap between the payoff at t, and the payoffs in the previous trials An example of a case based decision model (Gilboa & Schmeidler, 1995 and see related ideas in Kareev, 2000; Osborne and Rubinstein, 1998; Gonzalez et al., 2003). 8
9
9 Two choice prediction competitions (Erev et al. 2010a, 2010b) 1. Individual choice tasks http://tx.technion.ac.il/~eyalert/Comp.html http://tx.technion.ac.il/~eyalert/Comp.html The task: Predicting the proportion of risky choices in binary choice task in the clicking paradigm without information concerning forgone payoffs. Two studies (estimation and competition) each with 60 conditions. We published the estimation, and challenge other researchers to predict the result of the second. The models were rank based on their squared error. The best baseline is a predecessor of I-SAW. The winning submission, submitted by Stewart, West & Lebiere is based on a similar instance based (“episodic”) logic (with a quantification in ACT-R). Reinforcement learning and similar “semantic” models did not do well. 9
10
10 2. Market entry games http://sites.google.com/site/gpredcomp http://sites.google.com/site/gpredcomp The task: Predicting behavior in a repeated 4-person market entry games with complete feedback. At each trial each player has to choose between: R: Entering a risky market (expected payoff decreasing with entrants) S: Staying out (a safer option) Two studies (estimation and competition) each with 40 conditions We published the estimation, and challenge other researchers to predict the result of the second. The models were rank based on their squared error. The best baseline is I-SAW. The winner, Chen et al., is a variant of I-SAW The running up, Gonzalez, Dutt & Lejarraga, is a similar instance based (“episodic”) logic. 10
11
11 Relationship to reinforcement learning There are four main reasons to the popularity of reinforcement learning models: 1. Effectiveness (Sutton and Barto, 1998) 2. Neural correlates (Schultz, 1998) 3. Useful ex ante-predictions (Erev & Roth, 1998) 4. Easy to estimate using elegant statistical methods
12
12 Relationship to reinforcement learning There are four main reasons to the popularity of reinforcement learning models: 1. Effectiveness (Sutton and Barto, 1998) 2. Neuro correlates (Schultz, 1998) 3. Useful ex ante-predictions (Erev & Roth, 1998) 4. Easy to estimate using elegant statistical methods But: 1.The effective learning occurs only when the state of nature is known, I-saw can do better in dynamic settings. 2.I-SAW (and many other models have similar neuro correlates) 3.I-SAW provides better ex-ante predictions. 4.The estimations are elegant under the assumption that the model is “well specified” Yet, the predictions of I-SAW can be the product of a case-contingent reinforcement learning process. B2G2B1G1.01.97G1.01.97.01B1.97.01 G2.01.97.01 G2 S: “0 for sure” or R “+1 if G, -1 if B”
13
13 Learning in games, and the effect of prior information The entry game competition demonstrates that the existence of social interaction does not have to change the learning model that best captures behavior. Another indication of the generality of basic learning processes come from the study of games with unique mixed strategy equilibrium (Erev & Roth, 1998) A2B2StatisticEq.MinimalFullI-SAW 1A1.77.35P(A1)49685964 B1.08.48P(A2)16423228 2A1.73.74P(A1)997684 B1.87.20P(A2)79403621 9A1.40.76P(A1)65585661 B1.91.23P(A2)51 45 46
14
14 I-SAW and similar model that assume learning between action fail when the instructions lead the subject to learn among more sophisticated strategies. One example is the prisoner dilemma game (data from Rapoport & Chammah, 1965) PD1 C D C1,1-10,10 D10,-10-1,-1 14
15
15 B1B2B3B4B5B6B7B8B9 A1 -9, -9 -9, 2-9, -9 A2 -9, -9 -9, 2-9, -9 A3 -9, -9 -9, 2-9, -9 A4 2, -9 -1, -22, -9 A5 -9, -9 -9, 2-9, -9 A6 -9, -9 -9, 2-9, -9 A7 -9, -9 -9, 2-9, -9 A8 -9, -9 -9, 2-9, -9 A9 -9, -9 -9, 2-9, -97, 7 The description experience gap in games (Erev & Greiner, 2012)
16
16 Applications and the economics of small decisions. The discoveries--innovations gap (a result of too much coffee with Al) St. Petersburg paradox, Allais paradox, rejections in the ultimatum game, endowment effect, fine is a price…. trade, markets, money, rule enforcement, auctions, incentive schemes. One explanation to this gap is based on the following assertions: (1)Many of the popular discoveries are reflections of reliance on experiences in similar but not identical situations. (2)Many of the innovations involve a change of the incentive structure to insure that the desired behavior will be better on average and better the most similar situations. Thus, even I-SAW like agents will respond to the change.
17
17 Gentle COntinuous Punishment (gentle COP): Enforcement of safety rules (Erev & Rodansky, 2004; and see Zohar, 1980; Zohar and Luria, 1994) Enforcement is necessary Workers like enforcement programs Probability is more important than magnitude Large punishments are too costly, therefore, gentle enforcement can be optimal Small brother
18
18 Gentle COP2: Washing hands and using gloves in hospitals In 1847, Dr. Ignaz Semmelweis first demonstrated that routine hand-washing could prevent the spread of disease. In an experiment, Dr. Semmelweis insisted that his students staffing a Vienna hospital’s maternity ward wash their hands before treating the maternity patients--and deaths on the maternity ward fell dramatically. In one study, it fell from 15% to near 0%!!. Though his findings were published, there was no apparent increase in hand washing by doctors until the discoveries of Louis Pasteur years after Dr. Semmelweis died in a mental asylum (Nuland, 2003).
19
Relative value of violation Proportion of violators Gentle COP3: Cheating in exams Many rule enforcement problems has at least two equilibria. The gentle COP idea is particularly effective in these settings. It can be used to move the game to the desired equilibrium. 19
20
Seven undergraduate courses were selected to participate in the study. In all courses the final exam was conducted in two rooms. One room was randomly assigned to the experimental (gentle COP) condition, and the second was assigned to the control condition. The only difference between the two conditions involved the timing of the preparation of the map in the instructions to the proctors. In the control group the instruction was: (2c) “A map of the students seating should be prepared immediately after the beginning of the exam.” After finishing the exam, the students were asked to complete a brief questionnaire in which they were ask to “rate the extent to which students cheated in this exam relative to other exams.” The results reveal lower cheating ratings in the gentle COP room in all 7 courses. 20
21
21 Broken Window theory Kelling and Wilson (1982) suggest that physical decay and disorder in a neighborhood increase crime rate. This suggestion, known as Broken Windows theory, was motivated by a field experiment conducted by Zimbardo (1969). Broken windows theory was a motivation for the “quality of life” policing strategy implemented in New York City in the mid 1990’s (Kelling & Sousa, 2001). This policing strategy advocated increased number of police on the streets and arresting persons for less serious but visible offenses. Some credit this strategy for the decline in crime and disorder. However, field studies that test the broken windows hypothesis provide mixed results. Skogan (1990) found that robbery victimization was higher in neighborhoods characterized by disorder, but Harcourt (2001) found that the crime- disorder relationship did not hold for other crime types including burglary, assault, rape and pick-pocketing.
22
22 The effect of the timing of warning ( Barron, Leider & Stack, 2008) Evaluation of the impact of warnings reveals a large effect of prior experience. Individuals who have had good experiences in the past are less affected by the warning. For example, when the FDA added a black-box warning to the drug Cisapride, the data show an increase in usage of 2% among repeat users, but a decrease of 17% amongst first-time users (Smalley, et. al., 2000). Another example is provided by a study of parent-adolescent sexual communication. Regular condom use was found to be lower when parent-adolescent sexual communication occurred at a later age (Hutchinson, 2002). Barron, Leider and Stack (2008) show that part of the effect of experience remains even after controlling for the available information. This part appears to be a reflection of the experience description gap.
23
23 The evolution of social groups Proximity is an important determinant of liking. Even if students are randomly assigned to rooms, individuals are more likely to become friends with and have a favorable impression of individuals who are nearby (Segal, 1974). Denrell (205) shows that this pattern can be a product of the hot stove effect: Our opinions about our friends are likely to change after each meeting. When the opinion is negative, and we can avoid this friend, the opinion last longer.
24
24 Investors decisions: The black swan effect Simulated vs. real index funds. Under diversification Positive correlations between price change and volume of trade
25
25 Summary Many of the classical properties of human and animal learning can be reliably reproduced in the easy to run (and to model) clicking paradigm. The main results can be predicted with instance based models that assume best reply to small samples of experiences in similar cases. The implied behavioral processes are evolutionary reasonable, but can lead to robust deviations from maximization. The current understanding of decisions from experience is sufficient to shed light on many natural problems.
26
26 Related topics Objective tests One period a head econometrics ENO Level-1 reasoning Imitation Learned helplessness
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.