A Thought Experiment 2 doors.1 and.2 probability of getting a dollar respectively Can get a dollar behind both doors on the same trial Dollars stay there.

Slides:



Advertisements
Similar presentations
Schedules of reinforcement
Advertisements

Transposition: Spence’s Behavioral Solution Transposition seems to support the cognitive view that organisms learn by discovering how things are related,
Steven I. Dworkin, Ph.D. 1 Choice and Matching Chapter 10.
March 16, 2010 Psychology 485.  29pQBY 29pQBY  Introduction History & Definitions.
Schedules of Reinforcement: Continuous reinforcement: – Reinforce every single time the animal performs the response – Use for teaching the animal the.
Effort Discounting of Exam Grades Heidi L. Dempsey, David W. Dempsey, & Arian Ward Jacksonville State University.
Positive And Negative Reinforcers For Your Child Psychology 121.
Correlational Research Inferential/Descriptive Statistic (r) Describes strength of linear relation between two variables Strength of relation = degree.
Operant Conditioning What is Operant Conditioning?
The Matching Law Richard J. Herrnstein. Reinforcement schedule Fixed-Ratio (FR) : the first response made after a given number of responses is reinforced.
Quiz #3 Last class, we talked about 6 techniques for self- control. Name and briefly describe 2 of those techniques. 1.
Instrumental Learning A general class of behaviors inferring that learning has taken place.
More Instrumental (Operant) Conditioning. B.F. Skinner Coined the term ‘Operant conditioning’ Coined the term ‘Operant conditioning’ The animal operates.
Developing Stimulus Control. Peak Shift Phenomena where the peak of the generalization curve shifts AWAY from the S- – Means that the most responding.
Finance, Financial Markets, and NPV
PSY 402 Theories of Learning Chapter 7 – Behavior & Its Consequences Instrumental & Operant Learning.
Two Papers on Intertemporal Choice and Self Control SS200 - Meghana Bhatt.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 5): Outliers Fall, 2008.
Punishing Unacceptable Behavior Janhavi Nilekani and Sarah Ong.
Lectures 15 & 16: Instrumental Conditioning (Schedules of Reinforcement) Learning, Psychology 5310 Spring, 2015 Professor Delamater.
Variables cont. Psych 231: Research Methods in Psychology.
Sampling & Experimental Control Psych 231: Research Methods in Psychology.
Discrimination-Shift Problems Background This type of task has been used to compare concept learning across species as well as across a broad range of.
OPERANT CONDITIONING DEF: a form of learning in which responses come to be controlled by their consequences.
Week 5: Increasing Behavior
Qualitative research methods. Qualitative research You need to be able to explain: what qualitative research is, and why it is controversial. The controversy.
The Marriage Problem Finding an Optimal Stopping Procedure.
Shaping.
 So far we have talked about motivations behind simple human behaviors like eating and sex.  What motivates us to do the more complicated behaviors,
Unit 1.4 Recurrence Relations
Chapter 9 Adjusting to Schedules of Partial Reinforcement.
Operant Conditioning Unit 4 - AoS 2 - Learning. Trial and Error Learning An organism’s attempts to learn or solve a problem by trying alternative possibilities.
Chapter 6 Operant Conditioning Schedules. Schedule of Reinforcement Appetitive outcome --> reinforcement –As a “shorthand” we call the appetitive outcome.
Operant Conditioning Unit 4 - AoS 2 - Learning. Trial and Error Learning An organism’s attempts to learn or solve a problem by trying alternative possibilities.
Reinforcement Consequences that strengthen responses.

Chapter 13: Schedules of Reinforcement
Modified from: Reference: Watson, D. L. & Tharp, R. G. (1997) Self-directed behavior: Self-modification for personal adjustment (7 th ed.). Pacific.
Reinforcers and Punishers versus Incentives Reinforcers and punishers refer to good and bad behavior consequences.
Extending the Definition of Exponents © Math As A Second Language All Rights Reserved next #10 Taking the Fear out of Math 2 -8.
Psychology 3306 Dr. D. Brodbeck. Introduction You knew it would start this way…. You knew it would start this way…. What is learning? What is learning?
Introduction Results A New Method for Quantifying Outcomes in Discounting Rochelle R. Smits, Matthew H. Newquist & Daniel D. Holt University of Wisconsin-Eau.
Chapter 17 Capitalizing on Existing Stimulus Control: Rules and Goals.
PSY402 Theories of Learning Chapter 6 – Appetitive Conditioning.
Principles of Behavior Sixth Edition Richard W. Malott Western Michigan University Power Point by Nikki Hoffmeister.
Info for the final 35 MC = 1 mark each 35 marks total = 40% of exam mark 15 FB = 2 marks each 30 marks total = 33% of exam mark 8 SA = 3 marks each 24.
By Ava Mason. The first question, do you eat healthily has the options yes and always. Yes and always are the same so, to improve I will use a small.
Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 1: Generalized policy iteration.
VIII Lecture Time Preferences. Wrap up of the previous lecture Problem of distinguishing between biasing and shaping effect. Evidence for shaping effect.
Experiment Basics: Variables Psych 231: Research Methods in Psychology.
Schedules of Reinforcement and Choice. Simple Schedules Ratio Interval Fixed Variable.
Chaining.
Temporal Discounting of Hypothetical and Real Extra Credit Points Makenzie D. Williams, Heidi L. Dempsey, & David W. Dempsey Jacksonville State University.
Schedules of Reinforcement CH 17,18,19. Divers of Nassau Diving for coins Success does not follow every attempt Success means reinforcement.
Schedules of Reinforcement Thomas G. Bowers, Ph.D.
Schedules of reinforcement
Lecture9 Generalization training Behavior Analysis.
Blocking The phenomenon of blocking tells us that what happens to one CS depends not only on its relationship to the US but also on the strength of other.
Sight Words.
Heidi L. Dempsey, David W. Dempsey, Tomesha Manora, Amanda Webster, Jody Thompson, Aaron Garrett, Iyanna Cammack, Yawa Dossou, Angel Johnston, & Michael.
Chapter 10: Self-Control November 14-16, 2005 Classes #35-36.
Response Processes Psych DeShon. Response Elicitation Completion Completion Requires production Requires production Allows for creative responses.
Chapter 4 More on Two-Variable Data. Four Corners Play a game of four corners, selecting the corner each time by rolling a die Collect the data in a table.
The Probability of Reinforcer Delay as a Determinant of Preference for Variability Michelle Ennis Soreth, Concetta Mineo, Jeffrey Walsh, Thomas Budroe,
Factors Affecting Performance on Reinforcement Schedules
Choice Behavior One.
Choice Behavior Two.
Schedules of Reinforcement
Individual Science project
Operant Conditioning Unit 4 - AoS 2 - Learning.
Presentation transcript:

A Thought Experiment 2 doors.1 and.2 probability of getting a dollar respectively Can get a dollar behind both doors on the same trial Dollars stay there until collected, but never more than 1 dollar per door. What order of doors do you choose?

Patterns in the Data If choices are made moment by moment, should be orderly patterns in the choices: 2, 2, 1, 2, 2, 1… Results mixed but promising results when using time as the measure

What Works Best Right Now Maximizing local rates and moment to moment choices can lower overall reinforcement rate. Short-term vs. long-term

Delay and Self-Control

Delayed Reinforcers Many of life’s reinforcers are delayed… –Eating right, studying, etc. Delay obviously devalues a reinforcer –How are effects of reinforcers affected by delay? –Why choose the immediate, smaller reward? –Why ever show self-control?

Remember Superstition? Temporal, not causal –Causal, with delay, very hard Same with delay of reinforcement –Effects decrease with delay But how does it occur? Are there reliable and predictable effects? Can we quantify the effect?

7 How Do We Measure Delay Effects? Studying preference of delayed reinforcers Humans: - verbal reports at different points in time - “what if” questions Humans AND nonhumans: A. Concurrent chains B:Titration All are choice techniques.

8 A. Concurrent chains Concurrent chains are simply concurrent schedules -- usually concurrent equal VI VI -- in which reinforcers are delayed. When a response is reinforced, usually both concurrent schedules stop and become unavailable, and a delay starts. Sometimes the delays are in blackout with no response required to get the final reinforcer (an FT schedule); Sometimes the delays are actually schedules, with an associated stimulus, like an FI schedule, that requires responding.

9 WW WWWW Conc VI VI VI b s Food VI a s Initial links, Choice phase Terminal links, Outcome phase The concurrent-chain procedure

10 An example of a concurrent-chain experiment MacEwen (1972) investigated choice between two terminal- link FI and two terminal-link VI schedules, one of which was always twice as long as the other. The initial links were always concurrent VI 60-s VI 60-s schedules.

11 The terminal-link schedules were: Constant reinforcer (delay and immediacy) ratio in the terminal links – all immediacy ratios are 2:1.

12

13 From the generalised matching law, we would expect: If a d was constant, then because D 2 /D 1 was kept constant, we would expect no change in choice with changes in the absolute size of the delays. D 2 /D 1 was kept constant throughout.

14 But choice did change, so a d did NOT remain constant: But does give us some data to answer some other questions…

Shape of the Delay Function Now that we have some data… How does reinforcer value change over time? What is the shape of the decay function?

16 Basically, the effects that reinforcers have on behaviour decrease -- rapidly -- when the reinforcers are more and more delayed after the reinforced response. This is how reinforcer value generally changes with delay:

Delay Functions What is the “real” delay function? V t = V 0 / (1 + Kt) V t = V 0 /(1 + Kt) s V t = V 0 /(M + Kt s ) V t = V 0 /(M + t s ) V t = V 0 exp(-Mt)

18 Exponential versus hyperbolic decay It is important to understand how the effects of reinforcers decay over time, because different sorts of decay predict different effects. The two main candidates: Exponential decay -- the rate of decay remains constant over time in this Hyperbolic decay -- the rate of decay decreases over time -- as in memory, too

20 Exponential decay V t : value of the delayed reinforcer at time t V o : value of the reinforcer at 0-s delay t : delay in seconds b : a parameter that determines the rate of decay e : the base of natural logarithms.

21 Hyperbolic decay In this equation, all the variables are the same as in the exponential decay, except that h is the half-life of the decay -- the time over which the value of V o reduced to half its initial value. Hyperbolic decay is strongly supported by Mazur’s research.

23 Two sorts of decay fitted to McEwen's (1972) data Hyperbolic is clearly better. Not that clean, but… Relative Rate

Studying Delay Using Indifference Titration procedures.

25 B: Titration - Finding the point of preference reversal The titration procedure was introduced by Mazur: - one standard (constant) delay and - one adjusting delay. These may differ in what schedule they are (e.g., FT versus VT with the same size reinforcers for both), or they may be the same schedule (both FT, say) with different magnitudes of reinforcers. What the procedure does is to find the value of the adjusting delay that is equally preferred to the standard delay -- the indifference point in choice.

26 For example: - reinforcer magnitudes are the same - standard schedule is VT 30 s - adjusting schedule is FT How long would the FT schedule need to become to make preference equal?

27 Titration: Procedure Trials are in blocks of 4. The first 2 are forced choice, randomly one to each alternative The last 2 are free choice. If, on the last 2 trials, it chooses the adjusting schedule twice, the adjusting schedule is increased by a small amount. If it chooses the standard twice, the adjusting schedule is decreased by a small amount. If equal choice (1 of each) -- no change (von Bekesy procedure in audition)

28 WWW WWW WWW Peck Standard delay + red houselight Adjusting delay + green houselight 6-s food 2-s food, BO Mazur's titration procedure Why the post- reinforcer blackout? ITI Trial start Choice

Mazur’s Findings Different magnitudes, finding delay –2-sec rf delayed 8 sec = 6 sec rf delayed 20 sec. Equal magnitudes, variable vs. fixed delay –Fixed delay 20 sec = variable delay 30 sec Why preference for variable? –Hyperbolic decay and interval weighting.

Moving onto Self-Control Which would you prefer? –$1 in an hour –$2 tomorrow

Moving onto Self-Control Which would you prefer? –$1 in a month –$2 in a month and a day

32 Here’s the problem: Preference reversal In positive self control, the further you are away from the smaller and larger reinforcers, the more likely you are to accept the larger, more delayed reinforcers. But, the closer you get to the first one, the more likely you are to chose the smaller, more immediate one.

33 Friday night: “Alright, I am setting my alarm clock to wake me up at 6.00 am tomorrow morning, and then I’ll go jogging.”... Saturday 6.00 am: “Hmm….maybe not today.”

35 Outside the laboratory, the majority of reinforcers are delayed. Studying the effects of delayed reinforcers is therefore very important. To be able to understand why preference reversal occurs, we need to know how the value of a reinforcer changes the time by which it is delayed... Assume: At the moment in time when we make the choice, we choose the reinforcer that has the highest current value...

36 Animal research: Preference reversal Green, Fisher, Perlow, & Sherman (1981)  Choice between a 2-s and a 6-s reinforcer.  Larger reinforcer delayed 4 s more than the smaller.  Choice response (across conditions) required from 2 to 28 s before the smaller reinforcer. We will call this time T.

37 4 s T choice 28 s2 s Small rf Large rf

38 4 s T choice 28 s2 s Small rf Large rf

39 4 s T choice 28 s2 s Small rf Large rf

40 Green et al. (continued) Thus, if T was 10 s, at the choice point,  the smaller reinforcer was 10-s away  the larger was 14-s away So, as T is changed over conditions, we should see preference reversal.

41 Control condition: two equal-sized reinforcers were delayed, one 28 s the other 32 s. Preference was strongly towards the reinforcer that came sooner. So, at delays that long, pigeons can still clearly tell which reinforcer is sooner and which one later. Larger, later / Smaller, sooner

42 Which Delay Function Predicts This?

43 Only hyperbolic decay can explain preference reversal

44 Hyperbolic predictions shown the same way Choice reverses here

45 Using strict matching theory to explain preference reversal The concatenated strict matching law for reinforcer magnitude and delay (see the generalised matching lecture) is: where M is reinforcer magnitude, and D is reinforcer delay. Note that for delay, a longer delay is less preferred, and therefore D 2 is on top. (OK, we know SM isn’t right, and delay sensitivity isn’t constant)

46 The baseline is: M 1 = 2, M 2 = 6, D 1 = 0, D 2 = 4 We will take the situation used by Green et al. (1981), and work through what the STRICT matching law predicts: The choice is infinite. Thus, the subject is predicted always to take the smaller, zero-delayed, reinforcer

47 Now, add T = 0.5 s, so M 1 = 2, M 2 = 6, D 1 = 0.5, D 2 = 4.5 The subject is predicted to prefer the smaller magnitude reinforcer three times more than the larger magnitude reinforcer, and again be impulsive. But its preference for the immediate reinforcer has decreased a lot.

48 Then, when T = 1, The choice is now less impulsive.

49 For T = 2, the preference ratio B 1 /B 2 is 1 -- so now, the generalised matching law predicts indifference between the two choices. For T = 10, the preference ratio is more than 2:1 towards the larger, more delayed, reinforcer. That is, the subject is now showing self control The whole function is shown next -- predictions for Green et al. (1981) assuming strict matching.

50 This graph shows log (B 2 /B 1 ), rather than (B 1 /B 2 ), shows how self control increases as you go back in time from when the reinforcers are due. Self control Impulsive

51 Green et al.’s actual data

52 Commitment Do this now Don’t have a choice to do the bad thing later Halloween candy

53 Commitment in the laboratory Rachlin & Green (1972) Pigeons chose between: EITHER allowing themselves a later choice between a small short- delay (SS) reinforcer or a large long-delay reinforcer (LL), OR denying themselves this later choice, and can only get the LL reinforcer.

54 W W Rachlin & Green (1972) Blackout Reinforcer T Larger later Smaller sooner Larger later, no choice

Operant Conditioning

56 As they moved the time T at which the commitment response was offered earlier in time from the reinforcers (from 0.5 to 16 s), preference should reverse. Indeed, Rachlin and Green found that 4 out of 5 birds developed commitment (what we might call a commitment strategy) when T was larger.

Operant Conditioning

58 Mischel & Baker (1975) Experimenter puts one pretzel on a table and leaves the room for an unspecified amount of time. If the child rings a bell, experimenter will come back and child can eat the pretzel. If the child waits, experimenter will come back with 3 pretzels. Most children chose the impulsive option. But there is apparently a correlation with age, SES, IQ scores. (correlation!)

Operant Conditioning

60 Mischel & Baker (1975) Self control less likely if children are instructed to think about the taste of the pretzels (e.g., how crunchy they are). Self control was more likely if they were instructed to think about the shape or colour of the pretzels.

61 Much human data replicated with animals by Neuringer & Grosch (1981). For example, making food reinforcers visible upset self control, but an extraneous task helped self control.

62 Can nonhumans be trained to show sustained self control? Mazur & Logue (1978) - Fading in self control Delay (s) Magnitude (s) Choice 1 62 Choice 2 66 Preferred Choice 2 (larger magnitude, same delay) -- Self control Over 11,000 trials, they faded the delay to the smaller magnitude (Choice 1) to 0 s -- and self control was maintained!

63 Additionally, and this is important, self control was even maintained even when the outcomes were reversed between the keys. In other words, the pigeons didn’t have to be re-taught to choose the self control option, but applied it to the new situation.

64 Contingency contracting A common therapeutic procedure: e.g., “I give you my CD collection, and agree that if I don't lose 0.5 kg per week, you can chop up one of my CDs -- each week.” You use the facts of self control -- i.e., you say "let's start this a couple of weeks from now" and the client will readily agree -- if you said, "starting today", they most likely would not. It's easy to give up anything next week...

Other Commitment Procedures Tell your friend to pick you up Let everyone know you’ve stopped smoking Avoid discriminative stimuli Train incompatible behaviors Bring consequences closer in time

66 Social dilemmas A lot of the world’s problems are problems of self control on a macro scale. -Investment strategies Rachlin, H. (2006). Notes on discounting. Journal of the Experimental Analysis of Behavior, 85, “In general, if a variable can be expressed as a function of its own maximum value, that function may be called a discount function. Delay discounting and probability discounting are commonly studied in psychology, but memory, matching, and economic utility also may be viewed as discounting processes.”