Law of Effect Animals improve on performance: –optimize –not just repeat same behavior but make it better and more efficient Adaptation to the environment.

Slides:



Advertisements
Similar presentations
Schedules of reinforcement
Advertisements

Theories of Learning Chapter 4 – Theories of Conditioning
Confidence Intervals for Population Means
Steven I. Dworkin, Ph.D. 1 Choice and Matching Chapter 10.
Schedules of Reinforcement: Continuous reinforcement: – Reinforce every single time the animal performs the response – Use for teaching the animal the.
Some terminology When the relation between variables are expressed in this manner, we call the relevant equation(s) mathematical models The intercept and.
Overview of Conditioning. Need to Examine Behavior Look at the behavior of an organism’s interaction with its environment Displacements in space through.
Siegel, 1976 Demonstration of addiction, tolerance and withdrawal or Cues are EVERYWHERE!
Mean = = 83%
The Matching Law Richard J. Herrnstein. Reinforcement schedule Fixed-Ratio (FR) : the first response made after a given number of responses is reinforced.
Developing Behavioral Persistence Through the Use of Intermittent Reinforcement Chapter 6.
Quiz #3 Last class, we talked about 6 techniques for self- control. Name and briefly describe 2 of those techniques. 1.
Copyright © 2011 Pearson Education, Inc. All rights reserved. Developing Behavioral Persistence Through the Use of Intermittent Reinforcement Chapter 6.
Operant Conditioning. Shaping shaping = successive approximations toward a goal a process whereby reinforcements are given for behavior directed toward.
Applications of the Matching Law. Behavioral Contrast Behavioral contrast: often found "side effect“: original study: Reynolds, 1961 – pigeons on CONC.
Schedules of Reinforcement Lecture 14. Schedules of RFT n Frequency of RFT after response is important n Continuous RFT l RFT after each response l Fast.
PSY 402 Theories of Learning Chapter 7 – Behavior & Its Consequences Instrumental & Operant Learning.
Excellence Justify the choice of your model by commenting on at least 3 points. Your comments could include the following: a)Relate the solution to the.
Team Cost, Profit and Winning Here we start with definitions and then look at idea of how a sports owner might run the team. 1.
PSY402 Theories of Learning Chapter 7 – Theories and Applications of Appetitive Conditioning.
PSY 402 Theories of Learning Chapter 7 – Behavior & Its Consequences Instrumental & Operant Learning.
Lectures 15 & 16: Instrumental Conditioning (Schedules of Reinforcement) Learning, Psychology 5310 Spring, 2015 Professor Delamater.
Last Day To Register  This is the last day to register for the November special election.  To register, go to: Rock the Vote website:
OPERANT CONDITIONING DEF: a form of learning in which responses come to be controlled by their consequences.
Chapter 7 Operant Conditioning:
 Also called Differentiation or IRT schedules.  Usually used with reinforcement  Used where the reinforcer depends BOTH on time and the number of reinforcers.
Learning Part II. Overview Habituation Classical conditioning Instrumental/operant conditioning Observational learning.
Chapter 6 Operant Conditioning Schedules. Schedule of Reinforcement Appetitive outcome --> reinforcement –As a “shorthand” we call the appetitive outcome.
Ninth Edition 5 Burrhus Frederic Skinner.
Classical Conditioning
Chapter 13: Schedules of Reinforcement
Investment Analysis and Portfolio Management Chapter 7.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 7. Using Probability Theory to Produce Sampling Distributions.
Basic linear regression and multiple regression Psych Fraley.
Chapter 6 Developing Behavioral Persistence Through the Use of Intermittent Reinforcement.
Meaning of operant conditioning Skinner’s box/maze Laws of learning Operant Conditioning A Skinner’s type of learning.
Operant Conditioning E.L. Thorndike and B.F. Skinner.
Operant Principles (aka: Behavior Management) Dr. Ayers HPHE 4480 Western Michigan University.
Reinforcers and Punishers versus Incentives Reinforcers and punishers refer to good and bad behavior consequences.
PED 383: Adapted Physical Education Dr. Johnson. Reactive – Applied after the fact Punishments Time outs Detentions No recess Proactive Address situation.
PSY402 Theories of Learning Chapter 6 – Appetitive Conditioning.
Schedules of Reinforcement and Choice. Simple Schedules Ratio Interval Fixed Variable.
Data Analysis Econ 176, Fall Populations When we run an experiment, we are always measuring an outcome, x. We say that an outcome belongs to some.
Schedules of Reinforcement CH 17,18,19. Divers of Nassau Diving for coins Success does not follow every attempt Success means reinforcement.
Schedules of Reinforcement Thomas G. Bowers, Ph.D.
Schedules of reinforcement
Race and the Evaluation of Signal Callers in the National Football League David J. Berri (California State University, Bakersfield) Rob Simmons (Lancaster.
Blocking The phenomenon of blocking tells us that what happens to one CS depends not only on its relationship to the US but also on the strength of other.
1 Cost Drivers and Cost Behavior CHAPTER 5 © 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
Copyright © Allyn and Bacon Chapter 6 Learning This multimedia product and its contents are protected under copyright law. The following are prohibited.
Operant Conditioning Chapter 6.
Chapter 15 Forecasting. Forecasting Methods n Forecasting methods can be classified as qualitative or quantitative. n Such methods are appropriate when.
Does the brain compute confidence estimates about decisions?
Behavioral Economics!. Behavioral Economics Application of economic theory to predict and control behavior – Microeconomics – Assumes law of supply and.
Optimization: Doing the
Behavioral Economics! Yep, another theory used to predict reinforcement effects This theory = Nobel Prize for Richard Thaler!
Factors Affecting Performance on Reinforcement Schedules
Choice Behavior One.
Behavioral Economics!.
Schedules of Reinforcement
Arousal Lesson 2 of 2.
Module 20 Operant Conditioning.
PSY402 Theories of Learning
Classical Conditioning and prediction
Operant Principles (aka: Behavior Management)
Significance Tests: The Basics
PSY402 Theories of Learning
Optimization: Doing the
Law of Effect Animals improve on performance:
Applications of the Matching Law
Presentation transcript:

Law of Effect Animals improve on performance: –optimize –not just repeat same behavior but make it better and more efficient Adaptation to the environment via learning Herrnstein considers this behavior change/adaptation a question, –not an answer –e.g., what are adapting to; how are adapting; how know to adapt, etc Wants to know if there is a similar way in which animals optimize, and can it be described by a unified paradigm.

Reinforcement as strength: Reinforcement = making a stronger link between responding and reward Relative frequency –measure of response-reinforcer strength: –Absolute rates: P1/time and Sr/time –Relative rate = P1/P2 and Sr1/Sr2 –response rate as function of reinforcer rate

Reinforcement as strength: Plot proportion of responses as function of proportion of reward –Should be a linear relationship –As rate of reward increases, the rate of reinforcement should increase Note: This is a continuous measure, and not discrete trial: animal has more “choice” –Discrete trial – trial by trial –Free operant: animal controls how many responses it makes

Reinforcement as strength: Differences when organism controls rate vs. time controls rate –Get exclusive choice on FR or VR schedules faster respond = more reinforcer In time, faster responding does not necessarily get you more But: should alter rate of one response alternative in comparison to another –BUT: VI schedules allow examination of changes in response rate as a function of predetermined rate of reinforcer –With VI schedules, can use reinforcer rate as the independent variable! This becomes basis of matching law –Pl/Pl +Pr = Rl/Rl + Rr –Ratio of responding should approximate the rate of reinforcement

A side bar: The Use of CODs COD = change over delay Use of a COD affects response strength and choice: –Shull and Pliskoff (1967): used COD and no COD –Got better approximation of matching with COD Why important: –COD not controlling factor, –controlling factor = response ratio –COD increases discriminability between the two reinforcer schedules Increased discriminability = better “matching” Why?

Herrnstein’s Matching Equation (1961) Begin with a single reinforcer & response P1 =kR R 1 +R o P1= rate of responding to alternative 1 R1 = rate of reinforcement for alternative 1 Ro = rate of unaccounted sources of reinforcement k = asymptote of response rate

Can derive a more general two-choice equation P1 =kR R 1 + R 2 + R o P2 =kR R 1 + R 2 + R o

Cancelling out: P1 =kR R 1 + R 2 + R o P2 =kR R 1 + R 2 + R o

Two-Parameter Matching Equation P1R1 ----=---- P2R2 –Assume that R o is equal for both P1 and P2 –What are some possible “R o ”s? Note that are everything is measurable!

What does this mean? Relative rate of responding varies with relative rate of reinforcement; Must have some effect on absolute rates of responding as well. Simple matching law: P1 = kR1/R1 + Ro –makes a hyperbole function –some maximum rate of responding

How plot? Plot response rate (R/min) as a function of reinforcement rate (S r /min) –Makes a hyperbola –Decelerating ascending curve –Why decelerating- why reach asymptote? Note is a STEADY STATE theory, not an acquisition model

Example: Plot response rate as a function of reinforcer rate: Responses per hourReinforcers per hour

Factors affecting the hyperbola Absolute rates are affected by reinforcement rates –Higher the reinforcement rate the higher the rate of responding –True up to some point (asymptote)- why? Can also plot for P1/R2 = R1/R2 and get same general trend

Baum, 1974 Generalized Matching Law

Describes basic matching law: P1/P1+P2 = R1/R1 + R2 Revises to: P1/P2=R1/R2 Notes that Staddon (1968) found can log it out to get straight lines Also adds two parameters: b and a New version: Log(P1/P2) = a*log(R1/R2) + log b –P1/P2 = b(R1/R2)a –Where a = the undermatching or sensitivity to reward parameter –B = bias

What is Undermatching? Perfect sensitivity to reward or “matching”: a=1.0 undermatching or under sensitivity to reward –Any preference less extreme than the matching relation would predict –a < 1.0: –A systematic deviation from the matching relation for preferences toward both alternatives, in the direction of indifference –Organism is less sensitive than predicted by the reinforcer ratios to changes in those reinforcer ratios

What is Undermatching? A >1.0: over matching or oversensitivity to reward –A preference that is MORE extreme than the equation would predict –Systematic deviaiton in matching relaion for preferences toward the better alternative, to the neglect of the lesser alternative –Organism is more sensitive than predicted to differences in reinforcer alternatives Reward sensitivity = discrimination or sensitivity model: –tells us how sensitive the animal is to changes in the (rate) of reward between the two alternatives

This is an example of almost perfect matching with little bias. Why? This is an example of undermatching with some bias towards the RIGHT feeder. Why? This is an example of overmatching with little bias. Why? Is overmatching BETTER than matching or undermatching? Why or why not?

Factors affecting the a or undermatching parameter: Discriminability between the stimuli signaling the two schedules Discriminability between the two rates of reinforcers Component duration COD and COD duration Deprivation level Social interactions during the experiment Others?

Bias Definition: magnitude of preference is shifted to one reinforcer when there is apparent equality between the rewards Unaccounted for preference Is experimenter’s failure to make both alternatives equal! Calculated using the intercept of the line: –Positive bias is a preference for R1 –Negative bias is a preference for R2

Four Sources of Bias response bias Discrepancy between scheduled and obtained reinforcement Qualitatively different reinforcers Qualitatively different reinforcement schedules Examples: –Difficulty of making response: one response key harder to push than other –Qualitatively different reinforcers: Spam vs. Steak –Color –Preference for a side of box, etc

Qualitatively Different Rewards Matching law only takes into consideration the rate of reward If qualitatively different, must add this in –So: P1/P2 = V1/V2*(R1/R2) a –Must add in additional factor for qualitative differences –Assumes value stays constant regardless of richness of a reinforcement schedule Interestingly, can get u-shaped functions rather than hyperbolas –Has to do with changing value of reward ratios when dealing with qualitatively different reinforcers –Different satiation/habituation points for each type of reweard –Move to economic models that allow for U-shaped rather than hyperbolic functions.

Qualitatively different reinforcement schedules Use of VI versus VR Animal should show exclusive choice for VR, or minimal responding to VI Can control response rate, but not time Not “match” in typical sense, but is still optimizing

So, does the matching law work? It is a really OLD model! Matching holds up well under mathematical and data tests some limitations for model tells us about sensitivity to reward and bias

Applications: McDowall, 1984 Wants to apply Herrnstein's equation to clinical settings: – uses Herrnstein's equation: P=P1/R1+Ro makes several important points about Herrnstein's equation: –r o governs rapidity with which hyperbola reaches asymptote –thus: extraneous reinforcement can affect response strength (rate)

Shows importance of equation contingent reinforcement supports higher rate of reinforcement in barren environments than in rich environments high rates of R o can affect situation when few other S r 's available, your S r 's matter more

Applications: McDowall law of diminishing returns: given increment in reinforcement rate –(delta-Sr) produces a larger increment in the response rate (delta-r) when the prevailing rate of contingent reinforcement is low (r1) than high (r2) –response rate increases hyperbolically with increases in reinforcement,

Applications: McDowall Reinforcement by experimenter/therapist DOES NOT OCCUR in isolation- must deal with R o What else and where else is your client getting reward? What are they comparing YOUR reward to?

Demonstrates with several human studies Bradshaw, Szabadi, & Bevan (1976; 1977; 1978) –button pressing for money –organisms matched Bradshaw, et al, 1981: used manic- depressive subjects –response rate was hyperbolic regardless of mood state –k was larger, Ro smaller when manic –k was smaller, Ro larger when depressed

Demonstrates with several human studies McDowell study: SIB (scratching) boy –used punishment (obtained reprimands) for scratching –found large value of Ro, but kid did match –Ro was so pervasive, was dominant R Ayllon & Roberts (1974): 5th grade boys and studying –reading test performance was rewarded (R1) –disruptive behavior/attention = Ro –found that when increased reinforcement for reading (R1), responding increased and the disruptive behavior decreased (reduced values of Ro)

Demonstrates with several human studies Critchfield: shows works well in sports: three point shots and running vs. passing –Basketball –Football why choose football? –Play calling = individual behavior –Quarterback –Offensive coordinator and head coach

Demonstrates with several human studies Highly skilled players –When calling play, consider success/failure of previous attempt in decision for next play –Individual differences in play-calling patterns (throwing vs passing teams) –Focus at team level

General Method data obtained from NFL –primary data: –number of passing/rushing plays –net yards gained Data off of ESPN websites

General Method several characteristics –plays categorized as rushing or passing based on what occurred rather than what was called (no way of knowing that) –sacks = failed rush play –yards gained = completion even if fumble after catch fit data to matching equation –ratio of yards gained through passing vs rushing used as predictor of ratio of pass plays/rush plays called

Season aggregate league outcome a = 0.725, r2 = 75.7; b= (favor of rushing) historical comparisons: – –2004 fell out of typical range –R2 decreases about 4%/year across years, suggesting more variability in play calling Why? –Shift in rules designed to favor passing –Free-agency rules –Salary caps

comparison with other leagues: Differences: –NFL Europe: –CFL: a =.544, r2 =.567 –Arena Football: a =.56, r2=.784 –United Indoor Football League: a = 61.3, r –National Women’s football association: a =.55, rw =.709 –NCAA Atlantic Coast: a = 0.63, r2=.809 –NCAA Western: a =.868, r2=.946 –NCAA Mid-america: a = 0.509, r2=634 Generally good fits: R2 = of 9 leagues: favored passing rather than rushing –CLF: rushing rather than passing (turnover risk?)

conditional play calling: examined specific circumstances –examined down number (1,2,3) how does matching change? –a = decreasing with down –less likely to pass with increased down –is this surprising? Why or why not?

To reduce behavior a la Herrnstein increase rate of reinforcement for the concurrently available response alternatives –Are engaging in out of seat because it is reinforcing –Increase rate of reinforcement for IN-SEAT behaviors

Game by Game outcomes: Regular season games Preseason fits relatively poor: a =.43 Later in season: better fits: a =.58 Post season slightly better: a =.59 Why?

Actual Therapy situation: Reduce behavior a la Herrnstein increase like a DRO schedule, except: –not reinforcing incompatible responses –arranging environment such that relative rate of reinforcement for desired response is higher than relative rate of reinforcement for undesired behavior Get more for “being good” than for “being bad”

To reduce behavior a la Herrnstein Take home message: IT is the disparity between 2 relative rates of reinforcement that is important, not the incompatibility of the 2 responses

Dealing with noncontingent reinforcement (Ro) An example: unconditional positive regard = free, noncontingent reinforcement will reduce frequency of undesired responding BUT, will also reduce behaviors that may want!!!

Dealing with noncontingent reinforcement (Ro) to increase responding: 3 ways: –increase rate of contingent reinforcement –decrease rate of concurrently available reinforcement of one alternative –decrease rate of free, noncontingent reinforcement

Dealing with noncontingent reinforcement (Ro) works well in rich environments where have more opportunity to alter reinforcement rates. Not have to add reinforcers, but can DECREASE reinforcement to alter situation, avoid satiation/habituaton allows for contextual changes in reinforcement

Behavioral Contrast Behavioral contrast: often found "side effect“: original study: Reynolds, 1961 –pigeons on CONC schedules of reinforcement with equal schedules at first –then, extinguish reinforcement on one alternative –got HUGE change in responding for non-EXT alternative why? Behavioral contrast- changed value of schedule also called the Pullman effect!!!!

Behavioral Contrast Helps explain "side effects" of reinforcement: –e.g.: EXT boy talking to teacher during class, but then kid talks more to peers Why? –P1/P2=R1/R2 100/100 = 100/100 –But then one option goes to EXT –P1/P2=R1/R2 100/100 = 100/0?

Behavioral Contrast Example: boy talking to teacher during class, so teacher puts the talking on EXT – but then kid talks more to peers –Look at ratios: P1 = R P2 R1

Behavioral Contrast lets plug in values: –before, talking to teacher highly valuable: P1/P2 = 100/50 –now: talking to teacher is not valuable: P1/P2 = 1/50 alternative is much more "preferable" than in original situation –If alter R o, get similar changes!

Can mathematically predict Responses –P1 = staying in seat –P2 = out of seat Rewards –R1 = rewards for staying in seat –R2 = rewards for being out of seat –R o = reward for playing around in seat What happens as we vary each of these P1 = R1/R1+R2+R o P2 = R1/R1+R2+R o

Conclusions: Clinical applications: MUST consider broader environmental conceptualizations of problem behavior Must account for sources of reinforcement other than that provided by therapist –again- Herrnstein's idea of context of reinforcement –if not- shoot yourself in the old therapeutic foot