Response-Reinforcer Relation

Slides:

Advertisements

Similar presentations

Instrumental Learning & Operant Reinforcement

Advertisements

Instrumental/Operant Conditioning

Lectures 14: Instrumental Conditioning (Basic Issues) Learning, Psychology 5310 Spring, 2015 Professor Delamater.

Instrumental Conditioning Also called Operant Conditioning.

Aversive Conditioning. ReinforcementPunishment Positive contingency Negative contingency Chocolate BarElectric Shock Excused from Chores No TV privileges.

THEORIES AND APPLICATIONS OF AVERSIVE CONDITIONING Chapter 7 1.

Lecture 21: Avoidance Learning & Punishment Learning, Psychology 5310 Spring, 2015 Professor Delamater.

PSY402 Theories of Learning Chapter 8 – Aversive Conditioning.

Copyright © 2005 Pearson Education Canada Inc. Learning Chapter 5.

PSY 402 Theories of Learning Chapter 10 – A Synthetic Perspective on Instrumental Learning.

Instrumental Conditioning: Foundations. Name Game Instrumental: subject instrumental in producing outcome Operant: subject operates on environment to.

Negative Reinforcement

RewardPunishment Reinforcement/Punishment Four Possible Consequences There are four possible consequences to any behavior. They are: Something Good.

PSY 402 Theories of Learning Chapter 10 – A Synthetic Perspective on Instrumental Learning.

Operant Conditioning Thomas G. Bowers, Ph.D. Penn State Harrisburg.

Learning the Consequences of Behavior

Learning & Maladaptive Behavior Lesson 17. Maladaptive Behavior n Detrimental to well-being/survival n How is it acquired? l Normal learning mechanisms.

Learning Part II. Overview Habituation Classical conditioning Instrumental/operant conditioning Observational learning.

Chapter 10 Aversive Control: Avoidance and Punishment.

Operant Conditioning Unit 4 - AoS 2 - Learning. Trial and Error Learning An organism’s attempts to learn or solve a problem by trying alternative possibilities.

B.F. SKINNER - "Skinner box": -many responses -little time and effort -easily recorded -RESPONSE RATE is the Dependent Variable.

Operant Conditioning Unit 4 - AoS 2 - Learning. Trial and Error Learning An organism’s attempts to learn or solve a problem by trying alternative possibilities.

PSY 402 Theories of Learning Chapter 10 – A Synthetic Perspective on Instrumental Learning.

LEARNING: PRINCIPLES AND APPLICATIONS Operant Conditioning.

Psychology 2250 Last Class Characteristics of Habituation and Sensitization -time course -stimulus-specificity -effects of strong extraneous stimuli (dishabituation)

OPERANT CONDITIONING. Learning in which a certain action is reinforced or punished, resulting in corresponding increases or decreases in behavior.

Learning & Maladaptive Behavior

Learning Chapter 5.

Operant Conditioning – Chapter 9 Theories of Learning October 19, 2005 Class #25.

Unit 1 Review 1. To say that learning has taken place, we must observe a change in a subject’s behavior. What two requirements must this behavioral change.

PSY402 Theories of Learning

PSY402 Theories of Learning Monday February 10, 2003.

PSY 402 Theories of Learning Chapter 10 – A Synthetic Perspective on Instrumental Learning.

Chapter 6 LEARNING. Learning Learning – A process through which experience produces lasting change in behavior or mental processes. Behavioral Learning.

PSY402 Theories of Learning Wednesday March 5, 2003.

Becoming Extinct. Extinction of Conditioned behavior Extinction is a typical part of the learning process – Pavlovian or Instrumental contingency can.

Making Responses for your Reinforcer. Instrumental Conditioning Procedures There are 4 basic types of instrumental conditioning Table 5.1 Instrumental.

Stimulus Control of Behavior

Instrumental Conditioning

Consequences for Avoiding Bad Things

Learning: Principles and Applications

OperanT conditioning.

The Rescorla-Wagner Model

PSY402 Theories of Learning

Context Cues and Conditional Relations

Schedules and more Schedules

Chapter 5 Learning © 2013 by McGraw-Hill Education. This is proprietary material solely for authorized instructor use. Not authorized for sale or distribution.

Choice Behavior One.

AP Psychology Unit: Learning.

Preview p.8 What reinforcers are at work in your life? i.e. What rewards increase the likelihood that you will continue with desirable behavior.. At.

Choice Behavior Two.

Unit: Learning.

Module 20 Operant Conditioning.

Operant conditioning.

Learning (Behaviorism)

Thinking About Psychology: The Science of Mind and Behavior 2e

PSY402 Theories of Learning

PSY402 Theories of Learning

Operant Conditioning Unit 4 - AoS 2 - Learning.

Operant Conditioning.

Classical Conditioning

PSY402 Theories of Learning

9.2 Operant Conditioning “Everything we do and are is determined by our history of rewards and punishments.” –BF Skinner Operant Conditioning: learning.

Do-Now: Describe the following phenomena of Classical Conditioning:

Learning (Behaviorism)

Chapter 7: Learning.

9.2 Operant Conditioning “Everything we do and are is determined by our history of rewards and punishments.” –BF Skinner Operant Conditioning: learning.

Operant Conditioning What the heck is it?

Presentation transcript:

Response-Reinforcer Relation Two types of relationships exist between a response and a reinforcer. 1. Causal Relationship between a response and the reinforcer Contingency: (R) - (O) doing(R) produces (O) extent to which response is necessary and sufficient for occurrence of the reinforcer 2. Temporal Relationship between the response and reinforcer amount of time between response and reinforcer. Temporal Contiguity: (R) - (O) , outcome is immediate Or with some delay (R) - - - -delay- - - - - (O) Response-reinforcer contingency and temporal contiguity are independent of each other Can have (R)-(O)contingency with either short or long temporal relationship Mow the lawn, get paid immediately or get paid next Tuesday Can have immediate outcome but only get the outcome part of the time Put money in a slot machine, sometimes you win and sometimes you do not

Effects of Temporal Contiguity Temporal contiguity (R) - (O) immediate outcome provides the best learning Even short delays CAN hinder learning Dickinson et al (1992) rats were reinforced for lever-pressing varied delay between response and reinforcer As the delay between the response and reinforcer increased the conditioned lever pressing decreased dramatically see Figure 5.11 Why is instrumental conditioning so sensitive to a delay of reinforcement? Is this also true for human behavior? How much delay can people tolerate? What are some examples of long delay between behavior and outcome? Dealing with delay of reinforcement However even rats can tolerate some delay if they bridge the delay with a conditioned reinforcer Or with a marking procedure

The Principles of Learning and Behavior, 7e by Michael Domjan Copyright © 2015 Wadsworth Publishing, a division of Cengage Learning. All rights reserved.

Secondary or Conditioned Reinforcer Primary reinforcers: usually food, drink, and pleasure Secondary, or conditioned reinforcer associated with the primary reinforcer Present clicker – food pairings for dog training Then response “sit” -- outcome “clicker“ for training the dog to sit clicker is a conditioned reinforcer used to bridge gap until primary reinforcer However clicker --- food only needs to be reinstated occasionally Response “sit” -- outcome “verbal praise” for training a dog to sit Is verbal praise a conditioned reinforcer? What works as conditioned reinforcers for humans? What is a common conditioned reinforcer for people? Coaches, instructors, parents use verbal praise What is the primary reinforcer?

Marking Procedure Marking the target response to bridge delayed reinforcement Use a specific stimulus such as a flashing light in conjunction with the response In discrete trials “maze” moving the rat to a holding chamber during the delay Lieberman et al (1979) tested whether rats could learn a correct turn or choice in a maze despite a long delay of reward Williams (1999) No signal group: lever press with 30 second delay then food delivery Marked group: lever press, 5 second light, 25 second delay then food delivery Blocked group: lever press, 25 second delay, 5 second light before food delivery See Figure 5.12 marking improves learning while blocking prevents learning marked group can use the light signal to fill the delay in the blocked group the light just before the food interferes with learning However, even the no signal group shows some learning so they must be bridging the delay by some other means such as ?

Figure 5.12 – Acquisition of lever pressing in rats with a 30 second delay of reinforcement. The Principles of Learning and Behavior, 7e by Michael Domjan Copyright © 2015 Wadsworth Publishing, a division of Cengage Learning. All rights reserved.

Response-Reinforcer Contingency Contingency: the outcome is "contingent on" a particular response positive contingency response produces outcome negative contingency response prevents or eliminates outcome Studies of delay of reinforcement demonstrated that a perfect contingency (R – O) is not sufficient to produce strong instrumental responding Led to the conclusion that response-reinforcer contiguity, rather than contingency, was the critical factor. Skinner’s Superstition Experiment supported this conclusion Food presented to pigeons every 15 seconds regardless of the behavior of the bird. Birds showed stereotyped behavior patterns Skinner’s operant conditioning explanation Adventitious (accidental) reinforcement of the bird’s behavior Stressed the importance of contiguity between response and the reinforcer However Skinner got this one wrong

Staddon and Simmelhag Superstition Experiment A landmark study that challenged Skinner’s interpretation. See Figure 5.13 Similar procedure except fixed time interval of 12 s birds were observed and their behavior recorded on all sessions. Found two types of responses at asymptote: Interim responses: started immediately after food delivery but terminated several seconds before food (e.g., turning circles, flapping wings), differed among pigeons and intervals Terminal responses: started mid-interval and continued until food was delivered. For all pigeons the terminal response was pecking terminal responses were reinforced Differences in Interim and Terminal responses can be explained by behavior systems Terminal responses are species-specific behavior that is part of focal search Interim responses are more like general search beahvior

The Principles of Learning and Behavior, 7e by Michael Domjan Copyright © 2015 Wadsworth Publishing, a division of Cengage Learning. All rights reserved.

Controllability of Reinforcers Response controls the reinforcer A strong contingency allows control over the reinforcer. A weak contingency does not allow control over the reinforcer can be seen in positive reinforcement Eating candy for example However, most of the research has used Negative Reinforcement Which has more clinical application Responses remove or prevent an aversive event Taking drugs to reduce pain Jumping back to avoid getting run over Early experiments used escape-avoidance learning in dogs Unavoidable “uncontrollable” shock disrupted subsequent avoidance learning Avoidable “controllable” shock did not disrupt avoidance learning Called the learned-helplessness effect because they failed to avoid the aversive shock even when they had the opportunity, i.e. they gave up

Learned-Helplessness (LH) effect The triadic design for LH experiments is outlined in Table 5.2 Phase one: Exposure to inescapable shock Group 1: Escape - restrained and given unsignaled shock to tail and could terminate the shock by spinning a wheel in front of them Group 2: Yoked - placed in same restraint and given same number and pattern of shocks but could not terminate the shocks Group 3: Control - just put in restraint, no shocks Phase two: all groups, put in a 2-compartment shuttlebox and taught a normal escape/avoidance reaction avoid shock by responding during a 10-s warning Light or escape shock once it came on by jumping to other side of compartment if subject did not respond in 60 seconds the shock was terminated the experiment tested whether phase one experience affected escape/avoidance learning

Wheel-Turn apparatus used in LH experiments Table 5.2 – The Triadic Design Used in Studies of the Learned-Helplessness Effect The Principles of Learning and Behavior, 7e by Michael Domjan Copyright © 2015 Wadsworth Publishing, a division of Cengage Learning. All rights reserved.

The Principles of Learning and Behavior, 7e by Michael Domjan Copyright © 2015 Wadsworth Publishing, a division of Cengage Learning. All rights reserved.

Learned-Helplessness (LH) effect Results from the triadic design for LH experiments The Escape group learned as easily as the Control group But the Yoked group showed an impairment. This deficit in learning is the learned-helplessness effect. The failure to learn was due to the inability to control shock in phase one According to Seligman and Maier, the lack of control in phase one led to the development of the general expectation that behavior is irrelevant to the shock offset. This expectation of lack of control transferred to the new situation in phase two, causing retardation of learning in the shuttle box

Learned-Helplessness hypothesis (LH) Based on the conclusion that animals can perceive the contingency between their behavior and the reinforcer. When the outcomes are independent of the subject’s behavior the subject develops a state of learned helplessness which is manifest in 2 ways 1. there is a motivational loss indicated by a decline in performance and heightened level of passivity 2. the subject has a generalized expectation that reinforcers will continue to be independent of its behavior this persistent belief is the cause of the future learning deficit The LH hypothesis has been challenged by studies showing that it is not the lack of control that leads to the LH outcome, but rather the inability to predict the reinforcer. 1. Receiving predictable, inescapable shock is less damaging than receiving unsignaled shock. If inescapable shock is signaled, there is less learning deficit. 2. Presentation of stimuli following offset of inescapable eliminates the LH deficit. house-light was turned off for a few seconds when shock ended Yoked/feedback group learned as well as the Escape and No shock groups

Alternatives to LH hypothesis Attentional deficits Activity deficit hypothesis Reduced activity similar to freezing Not support by Y maze choice learning Attentional deficits hypothesis Inescapable shock may cause animals to pay less attention to their actions. If an animal fails to pay attention to its behavior, it will have difficulty associating its actions with reinforces in escape-avoidance conditioning. When the response is marked by an external stimulus, which helps the animal pay attention to the appropriate response, the LH deficit is reduced. Stimulus relations in escape conditioning Why doesn’t controllable shock produce deficits in learning? Receiving shock will produce strong emotional responses In escape condition the response (turning wheel) terminates the shock Very similar to actual escape from a predator i.e. negative reinforcement Signals “safety” so there are safety-signal feedback cues See Figure 5.14

Figure 5.14 – Stimulus relations in an escape-conditioning trial. Stimulus relations in an escape-conditioning trial. Shock-cessation feedback cues are experienced at the start of the escape response, just before the termination of shock. Safety-signal feedback cues are experienced just after the termination of shock, at the start of the intertrial interval. The Principles of Learning and Behavior, 7e by Michael Domjan Copyright © 2015 Wadsworth Publishing, a division of Cengage Learning. All rights reserved.

Alternatives to LH hypothesis Attentional deficits Safety-signal feedback hypothesis Minor (1988, 1990) Inescaple shock group given a safety signal at the termination of shock 5 second light in first study Audio-visual combination in the second study Addition of safety signal eliminated the learned helpless effect Do not need to be able to escape from the aversive event Predictability, when it will begin and end, will prevent learned helplessness