Chapters 5 and 7 Operant Learning
Operant (Instrumental) Learning Stimulus Response Outcome
Classical vs. Operant Classical –Reflex action –Neutral stimulus associated with US –Outside of subject’s control Operant –Strengthens/weakens “voluntary” action –Subject does/doesn’t respond Can occur together
Edward Thorndike Animal intelligence Comparative psychology
Experiments Chicks, cats, dogs Single animals Observational learning
Puzzle Box Thorndike 1898, p. 8
Trial-and-Error Thorndike 1898, p. 19
Law of Effect "When particular stimulus-response sequences are followed by pleasure, those responses tend to be ‘stamped in’; responses followed by pain tend to be ‘stamped out’.” (Thorndike 1911) Reinforced Punished
Methodology Subjects Apparatus Escape latency Time-curves
All images Thorndike 1898, p. 18
Theory Incremental learning S-R Direct experience
Revision Scientific method Observational learning in non-humans
www1.appstate.edu/~kms/classes/psy3202/images/puzzleboxes.gif
B.F. Skinner Operant response –The unit of behaviour –Effect it has on environment Skinner’s approach ( video)video Operant chamber (video)video
Discrete Trial & Free Operant Discrete –One trial at a time –Re-set apparatus –Measure a behaviour –Latency, running speed, reduction in errors –E.g., maze Free –Automatic repeat –Less disruptive for subject –Response rate –E.g., operant chamber
Three-Term Contingency Contingency: Y iff X 1. Discriminative stimulus (S D ) 2. Operant response (R) 3. Outcome (O) –Appetitive or aversive
Outcomes and Effects Positive –Something is delivered Negative –Something is removed Reinforcer –Causes behaviour to increase Punisher –Causes behaviour to decrease Effect on behaviour re: “reinforcer” or “punisher”
Four Basic Operant Relations Response Rate: IncreasesDecreases Removed Presented Response Causes Stimulus to Be: Positive Reinforcement Negative Reinforcement Positive Punishment Negative Punishment e.g. lever press --> get food e.g. lever press --> stop shock e.g. lever press --> get shock e.g. lever press --> food lost
Types of Reinforcers Primary –Not dependent on an association with other reinforcers Secondary (“Conditioned Reinforcer”) –Neutral stimulus paired with primary reinforcer
Secondary Reinforcers “Bridging”, “clicker” Secondary extinction without periodic pairings with primary Generally weaker than primary Less prone to satiation Generalized reinforcer –Paired with many other kinds of reinforcers
Neurobiology of Reinforcement Pleasure centres of brain (reward pathway) –Electrical stimulation of brain (ESB) Dopamine –Major neurotransmitter –Released by appetitive stimuli
Dopamine Release Different amounts of dopamine released Unexpected reinforcement --> more dopamine release –Decreasing learning curve –Rescorla-Wagner –Less “surprising” the more you’ve learned; less dopamine released; less reinforcing
Addictive Internal/external drugs –Orgasm, cocaine, crack Dopamine very addictive Dopamine converts to epinephrine (adrenaline) –“Thrill junkies” –Tolerance develops
Strength of Operant Learning Condition practically any behaviour Shaping (successive approximations)
Shaping a Lever Press Gradual process Reinforce more appropriate/precise responses Feedback
Response Chains Sequences of behaviours in specific order Objective: primary reinforcer Conditioned reinforcers Discriminative stimuli
Backwards Chaining Often used with “complex” training Start with last response in chain Next, second last response Third last, etc.
Chaining S D : discriminative stimulus R: response SR: secondary reinforcer PR: primary reinforcer PR SD2SD2 SR 2 SD1SD1 SD3SD3 SR 3 R 3 : climb up R 2 : walk R 1 : climb down
Forward Chaining Start with first response Add additional links in chain
Factors in Operant Learning
Contiguity Time between behaviour & outcome Delays let other behaviours occur, forgetting, extinction (behaviour w/o reinforcement) –Learning with delay if stimulus “placeholder” provided (conditioned reinforcer?) Important re: punishment
Contingency Correlation between behaviour & outcome Strong vs. random contingency Both reinforcement and punishment
Outcome Characteristics Larger reinforcers/punishers --> stronger learning –Not a linear effect Qualitative differences in reinforcers and punishers –Species & individual differences Intensity of punisher –Tolerance
Task Characteristics Some tasks easier to learn than others Species & individual differences Innate and/or prior conditioning
Deprivation Levels Generally, the greater the deprivation, the more effective the reinforcer Reinforcer satiation Deprivation can motivate punishable responses
Reinforcers in Punishment What maintains undesired behaviour? Benefit? Alternative sources of reinforcement –Find other ways to provide acceptable reinforcement
Latent Learning Motivation Learning behaviour Performing behaviour
Tolman & Honzig (1930) Day 11 Average Errors Days food no food no food until day 11
Extinction Response no longer produces same outcome Extinction burst Variability of behaviour Aggression and frustration Spontaneous recovery
Behaviour Modification Also “behaviour analysis” Alter behaviour via operant conditioning Therapy Reinforcement vs. punishment
Problems with Punishment in Behaviour Modification Application of the punisher Incorrect use of punishment –Creates issues or exacerbates punishment consequences Tolerance –Start with strong punisher –Gradually reduce General reluctance to administer
Possible Consequences of Punishment Escape Aggression, violence –At punisher, self, other Apathy –General suppression of other behaviours Abuse –Permanent damage Imitation
Alternatives to Using Punishment
Response Prevention Make it impossible to do punishable behaviour Circumvention Younger children
Extinction Identify reinforcer of behaviour Withhold reinforcer Difficult to ID reinforcer Extinction bursts Slow
Differential Reinforcement Differential reinforcement of low responses (DRL) –Only reinforce behaviour when response occurs at low frequency Differential reinforcement of zero responses (DR0) –Reinforcement contingent on not performing behaviour at all (in some time period)
Differential reinforcement of alternative behaviour (DRA) –Reinforcer gained from undesired behaviour now only available when some alternative behaviour done Differential reinforcement of incompatible behaviour (DRI) –Reinforce behaviour completely incompatible with undesired response
Noncontingent Reinforcement Provide desired reinforcer on regular basis regardless of what is being done No correlation between response and outcome May work because subject gets reinforcer for “free” Problems if reinforcer comes after some other undesired behaviour (new acquisition)
Negative Punishment Removal of pleasant stimulus Time-out Popular in human behaviour modification
Other Techniques for Behavioural Deceleration Overcorrection –Repetitions of alternate, desired behaviour Restitution Positive practice –Technically, punishment Stimulus satiation
Escape and Avoidance
Definitions Escape –Get away from aversive stimulus that is in progress Avoidance –Get away from aversive stimulus before it begins
Shuttle Box Solomon & Wynne (1953) –Dogs –Chamber with barrier; Shock –Light off as signal
Theory Issues For escape, no ambiguity –Aversive removed, behaviour increases = negative reinforcement What about avoidance? –Shuttles before shock –Behaviour increases –Nothing obvious removed or delivered Mowrer & Lamoreaux (1942) –“…not getting something can hardly, in and of itself, qualify as rewarding.”
Two-Process Theory Classical and operant conditioning –Shock = US –Fear/pain/jump/twitch/ squeal = UR –Darkness = CS –Fear of dark = CR Fear: heart rate, breathing, stomach cramps, etc. Negative reinforcement –Removal of fear (CR) Escape from CS, not avoidance of shock Two-process treats avoidance as just another type of escape behaviour
Support for Two-Process Theory Rescorla & LoLordo (1965) Dog in shuttlebox –No signal –Response gives “safe time” Pair tone with shock –Tone increases rate of response CS can amplify avoidance Conditioned inhibition can reduce avoidance
Problems with Two-Process Theory Avoidance without observable fear –Heart rate –Not consistent Fear diminishes with avoidance learning
Measuring Fear Kamin, Brimer, and Black (1963) –Lever press ---> food –Auditory CS ---> avoidance in shuttle box until: 1, 3, 9, 27 avoidances in a row –CS in operant chamber; check for suppression of lever press
Results Fear decreases during extended avoidance training But, avoidance still strong Even low fear is enough? Avoidance responses Responding
Extinction in Avoidance Behaviour Odd prediction from two-process theory “Yo-yo” effect Avoidance should toggle But! Avoidance is extremely persistent successful avoidance trials # of US received
One-Process Theory Classical conditioning component unnecessary Two interpretations of reinforcer –Molar vs. molecular –Negative reinforcement: Overall reduction in exposure to punishers is reinforcer (text interpretation) –Postive reinforcement: Avoidance itself is reinforcer; subject gets reinforced by “safety” on a trial
Sidman Avoidance Task Free-operant avoidance –Can avoidance be learned if no warning CS? Shock at random intervals Response gives safe time Extensive training --> learn avoidance –But, usually never perfect –High variability across subjects Two-process theory suggests: –Time becomes a CS (time elicits fear)
Herrnstein & Hineline (1966) Rapid and slow shock rate schedules Response switches schedules Shocks presented randomly, no signal Responses give shock reduction Reduction in shock frequency is reinforcer
Learned Helplessness Behaviour has no effect on situation Generalizes Laboratory –Give inescapable shocks –Shuttle box –Will not switch sides –Expectation that behaviour has no effect
Learned Helplessness in Humans Depression Situations beyond your control Three dimensions –Situation: specific or global –Attribute: internal or external –Time: short-term or long-term
Therapeutic Application Confidence building (“can not fail”) –Implementation issues Tasks that can be successfully completed –Produces immunization –Escapable condition … inescapable condition Learned helplessness less likely to develop
Theories of Operant Conditioning
Hull’s Drive Reduction Theory Animals have motivational states (drives) Necessary for survival Reinforcers are things that reduce drives Physiological value –Reduce physiological state
Drive Reduction Reinforcers Works well with primary reinforcers Many secondary reinforcers have no physiological value Hull: association links secondary to drive Some reinforcers hard to classify as primary or secondary Some increase a physiological state Some necessities undetectable Roller coasters Vitamins Saccharin
Relative Value Theory & Premack Principle Treat reinforcers as behaviours Is it the food, or the behaviour of eating that is the reinforcer? Behavioural probability scale Greater or lesser value of behaviours relative to one another No distinction between primary and secondary
Premack Principle One behaviour will reinforce a second behaviour –High probability behaviour reinforces low probability behaviour Baseline probability scale –Time –Rank order Reinforcement relativity –No absolutes Probabilty of response = Time spent on response Total time
Example Behaviours –Eat ice cream (I), play video game (V), read book (B) Baseline (30 minutes) –Student 1: I (2min), V (8min), B (20min) Scale: I -- V -- B –Student 2: I (8min), V (20min), B (2min) Scale: B -- I -- V Student 1: V reinforces I, B reinforces V & I Student 2: I reinforces B, V reinforces I & B
Problems Baseline phase –Fair rating? –How to compare very different behaviours Time problems –What if time not important to behaviour? –Behaviour duration? –Length of baseline period?
Response Deprivation Theory Deprived behaviours = reinforcing behaviours Drop below baseline level of performance Not relative frequency of one behaviour compared to another (i.e., Premack) Level of deprivation for a behaviour