Chapters 5 and 7 Operant Learning. Operant (Instrumental) Learning Stimulus Response Outcome.

Slides:



Advertisements
Similar presentations
Chapter 4 Using Reinforcement to Increase Operant Behavior
Advertisements

Instrumental Learning & Operant Reinforcement
Associative Learning Operant Conditioning. Foundations Edward Thorndike ( ) –Puzzle Box –Cats became more efficient with each trial –Law of.
Learning – Operant Conditioning AP Psychology Chapter 6.
PSY402 Theories of Learning Chapter 9, Theories and Applications of Aversive Conditioning.
Lecture 21: Avoidance Learning & Punishment Learning, Psychology 5310 Spring, 2015 Professor Delamater.
Aversive Control: Avoidance and Punishment
Copyright © 2005 Pearson Education Canada Inc. Learning Chapter 5.
Instrumental Conditioning: Foundations. Name Game Instrumental: subject instrumental in producing outcome Operant: subject operates on environment to.
Negative Reinforcement
Psychology of Learning EXP4404
OPERANT CONDITIONING DEF: a form of learning in which responses come to be controlled by their consequences.
Chapter 6: Learning Music: “Live and Learn” by the Cardigans “Learn to Fly” by the Foo Fighters.
Learning Part II. Overview Habituation Classical conditioning Instrumental/operant conditioning Observational learning.
Chapter 10 Aversive Control: Avoidance and Punishment.
Operant Conditioning Unit 4 - AoS 2 - Learning. Trial and Error Learning An organism’s attempts to learn or solve a problem by trying alternative possibilities.
Learning/Behaviorism Operant and Observational learning.
Learning.
B.F. SKINNER - "Skinner box": -many responses -little time and effort -easily recorded -RESPONSE RATE is the Dependent Variable.
© 2008 The McGraw-Hill Companies, Inc. Chapter 6: Learning.
Chapter 6 Learning. Table of Contents Learning Learning defined on page –Classical conditioning –Operant/Instrumental conditioning –Observational learning.
© 2013 by McGraw-Hill Education. This is proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Operant Conditioning Unit 4 - AoS 2 - Learning. Trial and Error Learning An organism’s attempts to learn or solve a problem by trying alternative possibilities.
Chapter 6: Learning 1Ch. 6. – Relatively permanent change in behavior due to experience 1. Classical Conditioning : Pairing 2. Operant Conditioning :
Operant Conditioning Operant Conditioning A type of learning in which behavior is strengthened if followed by reinforcement or diminished if.
© 2013 by McGraw-Hill Education. This is proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Classical Conditioning
Learning Review Flashcards for Terms on the Test.
Learning. A. Introduction to learning 1. Why do psychologists care about learning? 2. What is and isn’t learning? IS: A relatively permanent change in.
Chapter 6 Learning.
Psychology: An Introduction Charles A. Morris & Albert A. Maisto © 2005 Prentice Hall Learning Chapter 6.
Operant Conditioning E.L. Thorndike and B.F. Skinner.
LEARNING: PRINCIPLES AND APPLICATIONS Operant Conditioning.
College Board - “Acorn Book” Course Description 7-9% Unit VI. Learning 1 VI. Learning.
Learning Principles and Applications
Table of Contents CHAPTER 6 Learning. Table of ContentsLEARNING  Learning  Classical conditioning  Operant/Instrumental conditioning  Observational.
Chapters 1 & 2 Definition of Learning Change in behaviour due to experience Behaviour & Evolution Types of selection: natural, artificial, sexual Selection.
PSY402 Theories of Learning Chapter 6 – Appetitive Conditioning.
Copyright McGraw-Hill, Inc Chapter 5 Learning.
LEARNING  a relatively permanent change in behavior as the result of an experience.  essential process enabling animals and humans to adapt to their.
Learning Experiments and Concepts.  What is learning?
Assignment #2 Deadline changed to JUNE 4 th Will mostly focus on Ch 7 Talk about that after the midterm on Monday Topics will be announced on Monday.
Instrumental Conditioning: Motivational Mechanisms.
OPERANT CONDITIONING. Learning in which a certain action is reinforced or punished, resulting in corresponding increases or decreases in behavior.
Operant conditioning (Skinner – 1938, 1956)
Learning Chapter 5.
© 2013 by McGraw-Hill Education. This is proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Chapter 6 Learning and Behavior Learning n A more or less permanent change in behavior that results from experience.
Unit 6: Learning. How Do We Learn? Learning = a relatively permanent change in an organism’s behavior due to experience. 3 Types:  Classical  Operant.
Def: a relatively permanent change in behavior that results from experience Classical Conditioning: learning procedure in which associations are made.
PSY402 Theories of Learning Chapter 4 – Appetitive Conditioning.
Operant Conditioning – Chapter 9 Theories of Learning October 19, 2005 Class #25.
Psychology: An Introduction Charles A. Morris & Albert A. Maisto © 2005 Prentice Hall Learning Chapter 5.
Chapter 2: Behavioral Learning Theory What causes change in behavior?
Learning 7-9% of the AP Psychology exam. Thursday, December 3 Sit with your group from yesterday’s test review!
Table of Contents Chapter 6 Learning. Table of Contents Learning –Classical conditioning –Operant/Instrumental conditioning –Observational learning Ivan.
Chapter 6: Learning Music: “Another Brick in the Wall” Pink Floyd “Learn to Fly” Foo Fighters.
Copyright © Allyn and Bacon Chapter 6 Learning This multimedia product and its contents are protected under copyright law. The following are prohibited.
Operant Conditioning. A type of learning in which the frequency of a behavior depends on the consequence that follows that behavior. The frequency will.
Chapter 6 Learning. Objectives 6.1 How We Learn Distinguish among three major types of learning theories focusing on behavior. 6.2 Classical Conditioning.
CHAPTER 4 Behavioural views of learning Identify three characteristics that distinguish classical from operant conditioning Describe the Premack principle.
LEARNING * A relatively permanent change in behavior or knowledge resulting from experience.
Basic Learning Processes Robert C. Kennedy, PhD University of Central Florida
Chapter 6 LEARNING. Learning Learning – A process through which experience produces lasting change in behavior or mental processes. Behavioral Learning.
Conditioning and Learning Unit 6 Conditioning and Learning Modules
Chapter 5 Learning © 2013 by McGraw-Hill Education. This is proprietary material solely for authorized instructor use. Not authorized for sale or distribution.
© 2008 The McGraw-Hill Companies, Inc.
AP Psychology Unit: Learning.
Unit: Learning.
PSY402 Theories of Learning
Presentation transcript:

Chapters 5 and 7 Operant Learning

Operant (Instrumental) Learning Stimulus Response Outcome

Classical vs. Operant Classical –Reflex action –Neutral stimulus associated with US –Outside of subject’s control Operant –Strengthens/weakens “voluntary” action –Subject does/doesn’t respond Can occur together

Edward Thorndike Animal intelligence Comparative psychology

Experiments Chicks, cats, dogs Single animals Observational learning

Puzzle Box Thorndike 1898, p. 8

Trial-and-Error Thorndike 1898, p. 19

Law of Effect "When particular stimulus-response sequences are followed by pleasure, those responses tend to be ‘stamped in’; responses followed by pain tend to be ‘stamped out’.” (Thorndike 1911) Reinforced Punished

Methodology Subjects Apparatus Escape latency Time-curves

All images Thorndike 1898, p. 18

Theory Incremental learning S-R Direct experience

Revision Scientific method Observational learning in non-humans

www1.appstate.edu/~kms/classes/psy3202/images/puzzleboxes.gif

B.F. Skinner Operant response –The unit of behaviour –Effect it has on environment Skinner’s approach ( video)video Operant chamber (video)video

Discrete Trial & Free Operant Discrete –One trial at a time –Re-set apparatus –Measure a behaviour –Latency, running speed, reduction in errors –E.g., maze Free –Automatic repeat –Less disruptive for subject –Response rate –E.g., operant chamber

Three-Term Contingency Contingency: Y iff X 1. Discriminative stimulus (S D ) 2. Operant response (R) 3. Outcome (O) –Appetitive or aversive

Outcomes and Effects Positive –Something is delivered Negative –Something is removed Reinforcer –Causes behaviour to increase Punisher –Causes behaviour to decrease Effect on behaviour re: “reinforcer” or “punisher”

Four Basic Operant Relations Response Rate: IncreasesDecreases Removed Presented Response Causes Stimulus to Be: Positive Reinforcement Negative Reinforcement Positive Punishment Negative Punishment e.g. lever press --> get food e.g. lever press --> stop shock e.g. lever press --> get shock e.g. lever press --> food lost

Types of Reinforcers Primary –Not dependent on an association with other reinforcers Secondary (“Conditioned Reinforcer”) –Neutral stimulus paired with primary reinforcer

Secondary Reinforcers “Bridging”, “clicker” Secondary extinction without periodic pairings with primary Generally weaker than primary Less prone to satiation Generalized reinforcer –Paired with many other kinds of reinforcers

Neurobiology of Reinforcement Pleasure centres of brain (reward pathway) –Electrical stimulation of brain (ESB) Dopamine –Major neurotransmitter –Released by appetitive stimuli

Dopamine Release Different amounts of dopamine released Unexpected reinforcement --> more dopamine release –Decreasing learning curve –Rescorla-Wagner –Less “surprising” the more you’ve learned; less dopamine released; less reinforcing

Addictive Internal/external drugs –Orgasm, cocaine, crack Dopamine very addictive Dopamine converts to epinephrine (adrenaline) –“Thrill junkies” –Tolerance develops

Strength of Operant Learning Condition practically any behaviour Shaping (successive approximations)

Shaping a Lever Press Gradual process Reinforce more appropriate/precise responses Feedback

Response Chains Sequences of behaviours in specific order Objective: primary reinforcer Conditioned reinforcers Discriminative stimuli

Backwards Chaining Often used with “complex” training Start with last response in chain Next, second last response Third last, etc.

Chaining S D : discriminative stimulus R: response SR: secondary reinforcer PR: primary reinforcer PR SD2SD2 SR 2 SD1SD1 SD3SD3 SR 3 R 3 : climb up R 2 : walk R 1 : climb down

Forward Chaining Start with first response Add additional links in chain

Factors in Operant Learning

Contiguity Time between behaviour & outcome Delays let other behaviours occur, forgetting, extinction (behaviour w/o reinforcement) –Learning with delay if stimulus “placeholder” provided (conditioned reinforcer?) Important re: punishment

Contingency Correlation between behaviour & outcome Strong vs. random contingency Both reinforcement and punishment

Outcome Characteristics Larger reinforcers/punishers --> stronger learning –Not a linear effect Qualitative differences in reinforcers and punishers –Species & individual differences Intensity of punisher –Tolerance

Task Characteristics Some tasks easier to learn than others Species & individual differences Innate and/or prior conditioning

Deprivation Levels Generally, the greater the deprivation, the more effective the reinforcer Reinforcer satiation Deprivation can motivate punishable responses

Reinforcers in Punishment What maintains undesired behaviour? Benefit? Alternative sources of reinforcement –Find other ways to provide acceptable reinforcement

Latent Learning Motivation Learning behaviour Performing behaviour

Tolman & Honzig (1930) Day 11 Average Errors Days food no food no food until day 11

Extinction Response no longer produces same outcome Extinction burst Variability of behaviour Aggression and frustration Spontaneous recovery

Behaviour Modification Also “behaviour analysis” Alter behaviour via operant conditioning Therapy Reinforcement vs. punishment

Problems with Punishment in Behaviour Modification Application of the punisher Incorrect use of punishment –Creates issues or exacerbates punishment consequences Tolerance –Start with strong punisher –Gradually reduce General reluctance to administer

Possible Consequences of Punishment Escape Aggression, violence –At punisher, self, other Apathy –General suppression of other behaviours Abuse –Permanent damage Imitation

Alternatives to Using Punishment

Response Prevention Make it impossible to do punishable behaviour Circumvention Younger children

Extinction Identify reinforcer of behaviour Withhold reinforcer Difficult to ID reinforcer Extinction bursts Slow

Differential Reinforcement Differential reinforcement of low responses (DRL) –Only reinforce behaviour when response occurs at low frequency Differential reinforcement of zero responses (DR0) –Reinforcement contingent on not performing behaviour at all (in some time period)

Differential reinforcement of alternative behaviour (DRA) –Reinforcer gained from undesired behaviour now only available when some alternative behaviour done Differential reinforcement of incompatible behaviour (DRI) –Reinforce behaviour completely incompatible with undesired response

Noncontingent Reinforcement Provide desired reinforcer on regular basis regardless of what is being done No correlation between response and outcome May work because subject gets reinforcer for “free” Problems if reinforcer comes after some other undesired behaviour (new acquisition)

Negative Punishment Removal of pleasant stimulus Time-out Popular in human behaviour modification

Other Techniques for Behavioural Deceleration Overcorrection –Repetitions of alternate, desired behaviour Restitution Positive practice –Technically, punishment Stimulus satiation

Escape and Avoidance

Definitions Escape –Get away from aversive stimulus that is in progress Avoidance –Get away from aversive stimulus before it begins

Shuttle Box Solomon & Wynne (1953) –Dogs –Chamber with barrier; Shock –Light off as signal

Theory Issues For escape, no ambiguity –Aversive removed, behaviour increases = negative reinforcement What about avoidance? –Shuttles before shock –Behaviour increases –Nothing obvious removed or delivered Mowrer & Lamoreaux (1942) –“…not getting something can hardly, in and of itself, qualify as rewarding.”

Two-Process Theory Classical and operant conditioning –Shock = US –Fear/pain/jump/twitch/ squeal = UR –Darkness = CS –Fear of dark = CR Fear: heart rate, breathing, stomach cramps, etc. Negative reinforcement –Removal of fear (CR) Escape from CS, not avoidance of shock Two-process treats avoidance as just another type of escape behaviour

Support for Two-Process Theory Rescorla & LoLordo (1965) Dog in shuttlebox –No signal –Response gives “safe time” Pair tone with shock –Tone increases rate of response CS can amplify avoidance Conditioned inhibition can reduce avoidance

Problems with Two-Process Theory Avoidance without observable fear –Heart rate –Not consistent Fear diminishes with avoidance learning

Measuring Fear Kamin, Brimer, and Black (1963) –Lever press ---> food –Auditory CS ---> avoidance in shuttle box until: 1, 3, 9, 27 avoidances in a row –CS in operant chamber; check for suppression of lever press

Results Fear decreases during extended avoidance training But, avoidance still strong Even low fear is enough? Avoidance responses Responding

Extinction in Avoidance Behaviour Odd prediction from two-process theory “Yo-yo” effect Avoidance should toggle But! Avoidance is extremely persistent successful avoidance trials # of US received

One-Process Theory Classical conditioning component unnecessary Two interpretations of reinforcer –Molar vs. molecular –Negative reinforcement: Overall reduction in exposure to punishers is reinforcer (text interpretation) –Postive reinforcement: Avoidance itself is reinforcer; subject gets reinforced by “safety” on a trial

Sidman Avoidance Task Free-operant avoidance –Can avoidance be learned if no warning CS? Shock at random intervals Response gives safe time Extensive training --> learn avoidance –But, usually never perfect –High variability across subjects Two-process theory suggests: –Time becomes a CS (time elicits fear)

Herrnstein & Hineline (1966) Rapid and slow shock rate schedules Response switches schedules Shocks presented randomly, no signal Responses give shock reduction Reduction in shock frequency is reinforcer

Learned Helplessness Behaviour has no effect on situation Generalizes Laboratory –Give inescapable shocks –Shuttle box –Will not switch sides –Expectation that behaviour has no effect

Learned Helplessness in Humans Depression Situations beyond your control Three dimensions –Situation: specific or global –Attribute: internal or external –Time: short-term or long-term

Therapeutic Application Confidence building (“can not fail”) –Implementation issues Tasks that can be successfully completed –Produces immunization –Escapable condition … inescapable condition Learned helplessness less likely to develop

Theories of Operant Conditioning

Hull’s Drive Reduction Theory Animals have motivational states (drives) Necessary for survival Reinforcers are things that reduce drives Physiological value –Reduce physiological state

Drive Reduction Reinforcers Works well with primary reinforcers Many secondary reinforcers have no physiological value Hull: association links secondary to drive Some reinforcers hard to classify as primary or secondary Some increase a physiological state Some necessities undetectable Roller coasters Vitamins Saccharin

Relative Value Theory & Premack Principle Treat reinforcers as behaviours Is it the food, or the behaviour of eating that is the reinforcer? Behavioural probability scale Greater or lesser value of behaviours relative to one another No distinction between primary and secondary

Premack Principle One behaviour will reinforce a second behaviour –High probability behaviour reinforces low probability behaviour Baseline probability scale –Time –Rank order Reinforcement relativity –No absolutes Probabilty of response = Time spent on response Total time

Example Behaviours –Eat ice cream (I), play video game (V), read book (B) Baseline (30 minutes) –Student 1: I (2min), V (8min), B (20min) Scale: I -- V -- B –Student 2: I (8min), V (20min), B (2min) Scale: B -- I -- V Student 1: V reinforces I, B reinforces V & I Student 2: I reinforces B, V reinforces I & B

Problems Baseline phase –Fair rating? –How to compare very different behaviours Time problems –What if time not important to behaviour? –Behaviour duration? –Length of baseline period?

Response Deprivation Theory Deprived behaviours = reinforcing behaviours Drop below baseline level of performance Not relative frequency of one behaviour compared to another (i.e., Premack) Level of deprivation for a behaviour