O PERANT C ONDITIONING Year 12 Psychology Unit 4 Area of Study 1 (chapter 10, page 476)
T RIAL AND E RROR L EARNING Learning by trying different possibilities until the correct outcome is achieved. Also known as ‘instrumental learning’ because the individual is ‘instrumental’ in learning the correct response. More recently known as ‘Operant Conditioning’ because the individual ‘operates’ on the environment to solve a problem.
T RIAL AND E RROR L EARNING : E DWARD T HORNDIKE ’ S C ATS First studies of trial and error learning; he was interested in the study of animal intelligence. Hungry cat put in a ‘puzzle box’; piece of fish put outside box (could be seen and smelt but was just out of cat’s reach). To get fish, cat had to push a lever to open door on side of box. Learning was measured as the time it took to escape from the box. Cat tried numerous ineffective strategies (trial and error). Eventually, cat accidentally pushed the lever and the door opened. The cat was then rewarded with the food. Cat put into the box again to repeat test: each time, cat used trial and error but became progressively quicker at using the lever. Number of incorrect behaviours was also reduced After approximately 7 trials, cat went directly to lever. It became a deliberate response due to the cat learning the positive consequence of making that response.
T RIAL AND E RROR L EARNING : E DWARD T HORNDIKE ’ S C ATS Based on his results, Thorndike developed the Law of Effect: Behaviour that is accompanied or followed by ‘satisfying’ consequences is strengthened (more likely to occur). E.g. pushing the lever is followed by getting the fish. A behaviour that is followed by an ‘annoying’ consequence is weakened (less likely to occur). E.g. not pushing the lever (doing anything else) is followed by still being stuck in the box. Activity: 10.11
O PERANT C ONDITIONING An organism will tend to repeat behaviours (operants = responses) that have desirable consequences (i.e. rewards) or that will enable it to avoid undesirable consequences (i.e. punishments). Also, an organism will tend not to repeat behaviours that lead to undesirable consequences. Stemmed from Thorndike’s work on ‘Instrumental Learning’ with cats. Most famous experiments in Operant Conditioning were conducted by B.F. Skinner using his ‘Skinner Box’.
O PERANT C ONDITIONING : T HREE -P HASE M ODEL Based on Thorndike’s law of effect. 1. Stimulus (S); 2. Operant Response (R); 3. Consequence (C); Sometimes also referred to as (S) because it is a stimulus in the form of a consequence. SO, SRC Where the probability of (R) occurring after (S) depends on the previous experiences of (C).
O PERANT C ONDITIONING : S KINNER ’ S R ATS Hungry rat was placed in a Skinner Box. Scurried around randomly touching floor, walls etc. Eventually accidentally pressed lever, which dispensed a food pellet: rat ate. Rat continued random movements and eventually pressed the lever again: rat ate. With additional repetitions of lever pressing followed by food, the rat’s random movements began to disappear and were replaced by more consistent lever pressing. Eventually the rat was pressing the lever as fast as it could eat each pellet. Pellet was a reward (reinforcer) for the correct response.
E LEMENTS OF O PERANT C ONDITIONING Reinforcement: applying a reward/positive stimulus ( positive reinforcement ) or removing a negative stimulus ( negative reinforcement ) to encourage the production of desired behaviour. Reinforcer: any object/event that increases the probability that an operant behaviour will occur again. Punishment: applying a negative/unpleasant stimulus to discourage unwanted behaviour. Schedules of Reinforcement: frequency and manner in which a desired response is reinforced (either positively or negatively). Activities: & 10.17
Positive Reinforcement (Reward) Punishment Negative Reinforcement (if they lay eggs, they don’t get cooked!)
T HIS SHOULD BE PUNISHMENT, BUT …
E LEMENTS OF O PERANT C ONDITIONING : S CHEDULES OF R EINFORCEMENT Continuous Reinforcement: reinforcer is applied immediately after every correct response/behaviour. Partial Reinforcement: reinforcer is only applied after some correct responses, but not all. More difficult to change the behaviour, more resistant to extinction. Ratio: reinforcement given after a certain number of correct responses. Interval: reinforcement given after a certain amount of time has passed since the last correct response. Fixed: reinforcement given on a regular basis, such as after every 3 rd response or ever 10 seconds. Variable: reinforcement given in an unpredictable or random way. Activity: 10.14
So, using the info from the previous slide, there are four main schedules of partial reinforcement: Fixed-ratio schedule: ? Variable-ratio schedule: ? Fixed-interval schedule: ? Variable-interval schedule: ? See pages E LEMENTS OF O PERANT C ONDITIONING : S CHEDULES OF R EINFORCEMENT
So, using the info from the previous slide, there are four main schedules of partial reinforcement: Fixed-ratio schedule: reinforcer given after a set (fixed) number (ratio) of correct responses. Variable-ratio schedule: reinforcer given after an unpredictable (variable) number (ratio) of correct responses. Fixed-interval schedule: reinforcer given after a set (fixed) period of time (interval) since the last correct response. Variable-interval schedule: reinforcer given after an unpredictable (variable) period of time (interval) since the last correct response. E LEMENTS OF O PERANT C ONDITIONING : S CHEDULES OF R EINFORCEMENT
W HICH S CHEDULE IS M ORE E FFECTIVE ?
Order of Presentation: reinforcement/punishment must be presented after behaviour so that it is learned as a consequence of that behaviour. Timing: reinforcement/punishment are most effective when presented immediately after behaviour (also increases strength of response). Appropriateness: reinforcement/punishment must be specific to the likes/dislikes of the individual (otherwise my ‘reward’ could be your ‘punishment’). F ACTORS T HAT I NFLUENCE THE E FFECTIVENESS OF O PERANT C ONDITIONING
K EY P ROCESSES IN O PERANT C ONDITIONING Acquisition: speed may vary depending on complexity of behaviour being learned. Extinction: less likely to occur when partial reinforcement is used. Organism is used to not getting reinforcer every time. Spontaneous Recovery, Stimulus Generalisation and Stimulus Discrimination: same as when discussed in Classical Conditioning. Activity: 10.21
A PPLICATIONS OF O PERANT C ONDITIONING Shaping: reinforcement is given for each response that moves closer to the final goal behaviour. e.g. teaching a baby to talk: “Ddd”, “Daaa”, “Dad”. Also known as ‘method of successive approximations’. Token Economies: reinforcers (tokens) are given for desired behaviour and can then be exchanged for other reinforcers (rewards). Tokens may also be removed as punishment. Ensures reinforcement (reward) is appropriate. Could backfire if token is misunderstood or underlying cause of behaviour is not addressed (see page 500). Activity: 10.22
C LASSICAL VS. O PERANT C ONDITIONING Role of the Learner: Passive (classical) vs. active (operant). Timing of the Stimulus and Response: Immediate (classical) vs. delayed (operant); Response depends on stimuli (classical) vs. reinforcer depends on response operant; Nature of the Response: Reflexive/involuntary (classical) vs. voluntary (operant). Activity: 10.26
R EMINDERS … The next section of your textbook is ‘One-Trial Learning’ but we have already discussed this in the Classical Conditioning slides. Page 507 outlines a good experiment. Don’t forget to keep track of the key knowledge dot points that we are covering and tick each one as you become confident with it. The person who can best monitor your progress and understanding is YOU – don’t cheat yourself. Miss Moore is awesome. As if you’d forget that.