Operant Conditioning Skinner, positive & negative reinforcement, response cost, punishment and schedules of reinforcement
Three-phase model of operant conditioning Skinner “operant conditioning” Thorndike calls “Instrumental learning” Operant is the response(s) that “operate” or act upon the environment to produce some kind of effect Eg – Thorndike’s experiment the operants was the cat biting on the bar and clawing the box Based on Thorndike’s law of effect – an organism will repeat a behaviour (operants) that have a desirable consequence (cat gets fish) or that will enable it to avoid an undesirable consequence (detention). Also an organism will not tend to repeat a behaviour that has an undesirable consequence (speeding fine = speed less)
Components of operant conditioning S.R.C S = Stimulus that comes before the operant response R = Operant Response to the stimulus C = Consequence to the operant response Example: Thorndike’s cat puzzle box experiment S = box R = sequence of movements to open the door (operating the environment) C = Escaping the box and getting fish
Stimulus (S) Response (R) Consequence (C)
Skinner box
Reinforcement and Punishment Skinner’s and Thorndike’s studies provide evidence for the concept of reinforcement – because learning through operant conditioning occurs as a result of the consequences of behaviour. Reinforcement and Punishment are the main aspects of operant conditioning
Reinforcement Reinforcement is when a stimulus (object or event) stregthens or increases the likelihood or frequency of a response that it follows Reinforcer is any stimulus (object or event) that increases the likelihood of a response that it follows – reinforcer is the stimulus that allows for the reinforcement to occur
Positive and Negative reinforcement Positive Reinforcement (adds something) + Presenting a stimulus (positive reinforcer) that strengthens or increases the likelihood of a desired response by providing a satisfying consequence. Eg. Being well behaved in class to get a gold star on your name; cleaning your room to get pocket money Negative Reinforcement (takes something away) – Removing an unpleasant stimulus that increases or strengthens the likelihood of a desired response. Eg. leaving home early one day and finding no traffic on the road may encourage you to leave home early again (response) in the future to avoid heavy traffic (removal of unpleasant stimulus)
Schedules of reinforcement Refer to the schedules or programs that are set out to determine how often reinforcement should be given in relation to the correct response. Continuous reinforcement = reinforcement is provided immediately after every correct / desired response is made Partial reinforcement = reinforcement is provided for some correct/desirable responses but not all of them
4 types of Partial reinforcement Fixed-ratio schedule A reinforcer is given after a set (fixed) amount of responses (ratio) are made. Eg a ratio of 1:5 means one reinforcer for every five correct responses. Eg. factory workers may be paid a certain amount for every 5 garments that they make. Variable-ratio schedule A reinforcer is given after an unpredictable (variable) number of correct responses (ratio) are made. Eg 1 reinforcer for a mean of 5 ratios made but after 1, 7, 11 etc
Partial Reinforcement Fixed-interval schedule A reinforcer is given after a specific fixed period of time has elapsed (interval) since the previous reinforcer, provided the correct response has been made. Eg. workers are given monthly reviews, they may work harder in the weeks leading up to their review, rather than the days after the review. Variable – interval schedule A reinforcer is given after irregular (variable) periods of time have passed (interval) provided the correct response has been made. There is a mean period of time, but at variable unpredictable times. Reponses made before the scheduled delivery time or before the interval has passed will not be reinforced, even if they are correct
Punishment Punishment is the delivery of an unpleasant consequence following a response or the removal of a pleasant consequence following a response Eg. delivery of an unpleasant consequence following a response (smacking a child after they misbehave) Eg. removal of a pleasant consequence following a response (losing money through a fine) Punishment is different to negative reinforcement. NR is the removal of an unpleasant stimulus to increase a response recurring. Punishment imposes an unpleasant consequence (or removes a pleasant one) and decreases or weakens the response from occurring. Also punishment is ‘given’ or ‘applied’ where as negative reinforcement is avoided or prevented.
Positive and Negative punishment Positive punishment + The presentation of an unpleasant stimulus that decreases or weakens the likelihood of the response occurring again. Eg, having arrived to sport training late, made to run 5 laps to decrease the likelihood that you will be late again Negative punishment – Removal of a stimulus that decreases or weakens the likelihood of a response from occurring again. Eg, removing your mobile phone from you for using it in class
Factors that influence the effectiveness of reinforcement and punishment OAT O = Order of presentation. To be effective it is essential that the reinforcement or punishment is presented after the response, never before. A = Appropriateness. Must be appropriate for the behaviour or response that has occurred. The punishment or reinforcement must be suited to the characteristic of the individual as well T = Timing. Reinforcement and punishment should be given immediately after the response has occurred.
Key processes in Operant Conditioning Acquisition The establishment of a response through reinforcement. The types of behaviour that become learned are more complex in operant conditioning, than the simple responses of classical conditioning Extinction Gradual decrease in the strength of a conditioned (learned) response following consistent non-reinforcement of that response. Eg. Skinner’s pigeons when stopped receiving food pellets, their conditioned response (press the lever) was extinguished. Less likely to occur with partial reinforcement. (Eg gamblers – less likely to stop as reward is unpredictable)
Spontaneous recovery Exhibits the response in the absence of reinforcement. Response is weaker and doesn’t last long Stimulus generalisation Occurs when the correct response is make to another stimulus that is similar. Eg, sound of a car backfiring may cause athletes to generalise this sound to a ‘starters pistol’ and begin running
Stimulus discrimination Makes the correct response to a stimulus and is reinforced, but not to a response that is similar stimuli. Eg. sniffer dog will only bark at certain smells (drugs and specific plant matter) not at every smell
Applications of Operant Conditioning Applications of behaviour modification include Shaping and Token economies. Shaping Also known as the ‘method of successive approximations’. It means giving reinforcement for any response that successively approximates or moves towards the desired response or behaviour. Eg. Shaping may be used when teaching and encouraging young children to swim
Token Economies Settings in which if an individual exhibits desired behaviour, they receive tokens (reinforcers) which are collected and these tokens or reinforcers can be exchanged for other reinforcers in the form of actual tangible rewards. Eg. In prison, an inmate’s good behaviour may earn him a token which could be cashed in for special rewards such as cigarettes and privileges. Can easily fail, especially if people feel they are being manipulated.
Comparison of Classical and Operant conditioning See handout for similarities and differences