Operant Conditioning

In operant conditioning, the subject operates on the environment. In this module, you will learn many of the fundamental principles about what variables control operant behavior. As with classical, or respondent, conditioning, these principles operate all the time, in our everyday lives, whether we are aware of them or not.

As you read the text, keep the following questions in mind.
Is there any difference between reinforcement and reward?
How would you teach a rat in a Skinner box to press a bar for reinforcement?
How would you teach a pigeon to execute a sequence of several behaviors?
What does operant conditioning have to do with human behavior?


A reinforcer increases the probability that a behavior it follows will be repeated

A central concept of operant conditioning is that any given behavior is dependent upon the consequences of that behavior. If these consequences make the behavior more likely to occur in the future, they are called reinforcers. It is important to notice what this statement says and also what it does not say. It does not say, for example, that reinforcement is the same thing as reward, or that reinforcement is something that makes the subject feel good. It does not say that a reinforcement is something that enhances the subject's self-esteem. It does say only that a reinforcer increases the frequency of the response it follows.

An overwhelming reason for noting these properties carefully is to avoid equating reinforcement with reward. There is no guarantee that rewards will be reinforcing. Rewards often come much too late, long after the important behavior has occurred. For example, a boy may be awarded a new baseball bat for getting a certain number of hits during the season. Such a reward will not in itself affect anyone's hitting skills; it merely acknowledges that, somehow, the boy has already become a good hitter. As we shall see in the following pages, a reinforcer must always follow immediately the behavior that is to be affected by it. The actual reinforcers for good hitting are probably the sight of the ball going over the fence, and the cheering of teammates. With an animal, there is no problem about furnishing the reinforcer. One simply deprives the animal of something, say food, and then uses whatever was withheld as a reinforcer.


A frequently used device in operant-conditioning experiments is the Skinner box. The essentials of a Skinner box are simple. There is no fixed size that the Skinner box has to be, but 14- to 1 6-inch sides and ends are adequate. If a lever-pressing response is to be studied, there must be a lever protruding from one end of the box. For reinforcement there should be either a food pan or a small dipper, producing a drop or so of water for a reinforcer. The box must have a cover to prevent the animal from escaping.

A rat is tamed by frequent handling and allowed to become familiar with the box by being allowed to explore and sniff around it. The animal is then simply placed in the box and permitted to do as he likes. Notice the contrast with the classical conditioning environment where the animal was restrained in a harness. In an operant environment usually no constraint is placed on the animal except for being confined to the experimental space.

Next, the experimenter, by means of several practice trials, makes sure that when the animal hears the food pellet drop or the dipper click, he will run immediately to the food pan or dipper.

After this preparation, the experimenter need only place the rat in the box and permit him to explore. Sooner or later he will depress the bar (lever) and be presented with a reinforcer. Once this happens, the rat will press the bar more and more often, until the bar pressing reaches a stable rate.


If the rat, for some reason or other, does not go near the bar, the experimenter may simply wait. If the experimenter has the time, the rat will eventually press the bar. Alternatively, he can reinforce successively closer approximations of bar pressing until bar pressing finally occurs. For example, he might have to reinforce the rat just for moving closer and closer to the bar. He finally reinforces the rat for standing up close to the bar, then for touching the bar, and finally for pressing the bar.

Behavior shaping is inherent in everone's day-to-day interactions

Is the shaping technique useful in real situations? You be the judge from the following account. The father of a two-and-a-half-year-old girl and four-year-old boy was watching his children at a playground. Here is a scene he observed. The girl was sitting at the top of the slide, hanging on and screaming in terror. Her brother was pushing and tugging, trying to get the little girl to go down the slide, but to no avail. Father, to his credit, did not become alarmed or annoyed with the scene. He simply went over to the slide and broke up the confrontation, carrying the girl down to the bottom of the slide and seating her on the end of it. He then asked her to jump off and run over to him for which he gave generous verbal reinforcement (strengthening the last part of the behavior first). He next placed her a couple of feet up on the slide and told her to slide the rest of the way herself. She did, and more reinforcement resulted. He then placed her a little higher on the slide and urged her to go the rest of the way herself. She did, and more reinforcement resulted. He then placed her a little higher on the slide and urged her to go the rest of the way, jump off and run up to him. This happened, and the next step was to place her still higher on the slide, and so on until finally she was laughing and sliding from the top of the slide. The disturbing conflict had been resolved, and the girl had learned to slide.

A good deal of this reinforcing for successive approximations goes on in everyday life. The beginning music student, for example, is not expected to play Bach. He is reinforced for almost any sound he gets from the instrument in the initial stages. Athletic coaches have all learned to reinforce approximations to the behavior they want: "That's taking a cut at it, Charlie."

These examples also illustrate that we humans are perfectly willing to reinforce approximations when the behavior we want is not yet in the subject's repertoire. In the beginning, the music student simply is unable to play Bach; the beginning baseball player is incapable of hitting a home run. It is when the subject has the behavior in his repertoire but does not emit it that we become annoyed. The subject may be referred to as "stubborn." In the case of the laboratory rat, however, the experimenter reinforces successive approximations, despite the fact that the rat is perfectly capable of walking to the bar, standing up, and pressing it. The experimenter does not worry about the cause of the rat's initial tendency not to press the bar; he simply uses shaping to obtain the desired behavior.


Just as responses can be strengthened by reinforcement, so can they be weakened by extinction─arranging things so that no reinforcer follows the Withholding reinforcement for the behavior. Once a response is on extinction, however, it is important that no occurrence of a learned response exceptions to the extinction operations occur. If exceptions do occur, the re eventually make it disappear sponse is being intermittently- reinforced and being made more resistant to extinction. Extinction is easy enough to accomplish in the experimental laboratory. One merely flips a switch to the off position and matters are automatically arranged so that a response previously reinforced, bar pressing for example, now has no consequences. When this happens, a behavior being extinguished declines in strength more or less regularly until finally it ceases to occur altogether.

This simple procedure of withholding reinforcement for a response we wish to weaken is, in the everyday world, terribly difficult to accomplish. For example, suppose a father accompanied by his young son is visiting a friend. Nothing much happens until the child starts to say, "Let's go, Dad." Dad decides to put this behavior on extinction. But, commonly, after hearing "Let's go, Dad," half a dozen times, father loses his patience and reacts. His reaction, "We'll go when I'm ready if you don't mind," is a consequence for the behavior and will more than likely maintain it. The "off switch" is not working. And when the "off switch" works only intermittently, we are producing a response which is more difficult to extinguish than if it had been reinforced every time it occurred.


By going through a similar class of operations as were used to demonstrate spontaneous recovery in classical conditioning, we may demonstrate analogous phenomena in operant conditioning. First, we extinguish until virtually no responding occurs. Second, we move the animal from the apparatus and return him to his own home cage. Third, after some time has elapsed, we return the animal to the apparatus and observe a partial recovery of his tendency to respond despite the lack of reinforcement. An extinction curve followed by a rest period in his home cage, followed by an additional smaller extinction curve after he is returned to the apparatus, is sketched below.


In operant conditioning, we may want to establish precise stimulus control of the desired response. For example, suppose we want a light to control the occurrence of a bar press. That is, we want bar pressing when the light is on and not when the light is off. How would we go about it? You probably already have the answer. What we can do is simply reinforce bar pressing when the light is on and extinguish bar pressing when the light is off. If you repeat these operations several times, eventually the rat will only press the bar (and get reinforced) when the light is on; when the light is off, it will not press the bar. The stimulus situation in which an animal does get reinforced is often called S+ (for positive stimulus); the situation in which he does not get reinforced is called S─(for negative stimulus). When the animal responds in the presence of S+, but not in the presence of S- then S + (the light, in this case) is said to have become a discriminative stimulus for bar pressing. The light stimulus, S+, is controlling the bar-pressing response.


In humans, most behaviors occur in complex sequences

Although it is very valuable in helping us to understand behavior generally, the bar-pressing response is not a perfect analogy for human behavior. The same response is not repeated over and over again in ordinary human affairs. We are much more likely, once we have emitted a response, to move on to a different one. For example, we get up in the morning, go to the shower, go back to the bedroom, dress, then proceed to breakfast, then go out to the car, and so on. Psychologists have studied events such as this in which one activity leads to another as instances of what they call chaining. How do these chains get established? Can we create them deliberately?

Let us do a hypothetical laboratory experiment in order to study chaining. Suppose you want to set up a chain in which a rat first presses a panel, then pulls a string, then presses a bar, and then gets reinforced. How would you go about it? One thing we are reasonably sure of is that we cannot proceed as in the bar-pressing experiment. We cannot simply insert the animal in the box and wait for him to perform the chain. A general rule about chaining is that each response has to produce the stimulus for the next component. Furthermore, one starts building the chain from the end back to the first member of the chain. How would this work in the case of the chain we have just described? The first behavior one would establish would be the last response in the chain: bar pressing followed by presentation of food. Next, one would get stimulus control of the bar press by establishing the light as a discriminative stimulus.

S+ (light)-----> R (bar press)-------> Reinforcer (food)

Reinforcing the last behavior in a chain acts to reinforce the entire chain

Any discriminative stimulus can serve as a reinforcer for the behavior that precedes it. Therefore, any response that produces the light will be reinforced by the light, and of course, responses which do not turn the light on go unreinforced. Most rats will spontaneously investigate a string dangling from the ceiling of the cage. Pulling the string will be reinforced by the light coming on.

This procedure would add one more link to the chain.

?------> S (string)-------> R (string pull)-------> S (light)---------> R (bar press)--------> S (food) ?

Note that the food is shown here as a stimulus (S),rather than as a reinforcer. This is done to illustrate that the food serves the same function in the chain as the light, which reinforces the string pull. In fact we could use food to stimulate another response, such as climbing through a hoop, in order to obtain a drop of water. Either the food or the light could then be called a reinforcing stimulus.

Also note that we could add another behavior at the beginning of the chain by making the appearance of the string contingent on having performed a response. For example, pushing a panel might be required in order to produce the string, which can then be pulled to turn on the light, which in turn signals it is time for bar pressing, which in turn produces a click, and food. Now, of course, in order to add another member to the chain we would have to bring panel pushing under stimulus control. Then matters would have to be arranged so that a new member of the chain produces the stimulus for panel pushing.


Many people (and maybe you have done it yourself) blow on dice and instruct them concerning what number to come up next. You may have seen people cross the street to avoid walking under a ladder or to avoid a black cat, and so on. Can events like this be studied in the laboratory? One experimenter who thinks they can was B. F. Skinner (Skinner, 1948). Skinner arranged a box so that food was presented to food-deprived pigeons automatically every fifteen seconds "with no reference whatsoever to the bird's behavior." In superstitious conditioning, the reinforcement is not contingent on any behavior. A word about the word contingent: Reinforcement is contingent upon some behavior only when the reinforcement can occur if, and only if the behavior occurs. Superstitious behavior results from noncontingent reinforcement.

A naive observer coming upon Skinner's experiment might have considered the behavior of the birds odd. One was making counterclockwise turns; another was constantly sticking its head into the upper corners of the cage; another was making a "tossing" response as if placing its head beneath an invisible bar and lifting it repeatedly.

What happened in Skinner's experiment was that whatever behavior was occurring at the time of reinforcement tended to reoccur, even though it had no effect whatever in bringing about reinforcement. For example, the bird who was turning in circles just happened to be turning counterclockwise when the reinforcement appeared. Following the reinforcement, the bird repeated this performance and as he was repeating it, another reinforcement occurred. Thus, what might have appeared to be odd behavior to an outsider turns out to be a quite straightforward case of operant conditioning.


Superstitious behavior is reinforced, but the reinforcer is not contingent on the behavior

It is quite likely that similar relationships establish and maintain many human rituals and superstitions. The rituals that card players and crapshooters develop are perhaps good examples. When a card player walks around the table to change his luck and intermittently gets a good hand, or when the dice intermittently do as they are told, we have a similar relationship. Of course, humans' superstitions are likely to get socially reinforced and maintained in addition to whatever accidental consequences might occur. For example, if one finds it reinforcing to have others pay attention to him, there is hardly a better way than to announce, "I'm going to change my luck!" and then march resolutely around the table.

In the world of business, there are many opportunities for superstitious conditioning. For example, in businesses where proposals are written for funding, why does proposal B get accepted and proposal A get rejected? "Well, it's quite obvious," says one principal. "It is because proposal B was much fatter than proposal A, and that impresses the funders." "No," says another. "It's because I went to see the funder before we started on the proposal this time." And so it goes. It should be emphasized that nine or ten of these properties surrounding a successful event may indeed have been responsible for its success. But it is almost as likely that a property someone thinks is responsible has nothing to do with the successful event. It was just "hanging around" when the event occurred.



Now test yourself without looking back.

1. What is a reinforcer?

2. How is an animal prepared for conditioning in a Skinner box?

3. Contrast the environment of an animal in reflex conditioning with that of an animal in operant conditioning.

4. What is the technical term for reinforcement of successive approximations? _

5. Explain how you could shape bar pressing in a Skinner box.

6. How is extinction of operant conditioning carried out in the laboratory?

7. In the course of operant conditioning, an experimenter conditions a rat to press the bar only when a light is on. In this case the light is a____________________________________

8. Explain briefly how you might teach a rat in a Skinner box to pull a string, then press a bar and get


9. A Skinner box is arranged so that a pellet appears every 20 seconds, no matter what the rat does. After

a while, the rat is running in clockwise circles inside the box. This is an example of_______________________


Now, be sure to do the exercises that follow





Operant conditioning influences the way the subject "operates" on the environment in order to receive a reinforcer. Anything that increases the frequency of the behavior it follows is a reinforcer. Write down the reinforcer in each of the following examples.

a. A rat receives a food pellet each time it presses a bar.

b. A child gets lots of attention each time it spills milk in a restaurant.

c. A pigeon receives a drop of water when it pecks at the right circle. ____________________________________________________________3

A pigeon pecks at a circle, then receives a drop of water. It pecks at it again, and receives another drop of water. What is likely to happen to its rate of pecking?

What is the water called in this example?


In classical conditioning experiments, the animal is carefully restrained. In operant conditioning, on the other hand, the animal is usually free to roam within its immediate environment. A device called a Skinner box is frequently used in operant conditioning. Within a Skinner box the animal is:

a. carefully restrained from moving about.
b. allowed :to- roam and get out if it wishes.
c. free to roam all it wishes inside the box. ___________________________________________________7

Skinner boxes are used in:

a. classical conditioning.
b. reflex conditioning.
c. respondent conditioning.
d. operant conditioning. _____________________________________________________8

Which of the following will be required in a Skinner box if you are going to condition a rat to pull a string to receive a food pellet?

a. A dish full of pellets
b. A bar or lever
c. A dish for pellets to drop into
d. A string
e. A cover for the box ________________________________________________________5

Suppose you want to condition a rat to pull a string to receive a food pellet. You put it in the Skinner box described in the question above. Then you wait, and watch. When will you give the rat a food pellet? ________________________________________________________________________4

Suppose the rat shows no inclination to go near the string. You would then start to reinforce successive approximations to the desired behavior. This means you would:

a. not give him a pellet until he pulls the string.
b. give him a pellet when he goes reasonably close to it, then only give pellets as he gets closer and closer to pulling it.
c. hold the rat near the string.
d. get a new rat. ______________________________________________________________________ 2

I it will increase; reinforcer
2 b
3 a. food pellet b. attention c. water

4 when he pulls the string

5 c. d, e

6 successive reinforcement

7 c

8 d

Reinforcing successive approximations to the desired behavior is commonly called shaping. How would you shape a rat to press a bar for a pellet? __________________________________________________________________ 4

We have already seen how responses can be strengthened by reinforcement. Responses can also be weakened by extinction. Extinction in operant conditioning is similar to extinction in classical conditioning. How do you think you might extinguish bar-pressing behavior in a conditioned rat?


A rat has been conditioned to pull a string by receiving a food pellet every time he does so. The experimenter then disconnects the string from the pellet dispenser, so the rat gets no more pellets no matter how often he pulls the string. What happens to the rate of string pulling?

What is this process called? ________________________________________________________________2

The key factor in extinguishing behavior is never to reinforce the behavior you are trying to extinguish. If the behavior is reinforced occasionally, it will become more resistant to extinction. In which of the following will extinction probably take place easily?

a. A child cried for a lollipop every week at the grocery store. He only gets one about every three weeks.

b. A rat gets a pellet every time it presses a bar. The experimenter turns off the pellet dispenser. _____________________________________________________________ 8

Suppose a student places a pigeon in a Skinner box, and sets it up so a reinforcer is presented every minute, regardless of what the pigeon does. Here the reinforcer is not contingent on behavior. It is called ___________________________________________reinforcement. ________________________________________________________________ 6

The pigeon in the question above may eventually start exhibiting odd behavior, such as running around in circles or poking his head repeatedly into the same corner. Superstitious conditioning has then taken place. From what you now know about operant conditioning, how might the pigeon happen to run around in circles almost constantly?

_______________________________________________________________ 7

A man may always use the same pen to take exams, because he feels the pen is lucky. The first time he used mat pen, he got a B on the exam. The second time he got an A. This is an example of ____________________________________________________________________ 3


3 superstitious conditioning

1 Never give him another pellet for pressing the bar.

2 it decreases;extinction

4 Give him a pellet when he goes close to it, and then only as he gets closer, eventually touches, and finally presses the bar.

6 noncontingent

7 He might have gone around in a circ e accidentally the first few times the water was pre- sensed, then continued the behaavior.

8 b

Usually behavior is not as simple as the events we have been working with. When you leave the house to come to class, for example, you get in your car, fasten your seatbelt, insert the key in the ignition, turn the key, and so on. You go through this sequence, or chain, almost automatically. Experimenters have shown, through experiments with rats and pigeons, that chains can be quite easily taught in the laboratory. Which of the examples sounds like a chain that might be taught in a Skinner box?

a. The rat will pull a string, then press a bar, then get reinforced.
b. The rat will press a bar, get reinforced, then pull the string
c. A pigeon will turn a circle counterclockwise, then get reinforced.
d. A rat goes to the back of the box, picks up a marble, carries it to the front end of the box, drops it down a chute, then presses a bar, and gets reinforced.


The process of chaining is used to condition an animal to perform a sequence of behaviors. Let us take a simple chain and figure out how to condition a rat. We want the rat to press a panel, then pull a string, then press the bar, and finally get reinforced. Chaining always begins with the back end of the chain, so the first behavior to be conditioned would be ___________________________________________________________________4

Once the rat can press the bar, we need to have stimulus control over this response. We want him to press the bar only in the presence of light; that is, we want the presence of the light to be a discriminative stimulus. In terms of reinforcement and extinction, how do you think this might be done?


Now any behavior that turns on the light will automatically be rein- forced since the rat will, when the light goes on, press the bar and receive a pellet. The next step we have to establish is pulling the string to turn on the light. What shall we do if the rat shows no inclination to pull the string?


Now the sequence is: pull the string (which turns on the light,) then press the bar (which gets the reinforcement). In order to add the 7 panel pressing we must first arrange that the rat pulls the string in only one situation. The simplest is to make the appearance of the string itself be the signal to pull it. That is, the appearance of the string is the


Any behavior that causes the string to appear is going to be rein- forced. Refer to the simple chain described above and write what behavior the appearance of the string should be contingent on.


The process of conditioning an animal to perform a sequence or chain of events is called ______________________________________________________________________ 5

In conditioning an animal to perform a chain of events, you always start at the



1 Reinforce small approximations of the stnng-pulling behavior, using the light as a reinforcer.

2 by reinforcing bar pressing when the light is on, and extinguishing it when the light is off

3 discriminative stimulus4 bar pressing

5 chaining

7 a, d

8 back end of the chain

9 panel pressing




1. Anything that increases the frequency of the response it follows is a____________________________

2. A Skinner box is:

a. a device for classical conditioning.
b. a device for operant conditioning.
c. a box with something for the animal to "operate" on, and something to provide reinforcement.
d. a device used to measure galvanic skin responses.

3. A conditioning experiment in which the animal is restrained in a harness is most likely an experiment in_________________________________________

4. Suppose an experimenter is trying to condition a rat to press a bar, and the rat goes nowhere near the bar. What should the experimenter do?

5. In which of the following situations might shaping be appropriate?

a. A beginning music student is being taught to play, eventually, a violin concerto.
b. A parent has decided to teach a child to tie his shoes.
c. A rat has been conditioned to press a bar, but, for some reason, does not do so.
d. A little league ballplayer cannot hit the ball out of the infield.

6. How would a parent extinguish a behavior in a child?

7. How could you cause a light to become a discriminative stimulus for the response of bar pressing in a rat?

8. When superstitious conditioning is being done in the laboratory, what type of reinforcement is being used?




Return to Unit 3 Home Page