SmartPups: The Nitty Gritty of Clean Training, Part 2: Reinforcement Schedules

As someone who uses reward based training, I hear this one all the time: "but if I train with treats, my dog will only ever listen when I have treats!"
That could be true- if you never change your reinforcement schedule and never fade out food treats. If you work with a trainer who understands reinforcement schedules and how to use them to fade out food treats and fine tune behavior, this is never a problem.

I will fight my natural tendency to give way more information than is necessary in this post, but this is one of those topics that gets a bit tricky so I use extra words to explain and re-explain myself. I apologize in advance for all the repetition.

A reinforcement schedule is a rule or pre-set program that determines how and when a response will be rewarded. Different stages of learning use different reinforcement schedules- when learning a new behavior we reward differently than when strengthening or proofing a known behavior. We also use different reinforcement schedules for different reasons in training, depending on the behavior we are encouraging or discouraging.

Look at you, learning about training!

Easy, painless, and you are still awake.

Let's dive in to the good stuff!

When I being training a new behavior or cue with a dog, we will begin with a Continuous Reinforcement Schedule, which abbreviated as CRF. In this type of reinforcement, the dog gets a reward every time they offer the desired behavior. We use this when teaching new behaviors because we want the dog to learn that the new behavior is a really great thing to do- it always gets attention and a treat! If your dog gets a treat, pat on the head, and an enthusiastic "good boy!" every time he sits; he's going to try sitting more often. This comes in handy when we build behaviors on top of each other, because they always have a strong base behavior to fall back on if there is a regression in training. Regression can happen because training suddenly stops for a period of time or because of a change in environment/stimuli. When using a CRF, it is important to only use it until the dog understands the behavior, then switch to a less predictable schedule (this is also when we begin to give different rewards based on the quality of response, but that's covered below in Differential Reinforcement Schedules).

Partial Reinforcement Schedules (PRF) reward the desired response only after certain responses, either after a set ratio (number of responses) or interval (period of time). We can use these schedules to fine tune behavior once the basics are understood.
Within this schedule, there are five different types of reinforcement:

Fixed Ratio: The dog gets a reward after a predetermined number of responses. For example, you can train your dog to "count" using this method by rewarding after say, three barks and labeling it "count to three". This could be done with any number, of course!
Variable Ratio: The dog gets rewarded after a different number of responses, but the average of them getting the reward is determined by you. If you want an average of three responses, you would reward for: 1, 4, 2, 3, 2. The average of these responses is 3. This is what I use to start fading out treats in training the 'heel'. At first, the dog is rewarded every step for staying in the 'heel' position. As the get better with staying in position in anticipation of treats, the treats are given after one step, two steps, four steps, three steps, two steps. They are getting rewarded on average every three steps, but it's not always three exactly and they are getting fewer treats than in initial training. Over time, we simply make the average a bigger number.
Random Ratio: This is the other way to build strong behaviors. In random ratio, the dog gets a reward sometimes, but not other times. It should be as random as possible. Truly random rewarding is hard for us people to wrap our heads around; we try to make patterns so it makes sense in our minds. Dogs are great at figuring out patterns, so they soon learn if we are actually making a pattern and predict it. This can be used in training the 'heel' just like the above example, but we would want to keep it random, instead of aiming for an average number of steps.
Fixed Interval: The dog gets a reward only when the behavior is offered when a set period of time has elapsed since the previous response. This is something that we don't really use much in training because it actually isn't terribly useful in most training situations. The idea is that a dog offers a behavior, like 'sit' and gets a reward. There would be a predetermined interval, lets say 4 seconds, that the dog needs to wait until it can offer the 'sit' again and get a reward. If they sit at 1, 2, or 3 seconds, they get no reward. Any 'sit' after 4 seconds gets a reward. Over time, responses on the part of the dog go up because they know they have to offer the behavior to get a reward. It's a fun thing to do, but really has little real use in day to day training. The problem is that a dog can get distracted and forget to respond with the correct behavior in that interval, so we can't effectively train anything that's well remembered.
Variable Interval: Just like variable ratio above, this is a reward for different responses, averaging a number you have picked. The difference is that this is rewarding for a period of time instead of a number of responses. The dog would get a reward for the correct behavior after a period of time has elapsed, but that interval of time will vary within an average. Like fixed interval this can result in a steady string of responses, but since the response is dependent on the animal offering it, can be tricky to use in training.

When using a Differential Reinforcement Schedule, rewards are given after certain types of responses are offered or after certain rates of response are offered. Basically, this means that the dog gets a reward based on the quality of their response or the frequency of offering the correct response. This is what we use to fine tune behaviors, to build complex behaviors, or work with especially nervous, anxious, or reactive dogs.

1. Response Type schedules are simply the quality of the response- a 'down' with the belly all the way on the ground is preferred over a 'down' with the belly tucked up and not touching the ground.

Within this, there are three types of schedules which we use to get the desired behavior and remove unwanted behaviors:

a. Differential Reinforcement of Incompatible Behaviors (DRI): A dog jumps to greet people will be rewarded for any behavior that they can't do while jumping. Sitting, laying down, or simply standing would all be considered incompatible behaviors. These incompatible behaviors become more rewarding than the problem behavior (jumping).
b. Differential Reinforcement of Other Behaviors (DRO): A dog who barks at passers-by on walks can be rewarded for doing anything that is not barking. These other behaviors become more rewarding than barking, so the barking starts to diminish.

c. Differential Reinforcement of Excellent Behaviors (DRE): A dog who perfectly heels on command when asked the first time, then sits in position when the handler comes to a stop would get a reward because that is an ideal response. We tend to reward these great responses a bit longer because they are the ultimate goal and we want them to become the normal. By rewarding these great behaviors, all others extinguish themselves.

2. Response Rate Schedules are ones that require a dog to respond at a certain rate for that reward. The reward is based on the offering of the correct behavior within the correct time period. Much like fixed interval and variable interval training, these aren't of as much use in dog training, but here you go anyway.

Differential Reinforcement of High Rates (DRH): A dog is only rewarded for offering the 'look' behavior if it occurs within 7 seconds of the previous response. If the dog looks at 1, 2, 3, 4, 5, 6, or 7 seconds, they get a reward. If it is 7 seconds or more they get no reward. This is used to build a steady stream of responses.
Differential Reinforcement of Low Rates (DRL): A dog is only rewarded for offering the 'look' behavior after a specified period of time has elapsed, lets say 7 seconds. Any look after 7 seconds gets a reward, anything before 7 seconds does not.

A Duration Reinforcement Schedule requires the dog to respond throughout a set period of time; these periods of time can be fixed, variable, or random. The classic example for this is the 'stay' cue. A dog is asked to hold the stay position for a period of time. Initially in training, we work with a short period of time and build it up gradually to longer duration and out of sight stays.

Fixed Duration: The dog has to stay for 1 minute to get a reward. If they get up before that minute is up, there is no reward.
Variable Duration: The dog has to stay for an average of 1 minute to get the reward. This is the best way to lengthen the duration of a stay because you are on average staying within the time period you know the dog will tolerate, but can gradually increase the duration by increasing the average.
Random Duration: The dog is asked to stay for random periods of time, rewarded only if they do so. This is a great way to lengthen duration also, because the dog can't predict how long you will be gone. If we simply leave for longer each time, the dog predicts that the time period will be longer, since they are good at putting together patterns.

Still awake? Good job, you're almost done!

So what does all that mean? It means that you can fine tune and train different behaviors using different reinforcement schedules. Within this, you can even give different types of rewards based on responses (more on that another day).
There are three lessons I want you to come away with from this:
1. there is strong relationship between continuous reinforcement and degradation of behavior even before the food is faded. If a behavior is always followed by a treat, over time the dog has no motivation to offer the behavior quickly or perfectly. If a behavior is always followed by a treat and the treats suddenly stop, the behavior stops too because the behavior is no longer paying off as it had been! Dogs who are on a continuous reinforcement schedule too long end up with sloppy or slow behaviors and behave like spoiled children, demanding things they want.
2. Random and variable reinforcement always result in the strongest behaviors, with much lower incidents of the behavior extinguishing as rewards fade. If a behavior is always rewarded initially and then randomly or variably rewarded, there is still always the possibility of a reward, so the behavior continues with the same strength. This is how a slot machine works. The machines pay out on a variable or random schedule, though it is very difficult to predict exactly when it will. The longer you keep putting coins in, the more convinced you become that it will pay off next time.
3. It is very difficult for us humans to be truly random, which is why we tend to use variable rates of reinforcement in training. That way, your human need for some order is met and your dog is still not getting rewarded every single time, so we still get strong behaviors.

The real point in telling you all of this, aside from giving you great reading material for your next bout of insomnia or a new drinking game (count how many times the word reinforcement is in here) is to demonstrate that the person who trains you and your dog should know a LOT about learning and training. It's not just a matter of tossing a collar on a dog and grabbing some treats- my 3 year old son can do that. It's not a matter of putting a pinch, choke, or prong collar on your dog and yanking him around to demonstrate "who is boss". Training and subsequent learning should be intentional, systematic, soundly based in science and well executed. There should be some room for flexibility with each individual dog/human pair and nobody should be pushed to the point of breaking or shutting down in training. Once you reach that point, nothing good is being taught.

Resources:
Excel-Erated Learning; Explaining in Plain English How Dogs Learn and How Best to Teach Them by Pam Reid, pgs. 48-59

http://www.lifecircles-inc.com/Learningtheories/behaviorism/Skinner.html

http://www.educateautism.com/applied-behaviour-analysis/schedules-of-reinforcement.html

Pages

Sunday, July 10, 2016

The Nitty Gritty of Clean Training, Part 2: Reinforcement Schedules

No comments:

Post a Comment