It will soon be rockfish season again, and even though most of us are better at talking about the fish we caught than we are at catching them, let’s take a moment to discuss a vital data science practice — avoiding reinforcement bias — that will help us improve our catch out on the water.
Imagine that a group of marine biologists asked you to explore a population of fish for a given disease. Rowing out to the center of the lake, you start catching fish and identifying the presence or absence of the disease for each before throwing them back. Before long, though, you notice a problem: many of the fish are ones you have previously caught. You hypothesize several possible causes for this: Perhaps the disease makes them easier to catch? Maybe this spot on the lake attracts fish with the disease? Could there not be as many fish in the lake as we originally thought? Regardless, you must ask yourself if fishing in this one spot gives you an adequate picture of the overall fish population, or if you should move your boat to other locations. Also, since bait is finite, you need to make your casts count. Where should you focus your efforts in order to best find fish with the disease?
Using predictive analytics, you can first build a model from previous samples to highlight locations on the lake with the highest probability of having sick fish. With the model results in hand, you then row from spot to spot and repeat the same process — catch, identify and release — except now your results from each spot can provide new input for your next model. This feedback loop increases the model’s predictive power.
Understanding Reinforcement Bias
Soon your original problem manifests itself, but now on a larger scale—your model repeatedly sends you to the locations that have been paying off, resulting in an increasingly static fish population. Is your model pointing you toward these spots for any reason other than past success? The feedback loop in your model, while critical to model accuracy, has introduced a reinforcement bias, since it derives from reinforcement learning. Models in feedback loops are very vulnerable to this hidden bias.
A feedback-enabled predictive model typically starts with a small sample of the population of interest. Outputs are labeled as successes or failures for model training, and some type of actionable result (feedback) serves as an input for future model runs. Meanwhile, of course, a model will not “know” why a given subject is of interest (that is, good/bad, positive/negative, and so on)—it simply fits the subject to past “interesting” cases as best it can. This cycle of 1) model prediction, 2) taking action on cases of interest, and 3) learning from feedback, repeats throughout the model’s lifecycle.
Feedback-enabled models will often show great efficacy in identifying high-interest results, but they have disadvantages. The model can only make predictions based on what it has seen, meaning other interesting cases can exist without detection. In our fishing example, there may exist prime casting spots we didn’t visit simply because our model didn’t have enough information to adequately describe them. Often, the model will make a prediction based solely on the outcome from a previous cycle, leading to repeated scrutiny. This over-scrutiny of some cases means that other cases are neglected, leading to an ever-growing blind spot in the predictive results.
Managing Reinforcement Bias
The fact that a feedback loop leads to reinforcement is effective at first, but then peters out as an area becomes “overfished”. To better manage reinforcement bias employ the following techniques over the model’s lifecycle:
Understand What You’re Looking For
A model does not understand why a given subject is or is not, of interest; i.e., is labeled a 1 or a 0. This is often a big advantage, in that a model can induce new patterns not thought of or considered by a human expert; but on the other hand, the expert can hypothesize new ideas (or types of “interestingness”) to look for that haven’t yet shown up enough in past data for the model to pick up on. The domain expert and data scientist should spend time brainstorming such cases before the model is deployed. Subsequent examination of the results may lead to further refinement and expansion of the model’s inputs.
Random Selection During Model Runs
To allow for the chance of finding new fish, spend at least some time casting in new places, even though they appear, at first, unpromising. Random selection provides the best opportunity to find unpredicted (but interesting) cases. Therefore, policy should dictate that during every model cycle, some subset of the population (say 10%) must receive post-modeling scrutiny, preferably from a domain expert. The results of this review, whether interesting or not, will enhance the model’s predictive power.
Heuristic Subject Selection
Don’t exclude any region too long. Subjects that have gone a certain number of cycles should receive human review at some point. Look for extremes along any dimension (variable) for potential examination. Reinforcement bias is most common in cyclical (recurring) review processes. Pay attention to vetting sequences and consider augmenting a predictive model with a heuristic that helps target unvetted cases. This will help balance the need to maximize both model accuracy and search coverage.
Conclusion
Repeated sampling in search of interesting cases, and the resulting refining of the guiding predictive models, leads to a positive feedback loop in the modeling, which is powerful, but may “peter out” as the pond becomes over-fished in the initially promising area. Data scientists must be vigilant of this feedback bias and intentionally make what appear to be sub-optimal choices in order to introduce fresh information into the loop. Work closely with domain experts and front-line analysts to understand the details of the problem they are working on and produce a model, and modeling process, that will prove operationally effective for many cycles of use.