Algorithmic bias is a popular topic; see for example this article describing how Microsoft is working on a dashboard product to detect unfair bias in algorithms. When a typical person (not a statistician) uses the term “bias” they usually have in mind unfair prejudgment, or stacking of the deck, against a person based on some aspect of that person’s identity (race, gender, ethnic background, religion, nationality, etc.). Until recently, “bias” meant something very different to statisticians.
Statistical bias is the tendency of an estimator, algorithm or other system (could be a physical system like a gun) to miss a target systematically in a particular direction (e.g., in the plot on the right, nearly always hitting above the negatively sloped diagonal, where the target is 0,0).
One example of statistical bias is the formula for variance. Suppose you have a randomly-drawn sample of homes. Common sense tells us that the average home value in the sample is a good estimate of the average home value in the larger population. One sample might be too low, another too high, due to the luck of the draw, but, on balance, the average in the sample will tend towards the average in the larger population. And this is true of the average (mean). However, it is not true for the variance (average of squared deviations from the mean). The sample variance will systematically underestimate the true variance of the population – it is biased. Only if you divide the sum of squared deviations by n-1 will the sample variance be unbiased.
Of course, this definition has nothing to do with “fairness.”
Now, the popular conception of bias as involving unfairness to a group of people based on identity is increasingly making its way into the field of statistics and machine learning; consider this article by Matthias Spielkamp (left) published in the MIT Technology Review: Inspecting Algorithms for Bias.
This sort of algorithmic bias falls into several categories:
- One category is what the courts term “disparate impact” where an algorithm produces different results for one group compared to another, but the algorithm is not explicitly basing decisions on group identity, and results end up differing by group for causes that are inextricably to other factors. For example, home ownership rates are linked to neighborhood crime rates, and hispanics have lower home ownership rates, so any algorithm making predictions (say, for burglary insurance) on the basis of homeownership will appear to discriminate against hispanics though the predictions of exposure to burglary may be accurate.
- A related category is where an algorithm may not explicitly take group membership into account, and yet the predictions end up being inaccurate in ways that are unfair to certain groups. ProPublica analyzed the COMPAS system for predicting recidivism (commission of additional crimes after conviction for an initial crime) and found that it over-identified blacks as recidivist.
- A third category is the “black-box” problem – where a machine learning system is used that masks the roles played by predictors so you may not know whether or how group membership plays a role in the predictions. As with categories 1 and 2, predictions might be accurate or inaccurate, but the lack of explainability lowers confidence, and hampers further analysis and improvement of the algorithm
- A fourth category of algorithmic bias in supervised predictive modeling is if the training data has been labelled by humans whose judgment is biased, and so bias is built in to the models from the beginning. This is less of a problem when the label is fact-based (the batter got a hit or not), rather than judgement based (how elegantly the ballerina danced) [1].
Cathy O’Neil has an engaging discussion of these and related issues in her provocative book Weapons of Math Destruction.
[1] Editor’s Note: This bias was suggested by Dr. John Elder.