Using this example, the British computer scientists and statisticians Bryce Goodman and Seth Flaxman illustrate in a technical essay which dilemma lies in the attractive commercial possibilities of big data. They are often hard at the vague frontier of illicit discrimination.
According to the European General Data Protection Regulation, health data are particularly sensitive data, the commercial processing of which is only permitted with explicit permission and special precautions.
In addition, everyone has the right to have the algorithm used clearly explained, to protest against the result and to demand a decision by a human being if sensitive data is used.
But when actuaries use such calculation models – also known as algorithms – to help decide who gets insurance at what price, they do not access health data directly. “Either the regulation is interpreted narrowly and only the direct use of sensitive data is meant, then there is no protection against discrimination,” conclude Goodman and Flaxman. Or, in a further interpretation, the use of data is also meant that allows strong conclusions to be drawn about the sensitive properties: “Then the regulation is not practical.”
The latter is because the algorithms of the newer generation can hardly be explained in an understandable way. This is all the more true when it comes to learning systems that develop the algorithm independently. Then even the developers no longer know exactly which data is weighted and interpreted in which combination and how.
Risks to companies
The business ethicist Adair Morse from the University of California and the central bank economist Karen Pence dealt with the problems that arise in the area of finance in the article “Technological Innovation and Discrimination in Household Finance”.
They highlight the major risks that companies face when rules and current technical possibilities fall apart. The great and difficult challenge for legislators and lawyers to find a workable solution is just as clear.
The good news is that data-driven decisions reduce the potential for the particularly unsavory form of discrimination, which in economic jargon is called “taste based discrimination”, ie discrimination based on aversion or prejudice. When decision-makers no longer have as much freedom to make decisions, they can no longer easily discriminate against people they don’t like.
The bad news: The second form of discrimination, “statistical discrimination”, could become more widespread through data-based creation of profiles and decisions.
Statistical discrimination occurs when, for lack of information about the individual, for example about his creditworthiness or life expectancy, one falls back on the average values of a group to which this person belongs. A woman is classified differently from a man. This was common for life insurance companies until it was banned by the courts as discriminatory.
As the amount of data increases, it becomes more difficult for commercial users to avoid this form of discrimination. At the same time, it becomes more difficult for the courts and regulators to prove and punish them.
Statistical discrimination serves to maximize profits. This is a motive that jurisprudence accepts in principle as legitimate. But the limits of what is allowed are blurred. It becomes problematic, for example, if good faith or the higher information costs of disadvantaged groups are used to charge higher prices than others. This can contradict the rules for equal access to goods and services.
So far, it has already been the case that sellers, for example of cars or loans, use rules of thumb about the possibilities of different groups to compare prices for price differentiation. However, if algorithms can now calculate how price-sensitive each individual is based on a large amount of data and, in principle, you can give each buyer a different price, this will take on a whole new quality.
Discrimination can also occur in an unexpected way, for example if minorities are heavily underrepresented in the data sets used to train artificial intelligence. With facial recognition, this has led to fair-skinned people and men being recognized much more reliably than dark-skinned people and women.
If credit default probabilities are determined, the effect of the small number means that the estimate for minorities is significantly more uncertain. If the algorithm limits lending to a default risk below a threshold, that alone can lead to rejections for the minority, even if loan defaults are not more common in their group.
While the courts were able to easily recognize and judge different insurance premiums for men and women, it is hardly possible today to determine whether there is indirect discrimination based on gender, age or health. Because the algorithms sometimes include thousands of features. Even if the protected features are not included, they are very often included indirectly, for example, because men rarely buy tampons and the purchase of medication allows good conclusions to be drawn about their age and state of health.
It is therefore all the more urgent that politicians, regulators and courts have to take on the task of adapting the relevant laws, judgments and regulatory requirements on the subject of discrimination to a new situation in which the unequal treatment is set up in the data sets and is “decided” by algorithms that are difficult to understand. “The decisions of the next few years will influence whether discrimination in financial services will become pervasive or not,” warn Morse and Pence.
While they do not offer discussed solutions to this problem, they do indicate where one could be. They consider the previous test system to be no longer practical. The variables used in a commercial decision are used to check whether there is any discrimination.
They promote thinking about an output-oriented model. According to this, discrimination should be considered, which results in people with certain protected characteristics being worse off than people without this characteristic.
At the same time, they complain that economic research cannot contribute as much to the necessary adjustment as it could. Research that puts the interests of consumers at the center hardly takes place. Instead, it is almost only about the interests of the data-using companies.
The reason is the power of those who have the data. “The algorithms are complex, and the data are the property of the technology corporations,” she explains, adding: “A lot of research is therefore necessarily done in partnership with them.”
More: That is the difference between AI and explainable AI.