Humans have a long history of creating products that are not necessarily aligned with the needs of everyone. “Women at the wheel were 47% more likely to be seriously injured in car accidents until 2011 because car manufacturers were not required to use female mannequins in crash tests,” explains Tulsee Doshi, responsible for the initiative to machine learning Google fair at a conference of the ACM TechTalks cycle. As a result of the lack of representative mannequins, those responsible for the safety of all drivers did not understand the impact of the belts or airbags on a substantial part of them in case of collision.
It is not an isolated case. In the fifties, Kodak calibrated the color of his cards using a white model. The nineties had to arrive so that nothing more and nothing less than the chocolatiers and the wood manufacturers complained that the color of their products was not well portrayed in their photos. “These two examples are not machine learning and they are not examples of bad intentions or desire to discriminate; they are examples of what happens when we design technologies based on who is devising them. The objective of launching something quickly can cause unconscious biases and stereotypes to penetrate our products. ”
The failed standard of mannequins did not fall from the sky and the lack of colors either, the fault is of the humans who were behind the wheel when deciding how things were going to be done. “Humans are also at the center of the development of machine learning, ”Says Doshi. And they can also screw up with no more intention than having their new product ready as soon as possible.
It happens in the best houses, and in Google’s too. An example is Perspective, an API created with the noble objective of promoting healthier conversations on the network and helping to facilitate content moderation. The operation is simple. The system is limited to grant a score to the contents: close to zero if they are harmless and closer to one if they are toxic.
Doshi sets the example of two possible comments to a puppy’s picture. The option what a sweet puppy, I want to hug him forever a score of 0.07 is taken. Instead, this is the worst example of a puppy I’ve ever seen reaches 0.84. “It’s a nasty and hateful comment,” explains Doshi. But hey, the animal doesn’t know anything, so everything would be left in an anecdote. The problem came when the developer team created a demo and opened it to users. “A user introduced two sentences: I’m straight Y I’m gay”, Recalls the expert. The result offered by Perspective was 0.04 and 0.86, respectively. “Of course, this is a difference we don’t want to see in our products. We don’t want the presence of an identity term to drastically change the prediction. ”
Where biases are born
In general, the start-up of machine learning systems follows a common procedure. Collect data, label it, train the model to achieve certain objectives, integrate this into a product and make it available to users to interact with it. “The interesting thing is that injustice can enter the system at any point in the process,” says Doshi. Even users can incorporate their own biases in the way they use the product.
“It is very rare that a single cause or a single solution can be found for these problems and it is often the way in which these different causes interact with each other that produces results like the ones we were commenting on,” explains the expert. Two examples of this are the case of a gender classifier and Google’s own translator. The first, whose function was to classify images, resulted in a greater number of errors for the group of black women. In the second, translations of certain languages were problematic: in Turkish, doctor (doctor in English) was translated by default in masculine and nurse (nurse), in female.
They are two different problems with two different solutions. In the case of the classifier, the answer was to collect more data from the group of black women to better train the model. For Google Translate, we looked for a way to offer the maximum information to the user: “We decided to give both contexts, both the male and the female version,” summarizes Doshi. “These two solutions are valuable ways to move the conversation about justice forward. And there are two ways to ensure that the user experience is inclusive and equitable, but they are very different. One approach is more technical and database, and the other takes the perspective of product design. ”
If the data does not go to Muhammad
In the case of the API to measure the toxicity of the contents, the path is more winding. They started by collecting more data, through what they called Project Pride: “We went to different pride parades around the world to collect positive comments about and from the LGTBQ community.” Another option would have been to generate synthetic data.
In addition, they tried to prevent the model from taking identity tags into account in their assessments. “For example, if I have the phrase some people are indianI can take the term Indian and replace it with a token (symbol) of blank identity ”, explains Toshi. In this way, it is ensured that all identities receive the same treatment, but information is also lost. “This can be harmful because it might be useful to know when certain terms of identity are used offensively. We have to be careful not to misclassify toxic comments, but we also need to make sure that we are not missing comments that are really toxic. ”
Another possible way to address this imbalance is to take into account the differences in the performance of the model for the different groups – in the initial example, the difference between the scores obtained by the identities Heterosexual Y gay– and establish a penalty system that forces you to minimize this distance.
“Each of these approaches can really mean a significant improvement. And you can also see that these improvements are different for different groups, ”says the expert. In this context, it is recommended to keep in mind that there are no single-size solutions and that many alternatives have pros and cons. “As a result of this, it is important that we be clear and transparent about the choices we are making.”
.