How to make "algorithm fairness" part of machine learning?


The rise of big data and artificial intelligence has brought a lot of convenience to our lives.

When we open the news client, we are no longer seeing the same editorial recommendation, but the news that the AI ​​engine carefully prepares after learning about our daily preferences. When we open the e-commerce app, we no longer see what the merchant wants. What to sell, but what we want to buy; when we open the travel website, we are no longer seeing the overwhelming travel classics but the tailored travel routes for us.

But at the same time, it also brings us some hidden worries - because everyone sees different content, will AI sell me more expensive products, or push more opinions to me?

In fact, this is entirely possible. There is even a proper term in the country to describe this phenomenon as “big data killing”.



But sometimes, the algorithm not only determines the price of the goods and the content of the push. As artificial intelligence is applied to public areas such as anti-terrorism, taxation, pre-trial, medical care, insurance, etc., the judgment of the algorithm also determines the welfare of each of us.

Such examples have occurred many times in the world. For example, Twtter's chat bot was “taught” by netizens into a swearing bad boy after a day of going online, and had to be forced to go offline; Google Photos' photo intelligent recognition function would be black. The chimpanzees are grouped together; the ads on the job site will default to female users pushing ads that are lower than men’s, etc...

In the United States, even judges issued a sentence of eight years in prison for a suspect who only committed car theft, only because the artificial intelligence tool COMPAS, which assisted the trial, considered the person “very dangerous”.

As we mentioned in the article "What the implications of artificial intelligence in the real world can be brought to the real world," artificial intelligence (whether in reality or in science fiction) always passes through a fast and detached human society. History learns to complete self-construction, so they have a natural "moral flaw."

So, is there a way to embed anti-discrimination—or algorithmic fairness—in the design of machine learning models in reality?

The answer is: maybe feasible.

In a recent Harvard Business Review, Ahmed Abbasi, Associate Dean of the Business Analysis Center at the McIntyre School of Business at the University of Virginia, and Li Jingjing, Assistant Professor of Information Technology at the McIntyre School of Business at the University of Virginia, and Biomedical Informatics at Emory University Department head Gari Clifford and Herman Taylor, professor of medicine at the Merr House School of Medicine and director of the Institute of Cardiovascular Research, co-authored the article “Fairness by Design” Part of Machine Learning, which explains some prevention algorithms from the perspective of product construction. The idea of ​​discrimination.

The reason why there are several medical experts in this article is because they are summarized in the design of an IoT medical platform that works with the US federal government.

The project collects data through mobile terminals and various IoT devices, processes the data through machine learning models and predicts stroke and early cardiovascular disease, and helps doctors make diagnosis and judgment.

In the project design, they used these steps to reduce the possibility of algorithmic racial and gender discrimination:

1. Team data scientists and social scientists

Generally speaking, data scientists and social scientists have different discourse systems.

For data scientists, “bias” has a specific technical meaning—it specifically refers to the level of segmentation in the classification model.

Similarly, the term "discriminatory potential" refers to the extent to which a model can accurately distinguish data categories (eg, patients with high and low risk of cardiovascular disease).

Finding greater “discrimination potential” is the primary goal in scientific data. In contrast, when social scientists talk about bias or discriminatory potential, they are more likely to refer to equity issues. Social scientists are often better able to provide a humanistic perspective about fairness and prejudice.

In their projects, from the outset, it was ensured to include psychologists, psychometricians, epidemiologists, and people who specialize in the health of different populations. This allows the entire project team to better and more timely understand the demographic bias that may spread to the machine learning process.

2, careful labeling

Before building the model, the original big data that the team got was often not structured, such as large texts entered by users or images of image inspections.

These unstructured data are first structured and tagged by humans and then used to train machine learning models.

This is very common in machine learning. For example, Google Photos has a page that allows you to help determine if a picture is a cat.



In more complex situations, it may be necessary to let the person manually determine which texts have positive emotions and which are negative emotions.

Manual tagging services have become a typical business model in the era of big data, and many crowdsourcing platforms and outsourcing companies will undertake the massive data flow labeling of technology companies.

But because humans have prejudices based on culture, race, and religion, this bias may be transferred to structured data during the tagging process.

In the author's project, they expect this to bias the final model.

For example, although the health level (ideal value) of the two is equivalent, if a person's data contains a lot of spelling mistakes and grammatical errors, then he or she may be labeled lower by some people. Points.

This may eventually lead to a health prediction model that creates a health bias against grammatical or misspelled people.

The authors found that the way to reduce this bias is to introduce modules on potential bias likelihood cases in training for taggers. However, in their projects, because they rely more on self-structured data submitted by users, this problem does not exist because users do not discriminate themselves. However, this occasionally brings some other problems.

3. Combine traditional machine learning indicators with fairness measures

In the past, evaluating the quality of a machine learning model was always achieved using a set of performance-related metrics. Such as overall performance, class-level performance or the general applicability of the model.

Introducing fairness measures in the evaluation of machine learning models can well correct some problems caused by prejudice or discrimination. This is in fact an improvement to the performance of the machine learning model, because the correction of these problems means that the machine learning model no longer makes very large deviations for certain groups, which can improve the overall accuracy.

In the author's project, the researchers examined the model's performance in different population groups and the basic model assumptions. Important equity measures included include internal and inter-departmental true/false, positive/negative ratios and dependence on demographic variables.

For the current seemingly fair part, if the demographic variable has a large weight relative to other variables and serves as the main driver of the forecast, there may be a possibility of bias in the future data.

4. When sampling, balance representative and critical mass constraints [1]

Eliminating irrelevant discriminatory data does not mean that certain specific extreme situations are not considered. In traditional statistical sampling, it is generally believed that as long as the characteristics of the entire population sampled are reflected.

One problem with this approach is that it underestimates the incidence of a particular minority contained within the entire population. On the surface, this does not seem to be a big problem. Because the model can still predict the incidence of the entire population "accurately". But when implemented to individuals in these specific groups, the model will have a significantly higher or lower prediction of their incidence.

In the author's project, they used a method of extensive oversampling of certain disease-related population groups to deliberately satisfy the resulting machine learning model when predicting an “ordinary person” and predicting a “special group”. Can give a more correct answer.

5, more important than the technical means is to maintain awareness

The article mentioned that even with the above measures, the possibility of discrimination in model building cannot be completely eliminated. So they usually have to stop at various stages of model building and training to check if potential discriminating factors are involved in the model.

The author also mentions two methods to correct the model that forms discrimination, one is to eliminate all demographic-related information in the training data; the other is to introduce additional fairness measures into machine learning, such as The importance of manual enlargement and reduction of minority or marginal cases is mentioned.

In the author's project, they found that such corrective behavior is very effective for the part of the algorithm training that is susceptible to demographic bias. After such a set of rules is implemented, the final fairness measure of the model is significantly improved, and the overall accuracy of the model is also a few percentage points.

After Facebook, Google and various Internet companies have erupted "algorithm discrimination scandals", Europe and the United States have set off a wave of anti-algorithm discrimination. Many technicians don't have a good impression of this movement, and think that "politically corrects technological innovation."



But in the author's view, design fairness is not about prioritizing political correctness over model accuracy. Through careful design and thinking, design fairness helps developers develop more reliable and accurate models. It gives the machine a deeper understanding of the complexities behind each demographic element.

Introducing design fairness is not to smooth out the results of machine learning through the principle of "Everyone is equal", but to introduce opposing perspectives and to examine the machine learning process from the perspective of different people, different groups and different strata. Different stages.

In the author's Stroke Belt project, design fairness enables them to develop higher overall performance, broader population applicability, and a more robust predictive model—enabling the healthcare system to more accurately intervene in high-risk groups in advance.

Perhaps every algorithmic engineer who is still pursuing model efficiency and performance should start thinking about introducing design fairness into his work. Because this will not only allow you to build a more equitable model, but also allow you to achieve your original purpose - a more perfect model.

[1] The "Critical mass" is a term of social dynamics used to describe that in a social system, the existence of something has reached a sufficient momentum to enable it to self-sustain and for the future. The growth provides the impetus.

Take a big city as a simple example: If one person stops and looks up to heaven, no one will pay attention to him, and other people passing by will continue what they are going to do. If three people stop and look up, there may be more people who will stop to see what they are doing, but will soon continue their original business. But if the street rises to the number of people in the sky to 5 to 7 people, then others may also join in curiously to see what they are doing. Three of them, five of them, are the critical point of this clustering effect.

Follow Me
Link:Tenco

                                                          ——END——

评论

此博客中的热门博文

RoboMaster Ends: Very Cool Robot Design Competition

The sixth generation of Xiao Bing is online. Why did Microsoft spend four years exploring emotional AI?

Why is your mobile phone battery dead so fast? Scientists say this