Wake Up Call - Data Bias and Corporations

I don’t think some of you are going to like this blog post. I reckon my pragmatism on this matter will be thrown to the wolves of optimism and idealism, but I understand that. In fact I welcome it. This post is not meant in any way to encourage or even justify this behavior. Rather it is meant to highlight that certain characters in this equation do not have their goals aligned with the my crowd, the AI Safety crowd, regardless of their flowery rhetoric.

Corporations exist for one reason, the benefit of their shareholders. They do not exist to make your life better, or easier. They do not exist to employ people. They do not exist to pay taxes or provide a social good. They exist to maximize profits for their shareholders.

Now, sometimes companies will identify social good, community employment, bettering life and some of the other peripherals as goals of their company, but these are secondary goals. To be achieved, if the cost to the bottom line is sufficiently low. Let me illustrate. Company A makes $1 million and says that they want to employ people in their community. Every time their hire someone it costs them $100k to do so. Will they hire 11 people, no way. Certainly they won’t hire 10, probably not even 9. Instead they will employ the right amount of people to help $1 mil be maintained and grown. Depending upon their outlook on the market, that might mean hiring zero people or 100, but it is dependent upon their analysis of profit ONLY. Their hiring decision is secondary, a function of the profit math and it does not occur because of their ancillary hiring goal. Any attempts to convince the company of the values of social good may be factored into their analysis of how can they maintain and grow that $1 mil profit. If that social good has a small probability to erode that $1 mil profit then it will be pushed aside. This is not a debate, there is no “yeah but”. Corporations have a long and distinguished track record of seeking profit first. It’s what they were designed for.

With that background let’s talk about data bias. For those unfamiliar, let me define data bias:

Bias — Oxford definitions

NOUN

  1. Inclination or prejudice for or against one person or group, especially in a way considered to be unfair.
  2. A concentration on or interest in one particular area or subject.

3. A systematic distortion of a statistical result due to a factor not allowed for in its derivation.

VERB

  1. Cause to feel or show inclination or prejudice for or against someone or something.

With the increased use of artificial intelligence (AI) and machine learning (ML), companies acquire enormous amounts of data. The data is the input/fuel for AI and ML to learn and inform their decision making process. This data when acquired from a myriad of sources can acquire bias, thus the term data bias. These biases may appear as a result of the collection process, the organization of, the interpretation of or the implementation of the data. OR, the bias may be embedded in the data from the source itself such as society. There is plenty of bias in society today and thus there is plenty of bias in our data sets.

So it fairly safe to assume that much if not all of the data being input into these systems will have some amount of bias to them. AI safety advocates like myself claim that companies should monitor for and remove bias. But here is the problem. This endeavor is likely to result in sub optimal results to their AI and ML. Let me explain.

If you are trying to model and analyze the best design for your widget, you will use your ML and AI to help you identify the key criteria for that product to create demand or meet demand in the marketplace. The goal is to maximize sales of your widget. If the company takes in data, manipulates ita and during that process causes data bias to be introduced into a reasonably pure data set — they likely corrupted their result. That is data bias that a corporation wants to avoid. It has reduced their profitability.

Now, let’s assume the same process, with no internal corruption including careful well-intended procedures for collection, organization and input into their models. The data is the data as it was acquired, representative of the marketplace the company is trying to create product for. This process will likely identify the right product for that marketplace, provided the company’s models are good. These are procedures a company is likely to take. It is in their interest.

Now consider further, what if THAT data is biased? What if that product has elements of bias embedded in it but it is profitable and meets the corporate sales objectives. What is the company’s responsibility here? Have they done something wrong? Are they perpetuating or even deepening bias? Activists and purists answer “absolutely, Yes”. But you can certainly argue the other side, that the company is simply meeting the world, where it is, to maximize their profits. This is classic goal misalignment, the AI safety crowd knows that removing bias is good for society. The company doesn’t include societal goals in its evaluation process.

The AI safety crowd has two options. Argue that the removal of bias will result in MORE profit or argue that that company should sacrifice profit (or potentially not lose profit) by limiting the spread of bias. The former is difficult to prove. The AI safety crowd should be looking for this argument, because it is the ultimate winning argument (the proverbial win-win). However, I am skeptical that we can find it. It doesn’t make intuitive sense that a data set describing the marketplace, once altered to remove the bias of that marketplace somehow will be BETTER tailored to sell.

So that leaves us pleading for the merits of social good, which are likely at the expense of profit. That concept has significant meaning in the development of AI and ML on the corporate level.

  1. There is little or no incentive for a company to identify embedded, societal bias in advance,
  2. The identification of bias would require the devotion of resources, diverted from the more obvious mission
  3. The company would be responsible for adjusting/accounting for the bias and that requires a moral judgement that corporations are rarely equipped to handle
  4. The search for bias is neither easy nor obvious. A proactive company, willing to sacrifice profit, will even find this a daunting challenge.
  5. Bias can remain hidden for long periods of time

Given those challenges, I suspect that while companies may pay lip service to the challenges of bias, they are more likely to avoid the issue and focus on their substantial business challenges.

Only a regime of truly independent audit, with pre-established rules to identify and remedy, can succeed in removing bias from a company/dataset. There are simply too many hurdles with poorly defined benefits for the company to readily pursue a course of sufficient protection against bias. That is not to say that there won’t be times, markets or even whole companies that will identify benefits from the removal of bias, because there may be. But across the board, using limited resources and internal expertise, it is highly unlikely that the majority of corporations will prioritize bias.

Remember, this is not justification or endorsement of bias, but rather a wake up call for the AI Safety movement. This isn’t going to be easy. We must be pro-active to remedy bias. We must be creative. It is our responsibility to change the will of the marketplace to root out bias from our society (data) as well as from our corporations. Only when company’s profits are reduced for dealing in biased products or better yet, when company’s profits are increased for actively pursuing the removal of bias, will we actually see substantial gains against bias in all its forms.