Posted on Sep 21, 2017 by Glyn Moody

Opening the black boxes: algorithmic bias and the need for accountability




Here on Privacy News Online we’ve written a number of stories about the privacy implications of DNA. There’s an important case going through the Californian courts at the moment that involves DNA and privacy, but whose ramifications go far beyond those issues:

“In this case, a defendant was linked to a series of rapes by a DNA matching software program called TrueAllele. The defendant wants to examine how TrueAllele takes in a DNA sample and analyzes potential matches, as part of his challenge to the prosecution’s evidence. However, prosecutors and the manufacturers of TrueAllele’s software argue that the source code is a trade secret, and therefore should not be disclosed to anyone.”

The Electronic Frontier Foundation (EFF) points out that there are two big problems here. One is the basic right of somebody accused of a crime to be able to examine and challenge the evidence that is being used against them. In this case, that’s not possible, because the manufacturer of the TrueAllele software is unwilling to allow the source code that determines whether or not there is a DNA match to be released. Particularly egregious is the fact that the company is claiming that its right to maintain a supposed trade secret outweighs the accused’s right to a fair trial.

But beyond that issue, there is another that is certain to have a big impact on the world of privacy. It involves the increasing use of algorithms to make judgements about us. An algorithm is just a fancy way of saying a set of rules, usually implemented as software encoding mathematical equations. The refusal by TrueAllele’s manufacturer is therefore a refusal to permit the accused in the Californian case to examine and challenge the algorithmic rules that are being applied.

If this position is allowed to stand, we run the risk of turning algorithms into black boxes whose results we are forced to accept, but whose workings we may not query. In particular, we won’t know what personal information has been used in the decision-making process, and thus how our privacy is being affected.

It’s not just outright errors in rules that are a problem. As a recent article in MIT Technology Review pointed out, even more insidious, because more subtle, is the presence of algorithmic bias:

“Algorithmic bias is shaping up to be a major societal issue at a critical moment in the evolution of machine learning and AI. If the bias lurking inside the algorithms that make ever-more-important decisions goes unrecognized and unchecked, it could have serious negative consequences, especially for poorer communities and minorities. The eventual outcry might also stymie the progress of an incredibly useful technology.”

Algorithmic bias can enter systems in two main ways. One, is through the algorithm’s basic rules, which may contain incorrect assumptions that skew the output results. Another is through the use of biased training data. Many of the latest algorithm-based systems draw their power from being trained on large holdings of real-world data. This allows hidden patterns to be detected and used for future analysis of new data. But if the training data has inherent biases, those will be propagated into the algorithm’s output.

Even though algorithmic systems are being rolled out rapidly, and across a wide range of sectors, people are only beginning to grapple with the deep problems they can bring, and to try to come up with solutions. For example, AlgorithmWatch is:

“a non-profit initiative to evaluate and shed light on algorithmic decision making processes that have a social relevance, meaning they are used either to predict or prescribe human action or to make decisions automatically.”

Its algorithmic decision making (ADM) manifesto states: “The fact that most ADM procedures are black boxes to the people affected by them is not a law of nature. It must end.” Another initiative helping to open up those black boxes is AI Now, one of whose aims is tackling the problem of data bias:

“Data reflects the social and political conditions in which it is collected. AI is only able to “see” what is in the data it’s given. This, along with many other factors, can lead to biased and unfair outcomes. AI Now researches and measures the nature of such bias, how bias is defined and by whom, and the impact of such bias on diverse populations.”

There’s already a book on the issues raised by algorithmic bias, called “Weapons of Math Destruction“, whose author now heads up a new company working in this field, O’Neil Risk Consulting & Algorithmic Auditing (ORCAA). As well as auditing algorithms, ORCAA also offers risk evaluation. That’s an important point. Inscrutable algorithms are not just a problem for the people whose lives they may affect dramatically – as the case in California makes plain. They may also lead to costly legal action against companies whose algorithms turn out to contain unsuspected biases that have resulted in erroneous or unfair decisions. The sooner we come up with a legal framework allowing or even requiring the outside review of key algorithms, the better it will be for the public, for companies, and for society as a whole.

Featured image by shahzairul.

About Glyn Moody

Glyn Moody is a freelance journalist who writes and speaks about privacy, surveillance, digital rights, open source, copyright, patents and general policy issues involving digital technology. He started covering the business use of the Internet in 1994, and wrote the first mainstream feature about Linux, which appeared in Wired in August 1997. His book, “Rebel Code,” is the first and only detailed history of the rise of open source, while his subsequent work, “The Digital Code of Life,” explores bioinformatics – the intersection of computing with genomics.


VPN Service

Leave a Reply

Your email address will not be published. Required fields are marked *

1 Comments

  1. Ash

    AI is inherently biased. All AI algorithms can be considered as compression algorithms, wherein input data, with multiple attributes, exists in a high-dimensional space, and is compressed to a set of transformed data points for human interpretation. All compression algorithms have information mangling and loss. In order to be efficient, these algorithms will mash together two different parts (part with guilty attributes and part with innocent attributes) of their input data space that should not logically be connected. This is an unacceptable error but since there are insignificant number of training data points in both these two parts, the algorithm deems the error to be acceptable. A visualization of this issue in machine learning paper titled ‘visual word ambiguity’ that delves on visual recognition:
    https://ivi.fnwi.uva.nl/isis/publications/2010/vanGemertTPAMI2010/thumbnail.jpg
    To make the point accessible to the lay person, consider the problem of deciding whether a Shark you encounter is lethal. There are many attributes to aquatic creatures: size, weight, color, genus, habitat (and more). There are several types of sharks and they have different set of attributes as well. But, if your algorithm has only ever been trained with data of a few white sharks acquired from lethal encounters , then it will mash together all those attributes of sharks together. Then upon given data of a new shark sighting, it will always predict that new shark to be lethal, regardless of its attributes. Even if its a tiny baby shark, AI will always say: “You need a bigger boat!”.
    This issue is not isolated to AI, it ails human intelligence as well, where its called stereotyping.
    The most pertinent point here is that it is notoriously difficult to conclusively prove that the AI algorithm is wrong. The defendant can theoretically acquire counter examples to substantiate his point. Indeed, if the AI algorithm were re-trained with this new data, it would being unprejudiced acquit the defendant. But typically this data isn’t available.
    So the defendant must quantify the error/bias in the AI algorithm and prove its unfair. The problem here is that the methods for measures of bias, like cross-validation, are themselves entirely dependent on available data. While they help identify a poorly trained AI algorithm, they can only do so up to the quality of the data available to them.
    In conclusion, even if TrueAllel’s algorithm was made available to the defendant, they could not conclusively prove their innocence. This is a very big problem with AI which needs to address the issue of inherent bias and develop good quantification for it.

    3 months ago
    Reply